Re: Deadlock with 2 processes waiting on transactionids and locking unrelated tables
Hi. Does any of the two tables have triggers? What's the database / transaction isolation level? Do the updates run in a transaction among other read / write operations within the same transaction ? Regards. 2018-02-18 23:28 GMT-06:00 David Wheeler: > Hi, > > We’re seeing deadlock semi-regularly (0-2 per day) that I’m really having > trouble getting to the bottom of. > > Process 7172 waits for ShareLock on transaction 4078724272 > <(407)%20872-4272>; blocked by process 7186. > Process 7186 waits for ShareLock on transaction 4078724210 > <(407)%20872-4210>; blocked by process 7172. > > The two queries in question are updates on unrelated tables. Running the > queries on their own shows no overlapping entries in pg_locks. > > Process 7172: update ticket set unread = true where ticketid = $1 > Process 7186: update "planscheduleitem" set "planschedule"=$1 where > "psi"=$2 > > How can I work out why Postgres has decided that the two processes are in > deadlock? Is there an explainer somewhere on transaction level locks? I > can’t see anything in the docs besides that they exist. > > > > Details below > > select version(); > version > > --- > PostgreSQL 10.2 (Ubuntu 10.2-1.pgdg14.04+1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit > (1 row) > > > > --- > > > after running update "planscheduleitem" set "planschedule"=$1 where > "psi"=$2 > > SELECT locktype, virtualxid, transactionid, virtualtransaction, pid, mode, > relname, page, tuple > FROM pg_locks LEFT JOIN pg_class ON (relation=oid) where pid =41715; >locktype| virtualxid | transactionid | virtualtransaction | pid | > mode | relname | page | tuple > ---++---+--- > -+---+--+-+--+--- > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_parentticketid | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_originalticketid | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_tickettypeid_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_subject_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_closedtime_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_assignedto_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_serviceuid_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_parentuid_idx| | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_createdtime_idx | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_txid | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_tickettype | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_ticketpriority | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_idx_0| | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_pkey | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | number_constraint | | > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket | | > virtualxid| 56/2306863 | | 56/2306863 | 41715 | > ExclusiveLock| | | > transactionid ||4089785154 <(408)%20978-5154> | > 56/2306863 | 41715 | ExclusiveLock| | >| > relation || | 56/2306863 | 41715 | > RowExclusiveLock | ticket_fromuid_idx | | > (19 rows) > > > > > > after running update ticket set unread = true where ticketid = $1 > > SELECT locktype, virtualxid,
Migrate2Postgres - A new tool for migration from other DBMSs
Hi everybody, I published a tool that makes it easy to migrate a database from other DBMSs to Postgres: https://github.com/isapir/Migrate2Postgres Currently it supports migrations from MS SQL Server, but it is written in a way that will make it easy to migrate from other DBMSs as well. I also published a video (didn't spend too much time editing it, sorry) with an overview and a demo of migrating the AdventureWorld database, which is one of the MS SQL Server sample databases. The migration itself takes about a minute. The overview of the tool and the explanations take longer: https://youtu.be/5eF9_UB73TI For the impatient, this is the point in the video where I start using the tool (so you can watch that first and then later watch the overview and the configuration options from the beginning); https://youtu.be/5eF9_UB73TI?t=9m51s Feedback welcome! Thank you, Igal
Deadlock with 2 processes waiting on transactionids and locking unrelated tables
Hi, We’re seeing deadlock semi-regularly (0-2 per day) that I’m really having trouble getting to the bottom of. Process 7172 waits for ShareLock on transaction 4078724272; blocked by process 7186. Process 7186 waits for ShareLock on transaction 4078724210; blocked by process 7172. The two queries in question are updates on unrelated tables. Running the queries on their own shows no overlapping entries in pg_locks. Process 7172: update ticket set unread = true where ticketid = $1 Process 7186: update "planscheduleitem" set "planschedule"=$1 where "psi"=$2 How can I work out why Postgres has decided that the two processes are in deadlock? Is there an explainer somewhere on transaction level locks? I can’t see anything in the docs besides that they exist. Details below select version(); version --- PostgreSQL 10.2 (Ubuntu 10.2-1.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit (1 row) --- after running update "planscheduleitem" set "planschedule"=$1 where "psi"=$2 SELECT locktype, virtualxid, transactionid, virtualtransaction, pid, mode, relname, page, tuple FROM pg_locks LEFT JOIN pg_class ON (relation=oid) where pid =41715; locktype| virtualxid | transactionid | virtualtransaction | pid | mode | relname | page | tuple ---++---++---+--+-+--+--- relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_parentticketid | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_originalticketid | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_tickettypeid_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_subject_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_closedtime_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_assignedto_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_serviceuid_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_parentuid_idx| | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_createdtime_idx | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_txid | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_tickettype | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_ticketpriority | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_idx_0| | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_pkey | | relation || | 56/2306863 | 41715 | RowExclusiveLock | number_constraint | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket | | virtualxid| 56/2306863 | | 56/2306863 | 41715 | ExclusiveLock| | | transactionid ||4089785154 | 56/2306863 | 41715 | ExclusiveLock| | | relation || | 56/2306863 | 41715 | RowExclusiveLock | ticket_fromuid_idx | | (19 rows) after running update ticket set unread = true where ticketid = $1 SELECT locktype, virtualxid, transactionid, virtualtransaction, pid, mode, relname, page, tuple FROM pg_locks LEFT JOIN pg_class ON (relation=oid) where pid =41715; locktype| virtualxid | transactionid | virtualtransaction | pid | mode | relname| page | tuple ---++---++---+--+--+--+--- relation || |
Re: pgBackRest backup from standby
On Sun, Feb 18, 2018 at 06:34:46PM -0600, Don Seiler wrote: > Looking to use pgBackRest to take a backup from a hot standby. I'm reading > that pgBackRest still needs to connect to the primary and copy some files. > My questions are: > > >1. What files does it need to copy? Config files? WAL files? >2. How does it connect? SSH? >3. Does pgBackRest need to be installed and configured on the primary as >well? I am adding in CC: Stephen Frost and David Steele who work on the took. You may want to contact the maintainers directly through github where the project is maintained: https://github.com/pgbackrest/pgbackrest -- Michael signature.asc Description: PGP signature
pgBackRest backup from standby
Evening all. Looking to use pgBackRest to take a backup from a hot standby. I'm reading that pgBackRest still needs to connect to the primary and copy some files. My questions are: 1. What files does it need to copy? Config files? WAL files? 2. How does it connect? SSH? 3. Does pgBackRest need to be installed and configured on the primary as well? Thanks, Don. -- Don Seiler www.seiler.us
Re: Need to fix one more glitch in upgrade to -10.2
Rich Shepardwrites: > On Mon, 19 Feb 2018, Tim Cross wrote: > >> It is possible for the target of a symbolic link to be changed, deleted >> etc (a dangling sym link). > > Tim, > >Broken symlinks display in a different color, black on a red background if > I remember correctly, rather than the light cyan of a working symlink. I've > seen enough of the former to recognize the difference. :-) > > Thanks, > > Rich Hi Rich, yes, for dangling links, if your ls is configured to show colours and your shell honours those colours, dangling links will show up with a different colour. However, this will not tell you if, for example, the mode of the file being pointed to does not have the executable bit set for the user/group running/calling the sym link. If the file doe snot have the right executable bits set, you would get the error message about not being able to find the executable file. >From memory, your cut and paste looked like it has a '*' at the end, which is what ls will show (assuming the alias you have adds that option), so the target probably has at least the owner exec bit set, but when tracking down problems like this, I have learnt it is often best to just verify the modes and owner/group memberships of the target are correct. regards, Tim -- Tim Cross
Re: Need to fix one more glitch in upgrade to -10.2
On Mon, 19 Feb 2018, Tim Cross wrote: It is possible for the target of a symbolic link to be changed, deleted etc (a dangling sym link). Tim, Broken symlinks display in a different color, black on a red background if I remember correctly, rather than the light cyan of a working symlink. I've seen enough of the former to recognize the difference. :-) Thanks, Rich
Re: Need to fix one more glitch in upgrade to -10.2
Rich Shepardwrites: > On Sun, 18 Feb 2018, Tim Cross wrote: > >>> # ll /usr/bin/postgres >>> lrwxrwxrwx 1 root root 35 Feb 17 09:30 /usr/bin/postgres -> >>> ../lib/postgresql/10.2/bin/postgres* > >> Try doing an 'll' on the second part of that output i.e. >> ll /usr//lib/postgresql/10.2/bin/postgres* > >See my message, repeated above. > > Regards, > > Rich Your ll command is only showing what the symbolic link is pointing to, not the status of the thing it points to. It is possible for the target of a symbolic link to be changed, deleted etc (a dangling sym link). My point was to verify the target and it's permissions, not just the sym link itself. Tim -- Tim Cross
Re: Need to fix one more glitch in upgrade to -10.2
Rich Shepardwrites: > On Sun, 18 Feb 2018, Tim Cross wrote: > >> This may not be relevant, > > Tim, > >Nope. Pat goes for stability, not cutting edge. No systemd in the > forthcoming 15.0, either. > > Thanks, > > Rich No worries, though I'm not sure you can call systemd 'cutting edge' anymore. Even many of the distros which argued against it have switched (i.e. Debian, Ubuntu). Tim -- Tim Cross
Re: READ COMMITTED vs. index-only scans
On Wed, Jan 17, 2018 at 9:34 PM, Jacek Kołodziejwrote: > Hi Tom, > > On Wed, Jan 17, 2018 at 7:56 PM, Tom Lane wrote: > >> =?UTF-8?Q?Jacek_Ko=C5=82odziej?= writes: >> > Here's what happening to me: the "A" query occasionally (in my case: on >> the >> > order of tenths per day) returns an ID _higher_ than any ID present in >> > second query's result (other conditions I haven't specified do _not_ >> filter >> > any more rows than "id <= max ID") - as if some entries were visible for >> > the first query, but not for the second one. This is an inconsistency >> that >> > is very problematic for me. >> >> That sounds problematic to me too, but how certain are you that the "other >> conditions you haven't specified" aren't suppressing the last row? That'd >> certainly be the least surprising explanation. If it isn't that, though, >> this surely seems like a bug. >> >> Yes, I'm fairly sure of that. When I execute that same "B" query again > some time afterwards, it returns all expected rows - I mean, also these > that were "included" in original "A" query and that were "missing" in "B" > one first time around. > > >> Can you determine whether the row(s) missing in the second query are >> freshly committed? Or have they been there awhile? >> >> Depends on what would be considered "fresh", usually it's on the order of > miliseconds or seconds. > > >> > Where am I wrong? What am I missing? What information may I provide to >> help >> > with investigating this? >> >> Probably the best thing to spend time on would be to try to extract a >> publishable test case. It would be really hard to get to the bottom >> of an issue like this without having a reproducer. It's okay if it >> takes awhile to reproduce the fault ... >> >> I'd certainly love to have a working repro. I won't be able to do it for > the next few days but I'll work on this right after the weekend. > > >> Also, before spending a whole lot of time on this: are you on 9.6.6? >> If not, update, just in case this is an already-fixed issue. The >> symptoms don't sound familiar, but I don't want to waste a lot of >> time only to find out it's some manifestation of a known bug. >> >> regards, tom lane >> > > I'm using 9.6.5; I'm not administrating it so it might take some time > before updating but once it's done, I'll get back with whether that fixed > the situation. In the meantime, when trying to reproduce it locally, I'll > use both 9.6.5 and 9.6.6 to see whether it makes any difference. > > Thank you very much for the suggestions. > > > -- > Kind regards, > Jacek Kołodziej > http://kolodziejj.info > Hello again, unsurprisingly, the fault was on my side - it was discovered by my colleague. Source of the problem was disregarding of how sequences work - i.e., how they produce monotonic numbers at query time (in this case: IDs for the primary key) but these queries (and,in turn, numbers from the sequence) may be committed in a different order (T1, T2, T3 - transactions; T1 and T2 appends to the events table, T3 reads from it with "A" and "B" queries): - T1 was inserting an event (it got ID 6) - it did not commit yet! - T2 was inserting an event (it got ID 7) and it commits - T3 has made a query "A" (which gets "max event's ID" equal to 7 at that time) - and selects events with query "B" (which gets event with ID 7, but not the one with ID 6 - because it has not been comitted yet) - T1 commits but that event (ID equals to 6) has been omitted already and won't be picked up ever again REPEATABLE READ could not help us in such situation. We ended up - at least for now - forcing serialization of inserting the events (so that their IDs will always match the order in which they are inserted _and comitted_ into the database); while it may be suboptimal, it seems like a good-enough solution for our use case for the time being and we'll be working on more scalable solution in the future. Thank you for looking into this and inspiration for further investigation. -- Kind regards, Jacek Kołodziej http://kolodziejj.info
Re: Need to fix one more glitch in upgrade to -10.2 [FIXED]
On Sun, 18 Feb 2018, Rich Shepard wrote: Thanks for confirming Removed all files in the data/ directory, re-initialized the cluster, and restored the dumped .sql file (minus three databases and their roles manually deleted). All works well now. Thanks, Adrian! Best regards, Rich
Re: Need to fix one more glitch in upgrade to -10.2
On Sun, 18 Feb 2018, Adrian Klaver wrote: Is this appropriate? Yes. Adrian, Thanks for confirming They could not have been removed as they are in the file. I am guessing you are saying they are not in use as far as you know. Just a warning(from experience), memory is a tricky thing and removing what is thought to be inactive roles is a quick way to find they are not. Well, one set of roles is related to my former bookkeeping system and that database had not been removed. Another set of roles was related to a replacement bookkeeping system I didn't use and I don't recall seeing that database when I last ran 'psql -l'. So can anyone who knows that postgres role is generally always there. If you want to do this at least restrict the user field. As I'm the only one here unless someone is sitting here and logging in under my username they won't see a thing. And no one's going to sit here an log in as me other than me. :-) One advantage of working from home. Regards, Rich
Re: query performance
On 02/18/2018 06:37 AM, David Rowley wrote: > On 18 February 2018 at 12:35, hmidi slimwrote: >> Is there an other optimized solution to make a query such this: >> select * from ( >> select e.name, e1.name, e.id >> from establishment as e, establishment as e1 >> where e.id <> e1.id >> and e1.id = 1 >> and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, >> 1000)) as tmp inner join product as p on p.establishment_id = tmp.id > > You really should state what is wrong with the performance of the > above version and also include the EXPLAIN (ANALYZE, BUFFERS) of that > query. > > The version of PostgreSQL that you're running it on is also a good > thing to share. > > Details of the indexes which you've defined on the tables are also > useful information. It would especially be good to know if you've > added an index on product (establishment_id), for example. > There's actually a wiki page with overview of what should be included in "slow query" questions: https://wiki.postgresql.org/wiki/Slow_Query_Questions regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: shared_buffers 8GB maximum
On 02/18/2018 02:41 PM, Vitaliy Garnashevich wrote: > >> I certainly wouldn't recommend using 1/2 of RAM right away. There's >> a good chance it would be a waste of memory - for example due to >> double buffering, which effectively reduces "total" cache hit >> ratio. > > Double buffering is often mentioned in context of tuning shared > buffers. Is there a tool to actually measure the amount of double > buffering happening in the system? > I'm not aware of such tool. But I suppose it could be done by integrating information from pg_buffercache and pgfincore [1]. [1] https://github.com/klando/pgfincore >> Those evictions are performed either by backends or bgwriter, both >> of which are less efficient than checkpointer. Not only can >> checkpointer perform various optimizations (e.g. sorting buffers to >> make the writes more sequential), but it also writes each dirty >> buffer just once. With smaller shared_buffers the page may have be >> written multiple times. > > In the case when shared_buffers cover most of RAM, most of writes > should happen by checkpointer, and cache hit ratio should be high. So > a hypothetical question: Could shared_buffers=200GB on a 250 GB RAM > server ever be a reasonable setting? (assuming there are no other > applications running except postgres, and 50GB is enough for > allocating work_mem/maintenance_work_mem and for serving queries) > It depends on how large is the active part of the data set is. If it fits into 200GB but not to smaller shared buffers (say, 100GB), then using 200GB may be a win. If the active set is much larger than RAM, smaller shared_buffer values work better in my experience. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: shared_buffers 8GB maximum
2018-02-18 14:41 GMT+01:00 Vitaliy Garnashevich: > > I certainly wouldn't recommend using 1/2 of RAM right away. There's a >> good chance it would be a waste of memory - for example due to double >> buffering, which effectively reduces "total" cache hit ratio. >> > > Double buffering is often mentioned in context of tuning shared buffers. > Is there a tool to actually measure the amount of double buffering > happening in the system? > > Those evictions are performed either by backends or bgwriter, both of >> which are less efficient than checkpointer. Not only can checkpointer >> perform various optimizations (e.g. sorting buffers to make the writes >> more sequential), but it also writes each dirty buffer just once. With >> smaller shared_buffers the page may have be written multiple times. >> > > In the case when shared_buffers cover most of RAM, most of writes should > happen by checkpointer, and cache hit ratio should be high. So a > hypothetical question: Could shared_buffers=200GB on a 250 GB RAM server > ever be a reasonable setting? (assuming there are no other applications > running except postgres, and 50GB is enough for allocating > work_mem/maintenance_work_mem and for serving queries) > > The best thing you can do is set shared buffers to some conservative >> value (say, 4-8GB), let the system run for a day or two, compute the >> cache hit ratio using metrics in pg_stat_database, and then decide if >> you need to resize shared buffers. >> >> Gradual increases are a good approach in general. And yes, having >> >> buffers_checkpoint > buffers_clean > buffers_backend >> >> is a good idea too. Together with the cache hit ratio it's probably a >> more sensible metric than looking at usagecount directly. >> > > Thanks! While increasing shared_buffers we'll be looking at changes in > cache hit ratio too. > When we did calculation of some analytic tasks, then increasing shared_buffers had negative impact on speed. Probably hit ration was too low after change, but the maintenance of shared buffers (searching free blocks) was slower. So optimal size of SB depends on use case too much - note -- too big SB means small work mem what can be worse .. work_mem must be multiplied by max_connection and by some constant .. 2 or 3. Regards Pavel > > Regards, > Vitaliy > > >
Re: Need to fix one more glitch in upgrade to -10.2
On Sun, 18 Feb 2018, Tim Cross wrote: # ll /usr/bin/postgres lrwxrwxrwx 1 root root 35 Feb 17 09:30 /usr/bin/postgres -> ../lib/postgresql/10.2/bin/postgres* Try doing an 'll' on the second part of that output i.e. ll /usr//lib/postgresql/10.2/bin/postgres* See my message, repeated above. Regards, Rich
Re: shared_buffers 8GB maximum
I certainly wouldn't recommend using 1/2 of RAM right away. There's a good chance it would be a waste of memory - for example due to double buffering, which effectively reduces "total" cache hit ratio. Double buffering is often mentioned in context of tuning shared buffers. Is there a tool to actually measure the amount of double buffering happening in the system? Those evictions are performed either by backends or bgwriter, both of which are less efficient than checkpointer. Not only can checkpointer perform various optimizations (e.g. sorting buffers to make the writes more sequential), but it also writes each dirty buffer just once. With smaller shared_buffers the page may have be written multiple times. In the case when shared_buffers cover most of RAM, most of writes should happen by checkpointer, and cache hit ratio should be high. So a hypothetical question: Could shared_buffers=200GB on a 250 GB RAM server ever be a reasonable setting? (assuming there are no other applications running except postgres, and 50GB is enough for allocating work_mem/maintenance_work_mem and for serving queries) The best thing you can do is set shared buffers to some conservative value (say, 4-8GB), let the system run for a day or two, compute the cache hit ratio using metrics in pg_stat_database, and then decide if you need to resize shared buffers. Gradual increases are a good approach in general. And yes, having buffers_checkpoint > buffers_clean > buffers_backend is a good idea too. Together with the cache hit ratio it's probably a more sensible metric than looking at usagecount directly. Thanks! While increasing shared_buffers we'll be looking at changes in cache hit ratio too. Regards, Vitaliy
Re: Need to fix one more glitch in upgrade to -10.2
On Sun, 18 Feb 2018, Tim Cross wrote: This may not be relevant, Tim, Nope. Pat goes for stability, not cutting edge. No systemd in the forthcoming 15.0, either. Thanks, Rich