Re: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Sanjaya Vithanagama Sent: Sunday, November 23, 2014 10:52 PM To: pgsql-general@postgresql.org Subject: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations Hi All, We have a single table which does not have any foreign key references. id_A (bigint) id_B (bigint) val_1 (varchar) val_2 (varchar) The primary key of the table is a composite of id_A and id_B. Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code. The operations usually look like the following where it could match thousands of records to update or delete: DELETE FROM table_name t USING ( SELECT id_A, id_B FROM table_name WHERE id_A = ANY(array_of_id_A) ANDid_B = ANY(array_of_id_B) ORDER BY id_A, id_B FORUPDATE ) del WHERE t.id_A = del.id_A ANDt.id_B = del.id_B; UPDATE table_name t SETval_1 = 'some value' , val_2 = 'some value' FROM ( SELECT id_A, id_B FROM table_name WHERE id_A = ANY(array_of_id_A) ANDid_B = ANY(array_of_id_B) ORDER BY id_A, id_B FORUPDATE ) upd WHERE t.id_A = upd.id_A ANDt.id_B = upd.id_B; We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact) Is there another way that we could try to solve these deadlock situations? The reference manual says — The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation? Thank you in advance, Sanjaya May be I’m missing something here, but it seems that you make the problem worse by using : DELETE … USING (SELECT … FOR UPDATE)… Can’t you just do: DELETE FROM table_name WHERE id_A = ANY(array_of_id_A) AND id_B = ANY(array_of_id_B); ? Regards, Igor Neyman
[GENERAL] How to individually list the DDL for all individual data base objects
Is there a good way to individually list the DDL for all individual data base objects? Running a data base dump like: pg_dump mydatabase mydatabase-database.sql produces one big file with all the DDL and DML to restore the data base, which is very convenient for most cases. Using that I have sometimes cut-and-pasted out of it the DDL for individual items that I want to work on, and then fed that back in with psql mydatabase newstuff.sql What I would like, though, is a set of output files, one for each DDL/DML item currently represented in the one big file. I could maybe attempt some convoluted automated parsing of the big file with, for example, Perl, but that seems, well, convoluted, error-prone, and probably fragile. The directory dump output option for pg_dump is similar to, but not really, what I want (as far as I see the individual files that produces are only the ones for reloading data, so correct me if I am wrong ... and please show me how to do it right!) I have played around with the custom format dump followed by pg_restore and various options, but did not get what I wanted, at least not as elegantly as I wanted. What I have come up with is fairly simple in appearance, but the way it works, by reading one line-at-a-time from the list file associated with the dump file, and then running pg_restore with just that one line, rinse and repeat for each piece of DDL/DML, also seems convoluted and potentially fragile. Something along the lines of (... if anyone thinks this a good idea, or good starting point ...): grep -v '^;' listfile | while read a b c n do a=${a/;} echo $a f pg_restore -L f -f outputdir/$a dumpfile done This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element. Ideally, I would like the files named after the schema+object it represents. Thanks for your help! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to avoid a GIN recheck condition
Alexis ale...@bernard.io writes: I have a GIN index but the planner rechecks the condition. When many rows are foud in the index that is very slow: You're assuming a fact not in evidence, namely that the recheck condition is being used at runtime. The planner will *always* attach a recheck condition to a bitmap heap scan, but it only gets used if (1) the index opclass reports that its result is lossy, or (2) the bitmap bloats to the point where it has to be stored lossily. I see no weights in your query so (1) isn't happening, and it seems unlikely that (2) is happening if the rowcount estimate is on target. Lacking EXPLAIN ANALYZE results, it's hard to speculate further. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit : Is there a good way to individually list the DDL for all individual data base objects? snip grep -v '^;' listfile | while read a b c n do a=${a/;} echo $a f pg_restore -L f -f outputdir/$a dumpfile done This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element. Ideally, I would like the files named after the schema+object it represents. Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file? $ pg_restore —help … -f, --file=FILENAME output file name … -t, --table=NAME restore named table … Such that you could run: $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate some of these problems. YMMV. Hope that helps! François -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
Berend Tober wrote: Is there a good way to individually list the DDL for all individual data base objects? These threads might interest you: http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0ys78txhfmdtyxjfsrsrc...@mail.gmail.com http://www.postgresql.org/message-id/caaswcxdkroy2uhlof4zvxx00gosccqxkg_g4yf9vouk8axw...@mail.gmail.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
François Beausoleil wrote: Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit : Is there a good way to individually list the DDL for all individual data base objects? snip Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file? Yes sir, thank you. That is what the script does. But that little script seems fragilly-dependent upon the format of the list file and runs (starts a new process for) pg_restore for every individual line in the list file. ... which seems like poor practise, generally. Additionally, I'd like stability in the output file names, rather than the (likely changing) internal, automatically numerically-named items. And, btw, I want not just a single table. The dependance issue is potentially a problem, but (as I believe) it looks like the output of pg_restore in a list file is in the correct order to process dependencies, so I could walk backwards of forwards through that if changes to one object were dependent on, or caused cascading effects in other objects. $ pg_restore —help … -f, --file=FILENAME output file name … -t, --table=NAME restore named table … Such that you could run: $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate some of these problems. YMMV. Hope that helps! François --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
You should probably look at the pg_extractor utility. https://github.com/omniti-labs/pg_extractor With it, you can dump individual or selected objects to separate directories. On Mon, Nov 24, 2014 at 11:00 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Berend Tober wrote: Is there a good way to individually list the DDL for all individual data base objects? These threads might interest you: http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0ys78txhfmdtyxjfsrsrc...@mail.gmail.com http://www.postgresql.org/message-id/caaswcxdkroy2uhlof4zvxx00gosccqxkg_g4yf9vouk8axw...@mail.gmail.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.
Re: [GENERAL] How to individually list the DDL for all individual data base objects
On 11/24/2014 08:12 AM, Berend Tober wrote: François Beausoleil wrote: Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit : Is there a good way to individually list the DDL for all individual data base objects? snip Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file? Yes sir, thank you. That is what the script does. But that little script seems fragilly-dependent upon the format of the list file and runs (starts a new process for) pg_restore for every individual line in the list file. ... which seems like poor practise, generally. Additionally, I'd like stability in the output file names, rather than the (likely changing) internal, automatically numerically-named items. And, btw, I want not just a single table. The dependance issue is potentially a problem, but (as I believe) it looks like the output of pg_restore in a list file is in the correct order to process dependencies, so I could walk backwards of forwards through that if changes to one object were dependent on, or caused cascading effects in other objects. At this point I have to ask: What is the problem you are trying to solve? Your last paragraph seems to be reinventing the -Fc TOC, so I am not sure where you going with this? $ pg_restore —help … -f, --file=FILENAME output file name … -t, --table=NAME restore named table … Such that you could run: $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate some of these problems. YMMV. Hope that helps! François --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
Adrian Klaver wrote: On 11/24/2014 08:12 AM, Berend Tober wrote: François Beausoleil wrote: Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit : Is there a good way to individually list the DDL for all individual data base objects? snip Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file? snip snip The dependance issue is potentially a problem, but (as I believe) it looks like the output of pg_restore in a list file is in the correct order to process dependencies, so I could walk backwards of forwards through that if changes to one object were dependent on, or caused cascading effects in other objects. At this point I have to ask: What is the problem you are trying to solve? Your last paragraph seems to be reinventing the -Fc TOC, so I am not sure where you going with this? On the small scale, I wanted to have an individual file listing the DDL for each data base object in the case that I need to modify the object ... I could start with the script that created it as a basis for modifications, like for views and functions, etc. In the larger scale, I was thinking I would like to check in all of the individual modules to revision control, retaining a fine-grained control, rather than submitting the one big comprehensive file. --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to individually list the DDL for all individual data base objects
Melvin Davidson wrote: You should probably look at the pg_extractor utility. https://github.com/omniti-labs/pg_extractor With it, you can dump individual or selected objects to separate directories. That looks like what I'm looking for. (Note: I did Google searching, but apparently did not use the right set of search terms, because this tool really is described as being what it is that I am looking to do!) Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations
On Mon, 24 Nov 2014 14:51:42 +1100 Sanjaya Vithanagama svithanag...@gmail.com wrote: Hi All, We have a single table which does not have any foreign key references. id_A (bigint) id_B (bigint) val_1 (varchar) val_2 (varchar) The primary key of the table is a composite of id_A and id_B. Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code. The operations usually look like the following where it could match thousands of records to update or delete: DELETE FROM table_name t USING ( SELECT id_A, id_B FROM table_name WHERE id_A = ANY(array_of_id_A) ANDid_B = ANY(array_of_id_B) ORDER BY id_A, id_B FORUPDATE ) del WHERE t.id_A = del.id_A ANDt.id_B = del.id_B; UPDATE table_name t SETval_1 = 'some value' , val_2 = 'some value' FROM ( SELECT id_A, id_B FROM table_name WHERE id_A = ANY(array_of_id_A) ANDid_B = ANY(array_of_id_B) ORDER BY id_A, id_B FORUPDATE ) upd WHERE t.id_A = upd.id_A ANDt.id_B = upd.id_B; We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact) Is there another way that we could try to solve these deadlock situations? The reference manual says ? The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation? Lots of stuff to say about this ... First off, Igor has a valid point that the subselects are not helping any and may be making the situation slightly worse. I can't see any reason not to simiplify the queries as he suggested. Secondly, a lot of your comments are too vague for me to understand what's happening, so I'm going to ask a bunch of questions to clarify: * How many UPDATE/INSERT queries are you running per second? * How many in parallel on average? * What's the typical execution time for an UPDATE/INSERT that might cause this problem? * How frequently do deadlocks occur? * Are there other tables involved in the transactions ... i.e., have you confirmed that these are the _only_ tables causing the deadlock? Since you didn't include any log output, I'm fuzzy on some of those things above ... but I'm assuming that you're unable to post specific details of the precise problem. I have a lot of suggestions, but instead of bombing you with all of them, I think it would be better if you answered those questions, which will tell me which suggestions are most likely to help. -- Bill Moran I need your help to succeed: http://gamesbybill.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general