Re: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations

2014-11-24 Thread Igor Neyman


From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Sanjaya Vithanagama
Sent: Sunday, November 23, 2014 10:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Avoiding deadlocks when performing bulk update and delete 
operations

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has millions 
of rows. We have several stored procedures which do mass updates and deletes. 
Those stored procedures are being called concurrently mainly by triggers and 
application code.

The operations usually look like the following where it could match thousands 
of records to update or delete:

DELETE FROM table_name t
USING (
   SELECT id_A, id_B
   FROM   table_name
   WHERE  id_A = ANY(array_of_id_A)
   ANDid_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FORUPDATE
   ) del
WHERE  t.id_A = del.id_A
ANDt.id_B = del.id_B;


UPDATE table_name t
SETval_1 = 'some value'
 , val_2 = 'some value'
FROM (
   SELECT id_A, id_B
   FROM   table_name
   WHERE  id_A = ANY(array_of_id_A)
   ANDid_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FORUPDATE
   ) upd
WHERE  t.id_A = upd.id_A
ANDt.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform operations with 
locks (row level using SELECT FOR UPDATE as used in the above queries and table 
level locks) do not seem to solve these deadlock issues. (Note that we cannot 
in any way use access exclusive locking on this table because of the 
performance impact)

Is there another way that we could try to solve these deadlock situations? The 
reference manual says — The best defense against deadlocks is generally to 
avoid them by being certain that all applications using a database acquire 
locks on multiple objects in a consistent order.

Is there a guaranteed way to do bulk update/delete operations in a particular 
order so that we can ensure deadlocks won't occur? Or are there any other 
tricks to avoid deadlocks in this situation?

Thank you in advance,
Sanjaya


May be I’m missing something here, but it seems that you make the problem worse 
by using :

DELETE … USING (SELECT … FOR UPDATE)…

Can’t you just do:

DELETE FROM table_name
  WHERE id_A = ANY(array_of_id_A)
   AND id_B = ANY(array_of_id_B);

?

Regards,
Igor Neyman



[GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Berend Tober
Is there a good way to individually list the DDL for all individual data 
base objects?


Running a data base dump like:

 pg_dump mydatabase   mydatabase-database.sql

produces one big file with all the DDL and DML to restore the data base, 
which is very convenient for most cases.


Using that I have sometimes cut-and-pasted out of it the DDL for 
individual items that I want to work on, and then fed that back in with


psql mydatabase  newstuff.sql


What I would like, though, is a set of output files, one for each 
DDL/DML item currently represented in the one big file.


I could maybe attempt some convoluted automated parsing of the big file 
with, for example, Perl, but that seems, well, convoluted, error-prone, 
and probably fragile.


The directory dump output option for pg_dump is similar to, but not 
really, what I want (as far as I see the individual files that produces 
are only the ones for reloading data, so correct me if I am wrong ... 
and please show me how to do it right!)


I have played around with the custom format dump followed by pg_restore 
and various options, but did not get what I wanted, at least not as 
elegantly as I wanted.


What I have come up with is fairly simple in appearance, but the way it 
works, by reading one line-at-a-time from the list file associated with 
the dump file, and then running pg_restore with just that one line, 
rinse and repeat for each piece of DDL/DML, also seems convoluted and 
potentially fragile.



Something along the lines of (... if anyone thinks this a good idea, or 
good starting point ...):



grep -v '^;' listfile | while read a b c n
do
  a=${a/;}
  echo $a  f
  pg_restore -L f -f outputdir/$a dumpfile
done

This, as it is, creates a set of files named according to the id number 
that pg_dump uses to identify each element. Ideally, I would like the 
files named after the schema+object it represents.


Thanks for your help!


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to avoid a GIN recheck condition

2014-11-24 Thread Tom Lane
Alexis ale...@bernard.io writes:
 I have a GIN index but the planner rechecks the condition. When many rows
 are foud in the index that is very slow:

You're assuming a fact not in evidence, namely that the recheck condition
is being used at runtime.  The planner will *always* attach a recheck
condition to a bitmap heap scan, but it only gets used if
(1) the index opclass reports that its result is lossy, or
(2) the bitmap bloats to the point where it has to be stored lossily.

I see no weights in your query so (1) isn't happening, and it seems
unlikely that (2) is happening if the rowcount estimate is on target.
Lacking EXPLAIN ANALYZE results, it's hard to speculate further.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread François Beausoleil

Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit :

 Is there a good way to individually list the DDL for all individual data base 
 objects?
 

snip

 grep -v '^;' listfile | while read a b c n
do
  a=${a/;}
  echo $a  f
  pg_restore -L f -f outputdir/$a dumpfile
done
 
 This, as it is, creates a set of files named according to the id number that 
 pg_dump uses to identify each element. Ideally, I would like the files named 
 after the schema+object it represents.

Were you aware that pg_restore can restore to STDOUT, and output DDL for only a 
single named object from a custom dump file?

$ pg_restore —help
…

  -f, --file=FILENAME  output file name
…
  -t, --table=NAME restore named table
…

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have issues. The 
--disable-triggers option can alleviate some of these problems. YMMV.

Hope that helps!
François



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Alvaro Herrera
Berend Tober wrote:
 Is there a good way to individually list the DDL for all individual data
 base objects?

These threads might interest you:

http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0ys78txhfmdtyxjfsrsrc...@mail.gmail.com
http://www.postgresql.org/message-id/caaswcxdkroy2uhlof4zvxx00gosccqxkg_g4yf9vouk8axw...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Berend Tober

François Beausoleil wrote:


Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit :


Is there a good way to individually list the DDL for all individual data base 
objects?



snip


Were you aware that pg_restore can restore to STDOUT, and output DDL for only a 
single named object from a custom dump file?


Yes sir, thank you. That is what the script does.

But that little script seems fragilly-dependent upon the format of the 
list file and runs (starts a new process for) pg_restore for every 
individual line in the list file. ... which seems like poor practise, 
generally. Additionally, I'd like stability in the output file names, 
rather than the (likely changing) internal, automatically 
numerically-named items.


And, btw, I want not just a single table.

The dependance issue is potentially a problem, but (as I believe) it 
looks like the output of pg_restore in a list file is in the correct 
order to process dependencies, so I could walk backwards of forwards 
through that if changes to one object were dependent on, or caused 
cascading effects in other objects.






$ pg_restore —help
…

   -f, --file=FILENAME  output file name
…
   -t, --table=NAME restore named table
…

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have issues. The 
--disable-triggers option can alleviate some of these problems. YMMV.

Hope that helps!
François




---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Melvin Davidson
You should probably look at the pg_extractor utility.

https://github.com/omniti-labs/pg_extractor

With it, you can dump individual or selected objects to separate
directories.

On Mon, Nov 24, 2014 at 11:00 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Berend Tober wrote:
  Is there a good way to individually list the DDL for all individual data
  base objects?

 These threads might interest you:


 http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0ys78txhfmdtyxjfsrsrc...@mail.gmail.com

 http://www.postgresql.org/message-id/caaswcxdkroy2uhlof4zvxx00gosccqxkg_g4yf9vouk8axw...@mail.gmail.com

 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Adrian Klaver

On 11/24/2014 08:12 AM, Berend Tober wrote:

François Beausoleil wrote:


Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit :


Is there a good way to individually list the DDL for all individual
data base objects?



snip


Were you aware that pg_restore can restore to STDOUT, and output DDL
for only a single named object from a custom dump file?


Yes sir, thank you. That is what the script does.

But that little script seems fragilly-dependent upon the format of the
list file and runs (starts a new process for) pg_restore for every
individual line in the list file. ... which seems like poor practise,
generally. Additionally, I'd like stability in the output file names,
rather than the (likely changing) internal, automatically
numerically-named items.

And, btw, I want not just a single table.

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.


At this point I have to ask:

What is the problem you are trying to solve?

Your last paragraph seems to be reinventing the -Fc TOC, so I am not 
sure where you going with this?








$ pg_restore —help
…

   -f, --file=FILENAME  output file name
…
   -t, --table=NAME restore named table
…

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have
issues. The --disable-triggers option can alleviate some of these
problems. YMMV.

Hope that helps!
François




---
This email is free from viruses and malware because avast! Antivirus
protection is active.
http://www.avast.com






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Berend Tober

Adrian Klaver wrote:

On 11/24/2014 08:12 AM, Berend Tober wrote:

François Beausoleil wrote:


Le 2014-11-24 à 10:14, Berend Tober bto...@broadstripe.net a écrit :


Is there a good way to individually list the DDL for all individual
data base objects?



snip


Were you aware that pg_restore can restore to STDOUT, and output DDL
for only a single named object from a custom dump file?


snip snip

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.


At this point I have to ask:

What is the problem you are trying to solve?

Your last paragraph seems to be reinventing the -Fc TOC, so I am not
sure where you going with this?



On the small scale, I wanted to have an individual file listing the DDL 
for each data base object in the case that I need to modify the object 
... I could start with the script that created it as a basis for 
modifications, like for views and functions, etc. In the larger scale, I 
was thinking I would like to check in all of the individual modules to 
revision control, retaining a fine-grained control, rather than 
submitting the one big comprehensive file.





---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to individually list the DDL for all individual data base objects

2014-11-24 Thread Berend Tober

Melvin Davidson wrote:

You should probably look at the pg_extractor utility.

https://github.com/omniti-labs/pg_extractor

With it, you can dump individual or selected objects to separate
directories.



That looks like what I'm looking for. (Note: I did Google searching, but 
apparently did not use the right set of search terms, because this tool 
really is described as being what it is that I am looking to do!)


Thanks!



---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations

2014-11-24 Thread Bill Moran
On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama svithanag...@gmail.com wrote:

 Hi All,
 
 We have a single table which does not have any foreign key references.
 
 id_A (bigint)
 id_B (bigint)
 val_1 (varchar)
 val_2 (varchar)
 
 The primary key of the table is a composite of id_A and id_B.
 
 Reads and writes of this table are highly concurrent and the table has
 millions of rows. We have several stored procedures which do mass updates
 and deletes. Those stored procedures are being called concurrently mainly
 by triggers and application code.
 
 The operations usually look like the following where it could match
 thousands of records to update or delete:
 
 DELETE FROM table_name t
 USING (
SELECT id_A, id_B
FROM   table_name
WHERE  id_A = ANY(array_of_id_A)
ANDid_B = ANY(array_of_id_B)
ORDER  BY id_A, id_B
FORUPDATE
) del
 WHERE  t.id_A = del.id_A
 ANDt.id_B = del.id_B;
 
 
 UPDATE table_name t
 SETval_1 = 'some value'
  , val_2 = 'some value'
 FROM (
SELECT id_A, id_B
FROM   table_name
WHERE  id_A = ANY(array_of_id_A)
ANDid_B = ANY(array_of_id_B)
ORDER  BY id_A, id_B
FORUPDATE
) upd
 WHERE  t.id_A = upd.id_A
 ANDt.id_B = upd.id_B;
 
 We are experiencing deadlocks and all our attempts to perform operations
 with locks (row level using SELECT FOR UPDATE as used in the above queries
 and table level locks) do not seem to solve these deadlock issues. (Note
 that we cannot in any way use access exclusive locking on this table
 because of the performance impact)
 
 Is there another way that we could try to solve these deadlock situations?
 The reference manual says ? The best defense against deadlocks is
 generally to avoid them by being certain that all applications using a
 database acquire locks on multiple objects in a consistent order.
 
 Is there a guaranteed way to do bulk update/delete operations in a
 particular order so that we can ensure deadlocks won't occur? Or are there
 any other tricks to avoid deadlocks in this situation?

Lots of stuff to say about this ...

First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.

Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:

* How many UPDATE/INSERT queries are you running per second?
* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
  this problem?
* How frequently do deadlocks occur?
* Are there other tables involved in the transactions ... i.e., have you
  confirmed that these are the _only_ tables causing the deadlock?

Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.

I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.

-- 
Bill Moran
I need your help to succeed:
http://gamesbybill.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general