Re: [HACKERS] Early locking option to parallel backup

2017-11-08 Thread Lucas B

On 11/06/2017 12:30 PM, Stephen Frost wrote:

* Lucas (luca...@gmail.com) wrote:

pg_dump was taking more than 24 hours to complete in one of my databases. I
begin to research alternatives. Parallel backup reduced the backup time to
little less than a hour, but it failed almost every time because of
concurrent queries that generated exclusive locks. It is difficult to
guarantee that my applications will not issue queries such as drop table,
alter table, truncate table, create index or drop index for a hour. And I
prefer not to create controls mechanisms to that end if I can work around
it.

I certainly understand the value of pg_dump-based backups, but have you
considered doing file-based backups?  That would avoid the need to do
any in-database locking at all, and would give you the ability to do
PITR too.  Further, you could actually restore that backup to another
system and then do a pg_dump there to get a logical representation (and
this would test your physical database backup/restore process too...).
Yes, a point in time recovery has the advantage of keeping the backup 
more up-to-date, but has the disadvantage of being more expensive and 
complex. In my case, point in time recovery would require an upgrade of 
10 TB of storage space and my stakeholders did not approved this 
investment yet.


I suspect that there is lots of users that uses pg_dump as primary 
backup tool and that they would benefit of a more reliable parallel backup.









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Lucas B

Em 05/11/2017 21:09, Andres Freund escreveu:

On 2017-11-05 17:38:39 -0500, Robert Haas wrote:

On Sun, Nov 5, 2017 at 5:17 AM, Lucas  wrote:

The patch creates a "--lock-early" option which will make pg_dump to issue
shared locks on all tables on the backup TOC on each parallel worker start.
That way, the backup has a very small chance of failing. When it does,
happen in the first few seconds of the backup job. My backup scripts (not
included here) are aware of that and retries the backup in case of failure.


I wonder why we don't do this already ... and by default.


Well, the current approach afaics requires #relations * 2 locks, whereas
acquiring them in every worker would scale that with the number of
workers.  


Yes, that is why I proposed as an option. As an option will not affect 
anyone that does not want to use it.



IIUC the problem here is that even though a lock is already
held by the main backend an independent locker's request will prevent
the on-demand lock by the dump worker from being granted.  It seems to
me the correct fix here would be to somehow avoid the fairness logic in
the parallel dump case - although I don't quite know how to best do so.


It seems natural to think several connections in a synchronized snapshot 
as the same connection. Then it may be reasonable to grant a shared lock 
out of turn if any connection of the same shared snapshot already have a 
granted lock for the same relation. Last year Tom mentioned that there 
is already queue-jumping logic of that sort in the lock manager for 
other purposes. Although seems conceptually simple, I suspect the 
implementation is not.


On the other hand, the lock-early option is very simple and has no 
impact on anyone that does not want to use it.


---
Lucas





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers