Re: [GENERAL] [BDR] Best practice to automatically abort a DDL operation when one node is down

Craig Ringer Sun, 17 Jan 2016 16:53:36 -0800

On 13 January 2016 at 21:45, Sylvain MARECHAL <[email protected]>
wrote:



> The problem is that the (1) DDL request will wait indefinitely, meaning
> all transactions will continue to fail until the DDL operation is manually
> aborted (for example, doing CTRL C in psql to abort the "CREATE TABLE").
>

Correct, and by design.

I'd like to do a pre-check where we sync up with the peer nodes and see if
they're all alive before we take the DDL lock. This would reduce the impact
a bit and allow an early ERROR like "ERROR: cannot perform DDL when one or
more nodes is unreachable".

However... we have something pretty close already. You can just set a
statement_timeout in the session doing the DDL. It'll cancel the operation
if it takes too long.

Note that a lock_timeout will NOT work because the BDR global DDL lock is
not recognised as a true lock by PostgreSQL.



> What is the best practice to make sure the DDL operation will fail,
> possibly after a timeout, if one of the node is down?


statement_timeout


> I could check the state of the node before issuing the DDL operation, but
> this solution is far from being perfect as the node may fail right after
> this.
>

Correct, but it's still useful to do.

I'd check to see all nodes are connected in pg_stat_replication then I'd
issue the DDL with a statement_timeout set.


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] [BDR] Best practice to automatically abort a DDL operation when one node is down

Reply via email to