Re: [Slony1-general] bug in deadlock handling?

Stéphane Schildknecht Thu, 20 Sep 2007 09:53:04 -0700

Jeff Frost a écrit :
> On Thu, 20 Sep 2007, Stéphane Schildknecht wrote:
>
>>> --------
>>> CREATE TABLE orders.amazon_items
>>> (
>>>   id serial NOT NULL,
>>>   order_id integer NOT NULL,
>>>   item_id integer NOT NULL,
>>>   amazon_item_id character varying(14) NOT NULL,
>>>   CONSTRAINT amazon_items_pkey PRIMARY KEY (id),
>>>   CONSTRAINT amazon_items_order_id_fkey FOREIGN KEY (order_id)
>>>       REFERENCES orders.orders (id) MATCH SIMPLE
>>>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>>>   CONSTRAINT amazon_items_item_id_fkey FOREIGN KEY (item_id)
>>>       REFERENCES orders.items (id) MATCH SIMPLE
>>>       ON UPDATE NO ACTION ON DELETE NO ACTION
>>> )
>>> WITH OIDS;
>>> ALTER TABLE orders.amazon_items OWNER TO thenerds;
>>> --------
>>>
>>> It was called by the following slonik script:
>>>
>>> --------
>>> #!/usr/bin/slonik
>>> include </nerds/preamble.slonik>;
>>>
>>>         EXECUTE SCRIPT (
>>>                 SET ID = 1,
>>>                 FILENAME = '/nerds/thenerds.sql',
>>>                 EVENT NODE = 1
>>>         );
>>> --------
>>>
>>> and caused the following deadlock to occur:
>>>
>>> 15:27:54 sql1 slon[12252]: [39-1] 2007-09-18 15:27:54 EDT ERROR
>>> remoteWorkerThread_1: "select "_nerdcluster".ddlScript
>>> _complete_int(1, -1); " PGRES_FATAL_ERROR
>>> Sep 18 15:27:54 sql1 slon[12252]: [39-2]  ERROR:  deadlock detected
>>> Sep 18 15:27:54 sql1 slon[12252]: [39-3] DETAIL:  Process 12263
>>> waits for
>>> AccessExclusiveLock on relation 121589880 of databas
>>> e 121589046; blocked by process 12096.
>>> Sep 18 15:27:54 sql1 slon[12252]: [39-4] Process 12096 waits for
>>> AccessShareLock on relation 121589817 of database 121589046;
>>> blocked by process 12263.
>>>
>>> Which then left the some of the tables on that slave in a bad state
>>> breaking replication:
>>>
>>> 2007-09-18 15:56:06 EDT ERROR  remoteWorkerThread_1: "select
>>> "_nerdcluster".ddlScript_prepare_int(1, -1); " PGRES_FATAL_ERROR ERROR:
>>> Slony-I: alterTableRestore(): Table "public"."carts" is not in altered
>>> state
>>> CONTEXT:  SQL statement "SELECT  "_nerdcluster".alterTableRestore(
>>> $1 )"
>>> PL/pgSQL function "ddlscript_prepare_int" line 46 at perform
>>>
>>> Note that it's just an AccessShareLock that's killing us.  Looks like
>>> that's caused by a select query which does searches.  Our application
>>> does not produce any extraneous locking, it simply does SELECTS on
>>> that server.
>>>
>>> Interestingly, before we started using the slave for queries, we would
>>> have the deadlocks happen on the master when doing DDL changes, but
>>> this never caused the tables on the master to get into a bad state.
>>> You could just re-run your EXECUTE SCRIPT and it would usually work
>>> fine the second time.
>>>
>>> What other info can I provide?
>>>
>> First, I don't think you have to use the execute script command to
>> create a new table.
>>
>> I'd also like to know if you use pgpool, on slave, master, both ?
>>
>> Does the re-executing of the script offer any solution ?
>
> You're quite right and I think we'll start just executing the create
> tables on all nodes independently without execute script, but the same
> problem happens with ALTER TABLE as well, this just happened to be the
> latest example that caused a problem.


I guess no matter what the script contains, using pgpool will break the
replication.
>
> We are using pgpool in master/slave mode.  So, pgpool is load
> balancing between node 1 and node 2.  Node 3 is not in the pgpool
> cluster.
>
> Unfortunately, since the first run of the script leaves node 2 in a
> bad state and breaks replication, running the script again doesn't
> help.  So far I've had to node 2 from replication and re-subscribe to
> fix the problem.
>

So did I.

Cheers,

SAS
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] bug in deadlock handling?

Reply via email to