Jeff Frost a écrit : > On Thu, 20 Sep 2007, Stéphane Schildknecht wrote: > >>> -------- >>> CREATE TABLE orders.amazon_items >>> ( >>> id serial NOT NULL, >>> order_id integer NOT NULL, >>> item_id integer NOT NULL, >>> amazon_item_id character varying(14) NOT NULL, >>> CONSTRAINT amazon_items_pkey PRIMARY KEY (id), >>> CONSTRAINT amazon_items_order_id_fkey FOREIGN KEY (order_id) >>> REFERENCES orders.orders (id) MATCH SIMPLE >>> ON UPDATE NO ACTION ON DELETE NO ACTION, >>> CONSTRAINT amazon_items_item_id_fkey FOREIGN KEY (item_id) >>> REFERENCES orders.items (id) MATCH SIMPLE >>> ON UPDATE NO ACTION ON DELETE NO ACTION >>> ) >>> WITH OIDS; >>> ALTER TABLE orders.amazon_items OWNER TO thenerds; >>> -------- >>> >>> It was called by the following slonik script: >>> >>> -------- >>> #!/usr/bin/slonik >>> include </nerds/preamble.slonik>; >>> >>> EXECUTE SCRIPT ( >>> SET ID = 1, >>> FILENAME = '/nerds/thenerds.sql', >>> EVENT NODE = 1 >>> ); >>> -------- >>> >>> and caused the following deadlock to occur: >>> >>> 15:27:54 sql1 slon[12252]: [39-1] 2007-09-18 15:27:54 EDT ERROR >>> remoteWorkerThread_1: "select "_nerdcluster".ddlScript >>> _complete_int(1, -1); " PGRES_FATAL_ERROR >>> Sep 18 15:27:54 sql1 slon[12252]: [39-2] ERROR: deadlock detected >>> Sep 18 15:27:54 sql1 slon[12252]: [39-3] DETAIL: Process 12263 >>> waits for >>> AccessExclusiveLock on relation 121589880 of databas >>> e 121589046; blocked by process 12096. >>> Sep 18 15:27:54 sql1 slon[12252]: [39-4] Process 12096 waits for >>> AccessShareLock on relation 121589817 of database 121589046; >>> blocked by process 12263. >>> >>> Which then left the some of the tables on that slave in a bad state >>> breaking replication: >>> >>> 2007-09-18 15:56:06 EDT ERROR remoteWorkerThread_1: "select >>> "_nerdcluster".ddlScript_prepare_int(1, -1); " PGRES_FATAL_ERROR ERROR: >>> Slony-I: alterTableRestore(): Table "public"."carts" is not in altered >>> state >>> CONTEXT: SQL statement "SELECT "_nerdcluster".alterTableRestore( >>> $1 )" >>> PL/pgSQL function "ddlscript_prepare_int" line 46 at perform >>> >>> Note that it's just an AccessShareLock that's killing us. Looks like >>> that's caused by a select query which does searches. Our application >>> does not produce any extraneous locking, it simply does SELECTS on >>> that server. >>> >>> Interestingly, before we started using the slave for queries, we would >>> have the deadlocks happen on the master when doing DDL changes, but >>> this never caused the tables on the master to get into a bad state. >>> You could just re-run your EXECUTE SCRIPT and it would usually work >>> fine the second time. >>> >>> What other info can I provide? >>> >> First, I don't think you have to use the execute script command to >> create a new table. >> >> I'd also like to know if you use pgpool, on slave, master, both ? >> >> Does the re-executing of the script offer any solution ? > > You're quite right and I think we'll start just executing the create > tables on all nodes independently without execute script, but the same > problem happens with ALTER TABLE as well, this just happened to be the > latest example that caused a problem.
I guess no matter what the script contains, using pgpool will break the replication. > > We are using pgpool in master/slave mode. So, pgpool is load > balancing between node 1 and node 2. Node 3 is not in the pgpool > cluster. > > Unfortunately, since the first run of the script leaves node 2 in a > bad state and breaks replication, running the script again doesn't > help. So far I've had to node 2 from replication and re-subscribe to > fix the problem. > So did I. Cheers, SAS _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
