Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> You're not considering the possibility of a transient communication > >> failure. > > > Can't the master re-send the request after a timeout? > > Not "it can", but "it has to". The master *must* keep hold of that > request forever (or until the slave responds, or until we reconfigure > the system not to consider that slave valid anymore). Similarly, the > slave cannot forget the maybe-committed transaction on pain of not being > a valid slave anymore. You can make this work, but the resource costs > are steep. For instance, in Postgres, you don't get to truncate the WAL > log, for what could be a really really long time --- more disk space > than you wanted to spend on WAL anyway. The locks held by the > maybe-committed transaction are another potentially unpleasant problem; > you can't release them, no matter what else they are blocking.
I think we would need a configurable timeout to say a slave is no longer valid, like 60 seconds, and then let everyone release. We can let the administrator decide how long he wants to try to keep two hosts communicating. I don't see this as much different from multi-master replication problems. -- Bruce Momjian | http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match