I just realized this is essentially an instance of the Two General's Problem; which is something I feel should have been more obvious to me.
On Tue, Jun 19, 2012 at 5:50 PM, Leon Smith <leon.p.sm...@gmail.com> wrote: > On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas <robertmh...@gmail.com>wrote: > >> On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> > The transaction would be committed before a command success report is >> > delivered to the client, so I don't think delivered-and-not-marked is >> > possible. >> >> ...unless you have configured synchronous_commit=off, or fsync=off. >> >> Or unless your disk melts into a heap of slag and you have to restore >> from backup. You can protect against that last case using synchronous >> replication. >> > > > But hard disk failure isn't in the failure model I was concerned about. > =) To be perfectly honest, I'm not too concerned with either hard drive > failure or network failure, as we are deploying on Raid 1+0 database > server talking to the client over a redundant LAN, and using asynchronous > (Slony) replication to an identical database server just in case. No > single point of failure is a key philosophy of this app from top to bottom. > Like I said, this is mostly idle curiosity. > > But I'm also accustomed to trying to get work done on shockingly > unreliable internet connections. As a result, network failure is > something I think about quite a lot when writing networked applications. > So this is not entirely idle curiosity either. > > And thinking about this a bit more, it's clear that the database has to > commit before the result is sent, on the off chance that the transaction > fails and needs to be retried. And that an explicit transaction block > isn't really a solution either, because a "BEGIN; SELECT dequeue_row()" > would get the row to the client without marking it as taken, but the > pathological TCP disconnect could then attack the following "COMMIT;", > leaving the client to think that the row has not been actually taken when > it in fact has. > > It's not clear to me that this is even a solvable problem without > modifying the schema to include both a "taken" and a "finished processing" > state, and then letting elements be re-delievered after a period of time. > But this would then allow a pathological demon with the power to cause > TCP connects have a single element delivered and processed multiple times. > > In any case, thanks for the responses... > > Best, > Leon >