Le mardi 05 août 2008, Tom Lane a écrit : > Huh? The problem case is that the primary server goes down, which would > certainly mean that a pgbouncer instance on the same machine goes with > it. So it seems to me that integrating pgbouncer is 100% backwards.
With all due respect, it seems to me you're missing an important piece of the scheme here: I certainly failed to explain correctly. Of course, I'm not sure (by and large) that detailing what I have in mind will answer your concerns, but still... What I have in mind is having the pgbouncer listening process both at master and slave sites. So your clients can already connect to slave for normal operations, and the listener process simply connects them to the master, transparently. When we later provider RO slave, some queries could be processed locally instead of getting sent to the master. The point being that the client does not have to care itself whether it's connecting to a master or a slave, -core knows what it can handle for the client and handles it (proxying the connection). Now, that does not solve the client side automatic failover per-se, it's another way to think about it: - both master & slave accept connection in any mode - master & slave are able to "speak" to each other (life link) - when master knows it's crashing (elog(FATAL)), it can say so to the slave - when said so, slave can switch to master It obviously only catches some errors on master, the ones we're able to log about. So it does nothing on its own for allowing HA in case of master crash. But... > Failover that actually works is not something we can provide with > trivial changes to Postgres. It's really a major project in its > own right: you need heartbeat detection, STONITH capability, > IP address redirection, etc. I think we should be recommending > external failover-management project(s) instead of offering a > half-baked home-grown solution. Searching freshmeat for "failover" > finds plenty of potential candidates, but not having used any of > them I'm not sure which are worth closer investigation. We have worked here with heartbeat, and automating failover is hard. Not for technical reasons only, also because: - current PostgreSQL offers no sync replication, switching means trading or losing the D in ACID, - you do not want to lose any commited data. If 8.4 resolve this, failover implementation will be a lot easier. What I see my proposal fit is the ability to handle a part of the smartness in -core directly, so the hard part of the STONITH/failover/switchback could be implemented in cooperation with -core, not playing tricks against it. For example, switching back when master gets back online would only means for the master to tell the slave to now redirect the queries to him as soon as it's ready --- which still is the hard part, sync back data. Having clients able to blindly connect to master or any slave and having the current cluster topology smartness into -core would certainly help here, even if not fullfilling all HA goals. Of course, in the case of master hard crash, we still have to get sure it won't restart on its own, and we have to have an external way to get a chosen slave become the master. I'm even envisioning than -core could help STONITH projects with having sth like the recovery.conf file for the master to restart in not-up-to-date slave mode. Whether we implement resyncing to the new master in -core or from external scripts is another concern, but certainly -core could help here (even if not in 8.4, of course). I'm still thinking that this proposal has a place in the scheme of an integrated HA solution and offers interresting bits. Regards, -- dim
signature.asc
Description: This is a digitally signed message part.