> On 10/05/10 09:33, Gary Fu wrote: >> On 10/04/10 18:56, Tatsuo Ishii wrote: >>>> I'm running pgpool2 3.0 with replication mode. I just noticed that >>>> when >>>> the pgpool failover (due to mismatch error) is done by shutting down >>>> the >>>> secondary db, my application failed due to the lost of connection. >>>> The documentation mention that when the failover is performed, >>>> pgpool kills all its child processes and starts new child processes >>>> for >>>> new connections from the clients. Does this mean that my application >>>> has to make the new connection when the failover happens ? >>> Yes. >>> >>>> If so, >>>> the question is how does my application know there is a failover ? >>> In this case libpq/your_favorite_driver returns error "server closed >>> the connection unexpectedly This probably means the server >>> terminated abnormally before or while processing the request." >>> -- >> >> Is there a way to just disable the secondary db when the mismatch >> error >> happens without failover, so that my application can keep working with >> the primary db without making a new connection ? >> >> I did tested (as far as I can remember) before with old pgpool2 >> version, >> that when I shut down one of the db, my application kept working >> without >> lost connection error. What's the difference between this case and >> the >> faileover case ? >> >> Thanks, >> Gary > > Hi Tatsuo, > > Could you provide any answer or suggestion on above questions I have ? > > Thanks, > Gary
Sorry for delay. This is a repeatedly asked question. The answer is in a comment in main.c: /* * Before we tried to minimize restarting pgpool to protect existing * connections from clients to pgpool children. What we did here was, * if children other than master went down, we did not fail over. * This is wrong. Think about following scenario. If someone * accidentally plugs out the network cable, the TCP/IP stack keeps * retrying for long time (typically 2 hours). The only way to stop * the retry is restarting the process. Bottom line is, we need to * restart all children in any case. See pgpool-general list posting * "TCP connections are *not* closed when a backend timeout" on Jul 13 * 2008 for more details. */ Here is the original complain. > Subject: [Pgpool-general] TCP connections are *not* closed when a backend > timeout > From: Maxence DUNNEWIND <[email protected]> > To: [email protected] > Date: Fri, 11 Jul 2008 11:34:37 +0200 > Sender: [email protected] > User-Agent: Mutt/1.5.13 (2006-08-11) > X-Mew: <1> No his/her public key: ID = 0x9334C111 > X-Mew: tab/spc characters on Subject: are simplified. > > Hi, > > I'm working on recovery with pgpool. > When a backend failed (ie, for exemple, when the postgresql server shuts > down), all seems OK, connections are closed and backend is set as down > (status 3). > > My problem is in case of network problem. If I remove the network link > with the backend, pgpool correctly detects it as down when healthcheck > timeout but it does *not* close tcp connections to remote backend. > > The problem is that when the link comes back and when I start > pcp_recovery_node, the second stage can't process because there are > existing connections to backend ... > > Is this a normal thing? Or is this a bug ? > > I'm trying to find how I can close the connections when healthcheck > timeout... Here is my reply: > Thanks for the report. > > If in case of network problem, the underlying TCP/IP stack is keeping > retrying, and the only safe way to shutdown the connection is > restarting the process. Even if we safely close the limbo connection, > we need to pass the limbo connection info from parent to child process > (remember that health checking is done in parent, and child is keeping > the connection). > > This is not problem if master goes does down, since all children will > restart anyway. I guess you remove the netork cable for nodes other > than master. The difference here is pgpool tries to minimize > restarting children. If master does not fail, pgpool will not do the > restarting. > > From your report I think this logic si wrong and we need to restart > children in *any* case. Included is the patch for this. Could you > please try it out? So we decided we always restart all pgpool child process. But if you think you never unplug network cable, myabe you could bring back the ifdef-outed code to treat master node specially. Right after the comment in main.c you see #ifdef NOT_USED. Just remove the ifdef and try it out... -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
