[Cced to pgpool-hackers] Brian,
Thanks for your contribution. I will look into the code. Also I would like to hear from pgpool hackers's opinion if any. If possible, it would be great if you could generate patches (diff) against CVS HEAD. Also you should change pool_config.l, rather than pool_config.c. Another request is, please provide documentation. You can modify doc/pgpool-en.html. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > Tatsuo > > > > I would like to contribibution of a patch of a code change to support a > config change for the killing of all child processes rather than waiting for > disconnect. The description of the rational and the code chnage are below > and attached. Please let me know if this can be contributed or even better > avoided all together some how. > > > > > > Cheers, > > > > Brian > > > > > > > > > > > > > > We changed the source files of Pgpool II 2.3.1 for the following two purposes: > 1. To avoid the wait for the disconnection of all sessions at 2nd stage > of pgpool recovery. > 2. To avoid the disconnection of active sessions at Primary node when the > secondary node stop communication with the primary pgpool. > > 1. To avoid the wait for the disconnection of all sessions at 2nd stage of > pgpool recovery > > With default implementation of Pgpool II 2.3.1, during recovery, at 2nd stage > pgpool waits for disconnection of all sessions. But in 24x7 production > database, without stopping the application, it is difficult to ensure that no > connection will exist in the database and if there is a session connected in > the database, the recovery process will wait for its disconnection and if it > does not happen within a specific period then the recovery will fail. > To avoid this recovery failure we added a parameter > 'kill_all_child_process_before_2nd_stage" > > if kill_all_child_process_before_2nd_stage = true/1 then > kill all child process before the start of 2nd stage so that no need to > wait for disconnections of active sessions > else > wait for the disconnections of all active sessions; that is same as the > Pgpool II 2.3.1 > end if > > Default value for kill_all_child_process_before_2nd_stage is set to false/0. > > The following 3 files have been modified: > > pool.h > pool_config.c > recovery.c > > > > File : pool.h //added the parameter > > > > typedef struct > { > .... > int kill_all_child_process_during_before_2nd_stage; > } POOL_CONFIG; > > > File: pool_config.c //read the pgpool.conf value set by the pgpool > administrator > > > > int pool_init_config() > { > ... > pool_config->kill_all_child_process_before_2nd_stage = 0; > > } //int pool_init_config() > > > int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context) > { > .... > //kill_all_child_process_before_2nd_stage > else if (!strcmp(key, "kill_all_child_process_before_2nd_stage") && > CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context)) > { > int v = eval_logical(yytext); > > if (v < 0) > { > pool_error("pool_config: invalid value %s for %s", yytext, > key); > return(-1); > } > pool_config->kill_all_child_process_before_2nd_stage = v; > } //kill_all_child_process_before_2nd_stage > .... > }//int pool_get_config > > > File: recovery.c //added the parameterized functionality in the block > "modified here" > > > > int start_recovery(int recovery_node) > { > ... > pool_log("1st stage is done"); > > --------------- modified here ------------------------- > if (pool_config->kill_all_child_process_before_2nd_stage == 1) > { /* kill all children processes */ > > pool_log("Disconnection of all sessions before 2nd > stage started"); > pool_debug("pool_config > =%d",pool_config->num_init_children); > int i; > for (i = 0; i < pool_config->num_init_children; i++) > { > pid_t pid = pids[i].pid; > if (pid) > { > kill(pid, SIGQUIT); > pool_debug("Disconnect session before > 2nd stage: kill %d", pid); > }//if (pids[i].pid) > }//for (i = 0; i < pool_config->num_init_children; > i++) > pool_log("Killed all active sessions"); > }// if (pool_config->kill_all_child_process_before_2nd_stage > == 1) > > ------------ end of modification --------------------------------- > pool_log("starting 2nd stage"); > > /* 2nd stage */ > *InRecovery = 1; > -------- modified here ---------------- > pool_log("---->Waiting for all connection closed <---"); > --------- end of modification ---------- > if (wait_connection_closed() != 0) > { > PQfinish(conn); > pool_error("start_recovery: timeover for waiting connection closed"); > return 1; > } > pool_log("all connections from clients have been closed"); > > ... > } //int start_recovery > 2. To avoid the disconnection of active sessions at Primary node when the > secondary node stop communication with the primary pgpool > > With default implementation of Pgpool II 2.3.1, if either node is down pgpool > kills all child processes to stop TCP/IP retrying. But this may cause problem > in our environment. > Let a situation where secondary node somehow disconnected and stopped > communication with the primary node. With default implementation, pgpool > kills all active calls and other activities of our application. > It is not desirable. Activities at primary node must not be hampered if > secondary node stops communication. > To avoid this, we added a parameter > "restart_all_process_for_same_master_node_in_failover". > > If restart_all_process_for_same_master_node_in_failover = true/1 then > restart all child processes that is same as the default with Pgpool II > 2.3.1 > else > don't restart the child processes for same master node > end if > > Default value for restart_all_process_for_same_master_node_in_failover set to > 1/true; > > Actually, here we have only enabled the old functionaluty of do not restart > child process for same master node with the parameter > restart_all_process_for_same_master_node_in_failover. > > > > The following 3 files have been changed: > pool.h > pool_config.c > main.c > > > > File: pool.h //added the parameter > > > > typedef struct > { > .... > int restart_all_process_for_same_master_node_in_failover; > } POOL_CONFIG; > > > File: pool_config.c //read the pgpool.conf value set by the pgpool > administrator > > > > int pool_init_config() > { > ... > pool_config->restart_all_process_for_same_master_node_in_failover = 1; > } //int pool_init_config() > > int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context) > { > .... > else if (!strcmp(key, "restart_all_process_for_same_master_node_in_failover") > && > CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context)) > { > int v = eval_logical(yytext); > > if (v < 0) > { > pool_error("pool_config: invalid value %s for %s", yytext, > key); > fclose(fd); > return(-1); > } > pool_debug("restart_all_process_for_same_master_node_in_failover: > %d", v); > pool_config->restart_all_process_for_same_master_node_in_failover > = v; > } > .... > > } //int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context) > > > > > File: main.c // added the parameterized functionality; the block "modified > here" indicates our modification > > > > ... > static void failover(void) > { > > ... > if (new_master == pool_config->backend_desc->num_backends) > { > pool_error("failover_handler: no valid DB node found"); > } > > /* > * Before we tried to minimize restarting pgpool to protect existing > * connections from clients to pgpool children. What we did here was, > * if children other than master went down, we did not fail over. > * This is wrong. Think about following scenario. If someone > * accidentally plugs out the network cable, the TCP/IP stack keeps > * retrying for long time (typically 2 hours). The only way to stop > * the retry is restarting the process. Bottom line is, we need to > * restart all children in any case. See pgpool-general list posting > * "TCP connections are *not* closed when a backend timeout" on Jul 13 > * 2008 for more details. > * It has been parameterized as people may want behavior depending on their > system. > */ > > ----------- modified here ------------------------ > //#ifdef NOT_USED // > else > { > if ( > pool_config->restart_all_process_for_same_master_node_in_failover == 0 && > Req_info->master_node_id == new_master && *InRecovery == 0) > { > pool_log("failover_handler: do not restart pgpool. same master > node %d was selected", new_master); > if (Req_info->kind == NODE_UP_REQUEST) > { > pool_log("failback done. reconnect host %s(%d)", > BACKEND_INFO(node_id).backend_hostname, > BACKEND_INFO(node_id).backend_port); > } > else > { > pool_log("failover done. shutdown host %s(%d)", > BACKEND_INFO(node_id).backend_hostname, > BACKEND_INFO(node_id).backend_port); > } > > /* exec failover_command */ > for (i = 0; i < pool_config->backend_desc->num_backends; i++) > { > if (nodes[i]) > trigger_failover_command(i, > pool_config->failover_command); > } > > pool_semaphore_unlock(REQUEST_INFO_SEM); > switching = 0; > kill(pcp_pid, SIGUSR2); > switching = 0; > return; > } > } > //#endif // > ----------- end of modification ------------------------ > > /* kill all children */ > for (i = 0; i < pool_config->num_init_children; i++) > { > pid_t pid = pids[i].pid; > if (pid) > { > kill(pid, SIGQUIT); > pool_debug("failover_handler: kill %d", pid); > } > } > > ... > } // end of failover() > ... _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
