Hello, I am having a problem with selection of master node when I manually attach slave node (with id #1) which came up after postgresql server reboot. When I attach the slave, pgpool tries to select master node, and for some reason reports that previously working master node (with id #0) is no longer available, marks the master node as unavailable, and selects the newly attached node (id #1) as master. If after that I manually attach the node with id #0 back to the pool it becomes a master.
More detailed explanation. == Setup == Two postgresql nodes (running postgresql 9.0.3 on FreeBSD 8.0): Master node (storage1) and a slave one (storage2). The system uses streaming replication with the slave node running in hot standby mode. pgpool-II-3.0.3-beta1 is installed on storage1 node with the following configuration file: ====================================================================== zaa:...local/etc# cat pgpool.conf | sort | grep -v '^#' | grep -v '^$' authentication_timeout = 30 backend_hostname0 = 'storage1' backend_hostname1 = 'storage2' backend_port0 = 5432 backend_port1 = 5432 backend_weight0 = 0.01 backend_weight1 = 0.99 black_function_list = 'nextval,setval' child_idle_limit = 0 child_life_time = 300 child_max_connections = 0 connection_cache = true connection_life_time = 0 debug_level = 1 enable_pool_hba = false fail_over_on_backend_error = false failover_command = '' health_check_period = 5 health_check_user = 'pgpool' listen_addresses = 'localhost' load_balance_mode = true log_connections = true log_hostname = false log_per_node_statement = true log_standby_delay = 'always' log_statement = true master_slave_mode = true master_slave_sub_mode = 'stream' max_pool = 5 num_init_children = 2 parallel_mode = false pid_file_name = '/var/run/pgpool/pgpool.pid' port = 9999 print_timestamp = true replication_mode = false reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT' socket_dir = '/tmp' white_function_list = '' ====================================================================== I can successfully connect to local port 9999 and issue DML and select queries. == Problem == 1. I shutdown postgresql on slave node storage2 and see the following lines in syslog on the node with pgpool: ====================================================================== Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: detect_stop_postmaster_error: receive admin shutdown error from a node. Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: postmaster on DB node 1 was shutdown by administrative command Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: notice_backend_error: 1 fail over request from pid 53354 Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler called Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: starting to select new master node Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: starting degeneration. shutdown host storage2(5432) Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: child received shutdown request signal 3 Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53354 Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: failover_handler: kill 53355 Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53355: child received shutdown request signal 3 Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: failover_handler: set new master node: 0 ... Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: starting health checking Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 0 th DB node status: 2 Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: health_check: 1 th DB node status: 3 ====================================================================== As you can see, pgpool correctly marks storage2 node as being down. 2. After a while I start postgres on node storage2. 3. In order to return the node back into the pool I run the following command: pcp_attach_node -d 120 localhost 9898 postgres postgres 1 and see those lines in syslog on storage1: ====================================================================== Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 53003: send_failback_request: fail back 1 th node request from pid 53003 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler called Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: starting to select new master node Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: starting fail back. reconnect host storage2(5432) Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56131 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56131: child received shutdown request signal 3 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: failover_handler: kill 56132 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56132: child received shutdown request signal 3 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: failover_handler: set new master node: 0 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: failback done. reconnect host storage2(5432) Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56205: I am 56205 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56206: I am 56206 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: auth kind: 0 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: parameter status data received Feb 21 23:28:03 zaa last message repeated 10 times Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: backend key data received Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: transaction state: I Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: T Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: 1 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: D Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: C Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: kind: Z Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: find_primary_node: primary node id is 0 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler called Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: call wait3 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56132 exits with status 0 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56131 exits with status 0 Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: reap_handler: normally exited Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: starting health checking Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 0 th DB node status: 2 Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: health_check: 1 th DB node status: 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: T Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: num_fileds: 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: D Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: C Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: Z Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: kind: E Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer) Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: notice_backend_error: 0 fail over request from pid 56021 Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client connection Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting degeneration. shutdown host storage1(5432) Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56205 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56205: child received shutdown request signal 3 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: kill 56206 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: failover_handler: set new master node: 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56206: child received shutdown request signal 3 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: failover done. shutdown host storage1(5432) Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56213: I am 56213 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56214: I am 56214 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: pool_ssl: SSL requested but SSL support is not available Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: auth kind: 0 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: parameter status data received Feb 21 23:28:08 zaa last message repeated 10 times Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: backend key data received Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: transaction state: I Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: D Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: C Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: T Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: E Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: kind: Z Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: find_primary_node: do_query returns no rows Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: find_primary_node: do_query returns no data Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: find_primary_node: primary node id is 1 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler called Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: call wait3 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56206 exits with status 0 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56205 exits with status 0 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker child 56021 exits with status 11 by signal 11 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: fork a new worker child pid 56216 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: reap_handler: normally exited Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: I am 56216 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: pool_ssl: SSL requested but SSL support is not available Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: auth kind: 0 Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: parameter status data received Feb 21 23:28:08 zaa last message repeated 10 times Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: backend key data received Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: transaction state: I Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: starting health checking Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 0 th DB node status: 3 Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: health_check: 1 th DB node status: 1 ====================================================================== As you can see, at first pgpool correctly chooses storage1 node as master, but then for some reason it gets information about the master node (storage1) unavailability: ====================================================================== Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: read failed (Connection reset by peer) Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: notice_backend_error: 0 fail over request from pid 56021 Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client connection Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler called Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: failover_handler: starting to select new master node Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting degeneration. shutdown host storage1(5432) ====================================================================== slave node storage2 becomes the master node and health check shows storage1 node as unavailable. If I run "pcp_attach_node -d 120 localhost 9898 postgres postgres 0" command, storage1 node again becomes the master node as it should be. Could someone please let me know what might be causing the issue? Is this a bug or I am doing something wrong? PS. One more interesting issue is presence of those two lines: ====================================================================== Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child process 56021 was terminated by segmentation fault Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker child 56021 exits with status 11 by signal 11 ====================================================================== -- Alexander Zhuravlev http://ikato.com/ _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
