Zhuravlev, I found a bug possibly related to your report. Pgpool worker process have a persistent connection to backends to check replication delay. If a backend goes down and then up, the process tries to keep using the connection without knowing the fact. I think this is the cause of the problems you have seen.
The fix was included in 3.0.3, released yesterday (though no mention about this on the NEWS file). http://lists.pgfoundry.org/pipermail/pgpool-committers/2011-February/001669.html Now the process creates everytime when replication delay checked. This is add a little bit resource consumption, but I couldn't think of any alternative. Could you please test 3.0.3 and report back to me if your problems go away? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > Zhuravlev, > > I couldn't reproduce your problem here. Since both > problems(unexpected failover and segfault) were occured in "worker > process", I would like to take a backtrace of the process. > The process looks like: > > 23567 ? S 0:00 pgpool: worker process > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp > >> Thanks for the report. We are looking into this. >> -- >> Tatsuo Ishii >> SRA OSS, Inc. Japan >> English: http://www.sraoss.co.jp/index_en.php >> Japanese: http://www.sraoss.co.jp >> >>> Hello, >>> >>> I am having a problem with selection of master node when I manually >>> attach slave node (with id #1) which came up after postgresql server reboot. >>> When I attach the slave, pgpool tries to select master node, and for >>> some reason reports that previously working master node (with id #0) is no >>> longer >>> available, marks the master node as unavailable, and selects the newly >>> attached node (id #1) as master. If after that I manually attach the node >>> with >>> id #0 back to the pool it becomes a master. >>> >>> More detailed explanation. >>> >>> == Setup == >>> Two postgresql nodes (running postgresql 9.0.3 on FreeBSD 8.0): >>> Master node (storage1) and a slave one (storage2). >>> The system uses streaming replication with the slave node running in hot >>> standby mode. >>> pgpool-II-3.0.3-beta1 is installed on storage1 node with the following >>> configuration file: >>> >>> ====================================================================== >>> zaa:...local/etc# cat pgpool.conf | sort | grep -v '^#' | grep -v '^$' >>> authentication_timeout = 30 >>> backend_hostname0 = 'storage1' >>> backend_hostname1 = 'storage2' >>> backend_port0 = 5432 >>> backend_port1 = 5432 >>> backend_weight0 = 0.01 >>> backend_weight1 = 0.99 >>> black_function_list = 'nextval,setval' >>> child_idle_limit = 0 >>> child_life_time = 300 >>> child_max_connections = 0 >>> connection_cache = true >>> connection_life_time = 0 >>> debug_level = 1 >>> enable_pool_hba = false >>> fail_over_on_backend_error = false >>> failover_command = '' >>> health_check_period = 5 >>> health_check_user = 'pgpool' >>> listen_addresses = 'localhost' >>> load_balance_mode = true >>> log_connections = true >>> log_hostname = false >>> log_per_node_statement = true >>> log_standby_delay = 'always' >>> log_statement = true >>> master_slave_mode = true >>> master_slave_sub_mode = 'stream' >>> max_pool = 5 >>> num_init_children = 2 >>> parallel_mode = false >>> pid_file_name = '/var/run/pgpool/pgpool.pid' >>> port = 9999 >>> print_timestamp = true >>> replication_mode = false >>> reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT' >>> socket_dir = '/tmp' >>> white_function_list = '' >>> ====================================================================== >>> >>> I can successfully connect to local port 9999 and issue DML and select >>> queries. >>> >>> == Problem == >>> >>> 1. I shutdown postgresql on slave node storage2 and see the following >>> lines in syslog on the node with pgpool: >>> >>> ====================================================================== >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: >>> detect_stop_postmaster_error: receive admin shutdown error from a node. >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: >>> postmaster on DB node 1 was shutdown by administrative command >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: >>> notice_backend_error: 1 fail over request from pid 53354 >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: >>> failover_handler called >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: >>> failover_handler: starting to select new master node >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: starting >>> degeneration. shutdown host storage2(5432) >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: child >>> received shutdown request signal 3 >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: >>> failover_handler: kill 53354 >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: >>> failover_handler: kill 53355 >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53355: child >>> received shutdown request signal 3 >>> Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: >>> failover_handler: set new master node: 0 >>> ... >>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: starting >>> health checking >>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: >>> health_check: 0 th DB node status: 2 >>> Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: >>> health_check: 1 th DB node status: 3 >>> ====================================================================== >>> >>> As you can see, pgpool correctly marks storage2 node as being down. >>> >>> 2. After a while I start postgres on node storage2. >>> 3. In order to return the node back into the pool I run the following >>> command: >>> >>> pcp_attach_node -d 120 localhost 9898 postgres postgres 1 >>> >>> and see those lines in syslog on storage1: >>> >>> ====================================================================== >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 53003: >>> send_failback_request: fail back 1 th node request from pid 53003 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> failover_handler called >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> failover_handler: starting to select new master node >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: starting >>> fail back. reconnect host storage2(5432) >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> failover_handler: kill 56131 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56131: child >>> received shutdown request signal 3 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> failover_handler: kill 56132 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56132: child >>> received shutdown request signal 3 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: >>> failover_handler: set new master node: 0 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: failback >>> done. reconnect host storage2(5432) >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56205: I am 56205 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: pool_ssl: >>> SSL requested but SSL support is not available >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56206: I am 56206 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> s_do_auth: auth kind: 0 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> s_do_auth: parameter status data received >>> Feb 21 23:28:03 zaa last message repeated 10 times >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> s_do_auth: backend key data received >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> s_do_auth: transaction state: I >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: T >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> num_fileds: 1 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: D >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: C >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: Z >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: T >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> num_fileds: 1 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: D >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: C >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: >>> kind: Z >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: >>> find_primary_node: primary node id is 0 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> reap_handler called >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> reap_handler: call wait3 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child >>> 56132 exits with status 0 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child >>> 56131 exits with status 0 >>> Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: >>> reap_handler: normally exited >>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: starting >>> health checking >>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: >>> health_check: 0 th DB node status: 2 >>> Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: >>> health_check: 1 th DB node status: 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: >>> kind: T >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: >>> num_fileds: 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: >>> kind: D >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: >>> kind: C >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: >>> kind: Z >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: >>> kind: E >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: >>> pool_read: read failed (Connection reset by peer) >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: >>> notice_backend_error: 0 fail over request from pid 56021 >>> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client >>> connection >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler called >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler: starting to select new master node >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting >>> degeneration. shutdown host storage1(5432) >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler: kill 56205 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56205: child >>> received shutdown request signal 3 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler: kill 56206 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: >>> failover_handler: set new master node: 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56206: child >>> received shutdown request signal 3 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: failover >>> done. shutdown host storage1(5432) >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56213: I am 56213 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56214: I am 56214 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: pool_ssl: >>> SSL requested but SSL support is not available >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> s_do_auth: auth kind: 0 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> s_do_auth: parameter status data received >>> Feb 21 23:28:08 zaa last message repeated 10 times >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> s_do_auth: backend key data received >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> s_do_auth: transaction state: I >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: T >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> num_fileds: 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: D >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: C >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: Z >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: T >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> num_fileds: 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: E >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: >>> kind: Z >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: >>> find_primary_node: do_query returns no rows >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: >>> find_primary_node: do_query returns no data >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: >>> find_primary_node: primary node id is 1 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> reap_handler called >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> reap_handler: call wait3 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child >>> 56206 exits with status 0 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child >>> 56205 exits with status 0 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child >>> process 56021 was terminated by segmentation fault >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker >>> child 56021 exits with status 11 by signal 11 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: fork a >>> new worker child pid 56216 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> reap_handler: normally exited >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: I am 56216 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: pool_ssl: >>> SSL requested but SSL support is not available >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: >>> s_do_auth: auth kind: 0 >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: >>> s_do_auth: parameter status data received >>> Feb 21 23:28:08 zaa last message repeated 10 times >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: >>> s_do_auth: backend key data received >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: >>> s_do_auth: transaction state: I >>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: starting >>> health checking >>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: >>> health_check: 0 th DB node status: 3 >>> Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: >>> health_check: 1 th DB node status: 1 >>> ====================================================================== >>> >>> As you can see, at first pgpool correctly chooses storage1 node as master, >>> but >>> then for some reason it gets information about the master node (storage1) >>> unavailability: >>> >>> ====================================================================== >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: >>> pool_read: read failed (Connection reset by peer) >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: >>> notice_backend_error: 0 fail over request from pid 56021 >>> Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client >>> connection >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler called >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: >>> failover_handler: starting to select new master node >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting >>> degeneration. shutdown host storage1(5432) >>> ====================================================================== >>> >>> slave node storage2 becomes the master node and health check shows storage1 >>> node as unavailable. >>> >>> If I run "pcp_attach_node -d 120 localhost 9898 postgres postgres 0" >>> command, >>> storage1 node again becomes the master node as it should be. >>> >>> Could someone please let me know what might be causing the issue? Is this a >>> bug >>> or I am doing something wrong? >>> >>> PS. One more interesting issue is presence of those two lines: >>> ====================================================================== >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child >>> process 56021 was terminated by segmentation fault >>> Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker >>> child 56021 exits with status 11 by signal 11 >>> ====================================================================== >>> >>> -- >>> Alexander Zhuravlev >>> http://ikato.com/ >>> _______________________________________________ >>> Pgpool-general mailing list >>> [email protected] >>> http://pgfoundry.org/mailman/listinfo/pgpool-general >> _______________________________________________ >> Pgpool-general mailing list >> [email protected] >> http://pgfoundry.org/mailman/listinfo/pgpool-general > _______________________________________________ > Pgpool-general mailing list > [email protected] > http://pgfoundry.org/mailman/listinfo/pgpool-general _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
