Thanks for the report. We are looking into this. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> Hello, > > I am having a problem with selection of master node when I manually > attach slave node (with id #1) which came up after postgresql server reboot. > When I attach the slave, pgpool tries to select master node, and for > some reason reports that previously working master node (with id #0) is no > longer > available, marks the master node as unavailable, and selects the newly > attached node (id #1) as master. If after that I manually attach the node with > id #0 back to the pool it becomes a master. > > More detailed explanation. > > == Setup == > Two postgresql nodes (running postgresql 9.0.3 on FreeBSD 8.0): > Master node (storage1) and a slave one (storage2). > The system uses streaming replication with the slave node running in hot > standby mode. > pgpool-II-3.0.3-beta1 is installed on storage1 node with the following > configuration file: > > ====================================================================== > zaa:...local/etc# cat pgpool.conf | sort | grep -v '^#' | grep -v '^$' > authentication_timeout = 30 > backend_hostname0 = 'storage1' > backend_hostname1 = 'storage2' > backend_port0 = 5432 > backend_port1 = 5432 > backend_weight0 = 0.01 > backend_weight1 = 0.99 > black_function_list = 'nextval,setval' > child_idle_limit = 0 > child_life_time = 300 > child_max_connections = 0 > connection_cache = true > connection_life_time = 0 > debug_level = 1 > enable_pool_hba = false > fail_over_on_backend_error = false > failover_command = '' > health_check_period = 5 > health_check_user = 'pgpool' > listen_addresses = 'localhost' > load_balance_mode = true > log_connections = true > log_hostname = false > log_per_node_statement = true > log_standby_delay = 'always' > log_statement = true > master_slave_mode = true > master_slave_sub_mode = 'stream' > max_pool = 5 > num_init_children = 2 > parallel_mode = false > pid_file_name = '/var/run/pgpool/pgpool.pid' > port = 9999 > print_timestamp = true > replication_mode = false > reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT' > socket_dir = '/tmp' > white_function_list = '' > ====================================================================== > > I can successfully connect to local port 9999 and issue DML and select > queries. > > == Problem == > > 1. I shutdown postgresql on slave node storage2 and see the following > lines in syslog on the node with pgpool: > > ====================================================================== > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: > detect_stop_postmaster_error: receive admin shutdown error from a node. > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: postmaster > on DB node 1 was shutdown by administrative command > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 53354: > notice_backend_error: 1 fail over request from pid 53354 > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: > failover_handler called > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: > failover_handler: starting to select new master node > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: starting > degeneration. shutdown host storage2(5432) > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53354: child > received shutdown request signal 3 > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: > failover_handler: kill 53354 > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 52999: > failover_handler: kill 53355 > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 DEBUG: pid 53355: child > received shutdown request signal 3 > Feb 21 23:22:33 zaa pgpool: 2011-02-21 23:22:33 LOG: pid 52999: > failover_handler: set new master node: 0 > ... > Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: starting > health checking > Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: > health_check: 0 th DB node status: 2 > Feb 21 23:22:34 zaa pgpool: 2011-02-21 23:22:34 DEBUG: pid 52999: > health_check: 1 th DB node status: 3 > ====================================================================== > > As you can see, pgpool correctly marks storage2 node as being down. > > 2. After a while I start postgres on node storage2. > 3. In order to return the node back into the pool I run the following > command: > > pcp_attach_node -d 120 localhost 9898 postgres postgres 1 > > and see those lines in syslog on storage1: > > ====================================================================== > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 53003: > send_failback_request: fail back 1 th node request from pid 53003 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > failover_handler called > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > failover_handler: starting to select new master node > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: starting > fail back. reconnect host storage2(5432) > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > failover_handler: kill 56131 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56131: child > received shutdown request signal 3 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > failover_handler: kill 56132 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56132: child > received shutdown request signal 3 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: > failover_handler: set new master node: 0 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: failback > done. reconnect host storage2(5432) > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56205: I am 56205 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: pool_ssl: > SSL requested but SSL support is not available > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 56206: I am 56206 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: > auth kind: 0 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: > parameter status data received > Feb 21 23:28:03 zaa last message repeated 10 times > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: > backend key data received > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: s_do_auth: > transaction state: I > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: T > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: > 1 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: D > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: C > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: Z > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: T > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: num_fileds: > 1 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: D > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: C > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: do_query: > kind: Z > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 LOG: pid 52999: > find_primary_node: primary node id is 0 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > reap_handler called > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > reap_handler: call wait3 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56132 > exits with status 0 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: child 56131 > exits with status 0 > Feb 21 23:28:03 zaa pgpool: 2011-02-21 23:28:03 DEBUG: pid 52999: > reap_handler: normally exited > Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: starting > health checking > Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: > health_check: 0 th DB node status: 2 > Feb 21 23:28:05 zaa pgpool: 2011-02-21 23:28:05 DEBUG: pid 52999: > health_check: 1 th DB node status: 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: > kind: T > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: num_fileds: > 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: > kind: D > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: > kind: C > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: > kind: Z > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56021: do_query: > kind: E > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: > read failed (Connection reset by peer) > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: > notice_backend_error: 0 fail over request from pid 56021 > Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client > connection > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler called > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler: starting to select new master node > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting > degeneration. shutdown host storage1(5432) > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler: kill 56205 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56205: child > received shutdown request signal 3 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler: kill 56206 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: > failover_handler: set new master node: 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56206: child > received shutdown request signal 3 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: failover > done. shutdown host storage1(5432) > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56213: I am 56213 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56214: I am 56214 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: pool_ssl: > SSL requested but SSL support is not available > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: > auth kind: 0 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: > parameter status data received > Feb 21 23:28:08 zaa last message repeated 10 times > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: > backend key data received > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: s_do_auth: > transaction state: I > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: T > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: > 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: D > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: C > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: Z > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: T > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: num_fileds: > 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: E > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: do_query: > kind: Z > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: > find_primary_node: do_query returns no rows > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: > find_primary_node: do_query returns no data > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: > find_primary_node: primary node id is 1 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > reap_handler called > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > reap_handler: call wait3 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56206 > exits with status 0 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: child 56205 > exits with status 0 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child > process 56021 was terminated by segmentation fault > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker > child 56021 exits with status 11 by signal 11 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: fork a new > worker child pid 56216 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > reap_handler: normally exited > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: I am 56216 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: pool_ssl: > SSL requested but SSL support is not available > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: > auth kind: 0 > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: > parameter status data received > Feb 21 23:28:08 zaa last message repeated 10 times > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: > backend key data received > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 56216: s_do_auth: > transaction state: I > Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: starting > health checking > Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: > health_check: 0 th DB node status: 3 > Feb 21 23:28:10 zaa pgpool: 2011-02-21 23:28:10 DEBUG: pid 52999: > health_check: 1 th DB node status: 1 > ====================================================================== > > As you can see, at first pgpool correctly chooses storage1 node as master, but > then for some reason it gets information about the master node (storage1) > unavailability: > > ====================================================================== > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 56021: pool_read: > read failed (Connection reset by peer) > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 56021: > notice_backend_error: 0 fail over request from pid 56021 > Feb 21 23:28:08 zaa postgres[56090]: [5-1] LOG: unexpected EOF on client > connection > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler called > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 DEBUG: pid 52999: > failover_handler: starting to select new master node > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: starting > degeneration. shutdown host storage1(5432) > ====================================================================== > > slave node storage2 becomes the master node and health check shows storage1 > node as unavailable. > > If I run "pcp_attach_node -d 120 localhost 9898 postgres postgres 0" command, > storage1 node again becomes the master node as it should be. > > Could someone please let me know what might be causing the issue? Is this a > bug > or I am doing something wrong? > > PS. One more interesting issue is presence of those two lines: > ====================================================================== > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 ERROR: pid 52999: Child > process 56021 was terminated by segmentation fault > Feb 21 23:28:08 zaa pgpool: 2011-02-21 23:28:08 LOG: pid 52999: worker > child 56021 exits with status 11 by signal 11 > ====================================================================== > > -- > Alexander Zhuravlev > http://ikato.com/ > _______________________________________________ > Pgpool-general mailing list > [email protected] > http://pgfoundry.org/mailman/listinfo/pgpool-general _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
