>> So it seems we need to wait for find_primary_node_repeatedly finish >> before we issue pcp_attach_node. This suggests that your fix might not >> be appropreate because your fix does not corresponds to this "timing" >> issue. >> >> I'm going to keep on looking into this... > > That's actually not the issue I'm talking about. I'm in V3.0, with a > single backend, no failover script. See my config in attachment.
Ok. pgpool-II-3.0-stable does not use find_primary_node_repeatedly() and it does not have the problem I'm talking about. > When I do the pcp_detach_node, I have this: > > 2011-05-25 20:24:12 LOG: pid 31861: notice_backend_error: 0 fail over > request from pid 31861 > 2011-05-25 20:24:12 LOG: pid 31828: starting degeneration. shutdown > host localhost(5432) > 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node > found > 2011-05-25 20:24:12 LOG: pid 31828: failover done. shutdown host > localhost(5432) > > Which seems fine to me. Then I do the pcp_attach_node, and I got this: > > 2011-05-25 20:25:23 LOG: pid 31861: send_failback_request: fail back 0 > th node request from pid 31861 > 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is > alive. > > I was mistaken on the "node 0 is alive" message. I thought it means that > node 0 is NOW up. What it really means is that pgpool thought it was > ALREADY alive (hence the ERROR message level on the > send_failback_request function). Digging harder on this issue, I finally > found that the VALID_BACKEND macro returns true when it should return > false. Actually, there is already this comment in get_next_master_node(): > > /* > * Do not use VALID_BACKEND macro in raw mode. > * VALID_BACKEND return true only if the argument is master > * node id. In other words, standby nodes are false. So need > * to check backend status without VALID_BACKEND. > */ > > And I'm actually in raw mode. VALID_BACKEND is used so much it would be > really dangerous to change it. So, I'm not sure what we really should do > here. I've got a patch that fixes my issue cleanly, not sure it's the > best way to do this. See the patch in attachment. My suggestion is, leave this as it is for 3.0.4. I think we need more time to investigate it. Let's continue the work after 3.0.4 released. We already have critical issues such as "unnamed statement not found" with 3.0.3, and I have personaly sent to users who were troubled by this issue the 3.0-STABLE CVS tar ball by their request. If we delay the 3.0.4 release, more and more this kind of questions/requests will be coming. I don't want to be troubled... > BTW, when I do a pcp_attach_node, I have the status 2, but it didn't > check if there was a PostgreSQL backend available. Not sure we want to > do something on this too. Why doesn't it check if the backend is > available? it doesn't do at startup too. I find this really weird, but > I'm sure there is a reason. It's a design decision. pcp_attach_node is supposed to be used by human(or smart management tool) and he/she should know what he/she is doing. That says he/she should make sure if the backend actually usable: just it is up and running is not enough. For example, in replication mode, it must be synched with other backend before pcp_attach_node is used. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
