Le 05/26/2011 12:57 AM, Tatsuo Ishii a écrit : >>> So it seems we need to wait for find_primary_node_repeatedly finish >>> before we issue pcp_attach_node. This suggests that your fix might not >>> be appropreate because your fix does not corresponds to this "timing" >>> issue. >>> >>> I'm going to keep on looking into this... >> >> That's actually not the issue I'm talking about. I'm in V3.0, with a >> single backend, no failover script. See my config in attachment. > > Ok. pgpool-II-3.0-stable does not use find_primary_node_repeatedly() > and it does not have the problem I'm talking about. >
Yes. >> When I do the pcp_detach_node, I have this: >> >> 2011-05-25 20:24:12 LOG: pid 31861: notice_backend_error: 0 fail over >> request from pid 31861 >> 2011-05-25 20:24:12 LOG: pid 31828: starting degeneration. shutdown >> host localhost(5432) >> 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node >> found >> 2011-05-25 20:24:12 LOG: pid 31828: failover done. shutdown host >> localhost(5432) >> >> Which seems fine to me. Then I do the pcp_attach_node, and I got this: >> >> 2011-05-25 20:25:23 LOG: pid 31861: send_failback_request: fail back 0 >> th node request from pid 31861 >> 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is >> alive. >> >> I was mistaken on the "node 0 is alive" message. I thought it means that >> node 0 is NOW up. What it really means is that pgpool thought it was >> ALREADY alive (hence the ERROR message level on the >> send_failback_request function). Digging harder on this issue, I finally >> found that the VALID_BACKEND macro returns true when it should return >> false. Actually, there is already this comment in get_next_master_node(): >> >> /* >> * Do not use VALID_BACKEND macro in raw mode. >> * VALID_BACKEND return true only if the argument is master >> * node id. In other words, standby nodes are false. So need >> * to check backend status without VALID_BACKEND. >> */ >> >> And I'm actually in raw mode. VALID_BACKEND is used so much it would be >> really dangerous to change it. So, I'm not sure what we really should do >> here. I've got a patch that fixes my issue cleanly, not sure it's the >> best way to do this. See the patch in attachment. > > My suggestion is, leave this as it is for 3.0.4. I think we need more > time to investigate it. Let's continue the work after 3.0.4 released. > We already have critical issues such as "unnamed statement not found" > with 3.0.3, and I have personaly sent to users who were troubled by > this issue the 3.0-STABLE CVS tar ball by their request. If we delay > the 3.0.4 release, more and more this kind of questions/requests will > be coming. I don't want to be troubled... > I agree. I have no problem with dealing with this for 3.0.5, or even 3.1. If you have a list of open items for 3.0.4, can you give it? so that we could help you closing some. >> BTW, when I do a pcp_attach_node, I have the status 2, but it didn't >> check if there was a PostgreSQL backend available. Not sure we want to >> do something on this too. Why doesn't it check if the backend is >> available? it doesn't do at startup too. I find this really weird, but >> I'm sure there is a reason. > > It's a design decision. pcp_attach_node is supposed to be used by > human(or smart management tool) and he/she should know what he/she is > doing. That says he/she should make sure if the backend actually > usable: just it is up and running is not enough. For example, in > replication mode, it must be synched with other backend before > pcp_attach_node is used. Fair enough. I didn't check the docs but it should say so. Will look into this. -- Guillaume http://www.postgresql.fr http://dalibo.com _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
