> On Fri, 2011-06-10 at 20:20 +0200, Guillaume Lelarge wrote: >> On Thu, 2011-05-26 at 09:06 +0200, Guillaume Lelarge wrote: >> > Le 05/26/2011 12:57 AM, Tatsuo Ishii a écrit : >> > [...] >> > >> When I do the pcp_detach_node, I have this: >> > >> >> > >> 2011-05-25 20:24:12 LOG: pid 31861: notice_backend_error: 0 fail over >> > >> request from pid 31861 >> > >> 2011-05-25 20:24:12 LOG: pid 31828: starting degeneration. shutdown >> > >> host localhost(5432) >> > >> 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid DB node >> > >> found >> > >> 2011-05-25 20:24:12 LOG: pid 31828: failover done. shutdown host >> > >> localhost(5432) >> > >> >> > >> Which seems fine to me. Then I do the pcp_attach_node, and I got this: >> > >> >> > >> 2011-05-25 20:25:23 LOG: pid 31861: send_failback_request: fail back 0 >> > >> th node request from pid 31861 >> > >> 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 0 is >> > >> alive. >> > >> >> > >> I was mistaken on the "node 0 is alive" message. I thought it means that >> > >> node 0 is NOW up. What it really means is that pgpool thought it was >> > >> ALREADY alive (hence the ERROR message level on the >> > >> send_failback_request function). Digging harder on this issue, I finally >> > >> found that the VALID_BACKEND macro returns true when it should return >> > >> false. Actually, there is already this comment in >> > >> get_next_master_node(): >> > >> >> > >> /* >> > >> * Do not use VALID_BACKEND macro in raw mode. >> > >> * VALID_BACKEND return true only if the argument is master >> > >> * node id. In other words, standby nodes are false. So need >> > >> * to check backend status without VALID_BACKEND. >> > >> */ >> > >> >> > >> And I'm actually in raw mode. VALID_BACKEND is used so much it would be >> > >> really dangerous to change it. So, I'm not sure what we really should do >> > >> here. I've got a patch that fixes my issue cleanly, not sure it's the >> > >> best way to do this. See the patch in attachment. >> > > >> > > My suggestion is, leave this as it is for 3.0.4. I think we need more >> > > time to investigate it. Let's continue the work after 3.0.4 released. >> > > We already have critical issues such as "unnamed statement not found" >> > > with 3.0.3, and I have personaly sent to users who were troubled by >> > > this issue the 3.0-STABLE CVS tar ball by their request. If we delay >> > > the 3.0.4 release, more and more this kind of questions/requests will >> > > be coming. I don't want to be troubled... >> > > >> > >> > I agree. I have no problem with dealing with this for 3.0.5, or even 3.1. >> > >> >> Now that 3.0.4 is out, maybe it's the right time to work on this. >> >> This issue is really a bad one. I had this week a mail from one of our >> customers, complaining that the online recovery process doesn't work >> because it thinks the node is still alive. And guess what... it uses the >> VALID_BACKEND, even if pgpool was working in raw mode. >> >> What could we do about this? My patch fixes the previous error, but not >> this one. I now would be more in favor of a VALID_RAW_BACKEND macro. >> > > No comments on this? meaning I finish my patch and commit it? or meaning > we don't care about that issue? :)
Can you please explain why you use raw mode *and* online recovery together? To be honest I have not thought about such a use case. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
