Re: [Pgpool-hackers] pgpool-II 3.0.4 release

Tatsuo Ishii Mon, 27 Jun 2011 15:25:00 -0700

> On Thu, 2011-06-16 at 18:11 +0200, Guillaume Lelarge wrote:
>> On Wed, 2011-06-15 at 19:31 +0900, Tatsuo Ishii wrote:
>> > > On Fri, 2011-06-10 at 20:20 +0200, Guillaume Lelarge wrote:
>> > >> On Thu, 2011-05-26 at 09:06 +0200, Guillaume Lelarge wrote:
>> > >> > Le 05/26/2011 12:57 AM, Tatsuo Ishii a écrit :
>> > >> > [...]
>> > >> > >> When I do the pcp_detach_node, I have this:
>> > >> > >>
>> > >> > >> 2011-05-25 20:24:12 LOG:   pid 31861: notice_backend_error: 0 fail 
>> > >> > >> over
>> > >> > >> request from pid 31861
>> > >> > >> 2011-05-25 20:24:12 LOG:   pid 31828: starting degeneration. 
>> > >> > >> shutdown
>> > >> > >> host localhost(5432)
>> > >> > >> 2011-05-25 20:24:12 ERROR: pid 31828: failover_handler: no valid 
>> > >> > >> DB node
>> > >> > >> found
>> > >> > >> 2011-05-25 20:24:12 LOG:   pid 31828: failover done. shutdown host
>> > >> > >> localhost(5432)
>> > >> > >>
>> > >> > >> Which seems fine to me. Then I do the pcp_attach_node, and I got 
>> > >> > >> this:
>> > >> > >>
>> > >> > >> 2011-05-25 20:25:23 LOG:   pid 31861: send_failback_request: fail 
>> > >> > >> back 0
>> > >> > >> th node request from pid 31861
>> > >> > >> 2011-05-25 20:25:23 ERROR: pid 31861: send_failback_request: node 
>> > >> > >> 0 is
>> > >> > >> alive.
>> > >> > >>
>> > >> > >> I was mistaken on the "node 0 is alive" message. I thought it 
>> > >> > >> means that
>> > >> > >> node 0 is NOW up. What it really means is that pgpool thought it 
>> > >> > >> was
>> > >> > >> ALREADY alive (hence the ERROR message level on the
>> > >> > >> send_failback_request function). Digging harder on this issue, I 
>> > >> > >> finally
>> > >> > >> found that the VALID_BACKEND macro returns true when it should 
>> > >> > >> return
>> > >> > >> false. Actually, there is already this comment in 
>> > >> > >> get_next_master_node():
>> > >> > >>
>> > >> > >>         /*
>> > >> > >>          * Do not use VALID_BACKEND macro in raw mode.
>> > >> > >>          * VALID_BACKEND return true only if the argument is master
>> > >> > >>          * node id. In other words, standby nodes are false. So 
>> > >> > >> need
>> > >> > >>          * to check backend status without VALID_BACKEND.
>> > >> > >>          */
>> > >> > >>
>> > >> > >> And I'm actually in raw mode. VALID_BACKEND is used so much it 
>> > >> > >> would be
>> > >> > >> really dangerous to change it. So, I'm not sure what we really 
>> > >> > >> should do
>> > >> > >> here. I've got a patch that fixes my issue cleanly, not sure it's 
>> > >> > >> the
>> > >> > >> best way to do this. See the patch in attachment.
>> > >> > > 
>> > >> > > My suggestion is, leave this as it is for 3.0.4. I think we need 
>> > >> > > more
>> > >> > > time to investigate it. Let's continue the work after 3.0.4 
>> > >> > > released.
>> > >> > > We already have critical issues such as "unnamed statement not 
>> > >> > > found"
>> > >> > > with 3.0.3, and I have personaly sent to users who were troubled by
>> > >> > > this issue the 3.0-STABLE CVS tar ball by their request. If we delay
>> > >> > > the 3.0.4 release, more and more this kind of questions/requests 
>> > >> > > will
>> > >> > > be coming. I don't want to be troubled...
>> > >> > > 
>> > >> > 
>> > >> > I agree. I have no problem with dealing with this for 3.0.5, or even 
>> > >> > 3.1.
>> > >> > 
>> > >> 
>> > >> Now that 3.0.4 is out, maybe it's the right time to work on this.
>> > >> 
>> > >> This issue is really a bad one. I had this week a mail from one of our
>> > >> customers, complaining that the online recovery process doesn't work
>> > >> because it thinks the node is still alive. And guess what... it uses the
>> > >> VALID_BACKEND, even if pgpool was working in raw mode.
>> > >> 
>> > >> What could we do about this? My patch fixes the previous error, but not
>> > >> this one. I now would be more in favor of a VALID_RAW_BACKEND macro.
>> > >> 
>> > > 
>> > > No comments on this? meaning I finish my patch and commit it? or meaning
>> > > we don't care about that issue? :)
>> > 
>> > Can you please explain why you use raw mode *and* online recovery
>> > together? To be honest I have not thought about such a use case.
>> 
>> Yeah, I tried a few things and I didn't find a way to reproduce that
>> behaviour. I sent an email to my customer to know more about this issue.
>> 
> 
> Seems I really can't reproduce the issue.
> 
>> Anyway, my first patch still applies. The one in the mail sent Wed, 25
>> May 2011 22:13:17 +0200, on this thread.
>> 
> 
> Tatsuo, any new comments on that mail?


Your patches look good to me.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
_______________________________________________
Pgpool-hackers mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-hackers

Re: [Pgpool-hackers] pgpool-II 3.0.4 release

Reply via email to