Hi Ken, Hi Jehan, >> 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. > When >> Pacemaker gets one of these errors from an agent, it will ban the >> resource from that node (until the failure is cleared).
Okay! I will test it about this correction next week. Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: Jehan-Guillaume de Rorthais <[email protected]> > To: Ken Gaillot <[email protected]> > Cc: Cluster Labs - All topics related to open-source clustering welcomed > <[email protected]> > Date: 2017/2/4, Sat 01:02 > Subject: Re: [ClusterLabs] [Question] About a change of crm_failcount. > > On Fri, 3 Feb 2017 09:45:18 -0600 > Ken Gaillot <[email protected]> wrote: > >> On 02/02/2017 12:33 PM, Ken Gaillot wrote: >> > On 02/02/2017 12:23 PM, [email protected] wrote: >> >> Hi All, >> >> >> >> By the next correction, the user was not able to set a value > except zero >> >> in crm_failcount. >> >> >> >> - [Fix: tools: implement crm_failcount command-line options > correctly] >> >> - >> >> > https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 >> >> >> >> However, pgsql RA sets INFINITY in a script. >> >> >> >> ``` >> >> (snip) >> >> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" >> >> (snip) >> >> ocf_exit_reason "My data is newer than new master's > one. New >> >> master's location : $master_baseline" exec_with_retry 0 > $CRM_FAILCOUNT -r >> >> $OCF_RESOURCE_INSTANCE -U $NODENAME -v INFINITY return > $OCF_ERR_GENERIC >> >> (snip) >> >> ``` >> >> >> >> There seems to be the influence only in pgsql somehow or other. >> >> >> >> Can you revise it to set a value except zero in crm_failcount? >> >> We make modifications to use crm_attribute in pgsql RA if we > cannot revise >> >> it. >> >> >> >> Best Regards, >> >> Hideo Yamauchi. >> > >> > Hmm, I didn't realize that was used. I changed it because it's > not a >> > good idea to set fail-count without also changing last-failure and >> > having a failed op in the LRM history. I'll have to think about > what the >> > best alternative is. >> >> Having a resource agent modify its own fail count is not a good idea, >> and could lead to unpredictable behavior. I didn't realize the pgsql >> agent did that. >> >> I don't want to re-enable the functionality, because I don't want > to >> encourage more agents doing this. >> >> There are two alternatives the pgsql agent can choose from: >> >> 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. > When >> Pacemaker gets one of these errors from an agent, it will ban the >> resource from that node (until the failure is cleared). >> >> 2. Use crm_resource --ban instead. This would ban the resource from that >> node until the user removes the ban with crm_resource --clear (or by >> deleting the ban consraint from the configuration). >> >> I'd recommend #1 since it does not require any pacemaker-specific > tools. >> >> We can make sure resource-agents has a fix for this before we release a >> new version of Pacemaker. We'll have to publicize as much as possible > to >> pgsql users that they should upgrade resource-agents before or at the >> same time as pacemaker. I see the alternative PAF agent has the same >> usage, so it will need to be updated, too. > > Yes, I was following this conversation. > > I'll do the fix on our side. > > Thank you! > > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
