Re: [ClusterLabs] [Question] About a change of crm_failcount.
On Thu, 09 Feb 2017 18:04:41 +0100 wf...@niif.hu (Ferenc Wágner) wrote: > Jehan-Guillaume de Rorthaiswrites: > > > PAF use private attribute to give informations between actions. We > > detect the failure during the notify as well, but raise the error > > during the promotion itself. See how I dealt with this in PAF: > > > > https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68 > > > > This is the first time I hear about private attributes. Since they > could come useful one day, I'd like to understand them better. After > some reading, they seem to be node attributes, not resource attributes. > This may be irrelevant for PAF, but doesn't it mean that two resources > of the same type on the same node would interfere with each other? > Also, your _set_priv_attr could fall into an infinite loop if another > instance used it at the inappropriate moment. Do I miss something here? No, you are perfectly right. We are aware of this, this is something we need to fix in the next release of PAF (I was actually discussing this with a user 2 days ago on IRC :)). Thank you for the report! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Question] About a change of crm_failcount.
On Fri, 3 Feb 2017 09:45:18 -0600 Ken Gaillotwrote: > On 02/02/2017 12:33 PM, Ken Gaillot wrote: > > On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: > >> Hi All, > >> > >> By the next correction, the user was not able to set a value except zero > >> in crm_failcount. > >> > >> - [Fix: tools: implement crm_failcount command-line options correctly] > >>- > >> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 > >> > >> However, pgsql RA sets INFINITY in a script. > >> > >> ``` > >> (snip) > >> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" > >> (snip) > >> ocf_exit_reason "My data is newer than new master's one. New > >> master's location : $master_baseline" exec_with_retry 0 $CRM_FAILCOUNT -r > >> $OCF_RESOURCE_INSTANCE -U $NODENAME -v INFINITY return $OCF_ERR_GENERIC > >> (snip) > >> ``` > >> > >> There seems to be the influence only in pgsql somehow or other. > >> > >> Can you revise it to set a value except zero in crm_failcount? > >> We make modifications to use crm_attribute in pgsql RA if we cannot revise > >> it. > >> > >> Best Regards, > >> Hideo Yamauchi. > > > > Hmm, I didn't realize that was used. I changed it because it's not a > > good idea to set fail-count without also changing last-failure and > > having a failed op in the LRM history. I'll have to think about what the > > best alternative is. > > Having a resource agent modify its own fail count is not a good idea, > and could lead to unpredictable behavior. I didn't realize the pgsql > agent did that. > > I don't want to re-enable the functionality, because I don't want to > encourage more agents doing this. > > There are two alternatives the pgsql agent can choose from: > > 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When > Pacemaker gets one of these errors from an agent, it will ban the > resource from that node (until the failure is cleared). > > 2. Use crm_resource --ban instead. This would ban the resource from that > node until the user removes the ban with crm_resource --clear (or by > deleting the ban consraint from the configuration). > > I'd recommend #1 since it does not require any pacemaker-specific tools. > > We can make sure resource-agents has a fix for this before we release a > new version of Pacemaker. We'll have to publicize as much as possible to > pgsql users that they should upgrade resource-agents before or at the > same time as pacemaker. I see the alternative PAF agent has the same > usage, so it will need to be updated, too. Yes, I was following this conversation. I'll do the fix on our side. Thank you! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Question] About a change of crm_failcount.
On 02/02/2017 12:33 PM, Ken Gaillot wrote: > On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: >> Hi All, >> >> By the next correction, the user was not able to set a value except zero in >> crm_failcount. >> >> - [Fix: tools: implement crm_failcount command-line options correctly] >>- >> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 >> >> However, pgsql RA sets INFINITY in a script. >> >> ``` >> (snip) >> CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" >> (snip) >> ocf_exit_reason "My data is newer than new master's one. New master's >> location : $master_baseline" >> exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U $NODENAME >> -v INFINITY >> return $OCF_ERR_GENERIC >> (snip) >> ``` >> >> There seems to be the influence only in pgsql somehow or other. >> >> Can you revise it to set a value except zero in crm_failcount? >> We make modifications to use crm_attribute in pgsql RA if we cannot revise >> it. >> >> Best Regards, >> Hideo Yamauchi. > > Hmm, I didn't realize that was used. I changed it because it's not a > good idea to set fail-count without also changing last-failure and > having a failed op in the LRM history. I'll have to think about what the > best alternative is. Having a resource agent modify its own fail count is not a good idea, and could lead to unpredictable behavior. I didn't realize the pgsql agent did that. I don't want to re-enable the functionality, because I don't want to encourage more agents doing this. There are two alternatives the pgsql agent can choose from: 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When Pacemaker gets one of these errors from an agent, it will ban the resource from that node (until the failure is cleared). 2. Use crm_resource --ban instead. This would ban the resource from that node until the user removes the ban with crm_resource --clear (or by deleting the ban consraint from the configuration). I'd recommend #1 since it does not require any pacemaker-specific tools. We can make sure resource-agents has a fix for this before we release a new version of Pacemaker. We'll have to publicize as much as possible to pgsql users that they should upgrade resource-agents before or at the same time as pacemaker. I see the alternative PAF agent has the same usage, so it will need to be updated, too. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Question] About a change of crm_failcount.
On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > By the next correction, the user was not able to set a value except zero in > crm_failcount. > > - [Fix: tools: implement crm_failcount command-line options correctly] >- > https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 > > However, pgsql RA sets INFINITY in a script. > > ``` > (snip) > CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" > (snip) > ocf_exit_reason "My data is newer than new master's one. New master's > location : $master_baseline" > exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U $NODENAME > -v INFINITY > return $OCF_ERR_GENERIC > (snip) > ``` > > There seems to be the influence only in pgsql somehow or other. > > Can you revise it to set a value except zero in crm_failcount? > We make modifications to use crm_attribute in pgsql RA if we cannot revise it. > > Best Regards, > Hideo Yamauchi. Hmm, I didn't realize that was used. I changed it because it's not a good idea to set fail-count without also changing last-failure and having a failed op in the LRM history. I'll have to think about what the best alternative is. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] [Question] About a change of crm_failcount.
Hi All, By the next correction, the user was not able to set a value except zero in crm_failcount. - [Fix: tools: implement crm_failcount command-line options correctly] - https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40a994498cafd#diff-6e58482648938fd488a920b9902daac4 However, pgsql RA sets INFINITY in a script. ``` (snip) CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount" (snip) ocf_exit_reason "My data is newer than new master's one. New master's location : $master_baseline" exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U $NODENAME -v INFINITY return $OCF_ERR_GENERIC (snip) ``` There seems to be the influence only in pgsql somehow or other. Can you revise it to set a value except zero in crm_failcount? We make modifications to use crm_attribute in pgsql RA if we cannot revise it. Best Regards, Hideo Yamauchi. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org