On Thu, 2020-03-19 at 13:39 -0400, Marc Smith wrote: > On Mon, Mar 16, 2020 at 1:26 PM Marc Smith <msmith...@gmail.com> > wrote: > > > > On Thu, Mar 12, 2020 at 10:51 AM Ken Gaillot <kgail...@redhat.com> > > wrote: > > > > > > On Wed, 2020-03-11 at 17:24 -0400, Marc Smith wrote: > > > > Hi, > > > > > > > > I'm using Pacemaker 1.1.20 (yes, I know, a bit dated now). I > > > > noticed > > > > > > I'd still consider that recent :) > > > > > > > when I modify a resource parameter (eg, update the value), this > > > > causes > > > > the resource itself to restart. And that's fine, but when this > > > > resource is restarted, it doesn't appear to honor the full set > > > > of > > > > constraints for that resource. > > > > > > > > I see the output like this (right after the resource parameter > > > > change): > > > > ... > > > > Mar 11 20:43:25 localhost crmd[1943]: notice: State > > > > transition > > > > S_IDLE -> S_POL > > > > ICY_ENGINE > > > > Mar 11 20:43:25 localhost crmd[1943]: notice: Current ping > > > > state: > > > > S_POLICY_ENG > > > > INE > > > > Mar 11 20:43:25 localhost pengine[1942]: notice: Clearing > > > > failure > > > > of > > > > p_bmd_140c58-1 on 140c58-1 because resource parameters have > > > > changed > > > > Mar 11 20:43:25 localhost pengine[1942]: notice: * Restart > > > > p_bmd_140c58-1 ( 140c58-1 ) due > > > > to > > > > resource definition change > > > > Mar 11 20:43:25 localhost pengine[1942]: notice: * Restart > > > > p_dummy_g_lvm_140c58-1 ( 140c58-1 ) due > > > > to > > > > required g_md_140c58-1 running > > > > Mar 11 20:43:25 localhost pengine[1942]: notice: * Restart > > > > p_lvm_140c58_vg_01 ( 140c58-1 ) due > > > > to > > > > required p_dummy_g_lvm_140c58-1 start > > > > Mar 11 20:43:25 localhost pengine[1942]: notice: Calculated > > > > transition 41, saving inputs in > > > > /var/lib/pacemaker/pengine/pe-input-173.bz2 > > > > Mar 11 20:43:25 localhost crmd[1943]: notice: Initiating stop > > > > operation p_lvm_140c58_vg_01_stop_0 on 140c58-1 > > > > Mar 11 20:43:25 localhost crmd[1943]: notice: Transition > > > > aborted by > > > > deletion of lrm_rsc_op[@id='p_bmd_140c58-1_last_failure_0']: > > > > Resource > > > > operation removal > > > > Mar 11 20:43:25 localhost crmd[1943]: notice: Current ping > > > > state: > > > > S_TRANSITION_ENGINE > > > > ... > > > > > > > > The stop on 'p_lvm_140c58_vg_01' then times out, because the > > > > other > > > > constraint (to stop the service above LVM) is never executed. I > > > > can > > > > see from the messages it never even tries to demote the > > > > resource > > > > above > > > > that. > > > > > > > > Yet, if I use crmsh at the shell, and do a restart on that same > > > > resource, it works correctly, and all constraints are honored: > > > > crm > > > > resource restart p_bmd_140c58-1 > > > > > > > > I can certainly provide my full cluster config if needed, but > > > > hoping > > > > to keep this email concise for clarity. =) > > > > > > > > I guess my questions are: 1) Is the difference in restart > > > > behavior > > > > expected, and not all constraints are followed when resource > > > > parameters change (or some other restart event that originated > > > > internally like this)? 2) Or perhaps this is known bug that was > > > > already resolved in newer versions of Pacemaker? > > > > > > No to both. Can you attach that pe-input-173.bz2 file (with any > > > sensitive info removed)? > > > > Thanks; that system got wiped, so I reproduced it on another system > > and I am attaching that pe-input file. Log snippet is below for > > completeness: > > > > Mar 16 17:16:50 localhost crmd[1340]: notice: State transition > > S_IDLE -> S_POL > > ICY_ENGINE > > Mar 16 17:16:50 localhost pengine[1339]: notice: * Restart > > p_bmd_126c4f-1 ( 126c4f-1 ) due to > > resource definition change > > Mar 16 17:16:50 localhost pengine[1339]: notice: * Restart > > p_dummy_g_lvm_126c4f-1 ( 126c4f-1 ) due to > > required g_md_126c4f-1 running > > Mar 16 17:16:50 localhost pengine[1339]: notice: * Restart > > p_lvm_126c4f_vg_01 ( 126c4f-1 ) due to > > required p_dummy_g_lvm_126c4f-1 start > > Mar 16 17:16:50 localhost pengine[1339]: notice: Calculated > > transition 149, saving inputs in > > /var/lib/pacemaker/pengine/pe-input-46.bz2 > > > > Hi Ken, > > Just a friendly bump to see if you had a chance to take a look at > this > issue? I appreciate your time and expertise! =) > > --Marc
Sorry, I've been slammed lately. There does appear to be a scheduler bug. The relevant constraint is (in plain language) start g_lvm_* then promote ms_alua_* The implicit inverse of that is demote ms_alua_* then stop g_lvm_* The bug is that ms_alua_* isn't demoted before g_lvm_* is stopped. (Note however that the configuration does not require ms_alua_* to be stopped.) > > > > --Marc > > > > > > > > > > > > I searched a bit for #2 but I didn't get many (well any) hits > > > > on > > > > other > > > > users experiencing this behavior. > > > > > > > > Many thanks in advance. > > > > > > > > --Marc > > > > > > -- > > > Ken Gaillot <kgail...@redhat.com> > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/