Don't worry about the attrd_updater, the standby is recorded as a permanent node attribute so you'd use crm_attribute for that instead. But from the CIB we can see that the attribute was not successfully recorded, even though it logged that it was. That's concerning and may indicate a regression. I'll try to reproduce it on my end.
On Wed, 2023-03-15 at 21:01 +0530, Ayush Siddarath wrote: > Hi Ken, > > Somehow I didn't receive the email for your response. > > The system is currently in the same state and here are the required > command outputs: > > > FILE-2:~ # cibadmin -Q | grep standby > > <nvpair id="num-1-instance_attributes-standby" > > name="standby" value="on"/> > > <nvpair id="num-3-instance_attributes-standby" > > name="standby" value="on"/> > > <nvpair id="num-4-instance_attributes-standby" > > name="standby" value="on"/> > > Running into some syntax issues when issuing the attrd_updater > command. Could you review the commands? > > > FILE-2:~ # attrd_updater -Q --name="standby" -N FILE-3 > > Could not query value of standby: attribute does not exist > > FILE-2:~ # attrd_updater -Q -n standby -N FILE-3 > > Could not query value of standby: attribute does not exist > > FILE-2:~ # attrd_updater -Q -n standby -N FILE-2 > > Could not query value of standby: attribute does not exist > > cibadmin -Q --> > > </crm_config> > <nodes> > <node id="1" uname="FILE-1"> > <instance_attributes id="num-1-instance_attributes"> > <nvpair id="num-1-instance_attributes-standby" > name="standby" value="on"/> > </instance_attributes> > </node> > <node id="2" uname="FILE-2"/> > <node id="3" uname="FILE-3"> > <instance_attributes id="num-3-instance_attributes"> > <nvpair id="num-3-instance_attributes-standby" > name="standby" value="on"/> > </instance_attributes> > </node> > <node id="4" uname="FILE-4"> > <instance_attributes id="num-4-instance_attributes"> > <nvpair id="num-4-instance_attributes-standby" > name="standby" value="on"/> > </instance_attributes> > </node> > > After a few minutes, re-running the node standby command for the same > node works fine. > > Thanks, > Ayush > > On Wed, Mar 15, 2023 at 8:55 PM Priyanka Balotra < > priyanka.14balo...@gmail.com> wrote: > > +Ayush > > > > Thanks > > > > > > On Wed, 15 Mar 2023 at 8:17 PM, Ken Gaillot <kgail...@redhat.com> > > wrote: > > > Hi, > > > > > > If you can reproduce the problem, the following info would be > > > helpful: > > > > > > * "cibadmin -Q | grep standby" : to show whether it was > > > successfully > > > recorded in the CIB (will show info for any node with standby, > > > but the > > > XML ID likely has the node name or ID in it) > > > > > > * "attrd_updater -Q -n standby -N FILE-2" : to show whether the > > > attribute manager has the right value in memory for the affected > > > node > > > > > > > > > On Wed, 2023-03-15 at 15:51 +0530, Ayush Siddarath wrote: > > > > Hi All, > > > > > > > > We are seeing an issue as part of crm maintenance operations. > > > As part > > > > of the upgrade process, the crm nodes are put into standby > > > mode. > > > > But it's observed that one of the nodes fails to go into > > > standby mode > > > > despite the "crm node standby" returning success. > > > > > > > > Commands issued to put nodes into maintenance : > > > > > > > > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-1] => > > > > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-1", > > > > > "delta": "0:00:00.442615", "end": "2023-03-15 > > > 06:07:08.150375", > > > > > "rc": 0, "start": "2023-03-15 06:07:07.707760", "stderr": "", > > > > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: > > > standby > > > > > node FILE-1", "stdout_lines": ["\u001b[32mINFO\u001b[0m: > > > standby > > > > > node FILE-1"]} > > > > > . > > > > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-2] => > > > > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-2", > > > > > "delta": "0:00:00.459407", "end": "2023-03-15 > > > 06:07:08.223749", > > > > > "rc": 0, "start": "2023-03-15 06:07:07.764342", "stderr": "", > > > > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: > > > standby > > > > > node FILE-2", "stdout_lines": ["\u001b[32mINFO\u001b[0m: > > > standby > > > > > node FILE-2"]} > > > > > > > > ........ > > > > > > > > Crm status o/p after above command execution: > > > > > > > > > FILE-2:/var/log # crm status > > > > > Cluster Summary: > > > > > * Stack: corosync > > > > > * Current DC: FILE-1 (version 2.1.2+20211124.ada5c3b36- > > > > > 150400.2.43-2.1.2+20211124.ada5c3b36) - partition with quorum > > > > > * Last updated: Wed Mar 15 08:32:27 2023 > > > > > * Last change: Wed Mar 15 06:07:08 2023 by root via > > > cibadmin on > > > > > FILE-4 > > > > > * 4 nodes configured > > > > > * 11 resource instances configured (5 DISABLED) > > > > > Node List: > > > > > * Node FILE-1: standby (with active resources) > > > > > * Node FILE-3: standby (with active resources) > > > > > * Node FILE-4: standby (with active resources) > > > > > * Online: [ FILE-2 ] > > > > > > > > pacemaker logs indicate that FILE-2 received the commands to > > > put it > > > > into standby. > > > > > > > > > FILE-2:/var/log # grep standby > > > /var/log/pacemaker/pacemaker.log > > > > > Mar 15 06:07:08.098 FILE-2 pacemaker-based [8635] > > > > > (cib_perform_op) info: ++ > > > > > > > > <nvpair id="num-1-instance_attributes-standby" > > > name="standby" > > > > > value="on"/> > > > > > Mar 15 06:07:08.166 FILE-2 pacemaker-based [8635] > > > > > (cib_perform_op) info: ++ > > > > > > > > <nvpair id="num-3-instance_attributes-standby" > > > name="standby" > > > > > value="on"/> > > > > > Mar 15 06:07:08.170 FILE-2 pacemaker-based [8635] > > > > > (cib_perform_op) info: ++ > > > > > > > > <nvpair id="num-2-instance_attributes-standby" > > > name="standby" > > > > > value="on"/> > > > > > Mar 15 06:07:08.230 FILE-2 pacemaker-based [8635] > > > > > (cib_perform_op) info: ++ > > > > > > > > <nvpair id="num-4-instance_attributes-standby" > > > name="standby" > > > > > value="on"/> > > > > > > > > > > > > Issue is quite intermittent and observed on other nodes as > > > well. > > > > We have seen a similar issue when we try to remove the node > > > from > > > > standby mode (using crm node online) command. One/more nodes > > > fails to > > > > get removed from standby mode. > > > > > > > > We suspect it could be an issue with parallel execution of node > > > > standby/online command for all nodes but this issue wasn't > > > observed > > > > with pacemaker packaged with SLES15 SP2 OS. > > > > > > > > I'm attaching the pacemaker.log from FILE-2 for analysis. Let > > > us know > > > > if any additional information is required. > > > > > > > > OS: SLES15 SP4 > > > > Pacemaker version --> > > > > crmadmin --version > > > > Pacemaker 2.1.2+20211124.ada5c3b36-150400.2.43 > > > > > > > > Thanks, > > > > Ayush > > > > > > > > _______________________________________________ > > > > Manage your subscription: > > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/