Re: [ClusterLabs] pacemaker-fenced /dev/shm errors
Ken Gaillot > I'm glad it's resolved, but for future reference, that does indicate a > serious problem. It means the fencer is not accepting any requests, so > any fencing attempts or even attempts to monitor a fencing device from > that node will fail. > That sounds like pacemaker-fenced became some kind of zombie. For testing, I block the connection between the node and ipmi-fencing device. the fencing resource stopped and report error like below: Failed Resource Actions: * fence_ipmi start on c1.example.tw could not be executed (Timed Out) because 'Fence agent did not complete in time' at Tue Mar 28 12:49:58 2023 after 20.004s and it recovered when the connection recovered. Does it mean fencing is still working? I want to make sure if I saw message like "pacemaker-fenced[2405] is unresponsive to ipc after 1 tries", does it mean permanent fail or the second try success so it no more complains. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] pacemaker-fenced /dev/shm errors
Christine caulfield > It sounds like you're running an old version of libqb, upgrading to > libqb 2.0.6 (in RHEL 9.1) should fix those messages Thanks a lot for the quick response! I will arrange the upgrade. Regards, tbskyd ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pacemaker-fenced /dev/shm errors
Hi: the cluster is running under RHEL 9.0 elements. today I saw log report strange errors like below: Mar 27 13:07:06.287 example.com pacemaker-fenced[2405] (qb_sys_mmap_file_open) error: couldn't allocate file /dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng-data: Interrupted system call (4) Mar 27 13:07:06.288 example.com pacemaker-fenced[2405] (qb_rb_open_2) error: couldn't create file for mmap Mar 27 13:07:06.288 example.com pacemaker-fenced[2405] (qb_ipcs_shm_rb_open) error: qb_rb_open:/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng: Interrupted system call (4) Mar 27 13:07:06.288 example.com pacemaker-fenced[2405] (qb_ipcs_shm_connect) error: shm connection FAILED: Interrupted system call (4) Mar 27 13:07:06.288 example.com pacemaker-fenced[2405] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-2405-2403-12-A9UUaJ/qb): Interrupted system call (4) Mar 27 13:07:06.288 example.com pacemakerd [2403] (pcmk__ipc_is_authentic_process_active) info: Could not connect to stonith-ng IPC: Interrupted system call Mar 27 13:07:06.288 example.com pacemakerd [2403] (check_active_before_startup_processes) notice: pacemaker-fenced[2405] is unresponsive to ipc after 1 tries there are no more "pacemaker-fenced" keywords in the log. the cluster seems fine and the process id "2405" of pacemaker-fenced is still running. may I assume the cluster is ok and I don't need to do anything since pacemaker didn't complain further? ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] cibadmin response unexpected
Hi: I am using RHEL 9.1 with pacemaker-cli-2.1.4-5. I tried command below: > cibadmin -Q -o xxx I expect the result tell me that "xxx" scope is not exist, but the command return the whole configuration. is this the normal behavior? thanks a lot for help. Regards, tbskyd ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] best practice for scripting
Ken Gaillot > FYI, the old and new status XML are nearly identical. The old XML's > outermost element is > > >... > > > while the new XML has > > >... > > > > As long as you're not looking at those particular elements, parsing the > XML output is the same. > > The format is actually stable and selectable since Pacemaker 2.0.3: > crm_mon --as-xml gets the old format, crm_mon --output-as=xml gets the > new. pcs status xml has continued to use --as-xml, but will switch to > --output-as=xml, thus Tomas's warning. Thanks a lot! the explanation is so clear that I have no excuse to rewrite my scripts. I already use 'crm_mon' in my scripts, but many times parse 'pcs' result is simpler so I just use it. now I understand what to do. Regards, tbskyd ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] best practice for scripting
Tomas Jelinek > As we are aware of the difficulties of using pcs in scripts, we indeed > have a long term goal to provide machine readable output from pcs. Since > pcs started with a focus on producing human readable output, it's a lot > of work to do. Quite a big part of pcs code base cannot be easily > switched to producing machine parsable output. We are slowly moving > towards it (which in some cases may cause text output changes due to > code reuse), but it is currently not seen as something to be finished in > a year. Thanks for the hint. maybe after pacemaker 2.1 the xml output will become stable. I will try to start with "pcs status xml" in the future. Regards, tbskyd ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] best practice for scripting
Strahil Nikolov > > By the way , how do you monitor your pacemaker clusters ? > We are using Nagios and I found only 'check_crm' but it looks like it was > made for crmsh and most probably won't work with pcs without modifications. > > Best Regards, > Strahil Nikolov we use zabbix so we wrote some scripts for zabbix-agent to report cluster status. in that case pcs works fine since it is under the script. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] best practice for scripting
Ulrich Windl > "Coming from SUSE" I'm using crm shell for most tasks, and I think the syntax > is quite good. good for SUSE! unfortunately RHEL didn't include the utility... ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] best practice for scripting
Hi: I have some scripts which use 'pcs' and 'crm_mon'. I prefer pcs since it is an all-in-one tool. but besides 'pcs cluster cib' it has no stable text output. reading document of pacemaker 2.1 I found it says: "In addition to crm_mon and stonith_admin, the crmadmin, crm_resource, crm_simulate, and crm_verify commands now support the --output-as and --output-to options, including XML output (which scripts and higher-level tools are strongly recommended to use instead of trying to parse the text output, which may change from release to release)." so if I need to parse the command output, I think I should study these commands instead of pcs in the future? RedHat is doing great job to maintain the 'pcs' output consistency between minor versions (eg: 7.2-> 7.3). but with major version 7->8 many things changed. if there is a more stable method for scripting I would like to follow it. thanks a lot for help! ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] how to setup single node cluster
Reid Wahl > Disaster recovery is the main use case we had in mind. See the RHEL 8.2 > release notes: > - > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.2_release_notes/rhel-8-2-0-release#enhancement_high-availability-and-clusters > > I thought I also remembered some other use case involving MS SQL, but I can't > find anything about it so I might be remembering incorrectly. thanks a lot for confirmation. according to the discussion above, I think the setup procedure is similar for singe-node cluster. I should still use corosync (although it seems sync nowhere). I will try that when I have time. thanks again for your kindly help! ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] how to setup single node cluster
Reid Wahl > I don't think we do require fencing for single-node clusters. (Anyone at Red > Hat, feel free to comment.) I vaguely recall an internal mailing list or IRC > conversation where we discussed this months ago, but I can't find it now. > I've also checked our support policies documentation, and it's not mentioned > in the "cluster size" doc or the "fencing" doc. since the cluster is 100% alive or 100% dead with single node, I think fencing/quorum is not required. I am just curious what is the usage case. since RedHat supports it, it must be useful in real scenario. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] how to setup single node cluster
Ulrich Windl > > >>> d tbsky schrieb am 08.04.2021 um 05:52 in Nachricht > : > > Hi: > > I found RHEL 8.2 support single node cluster now. but I didn't > > find further document to explain the concept. RHEL 8.2 also support > > "disaster recovery cluster". so I think maybe a single node disaster > > recovery cluster is not bad. > > > >I think corosync is still necessary under single node cluster. or > > is there other new style of configuration? > > IMHO if you want a single-.node cluster, and you are not planning to add more > nodes, you'll be better off using a utility like monit to manage your > processes... sorry I didn't mention pacemaker in my previous post. I want a single node pacemaker disaster recovery cluster, which can be managed by normal pacemaker utilities like pcs. maybe there is other case which single node pacemaker cluster is useful, I just don't know now. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] how to setup single node cluster
Hi: I found RHEL 8.2 support single node cluster now. but I didn't find further document to explain the concept. RHEL 8.2 also support "disaster recovery cluster". so I think maybe a single node disaster recovery cluster is not bad. I think corosync is still necessary under single node cluster. or is there other new style of configuration? thanks for help! ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] staggered resource start/stop
Klaus Wenninger > Guess that heavily depends on what you are running inside your VMs. > If the services inside don't need each other or anything provided by > the other cluster-resources (or other way round) or everything is > synchronizing independently from the cluster ... > What you could still do is make the timeout more generous and combine > it with checking for a certain boot-state - so that it either proceeds > after the generous timeout (when something is broken inside the VM) > or if a certain state is reached. > > Klaus > boot-state + timeout is perfect. if there is a simple configuration to make it automatically in the future.. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] staggered resource start/stop
Klaus Wenninger > In this case it might be useful not to wait some defined time > hoping startup of the VM would have gone far enough that > the IO load has already decayed enough. > What about a resource that checks for something running > inside the VM that indicates that startup has completed? > Don't remember if the VirtualDomain RA might already > have such a probe possibility. in my experience, windows will eat most of disk IO, but linux suffers. rhel7/rhel8 sometimes timeout mount /boot or / when disk IO is too heavy, but windows always boot successfully. resource check inside the VM is good, but I wonder if detecting failed for some reason the cluster will stuck. in my case I think boot by order and delay can solve 99% of problem. it just looks a little complex with many delay RA. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] staggered resource start/stop
Reid Wahl > > An order constraint set with kind=Serialize (which is mentioned in the first > reply to the thread you linked) seems like the most logical option to me. You > could serialize a set of resource sets, where each inner set contains a > VirtualDomain resource and an ocf:heartbeat:Delay resource. > > 5.3.1. Ordering Properties > (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#idm46061192464416) > 5.6. Ordering Sets of Resources > (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-resource-sets-ordering) thanks a lot! I don't know there is an official RA acting as delay. that's interesting and useful to me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] staggered resource start/stop
Hi: since the vm start/stop at once will consume disk IO, I want to start/stop the vm one-by-one with delay. search the email-list I found the discussion https://oss.clusterlabs.org/pipermail/pacemaker/2013-August/043128.html now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are new methods to solve the problem. I search the document but didn't find new parameters for the job. if possible I don't want to modify VirtualDomain RA which comes with standard rpm package. maybe I should write a new RA which stagger the node utilization. but if I reset the node utilization when cluster restart, there maybe a race condition. thanks for help! ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/