Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread d tbsky
Ken Gaillot 
> I'm glad it's resolved, but for future reference, that does indicate a
> serious problem. It means the fencer is not accepting any requests, so
> any fencing attempts or even attempts to monitor a fencing device from
> that node will fail.
>

   That sounds like pacemaker-fenced became some kind of zombie.
For testing, I block the connection between the node and ipmi-fencing
device. the fencing resource stopped and  report error like below:

Failed Resource Actions:
  * fence_ipmi start on c1.example.tw could not be executed (Timed
Out) because 'Fence agent did not complete in time' at Tue Mar 28
12:49:58 2023 after 20.004s

and it recovered when the connection recovered.
Does it mean fencing is still working?
I want to make sure if I saw message like "pacemaker-fenced[2405] is
unresponsive to ipc after 1 tries", does it mean permanent fail or the
second try success so it no more complains.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread d tbsky
Christine caulfield 
> It sounds like you're running an old version of libqb, upgrading to
> libqb 2.0.6 (in RHEL 9.1) should fix those messages

Thanks a lot for the quick response! I will arrange the upgrade.

Regards,
tbskyd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread d tbsky
Hi:
   the cluster is running under RHEL 9.0 elements. today I saw log
report strange errors like below:

Mar 27 13:07:06.287 example.com pacemaker-fenced[2405]
(qb_sys_mmap_file_open) error: couldn't allocate file
/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng-data:
Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_rb_open_2)  error: couldn't create file for mmap
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_ipcs_shm_rb_open)   error:
qb_rb_open:/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng:
Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_ipcs_shm_connect)   error: shm connection FAILED: Interrupted
system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(handle_new_connection) error: Error in connection setup
(/dev/shm/qb-2405-2403-12-A9UUaJ/qb): Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemakerd  [2403]
(pcmk__ipc_is_authentic_process_active) info: Could not connect to
stonith-ng IPC: Interrupted system call
Mar 27 13:07:06.288 example.com pacemakerd  [2403]
(check_active_before_startup_processes) notice:
pacemaker-fenced[2405] is unresponsive to ipc after 1 tries

there are no more "pacemaker-fenced" keywords in the log. the cluster
seems fine and the process id "2405" of pacemaker-fenced is still
running. may I assume the cluster is ok and I don't need to do
anything since pacemaker didn't complain further?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] cibadmin response unexpected

2023-01-17 Thread d tbsky
Hi:
   I am using RHEL 9.1 with pacemaker-cli-2.1.4-5. I tried command below:

> cibadmin -Q -o xxx

I expect the result tell me that "xxx" scope is not exist, but the
command return the whole configuration. is this the normal behavior?

thanks a lot for help.

Regards,
tbskyd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] best practice for scripting

2021-04-13 Thread d tbsky
Ken Gaillot 
> FYI, the old and new status XML are nearly identical. The old XML's
> outermost element is
>
>  
>...
>  
>
> while the new XML has
>
>  
>...
>
>  
>
> As long as you're not looking at those particular elements, parsing the
> XML output is the same.
>
> The format is actually stable and selectable since Pacemaker 2.0.3:
> crm_mon --as-xml gets the old format, crm_mon --output-as=xml gets the
> new. pcs status xml has continued to use --as-xml, but will switch to
> --output-as=xml, thus Tomas's warning.

Thanks a lot!
the explanation is so clear that I have no excuse to rewrite my scripts.
I already use 'crm_mon' in my scripts, but many times parse 'pcs'
result is simpler so I just use it.
now I understand what to do.

Regards,
tbskyd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] best practice for scripting

2021-04-13 Thread d tbsky
Tomas Jelinek 
> As we are aware of the difficulties of using pcs in scripts, we indeed
> have a long term goal to provide machine readable output from pcs. Since
> pcs started with a focus on producing human readable output, it's a lot
> of work to do. Quite a big part of pcs code base cannot be easily
> switched to producing machine parsable output. We are slowly moving
> towards it (which in some cases may cause text output changes due to
> code reuse), but it is currently not seen as something to be finished in
> a year.

Thanks for the hint. maybe after pacemaker 2.1 the xml output will
become stable.
I will try to start with "pcs status xml" in the future.

Regards,
tbskyd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] best practice for scripting

2021-04-13 Thread d tbsky
Strahil Nikolov 
>
> By the way , how do you monitor your pacemaker clusters ?
> We are using Nagios and I found only 'check_crm' but it looks like it was 
> made for crmsh and most probably won't work with pcs without modifications.
>
> Best Regards,
> Strahil Nikolov

we use zabbix so we wrote some scripts for zabbix-agent to report
cluster status. in that case pcs works fine since it is under the
script.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] best practice for scripting

2021-04-13 Thread d tbsky
Ulrich Windl 

> "Coming from SUSE" I'm using crm shell for most tasks, and I think the syntax
> is quite good.

   good for SUSE! unfortunately RHEL didn't include the utility...
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] best practice for scripting

2021-04-13 Thread d tbsky
Hi:
I have some scripts which use 'pcs' and 'crm_mon'. I prefer pcs
since it is an all-in-one tool. but besides 'pcs cluster cib' it has
no stable text output. reading document of pacemaker 2.1 I found it
says:

"In addition to crm_mon and stonith_admin, the crmadmin, crm_resource,
crm_simulate, and crm_verify commands now support the --output-as and
--output-to options, including XML output (which scripts and
higher-level tools are strongly recommended to use instead of trying
to parse the text output, which may change from release to release)."

so if I need to parse the command output, I think I should study these
commands instead of pcs in the future?  RedHat is doing great job to
maintain the 'pcs' output consistency between minor versions (eg:
7.2-> 7.3). but with major version 7->8 many things changed. if there
is a more stable method for scripting I would like to follow it.

thanks a lot for help!
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] how to setup single node cluster

2021-04-08 Thread d tbsky
Reid Wahl 
> Disaster recovery is the main use case we had in mind. See the RHEL 8.2 
> release notes:
>   - 
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.2_release_notes/rhel-8-2-0-release#enhancement_high-availability-and-clusters
>
> I thought I also remembered some other use case involving MS SQL, but I can't 
> find anything about it so I might be remembering incorrectly.

thanks a lot for confirmation. according to the discussion above, I
think the setup procedure is similar for singe-node cluster. I should
still use corosync (although it seems sync nowhere).
I will try that when I have time.
thanks again for your kindly help!
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] how to setup single node cluster

2021-04-08 Thread d tbsky
Reid Wahl 
> I don't think we do require fencing for single-node clusters. (Anyone at Red 
> Hat, feel free to comment.) I vaguely recall an internal mailing list or IRC 
> conversation where we discussed this months ago, but I can't find it now. 
> I've also checked our support policies documentation, and it's not mentioned 
> in the "cluster size" doc or the "fencing" doc.

   since the cluster is 100% alive or 100% dead with single node, I
think fencing/quorum is not required. I am just curious what is the
usage case. since RedHat supports it, it must be useful in real
scenario.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] how to setup single node cluster

2021-04-08 Thread d tbsky
Ulrich Windl 
>
> >>> d tbsky  schrieb am 08.04.2021 um 05:52 in Nachricht
> :
> > Hi:
> > I found RHEL 8.2 support single node cluster now.  but I didn't
> > find further document to explain the concept. RHEL 8.2 also support
> > "disaster recovery cluster". so I think maybe a single node disaster
> > recovery cluster is not bad.
> >
> >I think corosync is still necessary under single node cluster. or
> > is there other new style of configuration?
>
> IMHO if you want a single-.node cluster, and you are not planning to add more 
> nodes, you'll be better off using a utility like monit to manage your 
> processes...

sorry I didn't mention pacemaker in my previous post. I want a single
node pacemaker disaster recovery cluster, which can be managed by
normal pacemaker utilities like pcs.
maybe there is other case which single node pacemaker cluster is
useful, I just don't know now.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] how to setup single node cluster

2021-04-07 Thread d tbsky
Hi:
I found RHEL 8.2 support single node cluster now.  but I didn't
find further document to explain the concept. RHEL 8.2 also support
"disaster recovery cluster". so I think maybe a single node disaster
recovery cluster is not bad.

   I think corosync is still necessary under single node cluster. or
is there other new style of configuration?

thanks for help!
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] staggered resource start/stop

2021-04-06 Thread d tbsky
Klaus Wenninger 
> Guess that heavily depends on what you are running inside your VMs.
> If the services inside don't need each other or anything provided by
> the other cluster-resources (or other way round) or everything is
> synchronizing independently from the cluster ...
> What you could still do is make the timeout more generous and combine
> it with checking for a certain boot-state - so that it either proceeds
> after the generous timeout (when something is broken inside the VM)
> or if a certain state is reached.
>
> Klaus
>

boot-state + timeout is perfect. if there is a simple configuration to
make it automatically in the future..
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] staggered resource start/stop

2021-03-31 Thread d tbsky
Klaus Wenninger 
> In this case it might be useful not to wait some defined time
> hoping startup of the VM would have gone far enough that
> the IO load has already decayed enough.
> What about a resource that checks for something running
> inside the VM that indicates that startup has completed?
> Don't remember if the VirtualDomain RA might already
> have such a probe possibility.

   in my experience, windows will eat most of disk IO, but linux
suffers. rhel7/rhel8 sometimes timeout mount /boot or / when disk IO
is too heavy, but windows always boot successfully.
resource check inside the VM is good, but I wonder if detecting failed
for some reason the cluster will stuck. in my case I think boot by
order and  delay can solve 99% of problem. it just looks a little
complex with many delay RA.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] staggered resource start/stop

2021-03-29 Thread d tbsky
Reid Wahl 
>
> An order constraint set with kind=Serialize (which is mentioned in the first 
> reply to the thread you linked) seems like the most logical option to me. You 
> could serialize a set of resource sets, where each inner set contains a 
> VirtualDomain resource and an ocf:heartbeat:Delay resource.
>
>  ⁠5.3.1. Ordering Properties 
> (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#idm46061192464416)
>  ⁠5.6. Ordering Sets of Resources 
> (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#s-resource-sets-ordering)

 thanks a lot! I don't know there is an official RA acting as
delay. that's interesting and useful to me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] staggered resource start/stop

2021-03-28 Thread d tbsky
Hi:
   since the vm start/stop at once will consume disk IO, I want to
start/stop the vm
one-by-one with delay.

search the email-list I found the discussion
https://oss.clusterlabs.org/pipermail/pacemaker/2013-August/043128.html

now I am testing rhel8 with pacemaker 2.0.4. I wonder if there are
new methods to solve the problem. I search the document but didn't
find new parameters for the job.

if possible I don't want to modify VirtualDomain RA which comes
with standard rpm package. maybe I should write a new RA which stagger
the node utilization. but if I reset the node utilization when cluster
restart, there maybe a race condition.

 thanks for help!
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/