Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?
Thanks for the reply. Yes, it's a bit confusing. I did end up using the documentation for Corosync 2.X since that seemed newer, but it also assumed CentOS/RHEL7 and systemd-based commands. It also incorporates cman, pcsd, psmisc, and policycoreutils-pythonwhich, which are all new to me. If there is anything I can do to assist with getting the documentation cleaned up, I'd be more than glad to help. -- Eric Robinson -Original Message- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: Tuesday, August 22, 2017 2:08 PM To: Cluster Labs - All topics related to open-source clustering welcomedSubject: Re: [ClusterLabs] ClusterLabs.Org Documentation Problem? On Tue, 2017-08-22 at 19:40 +, Eric Robinson wrote: > The documentation located here… > > > > http://clusterlabs.org/doc/ > > > > …is confusing because it offers two combinations: > > > > Pacemaker 1.0 for Corosync 1.x > > Pacemaker 1.1 for Corosync 2.x > > > > According to the documentation, if you use Corosync 1.x you need > Pacemaker 1.0, but if you use Corosync 2.x then you need Pacemaker > 1.1. > > > > However, on my Centos 6.9 system, when I do ‘yum install pacemaker > corosync” I get the following versions: > > > > pacemaker-1.1.15-5.el6.x86_64 > > corosync-1.4.7-5.el6.x86_64 > > > > What’s the correct answer? Does Pacemaker 1.1.15 work with Corosync > 1.4.7? If so, is the documentation at ClusterLabs misleading? > > > > -- > Eric Robinson The page actually offers a third option ... "Pacemaker 1.1 for CMAN or Corosync 1.x". That's the configuration used by CentOS 6. However, that's still a bit misleading; the documentation set for "Pacemaker 1.1 for Corosync 2.x" is the only one that is updated, and it's mostly independent of the underlying layer, so you should prefer that set. I plan to reorganize that page in the coming months, so I'll try to make it clearer. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?
On Tue, 2017-08-22 at 19:40 +, Eric Robinson wrote: > The documentation located here… > > > > http://clusterlabs.org/doc/ > > > > …is confusing because it offers two combinations: > > > > Pacemaker 1.0 for Corosync 1.x > > Pacemaker 1.1 for Corosync 2.x > > > > According to the documentation, if you use Corosync 1.x you need > Pacemaker 1.0, but if you use Corosync 2.x then you need Pacemaker > 1.1. > > > > However, on my Centos 6.9 system, when I do ‘yum install pacemaker > corosync” I get the following versions: > > > > pacemaker-1.1.15-5.el6.x86_64 > > corosync-1.4.7-5.el6.x86_64 > > > > What’s the correct answer? Does Pacemaker 1.1.15 work with Corosync > 1.4.7? If so, is the documentation at ClusterLabs misleading? > > > > -- > Eric Robinson The page actually offers a third option ... "Pacemaker 1.1 for CMAN or Corosync 1.x". That's the configuration used by CentOS 6. However, that's still a bit misleading; the documentation set for "Pacemaker 1.1 for Corosync 2.x" is the only one that is updated, and it's mostly independent of the underlying layer, so you should prefer that set. I plan to reorganize that page in the coming months, so I'll try to make it clearer. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] ClusterLabs.Org Documentation Problem?
The documentation located here... http://clusterlabs.org/doc/ ...is confusing because it offers two combinations: Pacemaker 1.0 for Corosync 1.x Pacemaker 1.1 for Corosync 2.x According to the documentation, if you use Corosync 1.x you need Pacemaker 1.0, but if you use Corosync 2.x then you need Pacemaker 1.1. However, on my Centos 6.9 system, when I do 'yum install pacemaker corosync" I get the following versions: pacemaker-1.1.15-5.el6.x86_64 corosync-1.4.7-5.el6.x86_64 What's the correct answer? Does Pacemaker 1.1.15 work with Corosync 1.4.7? If so, is the documentation at ClusterLabs misleading? -- Eric Robinson ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] - webdav/davfs
Hello Philipp, [first of all, I've noticed you are practising a pretty bad habit of starting a new topic/thread by simply responding to an existing one, hence distorting the clear thread overview of the exchanges going on for some of us ... please stop that, there's nothing to be afraid of going for "compose new" and copying the correct recipient email address (users@clo)] On 16/08/17 16:53 +0200, philipp.achmuel...@arz.at wrote: > are there any resource agents available to mount webdav/davfs filesystem? in pacemaker world[*], it's not customary to have a dedicated resource agent just for a specific file system, as there is catch-all ocf:heartbeat:Filesystem. Admittedly, having a brief look at some internal details, it will need a little bit of tweaking for it to work as "mount -t davfs http(s)://addres:/path /mount/point" under the hood without complaints. As a hint to start with, I'd try adding "|davfs" after each occurrence of "|tmpfs", you get the point: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem (or locally: /usr/lib/ocf/resource.d/heartbeat/Filesystem) If you have a positive progress and the solution works for you, please share your changes as a pull request against the repository per above, otherwise, it may be best if you'll open a new issue at the same place. [*] unlike with rgmanager where the composability of the agents used to be a significant configuration construct justifying plain/shared file system dichotomy -- Jan (Poki) pgpa7c0a8cBNR.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: big trouble with a DRBD resource
On 08/08/17 09:42 -0500, Ken Gaillot wrote: > On Tue, 2017-08-08 at 10:18 +0200, Ulrich Windl wrote: > Ken Gaillotschrieb am 07.08.2017 um 22:26 in > Nachricht >> <1502137587.5788.83.ca...@redhat.com>: >> >> [...] >>> Unmanaging doesn't stop monitoring a resource, it only prevents starting >>> and stopping of the resource. That lets you see the current status, even >>> if you're in the middle of maintenance or what not. You can disable >> >> This feature is discussable IMHO: If you plan to update the RAs, it seems a >> bad idea to run the monitor (that is part of the RA). Especially if a >> monitor detects a problem while in maintenance (e.g. the updated RA needs a >> new or changed parameter), it will cause actions once you stop maintenance >> mode, right? > > Generally, it won't cause any actions if the resource is back in a good > state when you leave maintenance mode. I'm not sure whether failures > during maintenance mode count toward the migration fail count -- I'm > guessing they do but shouldn't. If so, it would be possible that the > cluster decides to move it even if it's in a good state, due to the > migration threshold. I'll make a note to look into that. > > Unmanaging a resource (or going into maintenance mode) doesn't > necessarily mean that the user expects that resource to stop working. It > can be a precaution while doing other work on that node, in which case > they may very well want to know if it starts having problems. > > You can already disable the monitors if you want, so I don't think it > needs to be changed in pacemaker. My general outlook is that pacemaker > should be as conservative as possible (in this case, letting the user > know when there's an error), but higher-level tools can make different > assumptions if they feel their users would prefer it. So, pcs and crm > are free to disable monitors by default when unmanaging a resource, if > they think that's better. In fact pcs follows along in this regard (i.e. conservative behaviour per above by default), but as of 0.9.157[1] -- or rather bug-hunted 0.9.158[2] -- it allows one to disable/enable monitor operations when unmanaging/managing (respectively) resources in one go with --monitor modifier. That should cater the mentioned use case. [1] http://lists.clusterlabs.org/pipermail/users/2017-April/005459.html [2] http://lists.clusterlabs.org/pipermail/users/2017-May/005824.html -- Jan (Poki) pgpbogwliS9bW.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets
Hi, I have a strange issue where LIO-T based ISCSI targets and LUNs most of the time simply don’t work. They either don’t start, or bounce around until no more nodes are tried. The less-than-usefull information on the logs is like: Aug 21 22:49:06 [10531] storage-1-prodpengine: warning: check_migration_threshold: Forcing iscsi0-target away from storage-1-prod after 100 failures (max=100) Aug 21 22:54:47 storage-1-prod crmd[2757]: notice: Result of start operation for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok) Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]: WARNING: Configuration parameter "tid" is not supported by the iSCSI implementation and will be ignored. Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: Parameter auto_add_default_portal is now 'false'. Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: Created target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1. Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR: This Target already exists in configFS Aug 21 22:54:48 storage-1-prod crmd[2757]: notice: Result of start operation for iscsi0-target on storage-1-prod: 1 (unknown error) Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO: Deleted Target iqn.2017-08.access.net:prod-1-ha. Aug 21 22:54:49 storage-1-prod crmd[2757]: notice: Result of stop operation for iscsi0-target on storage-1-prod: 0 (ok) Now, the unknown error seems to actually be a targetcli type of error: "This Target already exists in configFS”. Checking with targetcli shows zero configured items on either node. Manually starting the LUNs and target gives: john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target Error performing operation: Operation not permitted Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1 > stderr: WARNING: Configuration parameter "tid" is not supported by the > iSCSI implementation and will be ignored. > stderr: INFO: Parameter auto_add_default_portal is now 'false'. > stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha. Created TPG > 1. > stderr: ERROR: This Target already exists in configFS but now targetcli shows at least the target. Checking with crm status still shows the target as stopped. Manually starting the LUNs gives: john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0 Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit) returned 0 > stderr: INFO: Created block storage object iscsi0-lun0 using > /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root. > stderr: INFO: Created LUN 0. > stderr: DEBUG: iscsi0-lun0 start : 0 john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1 Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit) returned 0 > stderr: INFO: Created block storage object iscsi0-lun1 using > /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap. > stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 378: > /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial: No > such file or directory > stderr: INFO: Created LUN 1. > stderr: DEBUG: iscsi0-lun1 start : 0 So the second LUN seems to have some bad parameters created by the iSCSILogicalUnit script. Checking with targetcli however shows both LUNs and the target up and running. Checking again with crm status (and pcs status) shows all three resources still stopped. Since LUNs are colocated with the target and the target still has fail counts, I clear them with: sudo pcs resource cleanup iscsi0-target Now the LUNs and target are all active in crm status / pcs status. But it’s quite a manual process to get this to work! I’m thinking either my configuration is bad or there is some bug somewhere in targetcli / LIO or the iSCSI heartbeat script. On top of all the manual work, it still breaks on any action. A move, failover, reboot etc. instantly breaks it. Everything else (the underlying ZFS Pool, the DRBD device, the IPv4 IP’s etc) moves just fine, it’s only the ISCSI that’s being problematic. Concrete questions: - Is my config bad? - Is there a known issue with ISCSI? (I have only found old references about ordering) I have added the output of crm config show as cib.txt and the output of a fresh boot of both nodes is: Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with quorum Last updated: Mon Aug 21 22:55:05 2017 Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on storage-1-prod 2 nodes configured 21 resources configured Online: [ storage-1-prod storage-2-prod ] Full list of resources: ip-iscsi0-vlan10 (ocf::heartbeat:IPaddr2): Started storage-1-prod ip-iscsi0-vlan20 (ocf::heartbeat:IPaddr2): Started storage-1-prod ip-iscsi0-vlan30 (ocf::heartbeat:IPaddr2): Started storage-1-prod ip-iscsi0-vlan40 (ocf::heartbeat:IPaddr2): Started storage-1-prod Master/Slave