Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-22 Thread Eric Robinson
Thanks for the reply. Yes, it's a bit confusing. I did end up using the 
documentation for Corosync 2.X since that seemed newer, but it also assumed 
CentOS/RHEL7 and systemd-based commands. It also incorporates cman, pcsd, 
psmisc, and policycoreutils-pythonwhich, which are all new to me. If there is 
anything I can do to assist with getting the documentation cleaned up, I'd be 
more than glad to help.

--
Eric Robinson

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: Tuesday, August 22, 2017 2:08 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

On Tue, 2017-08-22 at 19:40 +, Eric Robinson wrote:
> The documentation located here…
> 
>  
> 
> http://clusterlabs.org/doc/
> 
>  
> 
> …is confusing because it offers two combinations:
> 
>  
> 
> Pacemaker 1.0 for Corosync 1.x
> 
> Pacemaker 1.1 for Corosync 2.x
> 
>  
> 
> According to the documentation, if you use Corosync 1.x you need 
> Pacemaker 1.0, but if you use Corosync 2.x then you need Pacemaker 
> 1.1.
> 
>  
> 
> However, on my Centos 6.9 system, when I do ‘yum install pacemaker 
> corosync” I get the following versions:
> 
>  
> 
> pacemaker-1.1.15-5.el6.x86_64
> 
> corosync-1.4.7-5.el6.x86_64
> 
>  
> 
> What’s the correct answer? Does Pacemaker 1.1.15 work with Corosync 
> 1.4.7? If so, is the documentation at ClusterLabs misleading?
> 
>  
> 
> --
> Eric Robinson

The page actually offers a third option ... "Pacemaker 1.1 for CMAN or Corosync 
1.x". That's the configuration used by CentOS 6.

However, that's still a bit misleading; the documentation set for "Pacemaker 
1.1 for Corosync 2.x" is the only one that is updated, and it's mostly 
independent of the underlying layer, so you should prefer that set.

I plan to reorganize that page in the coming months, so I'll try to make it 
clearer.

--
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-22 Thread Ken Gaillot
On Tue, 2017-08-22 at 19:40 +, Eric Robinson wrote:
> The documentation located here… 
> 
>  
> 
> http://clusterlabs.org/doc/
> 
>  
> 
> …is confusing because it offers two combinations:
> 
>  
> 
> Pacemaker 1.0 for Corosync 1.x
> 
> Pacemaker 1.1 for Corosync 2.x
> 
>  
> 
> According to the documentation, if you use Corosync 1.x you need
> Pacemaker 1.0, but if you use Corosync 2.x then you need Pacemaker
> 1.1. 
> 
>  
> 
> However, on my Centos 6.9 system, when I do ‘yum install pacemaker
> corosync” I get the following versions:
> 
>  
> 
> pacemaker-1.1.15-5.el6.x86_64
> 
> corosync-1.4.7-5.el6.x86_64
> 
>  
> 
> What’s the correct answer? Does Pacemaker 1.1.15 work with Corosync
> 1.4.7? If so, is the documentation at ClusterLabs misleading? 
> 
>  
> 
> --
> Eric Robinson

The page actually offers a third option ... "Pacemaker 1.1 for CMAN or
Corosync 1.x". That's the configuration used by CentOS 6.

However, that's still a bit misleading; the documentation set for
"Pacemaker 1.1 for Corosync 2.x" is the only one that is updated, and
it's mostly independent of the underlying layer, so you should prefer
that set.

I plan to reorganize that page in the coming months, so I'll try to make
it clearer.

-- 
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-22 Thread Eric Robinson
The documentation located here...

http://clusterlabs.org/doc/

...is confusing because it offers two combinations:

Pacemaker 1.0 for Corosync 1.x
Pacemaker 1.1 for Corosync 2.x

According to the documentation, if you use Corosync 1.x you need Pacemaker 1.0, 
but if you use Corosync 2.x then you need Pacemaker 1.1.

However, on my Centos 6.9 system, when I do 'yum install pacemaker corosync" I 
get the following versions:

pacemaker-1.1.15-5.el6.x86_64
corosync-1.4.7-5.el6.x86_64

What's the correct answer? Does Pacemaker 1.1.15 work with Corosync 1.4.7? If 
so, is the documentation at ClusterLabs misleading?

--
Eric Robinson

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] - webdav/davfs

2017-08-22 Thread Jan Pokorný
Hello Philipp,

[first of all, I've noticed you are practising a pretty bad habit of
starting a new topic/thread by simply responding to an existing one,
hence distorting the clear thread overview of the exchanges going
on for some of us ... please stop that, there's nothing to be afraid
of going for "compose new" and copying the correct recipient email
address (users@clo)]

On 16/08/17 16:53 +0200, philipp.achmuel...@arz.at wrote:
> are there any resource agents available to mount webdav/davfs filesystem?

in pacemaker world[*], it's not customary to have a dedicated resource
agent just for a specific file system, as there is catch-all
ocf:heartbeat:Filesystem.  Admittedly, having a brief look at some
internal details, it will need a little bit of tweaking for it to
work as "mount -t davfs http(s)://addres:/path /mount/point"
under the hood without complaints.

As a hint to start with, I'd try adding "|davfs" after each occurrence
of "|tmpfs", you get the point:

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem
(or locally: /usr/lib/ocf/resource.d/heartbeat/Filesystem)

If you have a positive progress and the solution works for you, please
share your changes as a pull request against the repository per above,
otherwise, it may be best if you'll open a new issue at the same
place.


[*] unlike with rgmanager where the composability of the agents used
to be a significant configuration construct justifying plain/shared
file system dichotomy

-- 
Jan (Poki)


pgpa7c0a8cBNR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: big trouble with a DRBD resource

2017-08-22 Thread Jan Pokorný
On 08/08/17 09:42 -0500, Ken Gaillot wrote:
> On Tue, 2017-08-08 at 10:18 +0200, Ulrich Windl wrote:
> Ken Gaillot  schrieb am 07.08.2017 um 22:26 in 
> Nachricht
>> <1502137587.5788.83.ca...@redhat.com>:
>> 
>> [...]
>>> Unmanaging doesn't stop monitoring a resource, it only prevents starting
>>> and stopping of the resource. That lets you see the current status, even
>>> if you're in the middle of maintenance or what not. You can disable
>> 
>> This feature is discussable IMHO: If you plan to update the RAs, it seems a 
>> bad idea to run the monitor (that is part of the RA). Especially if a 
>> monitor detects a problem while in maintenance (e.g. the updated RA needs a 
>> new or changed parameter), it will cause actions once you stop maintenance 
>> mode, right?
> 
> Generally, it won't cause any actions if the resource is back in a good
> state when you leave maintenance mode. I'm not sure whether failures
> during maintenance mode count toward the migration fail count -- I'm
> guessing they do but shouldn't. If so, it would be possible that the
> cluster decides to move it even if it's in a good state, due to the
> migration threshold. I'll make a note to look into that.
> 
> Unmanaging a resource (or going into maintenance mode) doesn't
> necessarily mean that the user expects that resource to stop working. It
> can be a precaution while doing other work on that node, in which case
> they may very well want to know if it starts having problems.
>  
> You can already disable the monitors if you want, so I don't think it
> needs to be changed in pacemaker. My general outlook is that pacemaker
> should be as conservative as possible (in this case, letting the user
> know when there's an error), but higher-level tools can make different
> assumptions if they feel their users would prefer it. So, pcs and crm
> are free to disable monitors by default when unmanaging a resource, if
> they think that's better.

In fact pcs follows along in this regard (i.e. conservative behaviour
per above by default), but as of 0.9.157[1] -- or rather bug-hunted
0.9.158[2] -- it allows one to disable/enable monitor operations when
unmanaging/managing (respectively) resources in one go with --monitor
modifier.  That should cater the mentioned use case.

[1] http://lists.clusterlabs.org/pipermail/users/2017-April/005459.html
[2] http://lists.clusterlabs.org/pipermail/users/2017-May/005824.html

-- 
Jan (Poki)


pgpbogwliS9bW.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker not starting ISCSI LUNs and Targets

2017-08-22 Thread John Keates
Hi,

I have a strange issue where LIO-T based ISCSI targets and LUNs most of the 
time simply don’t work. They either don’t start, or bounce around until no more 
nodes are tried.
The less-than-usefull information on the logs is like:

Aug 21 22:49:06 [10531] storage-1-prodpengine:  warning: 
check_migration_threshold: Forcing iscsi0-target away from storage-1-prod after 
100 failures (max=100)

Aug 21 22:54:47 storage-1-prod crmd[2757]:   notice: Result of start operation 
for ip-iscsi0-vlan40 on storage-1-prod: 0 (ok)
Aug 21 22:54:47 storage-1-prod iSCSITarget(iscsi0-target)[5427]: WARNING: 
Configuration parameter "tid" is not supported by the iSCSI implementation and 
will be ignored.
Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: 
Parameter auto_add_default_portal is now 'false'.
Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: INFO: Created 
target iqn.2017-08.acccess.net:prod-1-ha. Created TPG 1.
Aug 21 22:54:48 storage-1-prod iSCSITarget(iscsi0-target)[5427]: ERROR: This 
Target already exists in configFS
Aug 21 22:54:48 storage-1-prod crmd[2757]:   notice: Result of start operation 
for iscsi0-target on storage-1-prod: 1 (unknown error)
Aug 21 22:54:49 storage-1-prod iSCSITarget(iscsi0-target)[5536]: INFO: Deleted 
Target iqn.2017-08.access.net:prod-1-ha.
Aug 21 22:54:49 storage-1-prod crmd[2757]:   notice: Result of stop operation 
for iscsi0-target on storage-1-prod: 0 (ok)

Now, the unknown error seems to actually be a targetcli type of error: "This 
Target already exists in configFS”. Checking with targetcli shows zero 
configured items on either node.
Manually starting the LUNs and target gives:


john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-target
Error performing operation: Operation not permitted
Operation start for iscsi0-target (ocf:heartbeat:iSCSITarget) returned 1
 >  stderr: WARNING: Configuration parameter "tid" is not supported by the 
 > iSCSI implementation and will be ignored.
 >  stderr: INFO: Parameter auto_add_default_portal is now 'false'.
 >  stderr: INFO: Created target iqn.2017-08.access.net:prod-1-ha. Created TPG 
 > 1.
 >  stderr: ERROR: This Target already exists in configFS

but now targetcli shows at least the target. Checking with crm status still 
shows the target as stopped.
Manually starting the LUNs gives:


john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun0
Operation start for iscsi0-lun0 (ocf:heartbeat:iSCSILogicalUnit) returned 0
 >  stderr: INFO: Created block storage object iscsi0-lun0 using 
 > /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-root.
 >  stderr: INFO: Created LUN 0.
 >  stderr: DEBUG: iscsi0-lun0 start : 0
john@storage-1-prod:~$ sudo pcs resource debug-start iscsi0-lun1
Operation start for iscsi0-lun1 (ocf:heartbeat:iSCSILogicalUnit) returned 0
 >  stderr: INFO: Created block storage object iscsi0-lun1 using 
 > /dev/zvol/iscsipool0/iscsi/net.access.prod-1-ha-swap.
 >  stderr: /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 378: 
 > /sys/kernel/config/target/core/iblock_0/iscsi0-lun1/wwn/vpd_unit_serial: No 
 > such file or directory
 >  stderr: INFO: Created LUN 1.
 >  stderr: DEBUG: iscsi0-lun1 start : 0

So the second LUN seems to have some bad parameters created by the 
iSCSILogicalUnit script. Checking with targetcli however shows both LUNs and 
the target up and running.
Checking again with crm status (and pcs status) shows all three resources still 
stopped. Since LUNs are colocated with the target and the target still has fail 
counts, I clear them with:

sudo pcs resource cleanup iscsi0-target

Now the LUNs and target are all active in crm status / pcs status. But it’s 
quite a manual process to get this to work! I’m thinking either my 
configuration is bad or there is some bug somewhere in targetcli / LIO or the 
iSCSI heartbeat script.
On top of all the manual work, it still breaks on any action. A move, failover, 
reboot etc. instantly breaks it. Everything else (the underlying ZFS Pool, the 
DRBD device, the IPv4 IP’s etc) moves just fine, it’s only the ISCSI that’s 
being problematic.

Concrete questions:

- Is my config bad?
- Is there a known issue with ISCSI? (I have only found old references about 
ordering)

I have added the output of crm config show as cib.txt and the output of a fresh 
boot of both nodes is:

Current DC: storage-2-prod (version 1.1.16-94ff4df) - partition with quorum
Last updated: Mon Aug 21 22:55:05 2017
Last change: Mon Aug 21 22:36:23 2017 by root via cibadmin on storage-1-prod

2 nodes configured
21 resources configured

Online: [ storage-1-prod storage-2-prod ]

Full list of resources:

 ip-iscsi0-vlan10   (ocf::heartbeat:IPaddr2):   Started storage-1-prod
 ip-iscsi0-vlan20   (ocf::heartbeat:IPaddr2):   Started storage-1-prod
 ip-iscsi0-vlan30   (ocf::heartbeat:IPaddr2):   Started storage-1-prod
 ip-iscsi0-vlan40   (ocf::heartbeat:IPaddr2):   Started storage-1-prod
 Master/Slave