Re: [ClusterLabs] dead cluster after centos update

2017-10-23 Thread Ken Gaillot
On Mon, 2017-10-23 at 15:02 -0500, Dimitri Maziuk wrote:
> I've a 2-node ZFS cluster that was working fine until redhat improved
> my
> user experience. Now, it's dead.
> 
> How do I get it back?
> 
> There was two resources: ZFS and IP. ther was fence_scsi that is now
> logging
> > Oct 23 14:22:45 hereland fence_scsi: Failed: nodename or key is
> > required
> > Oct 23 14:22:45 hereland fence_scsi: Please use '-h' for usage
> > Oct 23 14:22:45 hereland stonith-ng[1753]: warning:
> > fence_scsi[1929] stderr: [ Failed: nodename or key is required ]

I'm not familiar with fence_scsi, but the above looks like the issue.
Does your fence device configuration have either nodename or key?

If your existing configuration was working previously and now isn't,
open a support ticket with Red Hat, as it sounds like a regression.

> > Oct 23 14:22:45 hereland stonith-ng[1753]: warning:
> > fence_scsi[1929] stderr: [  ]
> > Oct 23 14:22:45 hereland stonith-ng[1753]: warning:
> > fence_scsi[1929] stderr: [ Please use '-h' for usage ]
> > Oct 23 14:22:45 hereland stonith-ng[1753]: warning:
> > fence_scsi[1929] stderr: [  ]
> > Oct 23 14:22:45 hereland stonith-ng[1753]:  notice: Disabling port
> > list queries for fence-tank (-201): (null)
> > Oct 23 14:22:45 hereland stonith-ng[1753]:  notice: Operation on of
> > hereland-eth1 by  for crmd.1784@flemish-eth1.3e4dd918: No
> > such device
> > Oct 23 14:22:45 hereland crmd[1757]:   error: Unfencing of
> > hereland-eth1 by  failed: No such device (-19)

Probably due to the configuration issue, unfencing fails. fence_scsi
requires unfencing, which in its case means allowing the node access to
the disk. Since that fails, nothing else can proceed.

> 
> I disabled it for now, and
> 
> pcs resource debug-start resource-zfs --full
> 
> works fine: the pool is imported, filesystems are mounted and
> exported
> -- but the resources remain stopped no matter what.
> 
> I don't see anything useful in the logs. How do I unfsck this mess?
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.18 Release Candidate 3

2017-10-23 Thread Ken Gaillot
The third release candidate for Pacemaker version 1.1.18 is now
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18-
rc3

This pre-release fixes a few minor bugs. For details, see the
ChangeLog.

This is likely to be the last release candidate before the final
release next week. Any testing you can do is very welcome.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] (Not) Coming in 1.1.18: deprecating stonith-enabled

2017-10-23 Thread Ken Gaillot
Well, this turned out to be trickier than initially imagined.

It's actually related to a broader issue in the policy engine that has
only been addressed piecemeal, which is that we need to make sure the
learning of a given piece of information happens before there is a need
to use it.

To properly deprecate stonith-enabled, we'll have to move more of the
information-gathering to the beginning of the policy engine's process.
As part of that, I'll probably try to define more clearly the
information that can be relied on at any given point.

In any case, that's a bigger project than the 1.1.18 (or 2.0.0) time
frame.

On Mon, 2017-09-25 at 18:53 -0500, Ken Gaillot wrote:
> Hi all,
> 
> I thought I'd call attention to one of the most visible deprecations
> coming in 1.1.18: stonith-enabled. In order to deprecate that option,
> we have to provide an alternate way to do the things that it does.
> 
> stonith-enabled determines whether a resource's "requires" meta-
> attribute defaults to "quorum" or "fencing". This already has an
> alternate method, the rsc_defaults section.
> 
> For everything else, e.g. whether to fence misbehaving nodes, and
> whether to start resources when fencing hasn't been configured, the
> cluster will now check additional criteria.
> 
> This my plan at the moment:
> 
> Fencing will be considered possible in a configuration if: "no-
> quorum-
> policy" is "suicide", any resource has "requires" set to "unfencing"
> or
> "fencing" (the default), any operation has "on-fail" set to "fence"
> (the default for stop operations), or any fence resource has been
> configured.
> 
> If fencing is not possible, the cluster will behave as if stonith-
> enabled is false (even if it's not).
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] dead cluster after centos update

2017-10-23 Thread Dimitri Maziuk

I've a 2-node ZFS cluster that was working fine until redhat improved my
user experience. Now, it's dead.

How do I get it back?

There was two resources: ZFS and IP. ther was fence_scsi that is now
logging
> Oct 23 14:22:45 hereland fence_scsi: Failed: nodename or key is required
> Oct 23 14:22:45 hereland fence_scsi: Please use '-h' for usage
> Oct 23 14:22:45 hereland stonith-ng[1753]: warning: fence_scsi[1929] stderr: 
> [ Failed: nodename or key is required ]
> Oct 23 14:22:45 hereland stonith-ng[1753]: warning: fence_scsi[1929] stderr: 
> [  ]
> Oct 23 14:22:45 hereland stonith-ng[1753]: warning: fence_scsi[1929] stderr: 
> [ Please use '-h' for usage ]
> Oct 23 14:22:45 hereland stonith-ng[1753]: warning: fence_scsi[1929] stderr: 
> [  ]
> Oct 23 14:22:45 hereland stonith-ng[1753]:  notice: Disabling port list 
> queries for fence-tank (-201): (null)
> Oct 23 14:22:45 hereland stonith-ng[1753]:  notice: Operation on of 
> hereland-eth1 by  for crmd.1784@flemish-eth1.3e4dd918: No such device
> Oct 23 14:22:45 hereland crmd[1757]:   error: Unfencing of hereland-eth1 by 
>  failed: No such device (-19)

I disabled it for now, and

pcs resource debug-start resource-zfs --full

works fine: the pool is imported, filesystems are mounted and exported
-- but the resources remain stopped no matter what.

I don't see anything useful in the logs. How do I unfsck this mess?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org