Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Kristoffer Grönlund
Jehan-Guillaume de Rorthais  writes:

>
> I feel like you guys are talking of a solution that already exists and you
> probably already know, eg. "etcd".
>
> Etcd provides:
>
> * a cluster wide key/value storage engine
> * support quorum
> * key locking
> * atomic changes
> * REST API
> * etc...
>
> However, it requires to open a new TCP port, indeed :/
>

My main inspiration and reasoning is indeed to introduce the same
functionality provided by etcd into a corosync-based cluster without
having to add a parallel cluster consensus solution. Simply installing
etcd means 1) now you have two clusters, 2) etcd doesn't handle 2-node
clusters or fencing and doesn't degrade well to a single node, 3)
relying on the presence of the KV-store in pacemaker tools is not an
option unless pacemaker wants to make etcd a requirement.

Cheers,
Kristoffer

> Moreover, as a RA developer, I am currently messing with attrd weird
> behavior[1], so any improvement there is welcomed :)
>
> Cheers,
>
> [1] https://github.com/ClusterLabs/PAF/issues/131
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to cancel a fencing request?

2018-04-09 Thread Ken Gaillot
On Tue, 2018-04-10 at 00:02 +0200, Jehan-Guillaume de Rorthais wrote:
> On Tue, 03 Apr 2018 17:35:43 -0500
> Ken Gaillot  wrote:
> 
> > On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote:
> > > On 04/03/2018 05:43 PM, Ken Gaillot wrote:  
> > > > On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:  
> > > > > On 04/02/2018 04:02 PM, Ken Gaillot wrote:  
> > > > > > On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de
> > > > > > Rorthais
> > > > > > wrote:  
> 
> [...]
> > > > > 
> > > > > -inf constraints like that should effectively prevent
> > > > > stonith-actions from being executed on that nodes.  
> > > > 
> > > > It shouldn't ...
> > > > 
> > > > Pacemaker respects target-role=Started/Stopped for controlling
> > > > execution of fence devices, but location (or even whether the
> > > > device is
> > > > "running" at all) only affects monitors, not execution.
> > > >   
> > > > > Though there are a few issues with location constraints
> > > > > and stonith-devices.
> > > > > 
> > > > > When stonithd brings up the devices from the cib it
> > > > > runs the parts of pengine that fully evaluate these
> > > > > constraints and it would disable the stonith-device
> > > > > if the resource is unrunable on that node.  
> > > > 
> > > > That should be true only for target-role, not everything that
> > > > affects
> > > > runnability  
> > > 
> > > cib_device_update bails out via a removal of the device if
> > > - role == stopped
> > > - node not in allowed_nodes-list of stonith-resource
> > > - weight is negative
> > > 
> > > Wouldn't that include a -inf rule for a node?  
> > 
> > Well, I'll be ... I thought I understood what was going on there.
> > :-)
> > You're right.
> > 
> > I've frequently seen it recommended to ban fence devices from their
> > target when using one device per target. Perhaps it would be better
> > to
> > give a lower (but positive) score on the target compared to the
> > other
> > node(s), so it can be used when no other nodes are available. you
> > could
> > re-manage.  
> 
> Wait, you mean a fencing resource can be triggered from its own
> target? Wat
> happen then? Node suicide and all the cluster nodes are shutdown?
> 
> Thanks,

A node can fence itself, though it will be the cluster's last resort
when no other node can. It doesn't necessarily imply all other nodes
are shut down ... there may be other nodes up, but they are not allowed
execute the relevant fence device for whatever reason. But of course
there might be no other nodes up, in which case, yes, the cluster dies
(the idea being that the node is known to be malfunctioning, so stop it
from possibly corrupting data).
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Jan Pokorný
On 09/04/18 12:10 -0500, Ken Gaillot wrote:
> Based on the list discussion and feedback I could coax out of others, I
> will change the Pacemaker daemon names, including the log tags, for
> 2.0.0-rc3.
> 
> I will add symlinks for the old names, to allow help/version/metadata
> calls in user scripts and higher-level tools to continue working during
> a transitional time. (Even if we update all known tools, we need to
> keep compatibility with existing versions for a good while.)
> 
> I won't change the systemd unit file names or API library names, since
> they aren't one-to-one with the daemons, and will have a bigger impact
> on client apps.
> 
> Here's my current plan:
> 
> Old name  New name
>   
> pacemakerdpacemakerd
> attrd pacemaker-attrd
> cib   pacemaker-confd

Let's restate it: do we indeed want to reinforce a misnomer that CIB
is (user) configuration only?

> crmd  pacemaker-controld
> lrmd  pacemaker-execd
> pengine   pacemaker-schedulerd
> stonithd  pacemaker-fenced
> pacemaker_remoted pacemaker-remoted
> 
> I had planned to use the "pcmk-" prefix, but I kept thinking about the
> goal of making things more intuitive for novice users, and a novice
> user's first instinct will be to search the logs for "pacemaker"

journalctl -u pacemaker?

We could also ship an example syslog configuration that aggegrates
messages from enumerated programs (that we know and user may not offhand)
into a dedicated file (well, this would be quite redundant to native
logging into the file).

IOW, I wouldn't worry that much.

> Most of the names stay under the convenient 15-character limit
> anyway.

-- 
Jan (Poki)


pgpVZPmnF2Ngs.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Jehan-Guillaume de Rorthais
On Tue, 03 Apr 2018 16:35:18 -0500
Ken Gaillot  wrote:

> On Tue, 2018-04-03 at 08:33 +0200, Kristoffer Grönlund wrote:
> > Ken Gaillot  writes:
> >   
> > > > I
> > > > would vote against PREFIX-configd as compared to other cluster
> > > > software,
> > > > I would expect that daemon name to refer to a more generic
> > > > cluster
> > > > configuration key/value store, and that is something that I have
> > > > some
> > > > hope of adding in the future ;) So I'd like to keep "config" or
> > > > "database" for such a possible future component...  
> > > 
> > > What's the benefit of another layer over the CIB?
> > >   
> > 
> > The idea is to provide a more generalized key-value store that other
> > applications built on top of pacemaker can use. Something like a
> > HTTP REST API to a key-value store with transactional semantics
> > provided
> > by the cluster. My understanding so far is that the CIB is too heavy
> > to
> > support that kind of functionality well, and besides that the
> > interface
> > is not convenient for non-cluster applications.  
> 
> My first impression is that it sounds like a good extension to attrd,
> cluster-wide attributes instead of node attributes. (I would envision a
> REST API daemon sitting in front of all the daemons without providing
> any actual functionality itself.)
> 
> The advantage to extending attrd is that it already has code to
> synchronize attributes at start-up, DC election, partition healing,
> etc., as well as features such as write dampening.

I feel like you guys are talking of a solution that already exists and you
probably already know, eg. "etcd".

Etcd provides:

* a cluster wide key/value storage engine
* support quorum
* key locking
* atomic changes
* REST API
* etc...

However, it requires to open a new TCP port, indeed :/

Moreover, as a RA developer, I am currently messing with attrd weird
behavior[1], so any improvement there is welcomed :)

Cheers,

[1] https://github.com/ClusterLabs/PAF/issues/131
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to cancel a fencing request?

2018-04-09 Thread Jehan-Guillaume de Rorthais
On Tue, 03 Apr 2018 16:59:21 -0500
Ken Gaillot  wrote:

> On Tue, 2018-04-03 at 21:33 +0200, Jehan-Guillaume de Rorthais wrote:
[...]
> > > > I'm not sure to understand the doc correctly in regard with this
> > > > property. Does
> > > > pcmk_delay_max delay the request itself or the execution of the
> > > > request?
> > > > 
> > > > In other words, is it:
> > > > 
> > > >   delay -> fence query -> fencing action
> > > > 
> > > > or 
> > > > 
> > > >   fence query -> delay -> fence action
> > > > 
> > > > ?
> > > > 
> > > > The first definition would solve this issue, but not the second.
> > > > As I
> > > > understand it, as soon as the fence query has been sent, the node
> > > > status is
> > > > "UNCLEAN (online)".    
> > > 
> > > The latter -- you're correct, the node is already unclean by that
> > > time.
> > > Since the stop did not succeed, the node must be fenced to continue
> > > safely.  
> > 
> > Thank you for this clarification.
> > 
> > Do you want to patch to add this clarification to the documentation ?  
> 
> Sure, it never hurts :)

I realize this is not as clear as I thought in my mind.

* who holds the action for some time? crmd or stonithd?
* in a two node cluster in fencing race, if one node is killed, what happen to
  its fencing query that was on hold? I suppose it will be overwrite with the
  new CIB version from the other node once it join the cluster again?

Thanks,
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to cancel a fencing request?

2018-04-09 Thread Jehan-Guillaume de Rorthais
On Tue, 03 Apr 2018 17:35:43 -0500
Ken Gaillot  wrote:

> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote:
> > On 04/03/2018 05:43 PM, Ken Gaillot wrote:  
> > > On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:  
> > > > On 04/02/2018 04:02 PM, Ken Gaillot wrote:  
> > > > > On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais
> > > > > wrote:  
[...]
> > > > 
> > > > -inf constraints like that should effectively prevent
> > > > stonith-actions from being executed on that nodes.  
> > > 
> > > It shouldn't ...
> > > 
> > > Pacemaker respects target-role=Started/Stopped for controlling
> > > execution of fence devices, but location (or even whether the
> > > device is
> > > "running" at all) only affects monitors, not execution.
> > >   
> > > > Though there are a few issues with location constraints
> > > > and stonith-devices.
> > > > 
> > > > When stonithd brings up the devices from the cib it
> > > > runs the parts of pengine that fully evaluate these
> > > > constraints and it would disable the stonith-device
> > > > if the resource is unrunable on that node.  
> > > 
> > > That should be true only for target-role, not everything that
> > > affects
> > > runnability  
> > 
> > cib_device_update bails out via a removal of the device if
> > - role == stopped
> > - node not in allowed_nodes-list of stonith-resource
> > - weight is negative
> > 
> > Wouldn't that include a -inf rule for a node?  
> 
> Well, I'll be ... I thought I understood what was going on there. :-)
> You're right.
> 
> I've frequently seen it recommended to ban fence devices from their
> target when using one device per target. Perhaps it would be better to
> give a lower (but positive) score on the target compared to the other
> node(s), so it can be used when no other nodes are available. you could
> re-manage.  

Wait, you mean a fencing resource can be triggered from its own target? Wat
happen then? Node suicide and all the cluster nodes are shutdown?

Thanks,
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trouble starting up PAF cluster for first time

2018-04-09 Thread Jehan-Guillaume de Rorthais
On Mon, 09 Apr 2018 10:26:34 -0600
Casey & Gina  wrote:

> > The PAF resource agent need to connect to your local PostgreSQL instance to
> > check its status in various situations. Parameters "pgport" and "pghost"
> > are by default "5432" and "/tmp" (same defaults than PostgreSQL policy).
> > The "/tmp" value is the directory where PostgreSQL creates its unix socket
> > on startup where local clients can connect through. The unix socket will be
> > eg. "/tmp/.s.PGSQL.5432".
> > 
> > However, the Debian policy is overwrite the "pghost" default value with
> > "/var/run/postgresql", not "/tmp".  
> 
> Since I'm installing PAF using `apt-get install resource-agents-paf`, I would
> expect that it being packaged for Debian/Ubuntu would follow the standard of
> that platform.  Would it be possible for the APT packager to add a patch to
> change this default?

Good idea. I created a feature request issue to track this suggestion. See:
https://github.com/ClusterLabs/PAF/issues/133

> > However, your problem doesn't comes from the start operation here. Right
> > after the start occurs, PAF connects to PostgreSQL to check if it is
> > started as expected and report the real status to Pacemaker. Because it
> > couldn't connect to your instance using the wrong pghost, PAF was reporting
> > an error to Pacemaker.  
> 
> It would be great if there were a way for it to log more information back to
> Pacemaker about what exactly it's executing as it does so, even if this
> required a non-default debug configuration.  I guess either way that
> Pacemaker doesn't have any direct awareness of what commands are being
> executed through the resource agent so it's not a job that Pacemaker could do
> itself.

You can setup Pacemaker in debug mode in /etc/default/pacemaker.conf if my
memory is good. See the very first variable and comments above.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Ken Gaillot
Based on the list discussion and feedback I could coax out of others, I
will change the Pacemaker daemon names, including the log tags, for
2.0.0-rc3.

I will add symlinks for the old names, to allow help/version/metadata
calls in user scripts and higher-level tools to continue working during
a transitional time. (Even if we update all known tools, we need to
keep compatibility with existing versions for a good while.)

I won't change the systemd unit file names or API library names, since
they aren't one-to-one with the daemons, and will have a bigger impact
on client apps.

Here's my current plan:

Old name  New name
  
pacemakerdpacemakerd
attrd pacemaker-attrd
cib   pacemaker-confd
crmd  pacemaker-controld
lrmd  pacemaker-execd
pengine   pacemaker-schedulerd
stonithd  pacemaker-fenced
pacemaker_remoted pacemaker-remoted

I had planned to use the "pcmk-" prefix, but I kept thinking about the
goal of making things more intuitive for novice users, and a novice
user's first instinct will be to search the logs for "pacemaker". Most
of the names stay under the convenient 15-character limit anyway.

On Wed, 2018-03-28 at 12:40 -0500, Ken Gaillot wrote:
> Hi all,
> 
> Andrew Beekhof brought up a potential change to help with reading
> Pacemaker logs.
> 
> Currently, pacemaker daemon names are not intuitive, making it
> difficult to search the system log or understand what each one does.
> 
> The idea is to rename the daemons, with a common prefix, and a name
> that better reflects the purpose.
> 
> I think it's a great idea, but we have to consider two drawbacks:
> 
> * I'm about to release 2.0.0-rc2, and it's late in the cycle for a
> major change. But if we don't do it now, it'll probably sit on the
> back
> burner for a few years, as it wouldn't make sense to introduce such a
> change shortly after a major bump.
> 
> * We can change *only* the names used in the logs, which will be
> simple, but give us inconsistencies with what shows up in "ps", etc.
> Or
> we can try to change everything -- process names, library names, API
> function/structure names -- but that will impact other projects such
> as
> sbd, crmsh, etc., potentially causing compatibility headaches.
> 
> What are your thoughts? Change or not? Now or later? Log tags, or
> everything?
> 
> And the fun part, what would we change them to ...
> 
> Beekhof suggested renaming "pengine" to "cluster-planner", as an
> example.
> 
> I think a prefix indicating pacemaker specifically would be better
> than
> "cluster-" for grepping and intuitiveness.
> 
> For intuitiveness, long names are better ("pacemaker-FUNCTION"). On
> the
> other hand, there's an argument for keeping names to 15 characters,
> which is the default "ps" column width, and a reasonable limit for
> log
> line tags. Maybe "pm-" or "pcmk-"? This prefix could also be used for
> library names.
> 
> Looking at other projects with server processes, most use the
> traditional "d" ending (for example, "rsyslogd"). A few add "-daemon"
> ("rtkit-daemon"), and others don't bother with any suffix ("gdm").
> 
> Here are the current names, with some example replacements:
> 
>  pacemakerd: PREFIX-launchd, PREFIX-launcher
> 
>  attrd: PREFIX-attrd, PREFIX-attributes
> 
>  cib: PREFIX-configd, PREFIX-state
> 
>  crmd: PREFIX-controld, PREFIX-clusterd, PREFIX-controller
> 
>  lrmd: PREFIX-locald, PREFIX-resourced, PREFIX-runner
> 
>  pengine: PREFIX-policyd, PREFIX-scheduler
> 
>  stonithd: PREFIX-fenced, PREFIX-stonithd, PREFIX-executioner
> 
>  pacemaker_remoted: PREFIX-remoted, PREFIX-remote
> 
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trouble starting up PAF cluster for first time

2018-04-09 Thread Casey & Gina
> The PAF resource agent need to connect to your local PostgreSQL instance to
> check its status in various situations. Parameters "pgport" and "pghost" are 
> by
> default "5432" and "/tmp" (same defaults than PostgreSQL policy). The "/tmp"
> value is the directory where PostgreSQL creates its unix socket on startup
> where local clients can connect through. The unix socket will be
> eg. "/tmp/.s.PGSQL.5432".
> 
> However, the Debian policy is overwrite the "pghost" default value with
> "/var/run/postgresql", not "/tmp".

Since I'm installing PAF using `apt-get install resource-agents-paf`, I would 
expect that it being packaged for Debian/Ubuntu would follow the standard of 
that platform.  Would it be possible for the APT packager to add a patch to 
change this default?  Thanks for explaining this - I've been using 
/var/run/postgresql myself for so long that I forgot /tmp was the upstream 
default...

> However, your problem doesn't comes from the start operation here. Right after
> the start occurs, PAF connects to PostgreSQL to check if it is started as
> expected and report the real status to Pacemaker. Because it couldn't connect 
> to
> your instance using the wrong pghost, PAF was reporting an error to Pacemaker.

It would be great if there were a way for it to log more information back to 
Pacemaker about what exactly it's executing as it does so, even if this 
required a non-default debug configuration.  I guess either way that Pacemaker 
doesn't have any direct awareness of what commands are being executed through 
the resource agent so it's not a job that Pacemaker could do itself.

Thanks for the help!
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Kristoffer Grönlund
Jan Pokorný  writes:

> /me keenly joins the bike-shedding
>
> What about pcmk-based/pcmk-infod.  First, we effectively tone down
> "common information/base" from the expanded CIB abbreviation[*1],
> and second, in the former case, we highlight that's the central point
> providing resident data glue (pcmk-datad?[*2]) amongst the other daemons.

pcmk-infod sounds pretty good to me, it indicates data management /
central information handling etc. Plus it contains at least part of one
of the words of the expansion of "CIB".

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org