Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-10 Thread Andrew Beekhof
On Wed, Apr 11, 2018 at 12:42 AM, Ken Gaillot  wrote:

> On Tue, 2018-04-10 at 08:50 +0200, Jehan-Guillaume de Rorthais wrote:
> > On Tue, 10 Apr 2018 00:54:01 +0200
> > Jan Pokorný  wrote:
> >
> > > On 09/04/18 12:10 -0500, Ken Gaillot wrote:
> > > > Based on the list discussion and feedback I could coax out of
> > > > others, I
> > > > will change the Pacemaker daemon names, including the log tags,
> > > > for
> > > > 2.0.0-rc3.
> > > >
> > > > I will add symlinks for the old names, to allow
> > > > help/version/metadata
> > > > calls in user scripts and higher-level tools to continue working
> > > > during
> > > > a transitional time. (Even if we update all known tools, we need
> > > > to
> > > > keep compatibility with existing versions for a good while.)
> > > >
> > > > I won't change the systemd unit file names or API library names,
> > > > since
> > > > they aren't one-to-one with the daemons, and will have a bigger
> > > > impact
> > > > on client apps.
> > > >
> > > > Here's my current plan:
> > > >
> > > > Old name  New name
> > > >   
> > > > pacemakerdpacemakerd
> > > > attrd pacemaker-attrd
> > > > cib   pacemaker-confd
> > >
> > > Let's restate it: do we indeed want to reinforce a misnomer that
> > > CIB
> > > is (user) configuration only?
> >
> > Agree. FWIW, +1 for the "Infod" suggestion.
>
> I'm not opposed to it, but no option suggested so far intuitively
> covers what the cib does (including "cib"). All the daemons maintain
> information of some sort for querying and setting -- attrd maintains
> node attributes, stonithd maintains fence history, lrmd maintains a
> list of registered resources, etc.
>
> -confd, -cfgd, or -configd would emphasize the configuration aspect of
> cib's workload, at the cost of hiding the dynamic status aspect.
>

you know... I wouldn't be opposed to running two copies (one for config,
one for status) and having the crmd combine the two before sending to the
PE.
i've toyed with the idea in the past to alleviate some of the performance
issues


>
> -infod, -datad, or -stated (or cib) would emphasize the broadness of
> cib's workload, at the cost of being vague and including aspects of
> other daemons' responsibilities.
>
> -iod would emphasize cib's role as a disk I/O abstraction layer, at the
>  cost of downplaying the more essential configuration+status roles.
>
> Given all that, I'm leaning to -confd because configuration management
> is the cib's primary responsibility.


+1


> The CIB stored on disk is entirely
> configuration (status is only in-memory), so it seems most intuitive to
> me. Keep in mind that the primary purpose of this renaming is to make
> the log files easier to read, so picture the name in front of a typical
> CIB log message (also considering that "info" is a log severity).
>
> But if there's a consensus otherwise, I'm willing to change it.
>
> >
> > > > crmd  pacemaker-controld
> > > > lrmd  pacemaker-execd
> > > > pengine   pacemaker-schedulerd
> > > > stonithd  pacemaker-fenced
> > > > pacemaker_remoted pacemaker-remoted
> > > >
> > > > I had planned to use the "pcmk-" prefix, but I kept thinking
> > > > about the
> > > > goal of making things more intuitive for novice users, and a
> > > > novice
> > > > user's first instinct will be to search the logs for
> > > > "pacemaker"
> > >
> > > journalctl -u pacemaker?
> > >
> > > We could also ship an example syslog configuration that aggegrates
> > > messages from enumerated programs (that we know and user may not
> > > offhand)
> > > into a dedicated file (well, this would be quite redundant to
> > > native
> > > logging into the file).
> > >
> > > IOW, I wouldn't worry that much.
> >
> > +1
> >
> > My 2¢...
> --
> Ken Gaillot 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-03 Thread Andrew Beekhof
On Tue, Apr 3, 2018 at 10:18 PM, Jehan-Guillaume de Rorthais <
j...@dalibo.com> wrote:

> On Tue, 3 Apr 2018 09:58:50 +1000
> Andrew Beekhof <abeek...@redhat.com> wrote:
> > On Fri, Mar 30, 2018 at 8:36 PM, Jehan-Guillaume de Rorthais <
> > j...@dalibo.com> wrote:
> > > On Thu, 29 Mar 2018 09:32:41 +1100
> > > Andrew Beekhof <abeek...@redhat.com> wrote:
> > > > On Thu, Mar 29, 2018 at 8:07 AM, Jehan-Guillaume de Rorthais <
> > > > j...@dalibo.com> wrote:
> [...]
> > > > Though by now there is surely a decent library for logging to files
> with
> > > > sub-second timestamps - if we could incorporate that into libqb and
> have
> > > > corosync use it too...
> > >
> > > In my opinion, this is neither the role of libqb
> >
> >
> > libqb has the logging library that pacemaker and corosync use.
> > it is absolutely where this change should happen
>
> I meant that this could be handled 100% by some external dedicated daemon,
> eg.
> syslog or journalctl.
>
> I was thinking about code simplification.
>
> [...]
>
> > > > then we could consider 1 log per daemon.
> > > > In which case, the outcome of the PREFIX-SUFFIX discussion above
> could
> > > > instead be used for /var/log/pacemaker/SUFFIX
> > >
> > > I think the best would be to have one log for Corosync, one log for
> > > Pacemaker (and all its sub-process/childs).
> > >
> > > Another good path toward understandable log file would be to hide what
> > > process is speaking. Experienced user will still know that "LOG:
> setting
> > > failcount to 3" comes from CRMd and "DEBUG1: failcount setted to 3"
> comes
> > > from attrd.
> > >
> > > However, this would probably be a mess...because again, the cause
> might be
> > > logged AFTER the effects/reaction :/
> >
> > why?  i've never seen that be the case
>
> Please find in attachment a demonstration of such behavior I found last
> week.
> Note that this comes from a Sles 12 SP1 using Pacemaker 1.1.13...People
> there
> were not able to upgrade the servers before we built the PoC together.
>
> First column is the order in the log file. Second column is how I would
> expect
> the messages to appear in the log.
>
> Eg. I would expect L.11
>
>   "pengine: notice: process_pe_message: Calculated Transition 29: [...]"
>
> Before CRMd begin to process it at L.6-10.
>
> Another exemple, I would expect LRMd L.35:
>
>   "lrmd:  notice: log_finished:  finished - rsc:pgsqld action:notify"
>
> Before the CRMd receive the result L.26...
>


No, none of these are out of order.


>
> Maybe this is something fixed in 1.1.18 or 2.0.0, I just couldn't find
> commit
> messages related to this when searching through them quickly.
>
> > > Maybe the solution is to log only messages from CRMD, where all the
> > > orchestration comes from. Everything else might go to some debug level
> if
> > > needed.
> >
> > sorry, that is a terrible idea
>
> I was throwing random ideas as I'm not familiar with internal architecture.
> Maybe it should be pacemakerd to gather messages from all other messages
> and
> spit them to stderr so they are captured by journald or redirected to a
> file...
>
> Regards,
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-02 Thread Andrew Beekhof
On Fri, Mar 30, 2018 at 8:36 PM, Jehan-Guillaume de Rorthais <
j...@dalibo.com> wrote:

> On Thu, 29 Mar 2018 09:32:41 +1100
> Andrew Beekhof <abeek...@redhat.com> wrote:
>
> > On Thu, Mar 29, 2018 at 8:07 AM, Jehan-Guillaume de Rorthais <
> > j...@dalibo.com> wrote:
> >
> > > On Wed, 28 Mar 2018 15:50:26 -0500
> > > Ken Gaillot <kgail...@redhat.com> wrote:
> > > [...]
> > > > > >  pacemakerd: PREFIX-launchd, PREFIX-launcher
> > > > >
> > > > > pacemakerd, alone, sounds perfectly fine to me.
> > > >
> > > > Agreed -- but the argument against it is to allow something like
> "grep
> > > > pcmk /var/log/messages" to get everything.
> > >
> > > Then I would pick PREFIX-master... But then I would hate having a
> > > pcmk-master
> > > process :(
> > >
> > > Maybe all the logging information should be handled by one process so
> > > syslog/journald/stderr are happy?
> > >
> > > Despite it's multiprocess architecture, PostgreSQL either:
> > >
> > > * emit to syslog/journald using the same ident for all process
> > > * capture stderr of all process and redirect to one file.
> > >
> > > [...]
> > > > > >  cib: PREFIX-configd, PREFIX-state
> > > > >
> > > > > Tricky...It deals with both config and state...
> > > > >
> > > > > By the way, why in the first place the CIB messages must be in the
> > > > > log file?
> > > > > Filling logs with XML diffs resulting from other actions already
> > > > > logged earlier
> > > > > sounds like duplicated informations.
> > > >
> > > > They are kept out of the syslog, which is where most users are
> expected
> > > > to look. They are in the detail log, which is for more advanced
> > > > troubleshooting.
> > >
> > > oh ok, sorry.
> > >
> > > I just finished a day reading log file /var/log/pacemaker.log on a
> Suse 12
> > > SP1
> > > that was packed with XML diffs :/
> > >
> > > Maybe I should have checked /var/log/messages or the syslog setup...
> > >
> > > But honestly, I hate having mixed logs all packed in one same files
> > > like /var/log/messages :/
> > >
> >
> > Theres a very good reason for it though - the relative timing of events
> is
> > the only way to establish cause and effect.
>
> Yes. But sometime, messages are not so well mixed, with causes showing up
> after
> effects in logs...
>
> > Though by now there is surely a decent library for logging to files with
> > sub-second timestamps - if we could incorporate that into libqb and have
> > corosync use it too...
>
> In my opinion, this is neither the role of libqb


libqb has the logging library that pacemaker and corosync use.
it is absolutely where this change should happen



> or corosync...Or you would
> have to include some more configuration params to be able to set the log
> directory, file, format, rotation, etc.
>
> Maybe the easiest path is to rely on syslog/journald. They are both able
> to log
> with sub-second timestamps (at least journald do) and provide the
> administrator
> everything they need to deal with the logs.
>
> > then we could consider 1 log per daemon.
> > In which case, the outcome of the PREFIX-SUFFIX discussion above could
> > instead be used for /var/log/pacemaker/SUFFIX
>
> I think the best would be to have one log for Corosync, one log for
> Pacemaker
> (and all its sub-process/childs).
>
> Another good path toward understandable log file would be to hide what
> process
> is speaking. Experienced user will still know that "LOG: setting failcount
> to 3" comes from CRMd and "DEBUG1: failcount setted to 3" comes from attrd.
>
> However, this would probably be a mess...because again, the cause might be
> logged AFTER the effects/reaction :/
>

why?  i've never seen that be the case


>
> Maybe the solution is to log only messages from CRMD, where all the
> orchestration comes from. Everything else might go to some debug level if
> needed.
>

sorry, that is a terrible idea


>
> > > > The detail log messages are useful mainly because CIB changes can
> come
> > > > from external sources, not just cluster daemons,
> > >
> > > Sure, but a simple info messages explaining that «the CIB has been
> updated
> > > by
> > > tool "" » sound enough to me.
> > >
> >
> > to you, because you know what you changed.
> > anyone else reading the logs (ie. people doing support) hasn't got a clue
> > and knowing who changed what is crucial to understanding what the cluster
> > did and why
>
> I'm doing some support as well, infrequently. I suppose crm_report is
> probably
> enough to gather previous CIB and compare them.
>
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-03-28 Thread Andrew Beekhof
On Thu, Mar 29, 2018 at 6:41 AM, Jehan-Guillaume de Rorthais <
j...@dalibo.com> wrote:

> On Wed, 28 Mar 2018 12:40:25 -0500
> Ken Gaillot <kgail...@redhat.com> wrote:
>
> > Hi all,
> >
> > Andrew Beekhof brought up a potential change to help with reading
> > Pacemaker logs.
> >
> > Currently, pacemaker daemon names are not intuitive, making it
> > difficult to search the system log or understand what each one does.
> >
> > The idea is to rename the daemons, with a common prefix, and a name
> > that better reflects the purpose.
> >
> > I think it's a great idea, but we have to consider two drawbacks:
> >
> > * I'm about to release 2.0.0-rc2, and it's late in the cycle for a
> > major change. But if we don't do it now, it'll probably sit on the back
> > burner for a few years, as it wouldn't make sense to introduce such a
> > change shortly after a major bump.
> >
> > * We can change *only* the names used in the logs, which will be
> > simple, but give us inconsistencies with what shows up in "ps", etc. Or
> > we can try to change everything -- process names, library names, API
> > function/structure names -- but that will impact other projects such as
> > sbd, crmsh, etc., potentially causing compatibility headaches.
> >
> > What are your thoughts? Change or not?
>
> change
>
> > Now or later?
>
> I'm not sure how much work it involves during rc time... But I would pick
> now
> if possible.
>
> > Log tags, or everything?
>
> Everything.
>
> I'm from the PostgreSQL galaxy. In this galaxy, parameter
> "update_process_title" controls if PostgreSQL should set human readable
> process
> title and is "on" by default. In ps, it gives:
>
>   ioguix@firost:~$ ps f -o cmd -C postmaster
>   CMD
>   postmaster -D /home/ioguix/var/lib/pgsql/96
>\_ postgres: checkpointer process
>\_ postgres: writer process
>\_ postgres: wal writer process
>\_ postgres: autovacuum launcher process
>\_ postgres: stats collector process
>
> Some process might even add some useful information about their status,
> eg.:
> current replication status and location (process wal receiver/sender) or
> last
> WAL archived (process archiver).
>
> In source code, it boils down to this function:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=
> blob;f=src/backend/utils/misc/ps_status.c;h=5742de0802a54e38a2c2e3cfa8e8f4
> 45b6822883;hb=65c6b53991e1c56f6a0700ae26928962ddf2b9fe#l321


sbd also has similar code


>
>
> > And the fun part, what would we change them to ...
> >
> > Beekhof suggested renaming "pengine" to "cluster-planner", as an
> > example.
> >
> > I think a prefix indicating pacemaker specifically would be better than
> > "cluster-" for grepping and intuitiveness.
> >
> > For intuitiveness, long names are better ("pacemaker-FUNCTION"). On the
> > other hand, there's an argument for keeping names to 15 characters,
> > which is the default "ps" column width, and a reasonable limit for log
> > line tags. Maybe "pm-" or "pcmk-"? This prefix could also be used for
> > library names.
>
> "pcmk-*" sounds better to me. "cluster" has so many different definiion in
> people mind...
>
> > Looking at other projects with server processes, most use the
> > traditional "d" ending (for example, "rsyslogd"). A few add "-daemon"
> > ("rtkit-daemon"), and others don't bother with any suffix ("gdm").
> >
> > Here are the current names, with some example replacements:
> >
> >  pacemakerd: PREFIX-launchd, PREFIX-launcher
>
> pacemakerd, alone, sounds perfectly fine to me.
>
> >  attrd: PREFIX-attrd, PREFIX-attributes
>
> PREFIX-attributes
>

PREFIX-keystored ?


>
> >  cib: PREFIX-configd, PREFIX-state
>
> Tricky...It deals with both config and state...
>

PREFIX-datastore ?


>
> By the way, why in the first place the CIB messages must be in the log
> file?
>

Because its the only record of what changed and when.
Which is almost never important, except for those times when the
information is critical.
Which describes almost all of pacemaker's logging unfortunately.

On a related topic, I think if we made file logging mandatory then we could
move a lot more logs (including most of the cib logs) out of syslog.


> Filling logs with XML diffs resulting from other actions already logged
> earlier
> sounds like duplicated informations.
>

We generally only log configuration 

Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-01 Thread Andrew Beekhof
On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
 wrote:
> When I recently tried to make use of the DEGRADED monitoring results,
> I found out that it does still not work.
>
> Because LRMD choses to filter them in ocf2uniform_rc(),
> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>
> See patch suggestion below.
>
> It also filters away the other "special" rc values.
> Do we really not want to see them in crmd/pengine?

I would think we do.

> Why does LRMD think it needs to outsmart the pengine?

Because the person that implemented the feature incorrectly assumed
the rc would be passed back unmolested.

>
> Note: I did build it, but did not use this yet,
> so I have no idea if the rest of the implementation of the DEGRADED
> stuff works as intended or if there are other things missing as well.

failcount might be the other place that needs some massaging.
specifically, not incrementing it when a degraded rc comes through

>
> Thougts?\

looks good to me

>
> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
> index 724edb7..39a7dd1 100644
> --- a/lrmd/lrmd.c
> +++ b/lrmd/lrmd.c
> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
> *stdout_data)
>  static int
>  ocf2uniform_rc(int rc)
>  {
> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
> -return PCMK_OCF_UNKNOWN_ERROR;
> +switch (rc) {
> +default:
> +   return PCMK_OCF_UNKNOWN_ERROR;
> +
> +case PCMK_OCF_OK:
> +case PCMK_OCF_UNKNOWN_ERROR:
> +case PCMK_OCF_INVALID_PARAM:
> +case PCMK_OCF_UNIMPLEMENT_FEATURE:
> +case PCMK_OCF_INSUFFICIENT_PRIV:
> +case PCMK_OCF_NOT_INSTALLED:
> +case PCMK_OCF_NOT_CONFIGURED:
> +case PCMK_OCF_NOT_RUNNING:
> +case PCMK_OCF_RUNNING_MASTER:
> +case PCMK_OCF_FAILED_MASTER:
> +
> +case PCMK_OCF_DEGRADED:
> +case PCMK_OCF_DEGRADED_MASTER:
> +   return rc;
> +
> +#if 0
> +   /* What about these?? */

yes, these should get passed back as-is too

> +/* 150-199 reserved for application use */
> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by 
> disconnection of the LRM API to a local or remote node */
> +
> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */
> +PCMK_OCF_UNKNOWN   = 193, /* State of the service is unknown - used 
> for recording in-flight operations */
> +PCMK_OCF_SIGNAL= 194,
> +PCMK_OCF_NOT_SUPPORTED = 195,
> +PCMK_OCF_PENDING   = 196,
> +PCMK_OCF_CANCELLED = 197,
> +PCMK_OCF_TIMEOUT   = 198,
> +PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
> +#endif
>  }
> -
> -return rc;
>  }
>
>  static int
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] question about dc-deadtime

2017-01-09 Thread Andrew Beekhof
On Fri, Dec 16, 2016 at 8:52 AM, Chris Walker
 wrote:
> Thanks for your response Ken.  I'm puzzled ... in my case node remain
> UNCLEAN (offline) until dc-deadtime expires, even when both nodes are up and
> corosync is quorate.

I'm guessing you're starting both nodes at the same time?
The behaviour you're seeing is arguably a hangover from the multicast
days (in which case corosync wouldn't have had a node list).

But since that's not the common case anymore, we could probably
shortcut the timeout if we know the complete node list and see that
they are all online.

>
> I see the following from crmd when I have dc-deadtime=2min
>
> Dec 15 21:34:33 max04 crmd[13791]:   notice: Quorum acquired
> Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
> max04[2886730248] - state is now member (was (null))
> Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
> (null)[2886730249] - state is now member (was (null))
> Dec 15 21:34:33 max04 crmd[13791]:   notice: Notifications disabled
> Dec 15 21:34:33 max04 crmd[13791]:   notice: The local CRM is operational
> Dec 15 21:34:33 max04 crmd[13791]:   notice: State transition S_STARTING ->
> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> ...
> Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_DC_TIMEOUT from
> crm_timer_popped() received in state S_PENDING
> Dec 15 21:36:33 max05 crmd[10365]:   notice: State transition S_ELECTION ->
> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED
> origin=election_timeout_popped ]
> Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_ELECTION_DC from
> do_election_check() received in state S_INTEGRATION
> Dec 15 21:36:33 max05 crmd[10365]:   notice: Notifications disabled
> Dec 15 21:36:33 max04 crmd[13791]:   notice: State transition S_PENDING ->
> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> origin=do_cl_join_finalize_respond ]
>
> only after this do the nodes transition to Online.  This is using the
> vanilla RHEL7.2 cluster stack and the following options:
>
> property cib-bootstrap-options: \
> no-quorum-policy=ignore \
> default-action-timeout=120s \
> pe-warn-series-max=1500 \
> pe-input-series-max=1500 \
> pe-error-series-max=1500 \
> stonith-action=poweroff \
> stonith-timeout=900 \
> dc-deadtime=2min \
> maintenance-mode=false \
> have-watchdog=false \
> dc-version=1.1.13-10.el7-44eb2dd \
> cluster-infrastructure=corosync
>
> Thanks again,
> Chris
>
> On Thu, Dec 15, 2016 at 3:26 PM, Ken Gaillot  wrote:
>>
>> On 12/15/2016 02:00 PM, Chris Walker wrote:
>> > Hello,
>> >
>> > I have a quick question about dc-deadtime.  I believe that Digimer and
>> > others on this list might have already addressed this, but I want to
>> > make sure I'm not missing something.
>> >
>> > If my understanding is correct, dc-deadtime sets the amount of time that
>> > must elapse before a cluster is formed (DC is elected, etc), regardless
>> > of which nodes have joined the cluster.  In other words, even if all
>> > nodes that are explicitly enumerated in the nodelist section have
>> > started Pacemaker, they will still wait dc-deadtime before forming a
>> > cluster.
>> >
>> > In my case, I have a two-node cluster on which I'd like to allow a
>> > pretty long time (~5 minutes) for both nodes to join before giving up on
>> > them.  However, if they both join quickly, I'd like to proceed to form a
>> > cluster immediately; I don't want to wait for the full five minutes to
>> > elapse before forming a cluster.  Further, if a node doesn't respond
>> > within five minutes, I want to fence it and start resources on the node
>> > that is up.
>>
>> Pacemaker+corosync behaves as you describe by default.
>>
>> dc-deadtime is how long to wait for an election to finish, but if the
>> election finishes sooner than that (i.e. a DC is elected), it stops
>> waiting. It doesn't even wait for all nodes, just a quorum.
>>
>> Also, with startup-fencing=true (the default), any unseen nodes will be
>> fenced, and the remaining nodes will proceed to host resources. Of
>> course, it needs quorum for this, too.
>>
>> With two nodes, quorum is handled specially, but that's a different topic.
>>
>> > With Pacemaker/Heartbeat, the initdead parameter did exactly what I
>> > want, but I don't see any way to do this with Pacemaker/Corosync.  From
>> > reading other posts, it looks like people use an external agent to start
>> > HA daemons once nodes are up ... is this a correct understanding?
>> >
>> > Thanks very much,
>> > Chris
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] question about dc-deadtime

2017-01-09 Thread Andrew Beekhof
On Fri, Dec 16, 2016 at 7:26 AM, Ken Gaillot  wrote:
> On 12/15/2016 02:00 PM, Chris Walker wrote:
>> Hello,
>>
>> I have a quick question about dc-deadtime.  I believe that Digimer and
>> others on this list might have already addressed this, but I want to
>> make sure I'm not missing something.
>>
>> If my understanding is correct, dc-deadtime sets the amount of time that
>> must elapse before a cluster is formed (DC is elected, etc), regardless
>> of which nodes have joined the cluster.  In other words, even if all
>> nodes that are explicitly enumerated in the nodelist section have
>> started Pacemaker, they will still wait dc-deadtime before forming a
>> cluster.
>>
>> In my case, I have a two-node cluster on which I'd like to allow a
>> pretty long time (~5 minutes) for both nodes to join before giving up on
>> them.  However, if they both join quickly, I'd like to proceed to form a
>> cluster immediately; I don't want to wait for the full five minutes to
>> elapse before forming a cluster.  Further, if a node doesn't respond
>> within five minutes, I want to fence it and start resources on the node
>> that is up.
>
> Pacemaker+corosync behaves as you describe by default.
>
> dc-deadtime is how long to wait for an election to finish, but if the
> election finishes sooner than that (i.e. a DC is elected), it stops
> waiting. It doesn't even wait for all nodes, just a quorum.

You're confusing dc_deadtime with election_timeout:

./crmd/control.c:899: { XML_CONFIG_ATTR_DC_DEADTIME, "dc_deadtime",
"time", NULL, "20s", _time,
./crmd/control.c-900-  "How long to wait for a response from
other nodes during startup.",
./crmd/control.c-901-  "The \"correct\" value will depend on
the speed/load of your network and the type of switches used."
./crmd/control.c-902-},

./crmd/control.c:934: { XML_CONFIG_ATTR_ELECTION_FAIL,
"election_timeout", "time", NULL, "2min", _timer,
./crmd/control.c-935-  "*** Advanced Use Only ***.", "If need
to adjust this value, it probably indicates the presence of a bug."
./crmd/control.c-936-},

"during startup" is incomplete though... we also start that timer
after partition changes in case the DC was one of the nodes lost.

>
> Also, with startup-fencing=true (the default), any unseen nodes will be
> fenced, and the remaining nodes will proceed to host resources. Of
> course, it needs quorum for this, too.
>
> With two nodes, quorum is handled specially, but that's a different topic.
>
>> With Pacemaker/Heartbeat, the initdead parameter did exactly what I
>> want, but I don't see any way to do this with Pacemaker/Corosync.  From
>> reading other posts, it looks like people use an external agent to start
>> HA daemons once nodes are up ... is this a correct understanding?
>>
>> Thanks very much,
>> Chris
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q: crm_compress_string

2016-11-20 Thread Andrew Beekhof
On Tue, Nov 15, 2016 at 6:35 PM, Ulrich Windl
 wrote:
>
> Hi!
>
> I have a question regarding SLES11 SP4: In pacemaker log I see many messages 
> like these:
>
> Nov 14 00:00:02 [22409] h04cib: info: crm_compress_string: 
> Compressed 162583 bytes into 13009 (ratio 12:1) in 27ms
> Nov 14 01:00:01 [22409] h04cib: info: cib_process_request: 
> Forwarding cib_query operation for section configuration to master 
> (origin=local/cibadmin/2)
> Nov 14 01:00:02 [22409] h04cib: info: crm_compress_string: 
> Compressed 162583 bytes into 12960 (ratio 12:1) in 33ms
> Nov 14 02:00:01 [22409] h04cib: info: cib_process_request: 
> Forwarding cib_query operation for section configuration to master 
> (origin=local/cibadmin/2)
>
> So it seems something is happening every hour. That's OK, and I think it's a 
> "cibadmin -Q" or "crm configure show" operation run by some cron job. But 
> what about the compression message: First is it so important that it has to 
> be logged?

Not anymore

>
> Second: Why is this kind of compression done?

Corosync has a maximum message size and it cuts down on the amount of
bandwidth we're consuming.

>
> cibadmin -Q displays plain XML.

Its automatically decompressed on the other end.

>
>
> Regards,
> Ulrich
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Andrew Beekhof
On Mon, Oct 10, 2016 at 8:04 AM, Digimer  wrote:
>
> The only geo-located/stretch cluster approach that I've seen that makes
> any sense and seems genuinely safe is SUSE's 'pacemaker booth' project.

Also arriving in RHEL 7.3
Might be tech preview though.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-04 Thread Andrew Beekhof
On Wed, Oct 5, 2016 at 7:03 AM, Ken Gaillot <kgail...@redhat.com> wrote:

> On 10/02/2016 10:02 PM, Andrew Beekhof wrote:
> >> Take a
> >> look at all of nagios' options for deciding when a failure becomes
> "real".
> >
> > I used to take a very hard line on this: if you don't want the cluster
> > to do anything about an error, don't tell us about it.
> > However I'm slowly changing my position... the reality is that many
> > people do want a heads up in advance and we have been forcing that
> > policy (when does an error become real) into the agents where one size
> > must fit all.
> >
> > So I'm now generally in favour of having the PE handle this "somehow".
>
> Nagios is a useful comparison:
>
> check_interval - like pacemaker's monitor interval
>
> retry_interval - if a check returns failure, switch to this interval
> (i.e. check more frequently when trying to decide whether it's a "hard"
> failure)
>
> max_check_attempts - if a check fails this many times in a row, it's a
> hard failure. Before this is reached, it's considered a soft failure.
> Nagios will call event handlers (comparable to pacemaker's alert agents)
> for both soft and hard failures (distinguishing the two). A service is
> also considered to have a "hard failure" if its host goes down.
>
> high_flap_threshold/low_flap_threshold - a service is considered to be
> flapping when its percent of state changes (ok <-> not ok) in the last
> 21 checks (= max. 20 state changes) reaches high_flap_threshold, and
> stable again once the percentage drops to low_flap_threshold. To put it
> another way, a service that passes every monitor is 0% flapping, and a
> service that fails every other monitor is 100% flapping. With these,
> even if a service never reaches max_check_attempts failures in a row, an
> alert can be sent if it's repeatedly failing and recovering.
>

makes sense.

since we're overhauling this functionality anyway, do you think we need to
add an equivalent of retry_interval too?


>
> >> If you clear failures after a success, you can't detect/recover a
> >> resource that is flapping.
> >
> > Ah, but you can if the thing you're clearing only applies to other
> > failures of the same action.
> > A completed start doesn't clear a previously failed monitor.
>
> Nope -- a monitor can alternately succeed and fail repeatedly, and that
> indicates a problem, but wouldn't trip an "N-failures-in-a-row" system.
>
> >> It only makes sense to escalate from ignore -> restart -> hard, so maybe
> >> something like:
> >>
> >>   op monitor ignore-fail=3 soft-fail=2 on-hard-fail=ban
> >>
> > I would favour something more concrete than 'soft' and 'hard' here.
> > Do they have a sufficiently obvious meaning outside of us developers?
> >
> > Perhaps (with or without a "failures-" prefix) :
> >
> >ignore-count
> >recover-count
> >escalation-policy
>
> I think the "soft" vs "hard" terminology is somewhat familiar to
> sysadmins -- there's at least nagios, email (SPF failures and bounces),
> and ECC RAM. But throwing "ignore" into the mix does confuse things.
>
> How about ... max-fail-ignore=3 max-fail-restart=2 fail-escalation=ban
>
>
I could live with that :-)
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-10-02 Thread Andrew Beekhof
On Fri, Sep 30, 2016 at 10:28 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 09/28/2016 10:54 PM, Andrew Beekhof wrote:
>> On Sat, Sep 24, 2016 at 9:12 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>>> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures
>>>> then migrate", but I can't think of a real-world situation where that
>>>> makes sense,
>>>>
>>>>
>>>> really?
>>>>
>>>> it is not uncommon to hear "i know its failed, but i dont want the
>>>> cluster to do anything until its _really_ failed"
>>>
>>> Hmm, I guess that would be similar to how monitoring systems such as
>>> nagios can be configured to send an alert only if N checks in a row
>>> fail. That's useful where transient outages (e.g. a webserver hitting
>>> its request limit) are acceptable for a short time.
>>>
>>> I'm not sure that's translatable to Pacemaker. Pacemaker's error count
>>> is not "in a row" but "since the count was last cleared".
>>
>> It would be a major change, but perhaps it should be "in-a-row" and
>> successfully performing the action clears the count.
>> Its entirely possible that the current behaviour is like that because
>> I wasn't smart enough to implement anything else at the time :-)
>
> Or you were smart enough to realize what a can of worms it is. :)

So you're saying two dumbs makes a smart? :-)

>Take a
> look at all of nagios' options for deciding when a failure becomes "real".

I used to take a very hard line on this: if you don't want the cluster
to do anything about an error, don't tell us about it.
However I'm slowly changing my position... the reality is that many
people do want a heads up in advance and we have been forcing that
policy (when does an error become real) into the agents where one size
must fit all.

So I'm now generally in favour of having the PE handle this "somehow".

>
> If you clear failures after a success, you can't detect/recover a
> resource that is flapping.

Ah, but you can if the thing you're clearing only applies to other
failures of the same action.
A completed start doesn't clear a previously failed monitor.

>
>>> "Ignore up to three monitor failures if they occur in a row [or, within
>>> 10 minutes?], then try soft recovery for the next two monitor failures,
>>> then ban this node for the next monitor failure." Not sure being able to
>>> say that is worth the complexity.
>>
>> Not disagreeing
>
> It only makes sense to escalate from ignore -> restart -> hard, so maybe
> something like:
>
>   op monitor ignore-fail=3 soft-fail=2 on-hard-fail=ban

The other idea I had, was to create some new return codes:
PCMK_OCF_ERR_BAN, PCMK_OCF_ERR_FENCE, etc.
Ie. make the internal mapping of return codes like
PCMK_OCF_NOT_CONFIGURED and PCMK_OCF_DEGRADED to hard/soft/ignore
recovery logic into something available to the agent.

To use your example above, return PCMK_OCF_DEGRADED for the first 3
monitor failures, PCMK_OCF_ERR_RESTART for the next two and
PCMK_OCF_ERR_BAN for the last.

But the more I think about it, the less I like it.
- We loose precision about what the actual error was
- We're pushing too much user config/policy into the agent (every
agent would end up with equivalents of 'ignore-fail', 'soft-fail', and
'on-hard-fail')
- We might need the agent to know about the fencing config
(enabled/disabled/valid)
- If forces the agent to track the number of operation failures

So I think I'm just mentioning it for completeness and in case it
prompts a good idea in someone else.

>
>
> To express current default behavior:
>
>   op start ignore-fail=0 soft-fail=0on-hard-fail=ban

I would favour something more concrete than 'soft' and 'hard' here.
Do they have a sufficiently obvious meaning outside of us developers?

Perhaps (with or without a "failures-" prefix) :

   ignore-count
   recover-count
   escalation-policy

>   op stop  ignore-fail=0 soft-fail=0on-hard-fail=fence
>   op * ignore-fail=0 soft-fail=INFINITY on-hard-fail=ban
>
>
> on-fail, migration-threshold, and start-failure-is-fatal would be
> deprecated (and would be easy to map to the new parameters).
>
> I'd avoid the hassles of counting failures "in a row", and stick with
> counting failures since the last cleanup.

sure

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot  wrote:

> On 09/22/2016 09:53 AM, Jan Pokorný wrote:
> > On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:
> >> Ken Gaillot  writes:
> >>
> >>> I'm not saying it's a bad idea, just that it's more complicated than it
> >>> first sounds, so it's worth thinking through the implications.
> >>
> >> Thinking about it and looking at how complicated it gets, maybe what
> >> you'd really want, to make it clearer for the user, is the ability to
> >> explicitly configure the behavior, either globally or per-resource. So
> >> instead of having to tweak a set of variables that interact in complex
> >> ways, you'd configure something like rule expressions,
> >>
> >> 
> >>   
> >>   
> >>   
> >> 
> >>
> >> So, try to restart the service 3 times, if that fails migrate the
> >> service, if it still fails, fence the node.
> >>
> >> (obviously the details and XML syntax are just an example)
> >>
> >> This would then replace on-fail, migration-threshold, etc.
> >
> > I must admit that in previous emails in this thread, I wasn't able to
> > follow during the first pass, which is not the case with this procedural
> > (sequence-ordered) approach.  Though someone can argue it doesn't take
> > type of operation into account, which might again open the door for
> > non-obvious interactions.
>
> "restart" is the only on-fail value that it makes sense to escalate.
>
> block/stop/fence/standby are final. Block means "don't touch the
> resource again", so there can't be any further response to failures.
> Stop/fence/standby move the resource off the local node, so failure
> handling is reset (there are 0 failures on the new node to begin with).
>
> "Ignore" is theoretically possible to escalate, e.g. "ignore 3 failures
> then migrate", but I can't think of a real-world situation where that
> makes sense,


really?

it is not uncommon to hear "i know its failed, but i dont want the cluster
to do anything until its _really_ failed"


> and it would be a significant re-implementation of "ignore"
> (which currently ignores the state of having failed, as opposed to a
> particular instance of failure).
>

agreed


>
> What the interface needs to express is: "If this operation fails,
> optionally try a soft recovery [always stop+start], but if  failures
> occur on the same node, proceed to a [configurable] hard recovery".
>
> And of course the interface will need to be different depending on how
> certain details are decided, e.g. whether any failures count toward 
> or just failures of one particular operation type, and whether the hard
> recovery type can vary depending on what operation failed.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-20 Thread Andrew Beekhof
On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot  wrote:

> Hi everybody,
>
> Currently, Pacemaker's on-fail property allows you to configure how the
> cluster reacts to operation failures. The default "restart" means try to
> restart on the same node, optionally moving to another node once
> migration-threshold is reached. Other possibilities are "ignore",
> "block", "stop", "fence", and "standby".
>
> Occasionally, we get requests to have something like migration-threshold
> for values besides restart. For example, try restarting the resource on
> the same node 3 times, then fence.
>
> I'd like to get your feedback on two alternative approaches we're
> considering.
>
> ###
>
> Our first proposed approach would add a new hard-fail-threshold
> operation property. If specified, the cluster would first try restarting
> the resource on the same node,


Well, just as now, it would be _allowed_ to start on the same node, but
this is not guaranteed.


> before doing the on-fail handling.
>
> For example, you could configure a promote operation with
> hard-fail-threshold=3 and on-fail=fence, to fence the node after 3
> failures.


> One point that's not settled is whether failures of *any* operation
> would count toward the 3 failures (which is how migration-threshold
> works now), or only failures of the specified operation.
>

I think if hard-fail-threshold is per-op, then only failures of that
operation should count.


>
> Currently, if a start fails (but is retried successfully), then a
> promote fails (but is retried successfully), then a monitor fails, the
> resource will move to another node if migration-threshold=3. We could
> keep that behavior with hard-fail-threshold, or only count monitor
> failures toward monitor's hard-fail-threshold. Each alternative has
> advantages and disadvantages.
>
> ###
>
> The second proposed approach would add a new on-restart-fail resource
> property.
>
> Same as now, on-fail set to anything but restart would be done
> immediately after the first failure. A new value, "ban", would
> immediately move the resource to another node. (on-fail=ban would behave
> like on-fail=restart with migration-threshold=1.)
>
> When on-fail=restart, and restarting on the same node doesn't work, the
> cluster would do the on-restart-fail handling. on-restart-fail would
> allow the same values as on-fail (minus "restart"), and would default to
> "ban".


I do wish you well tracking "is this a restart" across demote -> stop ->
start -> promote in 4 different transitions :-)


>
> So, if you want to fence immediately after any promote failure, you
> would still configure on-fail=fence; if you want to try restarting a few
> times first, you would configure on-fail=restart and on-restart-fail=fence.
>
> This approach keeps the current threshold behavior -- failures of any
> operation count toward the threshold. We'd rename migration-threshold to
> something like hard-fail-threshold, since it would apply to more than
> just migration, but unlike the first approach, it would stay a resource
> property.
>
> ###
>
> Comparing the two approaches, the first is more flexible, but also more
> complex and potentially confusing.
>

More complex to implement or more complex to configure?


>
> With either approach, we would deprecate the start-failure-is-fatal
> cluster property. start-failure-is-fatal=true would be equivalent to
> hard-fail-threshold=1 with the first approach, and on-fail=ban with the
> second approach. This would be both simpler and more useful -- it allows
> the value to be set differently per resource.
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-25 Thread Andrew Beekhof
On Sat, Jul 23, 2016 at 7:10 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 07/21/2016 07:46 PM, Andrew Beekhof wrote:
>>>> What do you mean by native restart action? Systemd restart?
>>
>> Whatever the agent supports.
>
> Are you suggesting that pacemaker starting checking whether the agent
> metadata advertises a "restart" action? Or just assume that certain
> resource classes support restart (e.g. systemd) and others don't (e.g. ocf)?

No, I'm suggesting the crm_resource cli start checking... not the same thing

>
>>>>
>>>>> 3. re-enables the recurring monitor operations regardless of whether
>>>>> the reload succeeds, fails, or times out, etc
>>>>>
>>>>> No maintenance mode required, and whatever state the resource ends up
>>>>> in is re-detected by the cluster in step 3.
>>>>
>>>> If you're lucky :-)
>>>>
>>>> The cluster may still mess with the resource even without monitors, e.g.
>>>> a dependency fails or a preferred node comes online.
>>
>> Can you explain how neither of those results in a restart of the service?
>
> Unless the resource is unmanaged, the cluster could do something like
> move it to a different node, disrupting the local force-restart.

But the next time it starts there, it will come up with the new configuration.
Achieving the desired affect.

This is no different to using maintenance-mode and the cluster moving
or stopping it immediately after is it disabled again.
Either way, the resource is no-longer running with the old
configuration at the end of the call.

>
> Ideally, we'd be able to disable monitors and unmanage the resource for
> the duration of the force-restart, but only on the local node.
>
>>>> Maintenance
>>>> mode/unmanaging would still be safer (though no --force-* option is
>>>> completely safe, besides check).
>>>
>>> I'm happy with whatever you gurus come up with ;-)  I'm just hoping
>>> that it can be made possible to pinpoint an individual resource on an
>>> individual node, rather than having to toggle maintenance flag(s)
>>> across a whole set of clones, or a whole node.
>>
>> Yep.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Doing reload right

2016-07-17 Thread Andrew Beekhof
On Sat, Jul 16, 2016 at 9:34 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 07/14/2016 06:21 PM, Andrew Beekhof wrote:
>> On Fri, Jul 15, 2016 at 2:33 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>> On 07/13/2016 11:20 PM, Andrew Beekhof wrote:
>>>> On Wed, Jul 6, 2016 at 12:57 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>>>> On 07/04/2016 02:01 AM, Ulrich Windl wrote:
>>>>>> For the case of changing the contents of an external configuration file, 
>>>>>> the
>>>>>> RA would have to provide some reloadable dummy parameter then (maybe like
>>>>>> "config_generation=2").
>>>>>
>>>>> That is a widely recommended approach for the current "reload"
>>>>> implementation, but I don't think it's desirable. It still does not
>>>>> distinguish changes in the Pacemaker resource configuration from changes
>>>>> in the service configuration.
>>>>>
>>>>> For example, of an RA has one parameter that is agent-reloadable and
>>>>> another that is service-reloadable, and it gets a "reload" action, it
>>>>> has no way of knowing which of the two (or both) changed. It would have
>>>>> to always reload all agent-reloadable parameters, and trigger a service
>>>>> reload. That seems inefficient to me. Also, only Pacemaker should
>>>>> trigger agent reloads, and only the user should trigger service reloads,
>>>>> so combining them doesn't make sense to me.
>>>>
>>>> Totally disagree :-)
>>>>
>>>> The whole reason service reloads exist is that they are more efficient
>>>> than a stop/start cycle.
>>>>
>>>> So I'm not seeing how calling one, on the rare occasion that the
>>>> parameters change and allow a reload, when it wasn't necessary can be
>>>> classed as inefficient.   On the contrary, trying to avoid it seems
>>>> like over-optimizing when we should be aiming for correctness - ie.
>>>> reloading the whole thing.
>>>
>>> I just don't see any logical connection between modifying a service's
>>> Pacemaker configuration and modifying its service configuration file.
>>
>> There isn't one beyond they are both bypassing a stop/start cycle.
>>
>>>
>>> Is the idea that people will tend to change them together?
>>
>> No, the idea is that the "penalty" of making sure both are up-to-date,
>> in the rare event that either one is changed, does not justify
>> splitting them up.
>
> OK. In that case, we'd keep the "reload" action for doing both types of
> reload together, and the only change we need to consider is unique vs
> reloadable.
>
> Thinking it through some more, I'm leaning to this approach:
>
>
> 1. Let's buckle down and update the OCF spec to reflect the accrued
> real-world practices, as well as this change. This will allow resource
> agents to specify that they comply with the new terminology by setting
>  to 1.1, and both pacemaker and higher-level tools can rely on
> that to determine whether to use the new behavior.

Agreed

>
> The alternative is that pacemaker and higher-level tools could check
> whether a resource agent specifies "reloadable" for any parameter, and
> use the new behavior if so. It's doable, but it's another hacky
> workaround when we're really overdue for this anyway.
>
>
> 2. Since the current usage of "unique" is so broken, I think we should
> abandon it altogether, and use two new attribute names to indicate
> uniqueness and reloadability. We've already converged on "reloadable",
> so we just need something to indicate that two instances of a resource
> cannot share the same value of a given parameter. Maybe "reject_duplicate"?

Agree on concept, not fan of the name.  "exclusive"? "unrepeatable"?

>
> I think it might even be worthwhile for pacemaker (not just high-level
> tools) to enforce the new attribute,

Not necessarily disagreeing, but the implementation would be "fun".
Are you sure pcs isn't the right place in the same way that it removes
related constraints when deleting a resource?

> because it would be used to
> indicate that there's a problem if it's used twice. For example, you can
> start two instances of apache with different config files, but you don't
> want to try to start two instances with the same config file. We can't
> do that currently because unique is often set wrong, but if we create a
> new attribute, we can en

Re: [ClusterLabs] Antw: Re: Antw: Doing reload right

2016-07-14 Thread Andrew Beekhof
On Fri, Jul 15, 2016 at 2:33 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 07/13/2016 11:20 PM, Andrew Beekhof wrote:
>> On Wed, Jul 6, 2016 at 12:57 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>> On 07/04/2016 02:01 AM, Ulrich Windl wrote:
>>>> For the case of changing the contents of an external configuration file, 
>>>> the
>>>> RA would have to provide some reloadable dummy parameter then (maybe like
>>>> "config_generation=2").
>>>
>>> That is a widely recommended approach for the current "reload"
>>> implementation, but I don't think it's desirable. It still does not
>>> distinguish changes in the Pacemaker resource configuration from changes
>>> in the service configuration.
>>>
>>> For example, of an RA has one parameter that is agent-reloadable and
>>> another that is service-reloadable, and it gets a "reload" action, it
>>> has no way of knowing which of the two (or both) changed. It would have
>>> to always reload all agent-reloadable parameters, and trigger a service
>>> reload. That seems inefficient to me. Also, only Pacemaker should
>>> trigger agent reloads, and only the user should trigger service reloads,
>>> so combining them doesn't make sense to me.
>>
>> Totally disagree :-)
>>
>> The whole reason service reloads exist is that they are more efficient
>> than a stop/start cycle.
>>
>> So I'm not seeing how calling one, on the rare occasion that the
>> parameters change and allow a reload, when it wasn't necessary can be
>> classed as inefficient.   On the contrary, trying to avoid it seems
>> like over-optimizing when we should be aiming for correctness - ie.
>> reloading the whole thing.
>
> I just don't see any logical connection between modifying a service's
> Pacemaker configuration and modifying its service configuration file.

There isn't one beyond they are both bypassing a stop/start cycle.

>
> Is the idea that people will tend to change them together?

No, the idea is that the "penalty" of making sure both are up-to-date,
in the rare event that either one is changed, does not justify
splitting them up.

> I'd expect
> that in most environments, the Pacemaker configuration (e.g. where the
> apache config file is) will remain much more stable than the service
> configuration (e.g. adding/modifying websites).
>
> Service reloads can sometimes be expensive (e.g. a complex/busy postfix
> or apache installation) even if they are less expensive than a full restart.

Right. But you just said that the pacemaker config is much less likely
(out of a thing thats already not very likely) to change. So why are
you optimizing for that scenario?

>
>> The most in-efficient part in all this is the current practice of
>> updating a dummy attribute to trigger a reload after changing the
>> application config file.  That we can address by supporting
>> --force-reload for crm_resource like we do for start/stop/monitor (and
>> exposing it nicely in pcs).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Doing reload right

2016-07-13 Thread Andrew Beekhof
On Sat, Jul 2, 2016 at 1:26 AM, Ken Gaillot  wrote:
> On 07/01/2016 04:48 AM, Jan Pokorný wrote:
>> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
>> Nachricht
>>> <57754f9f.8070...@redhat.com>:
 I've been meaning to address the implementation of "reload" in Pacemaker
 for a while now, and I think the next release will be a good time, as it
 seems to be coming up more frequently.

 In the current implementation, Pacemaker considers a resource parameter
 "reloadable" if the resource agent supports the "reload" action, and the
 agent's metadata marks the parameter with "unique=0". If (only) such
 parameters get changed in the resource's pacemaker configuration,
 pacemaker will call the agent's reload action rather than the
 stop-then-start it usually does for parameter changes.

 This is completely broken for two reasons:
>>>
>>> I agree ;-)
>>>

 1. It relies on "unique=0" to determine reloadability. "unique" was
 originally intended (and is widely used by existing resource agents) as
 a hint to UIs to indicate which parameters uniquely determine a resource
 instance. That is, two resource instances should never have the same
 value of a "unique" parameter. For this purpose, it makes perfect sense
 that (for example) the path to a binary command would have unique=0 --
 multiple resource instances could (and likely would) use the same
 binary. However, such a parameter could never be reloadable.
>>>
>>> I tought unique=0 were reloadable (unique=1 were not)...
>
> Correct. By "could never be reloadable", I mean that if someone changes
> the location of the daemon binary, there's no way the agent could change
> that with anything other than a full restart. So using unique=0 to
> indicate reloadable doesn't make sense.
>
>> I see a doubly-distorted picture here:
>> - actually "unique=1" on a RA parameter (together with this RA supporting
>>   "reload") currently leads to reload-on-change
>> - also the provided example shows why reload for "unique=0" is wrong,
>>   but as the opposite applies as of current state, it's not an argument
>>   why something is broken
>>
>> See also:
>> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23
>
> Nope, unique=1 is used for the *restart* list -- the non-reloadable
> parameters.
>
 2. Every known resource agent that implements a reload action does so
 incorrectly. Pacemaker uses reload for changes in the resource's
 *pacemaker* configuration, while all known RAs use reload for a
 service's native reload capability of its own configuration file. As an
 example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
 action, which will have zero effect on any pacemaker-configured
 parameters -- and on top of that, the RA uses "unique=0" in its correct
 UI sense, and none of those parameters are actually reloadable.
>>
>> (per the last subclause, applicable also, after mentioned inversion, for
>> "unique=1", such as a pid file path, which cannot be reloadable for
>> apparent reason)
>>
>>> Maybe LSB confusion...
>>
>> That's not entirely fair vindication, as when you have to do some
>> extra actions with parameters in LSB-aliased "start" action in the
>> RA, you should do such reflections also for "reload".
>
> I think the point is that "reload" for an LSB init script or systemd
> unit always reloads the native service configuration, so it's natural
> for administrators and developers to think of that when they see "reload".

But also because LSB have no parameters to think about.

>
 My proposed solution is:

 * Add a new "reloadable" attribute for resource agent metadata, to
 indicate reloadable parameters. Pacemaker would use this instead of
 "unique".
>>>
>>> No objections if you change the XML metadata version number this time ;-)
>>
>> Good point, but I guess everyone's a bit scared to open this Pandora
>> box as there's so much technical debt connected to that (unifying FA/RA
>> metadata if possible, adding new UI-oriented annotations, pacemaker's
>> silent additions like "private" parameter).
>> I'd imagine an established authority for OCF matters (and maintaing
>> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
>> process inspired by Python PEPs for coordinated development:
>> https://www.python.org/dev/peps/pep-0001/
>>
>> 
>
> An update to the OCF spec is long overdue. I wouldn't mind those wheels
> starting to turn, but I think this reload change could proceed
> independently (though of course coordinated at the appropriate time).
>
 * Add a new "reload-options" RA action for the ability to reload
 Pacemaker-configured options. Pacemaker would call this instead if 
 "reload".
>>>
>>> Why not "reload-parameters"?
>>
>> That came to my mind as well.  Or not 

Re: [ClusterLabs] Antw: Doing reload right

2016-07-13 Thread Andrew Beekhof
On Fri, Jul 1, 2016 at 7:48 PM, Jan Pokorný  wrote:
> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
> Nachricht
>> <57754f9f.8070...@redhat.com>:
>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>> for a while now, and I think the next release will be a good time, as it
>>> seems to be coming up more frequently.
>>>
>>> In the current implementation, Pacemaker considers a resource parameter
>>> "reloadable" if the resource agent supports the "reload" action, and the
>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>> parameters get changed in the resource's pacemaker configuration,
>>> pacemaker will call the agent's reload action rather than the
>>> stop-then-start it usually does for parameter changes.
>>>
>>> This is completely broken for two reasons:
>>
>> I agree ;-)
>>
>>>
>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>> originally intended (and is widely used by existing resource agents) as
>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>> instance. That is, two resource instances should never have the same
>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>> that (for example) the path to a binary command would have unique=0 --
>>> multiple resource instances could (and likely would) use the same
>>> binary. However, such a parameter could never be reloadable.
>>
>> I tought unique=0 were reloadable (unique=1 were not)...

Exactly

> I see a doubly-distorted picture here:
> - actually "unique=1" on a RA parameter (together with this RA supporting
>   "reload") currently leads to reload-on-change

Are you 100% sure about that?

> - also the provided example shows why reload for "unique=0" is wrong,
>   but as the opposite applies as of current state, it's not an argument
>   why something is broken
>
> See also:
> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

What about it?

>
>>> 2. Every known resource agent that implements a reload action does so
>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>> *pacemaker* configuration, while all known RAs use reload for a
>>> service's native reload capability of its own configuration file. As an
>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>> action, which will have zero effect on any pacemaker-configured
>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>> UI sense, and none of those parameters are actually reloadable.
>
> (per the last subclause, applicable also, after mentioned inversion, for
> "unique=1", such as a pid file path, which cannot be reloadable for
> apparent reason)
>
>> Maybe LSB confusion...
>
> That's not entirely fair vindication, as when you have to do some
> extra actions with parameters in LSB-aliased "start" action in the
> RA, you should do such reflections also for "reload".
>
>>> My proposed solution is:
>>>
>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>> indicate reloadable parameters. Pacemaker would use this instead of
>>> "unique".
>>
>> No objections if you change the XML metadata version number this time ;-)
>
> Good point, but I guess everyone's a bit scared to open this Pandora
> box as there's so much technical debt connected to that (unifying FA/RA
> metadata if possible, adding new UI-oriented annotations, pacemaker's
> silent additions like "private" parameter).
> I'd imagine an established authority for OCF matters (and maintaing
> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
> process inspired by Python PEPs for coordinated development:
> https://www.python.org/dev/peps/pep-0001/
>
> 
>
>>> * Add a new "reload-options" RA action for the ability to reload
>>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>>
>> Why not "reload-parameters"?
>
> That came to my mind as well.  Or not wasting time/space on too many
> letters, just "reload-params", perhaps.

The changes for when a reload should take place make plenty of sense,
it is clearly an ongoing source of confusion.  However I'm not so sure
about this part.

Would it not be better to have a single reload operation that took
into account the new config and any changed parameters?  When would we
want to update from only one source of changes?

Splitting the functionality into two functions seems like it would
increase not decrease confusion.  It took even me some time to realise
what you had in mind.

Or is the intention that since most RA writers only think of config
file reload, we are protecting users from incomplete agents? I would
have though requiring the new 'reloadable' to be added to attributes
would be sufficient for this purpose  (call it
'this-attribute-is-reloadable' if you really want to hammer it home
:-).

>
> 
>

Re: [ClusterLabs] Antw: Re: Regular pengine warnings after a transient failure

2016-07-11 Thread Andrew Beekhof
On Wed, Mar 9, 2016 at 6:31 PM, Ulrich Windl
 wrote:
 Ferenc Wágner  schrieb am 08.03.2016 um 15:08 in Nachricht
> <87wppdoydv@lant.ki.iif.hu>:
>> Ken Gaillot  writes:
>>
>>> On 03/07/2016 02:03 PM, Ferenc Wágner wrote:
>>>
 The transition-keys match, does this mean that the above is a late
 result from the monitor operation which was considered timed-out
 previously?  How did it reach vhbl07, if the DC at that time was vhbl03?

> The pe-input files from the transitions around here should help.

 They are available.  What shall I look for?
>>>
>>> It's not the most user-friendly of tools, but crm_simulate can show how
>>> the cluster would react to each transition: crm_simulate -Sx $FILE.bz2
>>
>> $ /usr/sbin/crm_simulate -Sx pe-input-430.bz2 -D recover_many.dot
>> [...]
>> $ dot recover_many.dot -Tpng >recover_many.png
>> dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.573572 to
>> fit
>>
>> The result is a 32767x254 bitmap of green ellipses connected by arrows.
>
> That completely agrees with my experience on this: FOr real-life
> configurations those graphs are gigantic. When outputting SVG instead of PNG,
> you can at least zoom and pan in the graph (even Firefox can do it). Or (if 
> you
> feel crazy enough) you can load the graph in Inkscape and delete the details
> you are not interested in.
>
> The best solution of course would be to omit irrelevant details from the graph
> before creating the dot file...

Personally I just use graphviz to view the raw .dot file
Even grepping it for the resource name you're interested in can be
highly useful.

>
>
>> Most arrows are impossible to follow, but the picture seems to agree
>> with the textual output from crm_simulate:

I should hope so, what the crm_simulate output misses however is
details of the ordering because it can't express what happens in
parallel, nor how long running actions would affect the order.

>>
>> * 30 FAILED resources on vhbl05 are to be recovered
>> * 32 Stopped resources are to be started (these are actually running,
>>   but considered Stopped as a consequence of the crmd restart on vhbl03)

I've lost all context here, the crmd process failed?

>>
>> On the other hand, simulation based on pe-input-431.bz2 reports
>> * only 2 FAILED resources to recover on vhbl05
>> * 36 resources to start (the 4 new are the ones whose recoveries started
>>   during the previous -- aborted -- transition)
>>
>> I failed to extract anything out of these simulations than what was
>> already known from the logs.

The dot files and the list of actions aren't normally the interesting
parts of crm_simulate - thats just the "what" that you already knew.

The real power of crm_simulate is the ability to replay events at a
higher verbosity, using any and all of:
* extra '-v' options
* export PCMK_trace_functions="some_function,another_one"
* export PCMK_trace_tags=${resource_name}

to find out WHY it did the thing you already know it did.


>  But I'm happy to see that the cluster
>> probes the disappeared resources on vhbl03 (where they disappeared with
>> the crmd restart) even though it plans to start some of them on other
>> nodes.

It may plan to start them elsewhere, but it wont go through with it
unless the probes return what we expect them to return (NOT_RUNNING).
When they fail to do this, we recompute everything and arrive at the
conclusion that at least most of them should stay where they are.

>> --
>> Regards,
>> Feri
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-23 Thread Andrew Beekhof
On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers <aspi...@suse.com> wrote:
> Andrew Beekhof <abeek...@redhat.com> wrote:

>> > Well, if you're OK with bending the rules like this then that's good
>> > enough for me to say we should at least try it :)
>>
>> I still say you shouldn't only do it on error.
>
> When else should it be done?

I was thinking whenever a stop() happens.

> IIUC, disabling/enabling the service is independent of the up/down
> state which nova tracks automatically, and which based on slightly
> more than a skim of the code, is dependent on the state of the RPC
> layer.
>
>> > But how would you avoid repeated consecutive invocations of "nova
>> > service-disable" when the monitor action fails, and ditto for "nova
>> > service-enable" when it succeeds?
>>
>> I don't think you can. Not ideal but I'd not have thought a deal breaker.
>
> Sounds like a massive deal-breaker to me!  With op monitor
> interval="10s" and 100 compute nodes, that would mean 10 pointless
> calls to nova-api every second.  Am I missing something?

I was thinking you would only call it for the "I detected a failure
case" and service-enable would still be on start().
So the number of pointless calls per second would be capped at one
tenth of the number of failed compute nodes.

One would hope that all of them weren't dead.

>
> Also I don't see any benefit to moving the API calls from start/stop
> actions to the monitor action.  If there's a failure, Pacemaker will
> invoke the stop action, so we can do service-disable there.

I agree. Doing it unconditionally at stop() is my preferred option, I
was only trying to provide a path that might be close to the behaviour
you were looking for.

> If the
> start action is invoked and we successfully initiate startup of
> nova-compute, the RA can undo any service-disable it previously did
> (although it should not reverse a service-disable done elsewhere,
> e.g. manually by the cloud operator).

Agree

>
>> > Earlier in this thread I proposed
>> > the idea of a tiny temporary file in /run which tracks the last known
>> > state and optimizes away the consecutive invocations, but IIRC you
>> > were against that.
>>
>> I'm generally not a fan, but sometimes state files are a necessity.
>> Just make sure you think through what a missing file might mean.
>
> Sure.  A missing file would mean the RA's never called service-disable
> before,

And that is why I generally don't like state files.
The default location for state files doesn't persist across reboots.

t1. stop (ie. disable)
t2. reboot
t3. start with no state file
t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS

> which means that it shouldn't call service-enable on startup.
>
>> Unless use the state file to store the date at which the last
>> start operation occurred?
>>
>> If we're calling stop() and data - start_date > threshold, then, if
>> you must, be optimistic, skip service-disable and assume we'll get
>> started again soon.
>>
>> Otherwise if we're calling stop() and data - start_date <= threshold,
>> always call service-disable because we're in a restart loop which is
>> not worth optimising for.
>>
>> ( And always call service-enable at start() )
>>
>> No Pacemaker feature or Beekhof approval required :-)
>
> Hmm ...  it's possible I just don't understand this proposal fully,
> but it sounds a bit woolly to me, e.g. how would you decide a suitable
> threshold?

roll a dice?

> I think I preferred your other suggestion of just skipping the
> optimization, i.e. calling service-disable on the first stop, and
> service-enable on (almost) every start.

good :)


And the use of force-down from your subsequent email sounds excellent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-15 Thread Andrew Beekhof
On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers <aspi...@suse.com> wrote:
> Andrew Beekhof <abeek...@redhat.com> wrote:
>> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers <aspi...@suse.com> wrote:
>> > Andrew Beekhof <abeek...@redhat.com> wrote:
>> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers <aspi...@suse.com> wrote:
>> >> > Andrew Beekhof <abeek...@redhat.com> wrote:
>> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote:
>> >> >> > We would also need to ensure that service-enable is called on start
>> >> >> > when necessary.  Perhaps we could track the enable/disable state in a
>> >> >> > local temporary file, and if the file indicates that we've previously
>> >> >> > done service-disable, we know to run service-enable on start.  This
>> >> >> > would avoid calling service-enable on every single start.
>> >> >>
>> >> >> feels like an over-optimization
>> >> >> in fact, the whole thing feels like that if i'm honest.
>> >> >
>> >> > Huh ... You didn't seem to think that when we discussed automating
>> >> > service-disable at length in Austin.
>> >>
>> >> I didn't feel the need to push back because RH uses the systemd agent
>> >> instead so you're only hanging yourself, but more importantly because
>> >> the proposed implementation to facilitate it wasn't leading RA writers
>> >> down a hazardous path :-)
>> >
>> > I'm a bit confused by that statement, because the only proposed
>> > implementation we came up with in Austin was adding this new feature
>> > to Pacemaker.
>>
>> _A_ new feature, not _this_ new feature.
>> The one we discussed was far less prone to being abused but, as it
>> turns out, also far less useful for what you were trying to do.
>
> Was there really that much significant change since the original idea?
> IIRC the only thing which really changed was the type, from "number of
> retries remaining" to a boolean "there are still some retries" left.

The new implementation has nothing to do with retries. Like the new
name, it is based on "is a start action expected".
Thats why I got an attack of the heebie-jeebies.

> I'm not sure why the integer approach would be far less open to abuse,
> or even why it would have been far less useful.  I'm probably missing
> something.
>
> [snipped]
>
>> >> >> why are we trying to optimise the projected performance impact
>> >> >
>> >> > It's not really "projected"; we know exactly what the impact is.  And
>> >> > it's not really a performance impact either.  If nova-compute (or a
>> >> > dependency) is malfunctioning on a compute node, there will be a
>> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
>> >> > which nova-scheduler could still schedule VMs onto that compute node,
>> >> > and then of course they'll fail to boot.
>> >>
>> >> Right, but that window exists regardless of whether the node is or is
>> >> not ever coming back.
>> >
>> > Sure, but the window's a *lot* bigger if we don't do service-disable.
>> > Although perhaps your question "why are we trying to optimise the
>> > projected performance impact" was actually "why are we trying to avoid
>> > extra calls to service-disable" rather than "why do we want to call
>> > service-disable" as I initially assumed.  Is that right?
>>
>> Exactly.  I assumed it was to limit the noise we'd be generating in doing so.
>
> Sort of - not just the noise, but the extra delay introduced by
> calling service-disable, restarting nova-compute, and then calling
> service-enable again when it succeeds.

Ok, but restarting nova-compute is not optional and the bits that are
optional are all but completely asynchronous* - so the overhead should
be negligible.

* Like most API calls, they are Ack'd when the request has been
received, not processed.

>
> [snipped]
>
>> >> > The masakari folks have a lot of operational experience in this space,
>> >> > and they found that this was enough of a problem to justify calling
>> >> > nova service-disable whenever the failure is detected.
>> >>
>> >> If you really want it whenever the failure is detected, call it from
>> >> the monitor operation that finds it broken.
>> >
>

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-06-14 Thread Andrew Beekhof
tl;dr - dont port pacemaker, use pacemaker-remote instead

On Wed, May 18, 2016 at 5:20 PM, Jan Friesse  wrote:
> Ken Gaillot napsal(a):
>
>> On 05/17/2016 09:54 AM, Digimer wrote:
>>>
>>> On 16/05/16 04:35 AM, Bogdan Dobrelya wrote:

 On 05/16/2016 09:23 AM, Jan Friesse wrote:
>>
>> Hi,
>>
>> I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is
>> it possible?
>> Is there any examination about that?


 Indeed, would be *great* to have a Pacemaker based control plane on top
 of other "pluggable" distributed KVS & messaging systems, for example
 etcd as well :)
 I'm looking forward to joining any dev efforts around that, although I'm
 not a Java or Go developer.
>>>
>>>
>>> Part of open source is the freedom to do whatever you want, of course.
>>>
>>> Let me ask though; What problems would zookeeper, etcd or other systems
>>> solve that can't be solved in corosync?
>>>
>>> I ask because the HA community just finished a multi-year effort to
>>> merge different projects into one common HA stack. This has a lot of
>>> benefits to the user base, not least of which is lack of confusion.
>>>
>>> Strikes me that the significant time investment in supporting a new
>>> comms layer would be much more beneficially spent on improving the
>>> existing stack.
>>>
>>> Again, anyone is free to do whatever they want... I just don't see the
>>> motivator personally.
>>>
>>> digimer
>>
>>
>> I see one big difference that is both a strength and a weakness: these
>> other packages have a much wider user base beyond the HA cluster use
>> case. The strength is that there will be many more developers working to
>> fix bugs, add features, etc. The weakness is that most of those
>
>
> This is exactly what I was thinking about during 2.x developement. If
> replacement of Corosync wouldn't make more sense than continue developing of
> Corosync. I was able to accept implementing some features. Sadly, there was
> exactly ONE project which would be able to replace corosync (Spread toolkit)
> which is even less widespread than Corosync.
>
> From my point of view, replacement of corosync must be (at least) able to:
> - Work without quorum

Agreed. It's a non-starter if messaging stops when quorum is lost.

> - Support 2 node clusters
> - Allow multiple links (something like RRP)

Doesn't bonding (and the imminent arrival of knet) make this somewhat optional?

> - Don't include SPOF (so nothing like configuration stored on one node only
> and/or different machine on network)

There can be subtle variations on this.  The pattern in OpenStack is
to have a "management node".
Which sounds like a SPOF but they also require that the service be
able to function without it.  So its a grey area.

> - Provide EVS/VS

Pacemaker could live without this.  Heartbeat didn't provide it either.

> - Provide something like qdevice

Or the ability to create it.  In fairness, Pacemaker has gotten by for
a long long time without it :-)


It would be nice to be considered for the kinds of scaled deployments
that kubernetes and etcd are built for because that's where all the
excitement and mindshare is.  Zookeeper was one of the options I
thought of too, however realistically Pacemaker is not what those
folks are looking for.  At those scales our stack's smarts take a back
seat to the idea that there are so many copies that dozens can die and
the only recovery you need is to maybe start some more copies (because
with so many, there is always a master around somewhere).


For those of us with a need to scale _and_ an appreciation of "real"
resource orchestration, I would argue that a better architecture is a
small traditional cluster managing a much larger pool of
pacemaker-remote nodes.  Putting effort into making that really shine
(especially since its pretty solid already) is likely to have a better
payoff than porting to another messaging layer.


>
> Both zookeeper and etcd builds on top of quite simple to understand
> membership mechanism (zookeeper = elected master, something like amoeba,
> etcd = raft), what's nice, because it means more contributors. Sadly bare
> metal HA must work even in situations where "simple" quorum is not enough.
>
>
>
>> developers are ignorant of HA clustering and could easily cause more
>> problems for the HA use case than they fix.
>>
>> Another potential benefit is the familiarity factor -- people are more
>> comfortable with things they recognize from somewhere else. So it might
>> help Pacemaker adoption, especially in the communities that already use
>> these packages.
>>
>> I'm not aware of any technical advantages, and I wouldn't expect any,
>> given corosync's long HA focus.
>>
>  From my point of view (and yes, I'm biased), biggest problem of
> Zookeper
> is need to have quorum
>
> (https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_designing).
> Direct consequence is inability to 

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-14 Thread Andrew Beekhof
On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers <aspi...@suse.com> wrote:
> Andrew Beekhof <abeek...@redhat.com> wrote:
>> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers <aspi...@suse.com> wrote:
>> > Andrew Beekhof <abeek...@redhat.com> wrote:
>> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote:
>> >> > Ken Gaillot <kgail...@redhat.com> wrote:
>> >> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> >> >> > Maybe your point was that if the expected start never happens (so
>> >> >> > never even gets a chance to fail), we still want to do a nova
>> >> >> > service-disable?
>> >> >>
>> >> >> That is a good question, which might mean it should be done on every
>> >> >> stop -- or could that cause problems (besides delays)?
>> >> >
>> >> > No, the whole point of adding this feature is to avoid a
>> >> > service-disable on every stop, and instead only do it on the final
>> >> > stop.  If there are corner cases where we never reach the final stop,
>> >> > that's not a disaster because nova will eventually figure it out and
>> >> > do the right thing when the server-agent connection times out.
>> >> >
>> >> >> Another aspect of this is that the proposed feature could only look at 
>> >> >> a
>> >> >> single transition. What if stop is called with start_expected=false, 
>> >> >> but
>> >> >> then Pacemaker is able to start the service on the same node in the 
>> >> >> next
>> >> >> transition immediately afterward? Would having called service-disable
>> >> >> cause problems for that start?
>> >> >
>> >> > We would also need to ensure that service-enable is called on start
>> >> > when necessary.  Perhaps we could track the enable/disable state in a
>> >> > local temporary file, and if the file indicates that we've previously
>> >> > done service-disable, we know to run service-enable on start.  This
>> >> > would avoid calling service-enable on every single start.
>> >>
>> >> feels like an over-optimization
>> >> in fact, the whole thing feels like that if i'm honest.
>> >
>> > Huh ... You didn't seem to think that when we discussed automating
>> > service-disable at length in Austin.
>>
>> I didn't feel the need to push back because RH uses the systemd agent
>> instead so you're only hanging yourself, but more importantly because
>> the proposed implementation to facilitate it wasn't leading RA writers
>> down a hazardous path :-)
>
> I'm a bit confused by that statement, because the only proposed
> implementation we came up with in Austin was adding this new feature
> to Pacemaker.

_A_ new feature, not _this_ new feature.
The one we discussed was far less prone to being abused but, as it
turns out, also far less useful for what you were trying to do.

 Prior to that, AFAICR, you, Dawid, and I had a long
> afternoon discussion in the sun where we tried to figure out a way to
> implement it just by tweaking the OCF RAs, but every approach we
> discussed turned out to have fundamental issues.  That's why we
> eventually turned to the idea of this new feature in Pacemaker.
>
> But anyway, it's water under the bridge now :-)
>
>> > What changed?  Can you suggest a better approach?
>>
>> Either always or never disable the service would be my advice.
>> "Always" specifically getting my vote.
>
> OK, thanks.  We discussed that at the meeting this morning, and it
> looks like we'll give it a try.
>
>> >> why are we trying to optimise the projected performance impact
>> >
>> > It's not really "projected"; we know exactly what the impact is.  And
>> > it's not really a performance impact either.  If nova-compute (or a
>> > dependency) is malfunctioning on a compute node, there will be a
>> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in
>> > which nova-scheduler could still schedule VMs onto that compute node,
>> > and then of course they'll fail to boot.
>>
>> Right, but that window exists regardless of whether the node is or is
>> not ever coming back.
>
> Sure, but the window's a *lot* bigger if we don't do service-disable.
> Although perhaps your question "why are we trying to optimise the
> projected performance impact" was actually &quo

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-09 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers <aspi...@suse.com> wrote:
> Andrew Beekhof <abeek...@redhat.com> wrote:
>> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote:
>> > Ken Gaillot <kgail...@redhat.com> wrote:
>> >> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> >> > Adam Spiers <aspi...@suse.com> wrote:
>> >> >> Andrew Beekhof <abeek...@redhat.com> wrote:
>> >> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>> >> >>>> Ken Gaillot <kgail...@redhat.com> wrote:
>> >> >>>>> My main question is how useful would it actually be in the proposed 
>> >> >>>>> use
>> >> >>>>> cases. Considering the possibility that the expected start might 
>> >> >>>>> never
>> >> >>>>> happen (or fail), can an RA really do anything different if
>> >> >>>>> start_expected=true?
>> >> >>>>
>> >> >>>> That's the wrong question :-)
>> >> >>>>
>> >> >>>>> If the use case is there, I have no problem with
>> >> >>>>> adding it, but I want to make sure it's worthwhile.
>> >> >>>>
>> >> >>>> The use case which started this whole thread is for
>> >> >>>> start_expected=false, not start_expected=true.
>> >> >>>
>> >> >>> Isn't this just two sides of the same coin?
>> >> >>> If you're not doing the same thing for both cases, then you're just
>> >> >>> reversing the order of the clauses.
>> >> >>
>> >> >> No, because the stated concern about unreliable expectations
>> >> >> ("Considering the possibility that the expected start might never
>> >> >> happen (or fail)") was regarding start_expected=true, and that's the
>> >> >> side of the coin we don't care about, so it doesn't matter if it's
>> >> >> unreliable.
>> >> >
>> >> > BTW, if the expected start happens but fails, then Pacemaker will just
>> >> > keep repeating until migration-threshold is hit, at which point it
>> >> > will call the RA 'stop' action finally with start_expected=false.
>> >> > So that's of no concern.
>> >>
>> >> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>> >
>> > Sure.
>> >
>> >> > Maybe your point was that if the expected start never happens (so
>> >> > never even gets a chance to fail), we still want to do a nova
>> >> > service-disable?
>> >>
>> >> That is a good question, which might mean it should be done on every
>> >> stop -- or could that cause problems (besides delays)?
>> >
>> > No, the whole point of adding this feature is to avoid a
>> > service-disable on every stop, and instead only do it on the final
>> > stop.  If there are corner cases where we never reach the final stop,
>> > that's not a disaster because nova will eventually figure it out and
>> > do the right thing when the server-agent connection times out.
>> >
>> >> Another aspect of this is that the proposed feature could only look at a
>> >> single transition. What if stop is called with start_expected=false, but
>> >> then Pacemaker is able to start the service on the same node in the next
>> >> transition immediately afterward? Would having called service-disable
>> >> cause problems for that start?
>> >
>> > We would also need to ensure that service-enable is called on start
>> > when necessary.  Perhaps we could track the enable/disable state in a
>> > local temporary file, and if the file indicates that we've previously
>> > done service-disable, we know to run service-enable on start.  This
>> > would avoid calling service-enable on every single start.
>>
>> feels like an over-optimization
>> in fact, the whole thing feels like that if i'm honest.
>
> Huh ... You didn't seem to think that when we discussed automating
> service-disable at length in Austin.

I didn't feel the need to push back because RH uses the systemd agent
instead so you're only hanging yourself, but more importantly because
the proposed implementation to facilitate it wasn't leading RA writers
down a hazardous path :-)

>  Wha

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 10:29 AM, Andrew Beekhof <abeek...@redhat.com> wrote:
> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote:
>> Ken Gaillot <kgail...@redhat.com> wrote:
>>> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>>> > Adam Spiers <aspi...@suse.com> wrote:
>>> >> Andrew Beekhof <abeek...@redhat.com> wrote:
>>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>>> >>>> Ken Gaillot <kgail...@redhat.com> wrote:
>>> >>>>> My main question is how useful would it actually be in the proposed 
>>> >>>>> use
>>> >>>>> cases. Considering the possibility that the expected start might never
>>> >>>>> happen (or fail), can an RA really do anything different if
>>> >>>>> start_expected=true?
>>> >>>>
>>> >>>> That's the wrong question :-)
>>> >>>>
>>> >>>>> If the use case is there, I have no problem with
>>> >>>>> adding it, but I want to make sure it's worthwhile.
>>> >>>>
>>> >>>> The use case which started this whole thread is for
>>> >>>> start_expected=false, not start_expected=true.
>>> >>>
>>> >>> Isn't this just two sides of the same coin?
>>> >>> If you're not doing the same thing for both cases, then you're just
>>> >>> reversing the order of the clauses.
>>> >>
>>> >> No, because the stated concern about unreliable expectations
>>> >> ("Considering the possibility that the expected start might never
>>> >> happen (or fail)") was regarding start_expected=true, and that's the
>>> >> side of the coin we don't care about, so it doesn't matter if it's
>>> >> unreliable.
>>> >
>>> > BTW, if the expected start happens but fails, then Pacemaker will just
>>> > keep repeating until migration-threshold is hit, at which point it
>>> > will call the RA 'stop' action finally with start_expected=false.
>>> > So that's of no concern.
>>>
>>> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>>
>> Sure.
>>
>>> > Maybe your point was that if the expected start never happens (so
>>> > never even gets a chance to fail), we still want to do a nova
>>> > service-disable?
>>>
>>> That is a good question, which might mean it should be done on every
>>> stop -- or could that cause problems (besides delays)?
>>
>> No, the whole point of adding this feature is to avoid a
>> service-disable on every stop, and instead only do it on the final
>> stop.  If there are corner cases where we never reach the final stop,
>> that's not a disaster because nova will eventually figure it out and
>> do the right thing when the server-agent connection times out.
>>
>>> Another aspect of this is that the proposed feature could only look at a
>>> single transition. What if stop is called with start_expected=false, but
>>> then Pacemaker is able to start the service on the same node in the next
>>> transition immediately afterward? Would having called service-disable
>>> cause problems for that start?
>>
>> We would also need to ensure that service-enable is called on start
>> when necessary.  Perhaps we could track the enable/disable state in a
>> local temporary file, and if the file indicates that we've previously
>> done service-disable, we know to run service-enable on start.  This
>> would avoid calling service-enable on every single start.
>
> feels like an over-optimization
> in fact, the whole thing feels like that if i'm honest.

Today the stars aligned :-)

   http://xkcd.com/1691/

>
> why are we trying to optimise the projected performance impact when
> the system is in terrible shape already?
>
>>
>>> > Yes that would be nice, but this proposal was never intended to
>>> > address that.  I guess we'd need an entirely different mechanism in
>>> > Pacemaker for that.  But let's not allow perfection to become the
>>> > enemy of the good ;-)
>>>
>>> The ultimate concern is that this will encourage people to write RAs
>>> that leave services in a dangerous state after stop is called.
>>
>> I don't see why it would.
>
> Previous experience suggests it definitely will.
>
> People will d

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-07 Thread Andrew Beekhof
On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote:
> Ken Gaillot <kgail...@redhat.com> wrote:
>> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> > Adam Spiers <aspi...@suse.com> wrote:
>> >> Andrew Beekhof <abeek...@redhat.com> wrote:
>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>> >>>> Ken Gaillot <kgail...@redhat.com> wrote:
>> >>>>> My main question is how useful would it actually be in the proposed use
>> >>>>> cases. Considering the possibility that the expected start might never
>> >>>>> happen (or fail), can an RA really do anything different if
>> >>>>> start_expected=true?
>> >>>>
>> >>>> That's the wrong question :-)
>> >>>>
>> >>>>> If the use case is there, I have no problem with
>> >>>>> adding it, but I want to make sure it's worthwhile.
>> >>>>
>> >>>> The use case which started this whole thread is for
>> >>>> start_expected=false, not start_expected=true.
>> >>>
>> >>> Isn't this just two sides of the same coin?
>> >>> If you're not doing the same thing for both cases, then you're just
>> >>> reversing the order of the clauses.
>> >>
>> >> No, because the stated concern about unreliable expectations
>> >> ("Considering the possibility that the expected start might never
>> >> happen (or fail)") was regarding start_expected=true, and that's the
>> >> side of the coin we don't care about, so it doesn't matter if it's
>> >> unreliable.
>> >
>> > BTW, if the expected start happens but fails, then Pacemaker will just
>> > keep repeating until migration-threshold is hit, at which point it
>> > will call the RA 'stop' action finally with start_expected=false.
>> > So that's of no concern.
>>
>> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>
> Sure.
>
>> > Maybe your point was that if the expected start never happens (so
>> > never even gets a chance to fail), we still want to do a nova
>> > service-disable?
>>
>> That is a good question, which might mean it should be done on every
>> stop -- or could that cause problems (besides delays)?
>
> No, the whole point of adding this feature is to avoid a
> service-disable on every stop, and instead only do it on the final
> stop.  If there are corner cases where we never reach the final stop,
> that's not a disaster because nova will eventually figure it out and
> do the right thing when the server-agent connection times out.
>
>> Another aspect of this is that the proposed feature could only look at a
>> single transition. What if stop is called with start_expected=false, but
>> then Pacemaker is able to start the service on the same node in the next
>> transition immediately afterward? Would having called service-disable
>> cause problems for that start?
>
> We would also need to ensure that service-enable is called on start
> when necessary.  Perhaps we could track the enable/disable state in a
> local temporary file, and if the file indicates that we've previously
> done service-disable, we know to run service-enable on start.  This
> would avoid calling service-enable on every single start.

feels like an over-optimization
in fact, the whole thing feels like that if i'm honest.

why are we trying to optimise the projected performance impact when
the system is in terrible shape already?

>
>> > Yes that would be nice, but this proposal was never intended to
>> > address that.  I guess we'd need an entirely different mechanism in
>> > Pacemaker for that.  But let's not allow perfection to become the
>> > enemy of the good ;-)
>>
>> The ultimate concern is that this will encourage people to write RAs
>> that leave services in a dangerous state after stop is called.
>
> I don't see why it would.

Previous experience suggests it definitely will.

People will do exactly what you're thinking but with something important.
They'll see it behaves as they expect in best-case testing and never
think about the corner cases.
Then they'll start thinking about optimising their start operations,
write some "optimistic" state recording code and break those too.

Imagine a bug in your state recording code (maybe you forget to handle
a missing state file after reboot) that means the 'enable' does't get
run.  The service is up, but nova will never use it.

> The new fea

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 9:07 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 06/06/2016 05:45 PM, Adam Spiers wrote:
>> Adam Spiers <aspi...@suse.com> wrote:
>>> Andrew Beekhof <abeek...@redhat.com> wrote:
>>>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>>>>> Ken Gaillot <kgail...@redhat.com> wrote:
>>>>>> My main question is how useful would it actually be in the proposed use
>>>>>> cases. Considering the possibility that the expected start might never
>>>>>> happen (or fail), can an RA really do anything different if
>>>>>> start_expected=true?
>>>>>
>>>>> That's the wrong question :-)
>>>>>
>>>>>> If the use case is there, I have no problem with
>>>>>> adding it, but I want to make sure it's worthwhile.
>>>>>
>>>>> The use case which started this whole thread is for
>>>>> start_expected=false, not start_expected=true.
>>>>
>>>> Isn't this just two sides of the same coin?
>>>> If you're not doing the same thing for both cases, then you're just
>>>> reversing the order of the clauses.
>>>
>>> No, because the stated concern about unreliable expectations
>>> ("Considering the possibility that the expected start might never
>>> happen (or fail)") was regarding start_expected=true, and that's the
>>> side of the coin we don't care about, so it doesn't matter if it's
>>> unreliable.
>>
>> BTW, if the expected start happens but fails, then Pacemaker will just
>> keep repeating until migration-threshold is hit, at which point it
>> will call the RA 'stop' action finally with start_expected=false.
>> So that's of no concern.
>
> To clarify, that's configurable, via start-failure-is-fatal and on-fail
>
>> Maybe your point was that if the expected start never happens (so
>> never even gets a chance to fail), we still want to do a nova
>> service-disable?
>
> That is a good question, which might mean it should be done on every
> stop -- or could that cause problems (besides delays)?
>
> Another aspect of this is that the proposed feature could only look at a
> single transition. What if stop is called with start_expected=false, but
> then Pacemaker is able to start the service on the same node in the next
> transition immediately afterward? Would having called service-disable
> cause problems for that start?
>
>> Yes that would be nice, but this proposal was never intended to
>> address that.  I guess we'd need an entirely different mechanism in
>> Pacemaker for that.  But let's not allow perfection to become the
>> enemy of the good ;-)
>
> The ultimate concern is that this will encourage people to write RAs
> that leave services in a dangerous state after stop is called.
>
> I think with naming and documenting it properly, I'm fine to provide the
> option, but I'm on the fence. Beekhof needs a little more convincing :-)

I think the new name is a big step in the right direction

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 8:45 AM, Adam Spiers <aspi...@suse.com> wrote:
> Adam Spiers <aspi...@suse.com> wrote:
>> Andrew Beekhof <abeek...@redhat.com> wrote:
>> > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>> > > Ken Gaillot <kgail...@redhat.com> wrote:
>> > >> My main question is how useful would it actually be in the proposed use
>> > >> cases. Considering the possibility that the expected start might never
>> > >> happen (or fail), can an RA really do anything different if
>> > >> start_expected=true?
>> > >
>> > > That's the wrong question :-)
>> > >
>> > >> If the use case is there, I have no problem with
>> > >> adding it, but I want to make sure it's worthwhile.
>> > >
>> > > The use case which started this whole thread is for
>> > > start_expected=false, not start_expected=true.
>> >
>> > Isn't this just two sides of the same coin?
>> > If you're not doing the same thing for both cases, then you're just
>> > reversing the order of the clauses.
>>
>> No, because the stated concern about unreliable expectations
>> ("Considering the possibility that the expected start might never
>> happen (or fail)") was regarding start_expected=true, and that's the
>> side of the coin we don't care about, so it doesn't matter if it's
>> unreliable.
>
> BTW, if the expected start happens but fails, then Pacemaker will just
> keep repeating until migration-threshold is hit, at which point it
> will call the RA 'stop' action finally with start_expected=false.

Maybe. Maybe not. People cannot rely on this and I'd put money on them
trying :-)

> So that's of no concern.
>
> Maybe your point was that if the expected start never happens (so
> never even gets a chance to fail), we still want to do a nova
> service-disable?

Exactly :)

>
> Yes that would be nice, but this proposal was never intended to
> address that.  I guess we'd need an entirely different mechanism in
> Pacemaker for that.  But let's not allow perfection to become the
> enemy of the good ;-)
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Andrew Beekhof
On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
> Ken Gaillot <kgail...@redhat.com> wrote:
>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>> >> A recent thread discussed a proposed new feature, a new environment
>> >> variable that would be passed to resource agents, indicating whether a
>> >> stop action was part of a recovery.
>> >>
>> >> Since that thread was long and covered a lot of topics, I'm starting a
>> >> new one to focus on the core issue remaining:
>> >>
>> >> The original idea was to pass the number of restarts remaining before
>> >> the resource will no longer tried to be started on the same node. This
>> >> involves calculating (fail-count - migration-threshold), and that
>> >> implies certain limitations: (1) it will only be set when the cluster
>> >> checks migration-threshold; (2) it will only be set for the failed
>> >> resource itself, not for other resources that may be recovered due to
>> >> dependencies on it.
>> >>
>> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
>> >> new variable like OCF_RESKEY_CRM_recovery=true
>> >
>> > This concept worries me, especially when what we've implemented is
>> > called OCF_RESKEY_CRM_restarting.
>>
>> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>
> [snipped]
>
>> My main question is how useful would it actually be in the proposed use
>> cases. Considering the possibility that the expected start might never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
>
> That's the wrong question :-)
>
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.
>
> The use case which started this whole thread is for
> start_expected=false, not start_expected=true.

Isn't this just two sides of the same coin?
If you're not doing the same thing for both cases, then you're just
reversing the order of the clauses.

"A isn't different from B, B is different from A!" :-)

> When it's false for
> NovaCompute, we call nova service-disable to ensure that nova doesn't
> attempt to schedule any more VMs on that host.
>
> If start_expected=true, we don't *want* to do anything different.  So
> it doesn't matter even if the expected start never happens.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-05 Thread Andrew Beekhof
On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>> A recent thread discussed a proposed new feature, a new environment
>>> variable that would be passed to resource agents, indicating whether a
>>> stop action was part of a recovery.
>>>
>>> Since that thread was long and covered a lot of topics, I'm starting a
>>> new one to focus on the core issue remaining:
>>>
>>> The original idea was to pass the number of restarts remaining before
>>> the resource will no longer tried to be started on the same node. This
>>> involves calculating (fail-count - migration-threshold), and that
>>> implies certain limitations: (1) it will only be set when the cluster
>>> checks migration-threshold; (2) it will only be set for the failed
>>> resource itself, not for other resources that may be recovered due to
>>> dependencies on it.
>>>
>>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>>> new variable like OCF_RESKEY_CRM_recovery=true
>>
>> This concept worries me, especially when what we've implemented is
>> called OCF_RESKEY_CRM_restarting.
>
> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>
>> The name alone encourages people to "optimise" the agent to not
>> actually stop the service "because its just going to start again
>> shortly".  I know thats not what Adam would do, but not everyone
>> understands how clusters work.
>>
>> There are any number of reasons why a cluster that intends to restart
>> a service may not do so.  In such a scenario, a badly written agent
>> would cause the cluster to mistakenly believe that the service is
>> stopped - allowing it to start elsewhere.
>>
>> Its true there are any number of ways to write bad agents, but I would
>> argue that we shouldn't be nudging people in that direction :)
>
> I do have mixed feelings about that. I think if we name it
> start_expected, and document it carefully, we can avoid any casual mistakes.
>
> My main question is how useful would it actually be in the proposed use
> cases. Considering the possibility that the expected start might never
> happen (or fail), can an RA really do anything different if
> start_expected=true?

I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.

> If the use case is there, I have no problem with
> adding it, but I want to make sure it's worthwhile.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-02 Thread Andrew Beekhof
On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
> A recent thread discussed a proposed new feature, a new environment
> variable that would be passed to resource agents, indicating whether a
> stop action was part of a recovery.
>
> Since that thread was long and covered a lot of topics, I'm starting a
> new one to focus on the core issue remaining:
>
> The original idea was to pass the number of restarts remaining before
> the resource will no longer tried to be started on the same node. This
> involves calculating (fail-count - migration-threshold), and that
> implies certain limitations: (1) it will only be set when the cluster
> checks migration-threshold; (2) it will only be set for the failed
> resource itself, not for other resources that may be recovered due to
> dependencies on it.
>
> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> forgot to cc the list on my reply, so I'll summarize now: We would set a
> new variable like OCF_RESKEY_CRM_recovery=true

This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.

The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)

> whenever a start is
> scheduled after a stop on the same node in the same transition. This
> would avoid the corner cases of the previous approach; instead of being
> tied to migration-threshold, it would be set whenever a recovery was
> being attempted, for any reason. And with this approach, it should be
> easier to set the variable for all actions on the resource
> (demote/stop/start/promote), rather than just the stop.
>
> I think the boolean approach fits all the envisioned use cases that have
> been discussed. Any objections to going that route instead of the count?
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-07 Thread Andrew Beekhof
On Tue, Mar 8, 2016 at 7:03 AM, Ferenc Wágner  wrote:

> Ken Gaillot  writes:
>
> > On 03/07/2016 07:31 AM, Ferenc Wágner wrote:
> >
> >> 12:55:13 vhbl07 crmd[8484]: notice: Transition aborted by
> vm-eiffel_monitor_6 'create' on vhbl05: Foreign event
> (magic=0:0;521:0:0:634eef05-39c1-4093-94d4-8d624b423bb7, cib=0.613.98,
> source=process_graph_event:600, 0)
> >
> > That means the action was initiated by a different node (the previous DC
> > presumably),


I suspect s/previous/other/

With a stuck machine its entirely possible that the other nodes elected a
new leader.
Would I be right in guessing that fencing is disabled?


> so the new DC wants to recalculate everything.
>
> Time travel was sort of possible in that situation, and recurring
> monitor operations are not logged, so this is indeed possible.  The main
> thing is that it wasn't mishandled.
>
> >> recovery actions turned into start actions for the resources stopped
> >> during the previous transition.  However, almost all other recovery
> >> actions just disappeared without any comment.  This was actually
> >> correct, but I really wonder why the cluster decided to paper over
> >> the previous monitor operation timeouts.  Maybe the operations
> >> finished meanwhile and got accounted somehow, just not logged?
> >
> > I'm not sure why the PE decided recovery was not necessary. Operation
> > results wouldn't be accepted without being logged.
>
> At which logging level?  I can't see recurring monitor operation logs in
> syslog (at default logging level: notice) nor in /var/log/pacemaker.log
> (which contains info level messages as well).
>

The DC will log that the recurring monitor was successfully started, but
due to noise it doesn't log subsequent successes.


>
> However, the info level logs contain more "Transition aborted" lines, as
> if only the first of them got logged with notice level.  This would make
> sense, since the later ones don't make any difference on an already
> aborted transition, so they aren't that important.  And in fact such
> lines were suppressed from the syslog I checked first, for example:
>
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: Diff: ---
> 0.613.120 2
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: Diff: +++
> 0.613.121 (null)
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: +  /cib:
> @num_updates=121
> 12:55:39 [8479] vhbl07cib: info: cib_perform_op: ++
> /cib/status/node_state[@id='167773707']/lrm[@id='167773707']/lrm_resources/lrm_resource[@id='vm-elm']:
>  operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.10"
> transition-key="473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> transition-magic="0:0;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> on_node="vhbl05" call-id="645" rc-code="0" op-st
> 12:55:39 [8479] vhbl07cib: info: cib_process_request:
> Completed cib_modify operation for section status: OK (rc=0,
> origin=vhbl05/crmd/362, version=0.613.121)
> 12:55:39 [8484] vhbl07   crmd: info: abort_transition_graph:
>  Transition aborted by vm-elm_monitor_6 'create' on vhbl05: Foreign
> event (magic=0:0;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7,
> cib=0.613.121, source=process_graph_event:600, 0)
> 12:55:39 [8484] vhbl07   crmd: info: process_graph_event:
> Detected action (0.473) vm-elm_monitor_6.645=ok: initiated by a
> different node
>
> I can very much imagine this cancelling the FAILED state induced by a
> monitor timeout like:
>
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++
> type="TransientDomain" class="ocf" provider="niif">
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++
>   id="vm-elm_last_failure_0" operation_key="vm-elm_monitor_6"
> operation="monitor" crm-debug-origin="build_active_RAs"
> crm_feature_set="3.0.10"
> transition-key="473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> transition-magic="2:1;473:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> on_node="vhbl05" call-id="645" rc-code="1" op-status="2" interval="6"
> last-rc-change="1456833279" exe
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++
>   operation_key="vm-elm_start_0" operation="start"
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10"
> transition-key="472:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> transition-magic="0:0;472:0:0:634eef05-39c1-4093-94d4-8d624b423bb7"
> on_node="vhbl05" call-id="602" rc-code="0" op-status="0" interval="0"
> last-run="1456091121" last-rc-change="1456091121" e
> 12:54:52 [8479] vhbl07cib: info: cib_perform_op: ++
>
>
> The transition-keys match, does this mean that the above is a late
> result from the monitor operation which was considered timed-out
> 

Re: [ClusterLabs] Status of constraint on per instance of cloned resource?

2016-01-27 Thread Andrew Beekhof

> On 27 Jan 2016, at 8:51 PM, Ming-Xun Zhong  wrote:
> 
> Hello,
> 
> What is the roadmap of constraint on instance of cloned resource ?  Aka 
> rsc-instance/with-rsc-instance/first-instance/then-instance
> 
> I found these features in old mail [1] and are using them in production.
> The newer version of crmsh remove support for these constraints.
> 
> Yes I read that Andrew recommend not doing this. The story is each of my 
> resource 
> need bind two IP address on different network derivatived from external ID 
> not under
> my control.
> 
> Will this feature be promoted in future version of Pacemaker? 

Not until if/when we can come up with a compelling use-case for it.
Maybe you’ve hit one, could you expand on what you’re doing and why?

> I could add support to crmsh if there is no objection.
> 
> [1] https://www.redhat.com/archives/linux-cluster/2014-February/msg00049.html
> 
> -- 
> Bill Ming-Xun Zhong 鍾明勳
> http://about.me/zhongmx
> http://twitter.com/zmx
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.14 released

2016-01-24 Thread Andrew Beekhof
Awesome job taking the lead on this Ken.
Some very important updates in this release, thanks for making sure they got 
out there :)

> On 15 Jan 2016, at 8:49 AM, Ken Gaillot  wrote:
> 
> ClusterLabs is proud to announce the latest release of the Pacemaker
> cluster resource manager, version 1.1.14. The source code is available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.14
> 
> This version introduces some valuable new features:
> 
> * Resources will now start as soon as their state has been confirmed on
> all nodes and all dependencies have been satisfied, rather than waiting
> for the state of all resources to be confirmed. This allows for faster
> startup of some services, and more even startup load.
> 
> * Fencing topology levels can now be applied to all nodes whose name
> matches a configurable pattern, or that have a configurable node attribute.
> 
> * When a fencing topology level has multiple devices, reboots are now
> automatically mapped to all-off-then-all-on, allowing much simplified
> configuration of redundant power supplies.
> 
> * Guest nodes can now be included in groups, which simplifies the common
> Pacemaker Remote use case of a grouping a storage device, filesystem and VM.
> 
> * Clone resources have a new clone-min metadata option, specifying that
> a certain number of instances must be running before any dependent
> resources can run. This is particularly useful for services behind a
> virtual IP and haproxy, as is often done with OpenStack.
> 
> As usual, the release includes many bugfixes and minor enhancements. For
> a more detailed list of changes, see the change log:
> 
> https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog
> 
> Feedback is invited and welcome.
> -- 
> Ken Gaillot 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-24 Thread Andrew Beekhof

> On 23 Jan 2016, at 7:52 AM, Jan Pokorný  wrote:
> 
> If they are one-off launchers of some long-running process, you may
> want to use ocf:anything resource (contained in resource-agents package):
> 

ocf:anything is never a good idea
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker license

2016-01-11 Thread Andrew Beekhof

> On 6 Oct 2015, at 9:39 AM, santosh_bidara...@dell.com wrote:
> 
> Dell - Internal Use - Confidential 
> 
> Hello Pacemaker Admins,
>  
> We have a query regarding licensing for pacemaker header files
>  
> As per the given link http://clusterlabs.org/wiki/License, it is mentioned 
> that “Pacemaker programs are licensed under the GPLv2+ (version 2 or later of 
> the GPL) and its headers and libraries are under the less restrictive LGPLv2+ 
> (version 2 or later of the LGPL) .”
>  
> However, website link 
> http://clusterlabs.org/doxygen/pacemaker/2927a0f9f25610c331b6a137c846fec27032c9ea/cib_8h.html,
>  states otherwise. 
> Cib.h header file needed to be included in order to configure pacemaker using 
> C API. But the header file for cib.h states that the header file is under GPL 
> license
> This seems to be conflicting the statement regarding header file license.
>  
> In addition, which similar issue has been discussed in the past 
> http://www.gossamer-threads.com/lists/linuxha/pacemaker/75967, no additional 
> details on the resolution.

I thought that was a pretty clear statement, but you’re correct that the 
licences were not changed.

Does this satisfy?

   https://github.com/beekhof/pacemaker/commit/6de9fde

>  
> Need your inputs on licensing to proceed further.
>  
> Thanks & Regards
> Santosh Bidaralli
>  
>  
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Parser error with fence_ipmilan

2015-09-14 Thread Andrew Beekhof

> On 14 Sep 2015, at 7:48 pm, dan  wrote:
> 
> mån 2015-09-14 klockan 10:02 +0200 skrev dan:
>> Hi
>> 
>> To see if my cluster problem go away with a newer version of pacemaker I
>> have now installed pcemaker 1.1.12+git+a9c8177-3ubuntu1 and I had to get
>> 4.0.19-1 (ubuntu) of fence-agents to get a working fence-ipmilan.
>> 
>> But now when the cluster wants to stonith a node I get:
>> 
>> fence_ipmilan: Parser error: option -n/--plug is not recognize
>> fence_ipmilan: Please use '-h' for usage
>> 
>> Is the problem in fence-agents or in pacemaker?
> 
> Looking at the code producing this, I got it working by adding to my
> cluster config for my stonith devices
> port_as_ip=1 port=192.168.xx.xx
> 
> before I had:
> lanplus=1 ipaddr=192.168.xx.xx
> which worked fine before the new version of pacemaker.
> Now I have:
> lanplus=1 ipaddr=192.168.xx.xx port_as_ip=1 port=192.168.xx.xx
> which works.

I’m glad it works, looks like a regression to me though.
You shouldn’t need to override the value pacemaker supplies for port if ipaddr 
is being set.

Can you comment on this Marek?
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute

2015-09-09 Thread Andrew Beekhof

> On 9 Sep 2015, at 7:45 pm, Kristoffer Grönlund  wrote:
> 
> Hi,
> 
> Ken Gaillot  writes:
> 
>> Pacemaker's upstream master branch has a new feature that will be part
>> of the eventual 1.1.14 release.
>> 
>> Fencing topology is used when a node requires multiple fencing devices
>> (in combination or as fallbacks). Currently, topologies must be
>> specified by node name (or a regular expression matching node names).
>> 
>> The new feature allows topologies to specified by node attribute.
> 
> Sounds like a really useful feature. :) I have implemented initial
> support for this syntax in crmsh,

word of warning, i’m in the process of changing it to avoid overloading the 
‘target’ attribute and exposing quoting issues stemming from people’s use of ‘='

   https://github.com/beekhof/pacemaker/commit/ea4fc1c



> so this will work fine in the next
> version of crmsh.
> 
> Examples of crmsh syntax below:
> 
>> Previously, if node1 was in rack #1, you'd have to register a fencing
>> topology by its name, which at the XML level would look like:
>> 
>>   
>>  >  devices="apc01,apc02"/>
>>   
>> 
> 
> crm cfg fencing-topology node1: apc01,apc02
> 
>> 
>> With the new feature, you could instead register a topology for all
>> hosts that have a node attribute "rack" whose value is "1":
>> 
>>   
>>  >  devices="apc01,apc02"/>
>>   
>> 
> 
> crm cfg fencing-topology rack=1: apc01,apc02
> 
>> 
>> You would assign that attribute to all nodes in that rack, e.g.:
>> 
>>   crm_attribute --type nodes --node node1 --name rack --update 1
>> 
> 
> crm node attr node1 set rack 1
> 
>> 
>> The syntax accepts either '=' or ':' as the separator for the name/value
>> pair, so target="rack:1" would work in the XML as well.
> 
> crm cfg fencing-topology rack:1: apc01,apc02
> 
> (admittedly perhaps not as clean as using '=', but it works)
> 
> Cheers,
> Kristoffer
> 
>> -- 
>> Ken Gaillot 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] SBD & Failed Peer

2015-09-08 Thread Andrew Beekhof

> On 9 Sep 2015, at 12:13 am, Ken Gaillot  wrote:
> 
> On 09/07/2015 07:48 AM, Jorge Fábregas wrote:
>> On 09/07/2015 03:27 AM, Digimer wrote:
>>> And this is why I am nervous; It is always ideal to have a primary fence
>>> method that has a method of confirming the 'off' state. IPMI fencing can
>>> do this, as can hypervisor-based fence methods like fence_virsh and
>>> fence_xvm.
>> 
>> Hi Digimer,
>> 
>> Yes, I thought that confirmation was kind of sacred but now I know it's
>> not always possible.
>> 
>>> I would use IPMI (iLO, DRAC, etc) as the primary fence method and
>>> something else as a secondary, backup method. You can use SBD + watchdog
>>> as the backup method, or as I do, a pair of switched PDUs (I find APC
>>> brand to be very fast in fencing).
>> 
>> This sounds great.  Is there a way to specify a primary & secondary
>> fencing device?  I haven't seen a way to specify such hierarchy in
>> pacemaker.
> 
> Good news/bad news:
> 
> Yes, pacemaker supports complex hierarchies of multiple fencing devices,
> which it calls "fencing topology". There is a small example at
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_advanced_stonith_configurations
> 
> Unfortunately, sbd is not supported in fencing topologies.

Another way to look at it, is that sbd is only supported in fencing topologies 
- just not explicit ones.
Self-termination is always the least best option, so we’ll only use it if all 
other options (including topologies) are exhausted.
But we’ll do so automatically.

> Pacemaker
> hooks into sbd via dedicated internal logic, not a conventional fence
> agent, so it's treated differently. You might want to open an RFE bug
> either upstream or with your OS vendor if you want to put it on the
> radar, but sbd isn't entirely under Pacemaker's control, so I'm not sure
> how feasible it would be.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Andrew Beekhof

 On 27 Aug 2015, at 4:11 am, Laurent B. laure...@qmail.re wrote:
 
 Gents,
 
 I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well
 except the fencing. The cluster's node are not connected to the
 management lan (where stand all the iLO/UPS/APC devices) and it's not
 planned to connecting them to this lan.
 
 With these constraints, I figured out that a way to get fencing working
 is to use *fence_sanlock*. I followed this tutorial:
 https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some
 problem with SELinux that I finally disabled like specified in the
 following RHEL 6.5 release note:
 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/
 )
 
 So perfect. The problem is that fence_sanlock relies on cman and not
 pacemaker. So with stonith disabled, pacemaker restarts the resources
 without waiting for the victim to be fenced and with stonith enabled,
 pacemaker complains about the lack of stonith resources and block all
 the cluster.
 I tried to put fence_sanlock as a stonith resource at the pacemaker
 level but as explained there
 http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it
 does not work and as explained there
 https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to
 make it work.
 
 My question: is there a workaround ?

You’d have to build it yourself, but sbd could be an option

 
 Thank you,
 
 Laurent
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] Resource Agent language discussion

2015-08-19 Thread Andrew Beekhof

 On 19 Aug 2015, at 6:59 pm, Jehan-Guillaume de Rorthais j...@dalibo.com 
 wrote:
 
 On Mon, 17 Aug 2015 09:42:35 +1000
 Andrew Beekhof and...@beekhof.net wrote:
 
 On 11 Aug 2015, at 5:34 pm, Jehan-Guillaume de Rorthais j...@dalibo.com
 wrote:
 
 On Tue, 11 Aug 2015 11:30:03 +1000
 Andrew Beekhof and...@beekhof.net wrote:
 [...]
 You can and should use whatever language you like for your own private RAs.
 But if you want it accepted and maintained by the resource-agents project,
 you would be advised to use the language they have standardised on.
 
 Well, let's imagine our RA was written in bash (in fact, we have a bash
 version pretty close to the current perl version we abandoned). I wonder if
 it would be accepted in the resource-agents project anyway as another one
 already exists there. I can easily list the reasons we rewrote a new one,
 but this is not the subject here.
 
 The discussion here is more about the language, if I should extract a
 ocf-perl-module from my RA and if there is any chance the resource-agents
 project would accept it.
 
 Well, it depends on the reasons you didn’t list :-)
 
 Ok, let's answer the questions then :)
 
 The first questions any maintainer is going to ask are:
 - why did you write a new one?
 
 About the existing pgsql RA:
  * it supports stateless AND multistate pgsql resource. It makes the code
bigger, more complexe, hard to follow and understand
  * some params are for multistate usage only, some other for stateless only,
some for both, making the configuration harder to understand
  * some params are required for multistate where a recent PostgreSQL can live
without them (restore_command)
  * it achieves operations a RA should not take care of (switching from
synchronous to asynchronous replication on the master if slaves are gone,
killing all existing xact)
  * ...and this makes the code even bigger and complexe again
  * it supports too many options and has some conventions the DBA should care
themselves. This make it way too complex and touchy to setup and maintain
  * it does not support demote, making the code lying about the real
state of the resource to Pacemaker. This was because demote/switchover was
unsafe for postgresql  9.3.
 
 What we tried to achieve with a new pgsql RA:
  * multistate only (we already have a stateless RA, in bash)
  * should have a simple code: easier to understand, to maintain, achieve one
goal at a time
  * be simple to setup
  * should not substitute itself to the DBA
  * support safe (cold) demote/switchover
 
 - can we merge this with the old one?
 
 Well, it would make the code even bigger, maybe conflicting and harder to
 understand. I already can hear questions about such a frankenstein RA  (why 
 am
 I able to setup two different multistate architecture? why this one does not
 supports this parameter? should I create my recovery.conf or not?)
 
 Some of our ideas could be merged to the old one though, we could discuss and
 help maintainers if they are interested and have time. But we only have a
 limited RD time and have no time to lead such a development.
 
 - can the new one replace the old one? (ie. full superset)
 
 No. It does not support stateless resource, does not mess with replication
 synchronism, does not kill queries if all the slaves are gone, does not lock
 an instance when it failed, only promote the resource using pg_ctl
 promote (with no restart), ...
 
 Because if both are included, then they will forevermore be answering the
 question “which one should I use?”.
 
 True.
 
 Basically, if you want it accepted upstream, then yes, you probably want to
 ditch the perl bit. But not having seen the agent or knowing why it exists,
 its hard to say.
 
 Well, it seems our RA will not make it to the upstream repository,

You made a fairly reasonable argument for separate stateless and stateful 
variants.

 but this is
 not a drama from my PoV, the discussion is not about that. As I wrote earlier:
 «
 The discussion here is more about the language, if I should extract a
 ocf-perl-module from my RA and if there is any chance the resource-agents
 project would accept it
 »
 
 What I was discussing here was:
 
  * if not using bash, is there any trap we should avoid that are already
addressed in the ocf-shellfuncs library?

No, you just might have to re-implement some things.
Particularly logging.

  * is there a chance a perl version of such library would be accepted 
 upstream?

Depends if you’re volunteering to maintain it too :)

 
 Note that in the Pacemaker galaxy, fencing agent are written in python, perl,
 C, ...
 
 Regards,


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] upgrade from 1.1.9 to 1.1.12 fails to start

2015-08-19 Thread Andrew Beekhof

 On 19 Aug 2015, at 12:15 am, Streeter, Michelle N 
 michelle.n.stree...@boeing.com wrote:
 
 I created a whole new virtual and installed everything with the new version 
 and pacemaker wouldn’t start.
 I have not yet learned how to use the logs yet to see what they have to say.
 No, I did not upgrade corosync.  I am running the latest which will work with 
 rhel6. 
 When I tried later versions, they failed and I was told it was because we are 
 not running rhel7.
  
 I am getting the feeling this version of Pacemaker does not work on rhel6 
 either.  Do you believe this is the case?

Absolutely not. Red Hat fully supports Pacemaker 1.1.12 on RHEL6. 

 Or is there some configuration that needs to be done between 1.1.9 and 1.1.12?
  
 Michelle Streeter
 ASC2 MCS – SDE/ACL/SDL/EDL OKC Software Engineer
 The Boeing Company 
  
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] upgrade from 1.1.9 to 1.1.12 fails to start

2015-08-17 Thread Andrew Beekhof

 On 18 Aug 2015, at 7:13 am, Streeter, Michelle N 
 michelle.n.stree...@boeing.com wrote:
 
 I was recommended to upgrade from 1.1.9 to 1.1.12.  
 I had to uninstall the 1.1.9 version to install the 1.1.12 version

Did you upgrade anything else? cman? corosync? heartbeat? What distro? Logs? 
Stack trace? Where did the packages come from?

 I am not allowed to connect to a repo and so I have to download the rpms and 
 install them individually.
 After I installed pacemaker-lib, cli, cluster-lib, and pacemaker itself, when 
 I rebooted, the cluster failed to start
 When I tried to manually start it, I got
 Starting Pacemaker Cluster Manager/etc/init.d/pacemaker: line 94:  8219 
 Segmentation fault  (core dumped) $prog  /dev/null 21
 I deleted the Cluster.conf file and the cib.xml and all the back up versions 
 and tried again and got the same error.
 I googled this error and really got nothing.   Any ideas?

Not based on what you’ve told us.

  
 Michelle Streeter
 ASC2 MCS – SDE/ACL/SDL/EDL OKC Software Engineer
 The Boeing Company 
  
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Ordering constraint restart second resource group

2015-08-17 Thread Andrew Beekhof

 On 17 Aug 2015, at 1:30 pm, Andrei Borzenkov arvidj...@gmail.com wrote:
 
 17.08.2015 02:26, Andrew Beekhof пишет:
 
 On 13 Aug 2015, at 7:33 pm, Andrei Borzenkov arvidj...@gmail.com wrote:
 
 On Thu, Aug 13, 2015 at 11:25 AM, Ulrich Windl
 ulrich.wi...@rz.uni-regensburg.de wrote:
 And what exactly is your problem?
 
 Real life example. Database resource depends on storage resource(s).
 There are multiple filesystems/volumes with database files. Database
 admin needs to increase available space. You add new storage,
 configure it in cluster ... pooh, your database is restarted.
 
 “configure it in cluster” hmmm
 
 if you’re expanding an existing mount point, then I’d expect you don’t need 
 to update the cluster.
 if you’re creating a new mount point, wouldn’t you need to take the db down 
 in order to point to the new location?
 
 
 No. Those database I worked with can use multiple storage locations at the 
 same time and those storage locations can be added (and removed) online.

Nice.  In that case, you could try adding it as a resource but waiting until it 
is active before creating the ordering constraint.

 
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Question:pacemaker_remote] By the operation that remote node cannot carry out a cluster, the resource does not move. (STONITH is not carried out, too)

2015-08-17 Thread Andrew Beekhof
Should be fixed now. Thanks for the report!

 On 12 Aug 2015, at 1:20 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi All,
 
 We confirmed movement of 
 pacemaker_remote.(version:pacemaker-ad1f397a8228a63949f86c96597da5cecc3ed977)
 
 It is the following cluster constitution.
  * bl460g8n3(KVM host)
  * bl460g8n4(KVM host)
  * pgsr01(Guest on the bl460g8n3 host)
  * pgsr02(Guest on the bl460g8n4 host)
 
 
 Step 1) I compose a cluster of a simple resource.
 
 [root@bl460g8n3 ~]# crm_mon -1 -Af
 Last updated: Wed Aug 12 11:52:27 2015  Last change: Wed Aug 12 
 11:51:47 2015 by root via crm_resource on bl460g8n4
 Stack: corosync
 Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
 4 nodes and 10 resources configured
 
 Online: [ bl460g8n3 bl460g8n4 ]
 GuestOnline: [ pgsr01@bl460g8n3 pgsr02@bl460g8n4 ]
 
  prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
  prmDB2 (ocf::heartbeat:VirtualDomain): Started bl460g8n4
  Resource Group: grpStonith1
  prmStonith1-2  (stonith:external/ipmi):Started bl460g8n4
  Resource Group: grpStonith2
  prmStonith2-2  (stonith:external/ipmi):Started bl460g8n3
  Resource Group: master-group
  vip-master (ocf::heartbeat:Dummy): Started pgsr02
  vip-rep(ocf::heartbeat:Dummy): Started pgsr02
  Master/Slave Set: msPostgresql [pgsql]
  Masters: [ pgsr02 ]
  Slaves: [ pgsr01 ]
 
 Node Attributes:
 * Node bl460g8n3:
 * Node bl460g8n4:
 * Node pgsr01@bl460g8n3:
 + master-pgsql  : 5 
 * Node pgsr02@bl460g8n4:
 + master-pgsql  : 10
 
 Migration Summary:
 * Node bl460g8n4:
 * Node bl460g8n3:
 * Node pgsr02@bl460g8n4:
 * Node pgsr01@bl460g8n3:
 
 
 Step 2) I cause trouble of pacemaker_remote in pgsr02.
 
 [root@pgsr02 ~]# ps -ef |grep remote
 root  1171 1  0 11:52 ?00:00:00 /usr/sbin/pacemaker_remoted
 root  1428  1377  0 11:53 pts/000:00:00 grep --color=auto remote
 [root@pgsr02 ~]# kill -9 1171
 
 
 Step 3) After trouble, the master-group resource does not start in pgsr01.
 
 [root@bl460g8n3 ~]# crm_mon -1 -Af
 Last updated: Wed Aug 12 11:54:04 2015  Last change: Wed Aug 12 
 11:51:47 2015 by root via crm_resource on bl460g8n4
 Stack: corosync
 Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
 4 nodes and 10 resources configured
 
 Online: [ bl460g8n3 bl460g8n4 ]
 GuestOnline: [ pgsr01@bl460g8n3 ]
 
  prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
  prmDB2 (ocf::heartbeat:VirtualDomain): FAILED bl460g8n4
  Resource Group: grpStonith1
  prmStonith1-2  (stonith:external/ipmi):Started bl460g8n4
  Resource Group: grpStonith2
  prmStonith2-2  (stonith:external/ipmi):Started bl460g8n3
  Master/Slave Set: msPostgresql [pgsql]
  Masters: [ pgsr01 ]
 
 Node Attributes:
 * Node bl460g8n3:
 * Node bl460g8n4:
 * Node pgsr01@bl460g8n3:
 + master-pgsql  : 10
 
 Migration Summary:
 * Node bl460g8n4:
pgsr02: migration-threshold=1 fail-count=1 last-failure='Wed Aug 12 
 11:53:39 2015'
 * Node bl460g8n3:
 * Node pgsr01@bl460g8n3:
 
 Failed Actions:
 * pgsr02_monitor_3 on bl460g8n4 'unknown error' (1): call=2, 
 status=Error, exitreason='none',
 last-rc-change='Wed Aug 12 11:53:39 2015', queued=0ms, exec=0ms
 
 
 It seems to be caused by the fact that STONITH is not carried out somehow or 
 other.
 The demote operation that a cluster cannot handle seems to obstruct start in 
 pgsr01.
 --
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: Graph 10 with 20 actions: 
 batch-limit=20 jobs, network-delay=0ms
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action4]: Pending rsc op 
 prmDB2_stop_0   on bl460g8n4 (priority: 0, waiting:  70)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   36]: Completed pseudo 
 op master-group_stop_0on N/A (priority: 0, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   34]: Completed pseudo 
 op master-group_start_0   on N/A (priority: 0, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   82]: Completed rsc op 
 pgsql_post_notify_demote_0on pgsr01 (priority: 100, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   81]: Completed rsc op 
 pgsql_pre_notify_demote_0 on pgsr01 (priority: 0, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   78]: Completed rsc op 
 pgsql_post_notify_stop_0  on pgsr01 (priority: 100, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   77]: Completed rsc op 
 pgsql_pre_notify_stop_0   on pgsr01 (priority: 0, waiting: none)
 Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   67]: Completed pseudo 
 op msPostgresql_confirmed-post_notify_demoted_0 on N/A (priority: 100, 
 waiting: none)
 Aug 12 

Re: [ClusterLabs] [Question:pacemaker_remote] About limitation of the placement of the resource to remote node.

2015-08-17 Thread Andrew Beekhof

 On 13 Aug 2015, at 10:23 am, renayama19661...@ybb.ne.jp wrote:
 
 Hi All,
 
 We confirmed movement of 
 pacemaker_remote.(version:pacemaker-ad1f397a8228a63949f86c96597da5cecc3ed977)
 
 It is the following cluster constitution.
  * sl7-01(KVM host)
  * snmp1(Guest on the sl7-01 host)
  * snmp2(Guest on the sl7-01 host)
 
 We prepared for the next CLI file to confirm the resource placement to remote 
 node.
 
 --
 property no-quorum-policy=ignore \
   stonith-enabled=false \
   startup-fencing=false
 
 rsc_defaults resource-stickiness=INFINITY \
   migration-threshold=1
 
 primitive remote-vm2 ocf:pacemaker:remote \
   params server=snmp1 \
   op monitor interval=3 timeout=15
 
 primitive remote-vm3 ocf:pacemaker:remote \
   params server=snmp2 \
   op monitor interval=3 timeout=15
 
 primitive dummy-remote-A Dummy \
   op start interval=0s timeout=60s \
   op monitor interval=30s timeout=60s \
   op stop interval=0s timeout=60s
 
 primitive dummy-remote-B Dummy \
   op start interval=0s timeout=60s \
   op monitor interval=30s timeout=60s \
   op stop interval=0s timeout=60s
 
 location loc1 dummy-remote-A \
   rule 200: #uname eq remote-vm3 \
   rule 100: #uname eq remote-vm2 \
   rule -inf: #uname eq sl7-01
 location loc2 dummy-remote-B \
   rule 200: #uname eq remote-vm3 \
   rule 100: #uname eq remote-vm2 \
   rule -inf: #uname eq sl7-01
 --
 
 Case 1) The resource is placed as follows when I spend the CLI file which we 
 prepared for.
  However, the placement of the dummy-remote resource does not meet a 
 condition.
  dummy-remote-A starts in remote-vm2.
 
 [root@sl7-01 ~]# crm_mon -1 -Af
 Last updated: Thu Aug 13 08:49:09 2015  Last change: Thu Aug 13 
 08:41:14 2015 by root via cibadmin on sl7-01
 Stack: corosync
 Current DC: sl7-01 (version 1.1.13-ad1f397) - partition WITHOUT quorum
 3 nodes and 4 resources configured
 
 Online: [ sl7-01 ]
 RemoteOnline: [ remote-vm2 remote-vm3 ]
 
  dummy-remote-A (ocf::heartbeat:Dummy): Started remote-vm2
  dummy-remote-B (ocf::heartbeat:Dummy): Started remote-vm3
  remote-vm2 (ocf::pacemaker:remote):Started sl7-01
  remote-vm3 (ocf::pacemaker:remote):Started sl7-01

It is possible that there was a time when only remote-vm2 was available (so we 
put dummy-remote-A there) and then before we could start dummy-remote-B there 
too, remote-vm3 showed up but due to resource-stickiness=“INFINITY”, we didn’t 
move dummy-remote-A.

 
 (snip)
 
 Case 2) When we change CLI file of it and spend it,

You lost me here :-)
Can you rephrase please?

 the resource is placed as follows.
  The resource is placed definitely.
  dummy-remote-A starts in remote-vm3.
  dummy-remote-B starts in remote-vm3.
 
 
 (snip)
 location loc1 dummy-remote-A \
   rule 200: #uname eq remote-vm3 \
   rule 100: #uname eq remote-vm2 \
   rule -inf: #uname ne remote-vm2 and #uname ne remote-vm3 \
   rule -inf: #uname eq sl7-01
 location loc2 dummy-remote-B \
   rule 200: #uname eq remote-vm3 \
   rule 100: #uname eq remote-vm2 \
   rule -inf: #uname ne remote-vm2 and #uname ne remote-vm3 \
   rule -inf: #uname eq sl7-01
 (snip)
 
 
 [root@sl7-01 ~]# crm_mon -1 -Af
 Last updated: Thu Aug 13 08:55:28 2015  Last change: Thu Aug 13 
 08:55:22 2015 by root via cibadmin on sl7-01
 Stack: corosync
 Current DC: sl7-01 (version 1.1.13-ad1f397) - partition WITHOUT quorum
 3 nodes and 4 resources configured
 
 Online: [ sl7-01 ]
 RemoteOnline: [ remote-vm2 remote-vm3 ]
 
  dummy-remote-A (ocf::heartbeat:Dummy): Started remote-vm3
  dummy-remote-B (ocf::heartbeat:Dummy): Started remote-vm3
  remote-vm2 (ocf::pacemaker:remote):Started sl7-01
  remote-vm3 (ocf::pacemaker:remote):Started sl7-01
 
 (snip)
 
 As for the placement of the resource being wrong with the first CLI file, the 
 placement limitation of the remote node is like remote resource not being 
 evaluated until it is done start.
 
 The placement becomes right with the CLI file which I revised, but the 
 description of this limitation is very troublesome when I compose a cluster 
 of more nodes.
 
 Does remote node not need processing delaying placement limitation until it 
 is done start?

Potentially.  I’d need a crm_report to confirm though.

 
 Is there a method to easily describe the limitation of the resource to remote 
 node?
 
  * As one means, we know that the placement of the resource goes well by 
 dividing the first CLI file into two.
* After a cluster sent CLI which remote node starts, I send CLI where a 
 cluster starts a resource.
  * However, we do not want to divide CLI file into two if possible.
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 

Re: [ClusterLabs] Antw: Re: Memory leak in crm_mon ?

2015-08-17 Thread Andrew Beekhof

 On 17 Aug 2015, at 4:35 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:
 
 Andrew Beekhof and...@beekhof.net schrieb am 17.08.2015 um 00:08 in
 Nachricht
 ff78be4f-173c-4a74-a989-92ea6c540...@beekhof.net:
 
 On 16 Aug 2015, at 9:41 pm, Attila Megyeri amegy...@minerva-soft.com
 wrote:
 
 Hi Andrew,
 
 I managed to isolate / reproduce the issue. You might want to take a look,
 
 as it might be present in 1.1.12 as well.
 
 I monitor my cluster from putty, mainly this way:
 - I have a putty (Windows client) session, that connects via SSH to the
 box, 
 authenticates using public key as a non-root user.
 - It immediately sends a sudo crm_mon -Af command, so with a single click
 
 I have a nice view of what the cluster is doing.
 
 Perhaps add -1 to the option list.
 The root cause seems to be that closing the putty window doesn’t actually
 
 kill the process running inside it.
 
 Sorry, the root cause seems to be that cm_mon happily writes to a closed
 filehandle (I guess). If crm_mon would handle that error by exiting the loop,
 ther would be no need for putty  to kill any process.

No, if you want a process to die you need to kill it.

 
 
 
 Whenever I close this putty window (terminate the app), crm_mon process
 gets 
 to 100% cpu usage, starts to leak, in a few hours consumes all memory and 
 then destroys the whole cluster.
 This does not happen if I leave crm_mon with Ctrl-C.
 
 I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu 
 trusty packages.
 This might be related on how sudo executes crm_mon, and what it signalls to
 
 crm_mon when it gets terminated.
 
 Now I know what I need to pay attention to in order to avoid this problem,
 
 but you might want to check whether this issue is still present.
 
 
 Thanks,
 Attila 
 
 
 
 
 
 
 -Original Message-
 From: Attila Megyeri [mailto:amegy...@minerva-soft.com] 
 Sent: Friday, August 14, 2015 12:40 AM
 To: Cluster Labs - All topics related to open-source clustering welcomed 
 users@clusterlabs.org
 Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
 
 
 
 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net] 
 Sent: Tuesday, August 11, 2015 2:49 AM
 To: Cluster Labs - All topics related to open-source clustering welcomed 
 users@clusterlabs.org
 Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
 
 
 On 10 Aug 2015, at 5:33 pm, Attila Megyeri amegy...@minerva-soft.com
 wrote:
 
 Hi!
 
 We are building a new cluster on top of pacemaker/corosync and several
 times 
 during the past days we noticed that „crm_mon -Af” used up all the 
 memory+swap and caused high CPU usage. Killing the process solves the
 issue.
 
 We are using the binary package versions available in the latest ubuntu 
 trusty, namely:
 
 crmsh 
 1.2.5+hg1034-1ubuntu4 
 
 pacemaker
 1.1.10+git20130802-1ubuntu2.3  
 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 
 
 corosync 2.3.3-1ubuntu1   
 
 Kernel is 3.13.0-46-generic
 
 Looking back some „atop” data, the CPU went to 100% many times during
 the 
 last couple of days, at various times, more often around midnight exaclty 
 (strange).
 
 08.05 14:00
 08.06 21:41
 08.07 00:00
 08.07 00:00
 08.08 00:00
 08.09 06:27
 
 Checked the corosync log and syslog, but did not find any correlation 
 between the entries int he logs around the specific times.
 For most of the time, the node running the crm_mon was the DC as well –
 not 
 running any resources (e.g. a pairless node for quorum).
 
 
 We have another running system, where everything works perfecly, whereas
 it 
 is almost the same:
 
 crmsh 
 1.2.5+hg1034-1ubuntu4 
 
 pacemaker
 1.1.10+git20130802-1ubuntu2.1 
 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
 corosync 2.3.3-1ubuntu1  
 
 Kernel is 3.13.0-8-generic
 
 
 Is this perhaps a known issue?
 
 Possibly, that version is over 2 years old.
 
 Any hints?
 
 Getting something a little more recent would be the best place to start
 
 Thanks Andew,
 
 I tried to upgrade to 1.1.12 using the packages availabe at 
 https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a
 
 single node, to see how it works out but I ended up with errors like
 
 Could not establish cib_rw connection: Connection refused (111)
 
 I have disabled the firewall, no changes. The node appears to be running
 but 
 does not see any of the other nodes. On the other nodes I see this node as
 an 
 UNCLEAN one. (I assume corosync is fine, but pacemaker not)
 I use udpu for the transport.
 
 Am I doing something wrong? I tried to look for some howtos on upgrade

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: pacemaker doesn't correctly handle a resource after time/date change

2015-08-16 Thread Andrew Beekhof

 On 8 Aug 2015, at 12:43 am, Kostiantyn Ponomarenko 
 konstantin.ponomare...@gmail.com wrote:
 
 Hi Andrew,
 
 So the issue is:
 
 Having one node up and running, set time on the node backward to, say, 15 min 
 (generally more than 10 min), then do stop for a resource.
 That leads to the next - the cluster fails the resource once, then shows it 
 as started, but the resource actually remains stopped.
 
 Do you need more input from me on the issue?

I think “why” :)

I’m struggling to imagine why this would need to happen.

 
 Thank you,
 Kostya
 
 On Wed, Aug 5, 2015 at 3:01 AM, Andrew Beekhof and...@beekhof.net wrote:
 
  On 4 Aug 2015, at 7:31 pm, Kostiantyn Ponomarenko 
  konstantin.ponomare...@gmail.com wrote:
 
 
  On Tue, Aug 4, 2015 at 3:57 AM, Andrew Beekhof and...@beekhof.net wrote:
  Github might be another.
 
  I am not able to open an issue/bug here 
  https://github.com/ClusterLabs/pacemaker
 
 Oh, for pacemaker bugs see http://clusterlabs.org/help.html
 Can someone clearly state what the issue is?  The thread was quite fractured 
 and hard to follow.
 
 
  Thank you,
  Kostya
  ___
  Users mailing list: Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] systemd: xxxx.service start request repeated too quickly

2015-08-16 Thread Andrew Beekhof

 On 6 Aug 2015, at 11:59 pm, Juha Heinanen j...@tutpro.com wrote:
 
 Ken Gaillot writes:
 
 Also, I want to add some delay to the restart attempts so that systemd
 does not complain about too quick restarts.
 
 This is outside of pacemaker control. Service respawning too rapidly
 means systemd itself attempts to restart it. You need to modify
 service definition in systemd to either disable restart on failure
 completely and let pacemaker manage it or at least add delay before
 restarts. See man systemd.service, specifically RestartSec and Restart
 parameters.
 
 The service in question only has old an style init.d file inherited from
 Debian Wheezy and I don't have in it any restart definition.  Based on
 systemd.service man page, Restart value defaults to 'no'.  So I'm not
 sure if it is systemd that is automatically restarting the service too
 rapidly.


Well its systemd thats printing the message, so its involved somehow.
What does the resource definition for that resource look like in pacemaker?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] Resource Agent language discussion

2015-08-16 Thread Andrew Beekhof

 On 11 Aug 2015, at 5:34 pm, Jehan-Guillaume de Rorthais j...@dalibo.com 
 wrote:
 
 On Tue, 11 Aug 2015 11:30:03 +1000
 Andrew Beekhof and...@beekhof.net wrote:
 
 
 On 8 Aug 2015, at 1:14 am, Jehan-Guillaume de Rorthais j...@dalibo.com
 wrote:
 
 Hi Jan,
 
 On Fri, 7 Aug 2015 15:36:57 +0200
 Jan Pokorný jpoko...@redhat.com wrote:
 
 On 07/08/15 12:09 +0200, Jehan-Guillaume de Rorthais wrote:
 Now, I would like to discuss about the language used to write a RA in
 Pacemaker. I never seen discussion or page about this so far.
 
 it wasn't in such a heretic :) tone, but I tried to show that it
 is extremely hard (if not impossible in some instances) to write
 bullet-proof code in bash (or POSIX shell, for that matter) because
 it's so cumbersome to move from whitespace-delimited words as
 a single argument and words as standalone arguments back and forth,
 connected with quotation-desired/-counterproductive madness
 (what if one wants to indeed pass quotation marks as legitimate
 characters within the passed value, etc.) few months back:
 
 http://clusterlabs.org/pipermail/users/2015-May/000403.html
 (even on developers list, but with fewer replies and broken threading:
 http://clusterlabs.org/pipermail/developers/2015-May/23.html).
 
 Thanks for the links and history. You add some more argument to my points :)
 
 HINT: I don't want to discuss (neither troll about) what is the best
 language. I would like to know why **ALL** the RA are written in
 bash
 
 I would expect the original influence were the init scripts (as RAs
 are mostly just enriched variants to support more flexible
 configuration and better diagnostics back to the cluster stack),
 which in turn were born having simplicity and ease of debugging
 (maintainability) in mind.
 
 That sounds legitimate. And bash is still appropriate for some simple RA.
 
 But for the same ease of code debugging and maintainability arguments (and
 many others), complexe RA shouldn't use shell as language.
 
 You can and should use whatever language you like for your own private RAs.
 But if you want it accepted and maintained by the resource-agents project,
 you would be advised to use the language they have standardised on.
 
 Well, let's imagine our RA was written in bash (in fact, we have a bash 
 version
 pretty close to the current perl version we abandoned). I wonder if it would 
 be
 accepted in the resource-agents project anyway as another one already exists
 there. I can easily list the reasons we rewrote a new one, but this is not the
 subject here.
 
 The discussion here is more about the language, if I should extract a
 ocf-perl-module from my RA and if there is any chance the resource-agents
 project would accept it.

Well, it depends on the reasons you didn’t list :-)

The first questions any maintainer is going to ask are:
- why did you write a new one?
- can we merge this with the old one?
- can the new one replace the old one? (ie. full superset)

Because if both are included, then they will forevermore be answering the 
question “which one should I use?”.

Basically, if you want it accepted upstream, then yes, you probably want to 
ditch the perl bit.
But not having seen the agent or knowing why it exists, its hard to say.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.

2015-08-16 Thread Andrew Beekhof

 On 4 Aug 2015, at 7:36 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi Andrew,
 
 Thank you for comments.
 
 However, a trap of crm_mon is sent to an SNMP manager.
  
 Are you using the built-in SNMP logic or using -E to give crm_mon a script 
 which 
 is then producing the trap?
 (I’m trying to figure out who could be turning the monitor action into a 
 start)
 
 
 I used the built-in SNMP.
 I started as a daemon with -d option.

Is it running on both nodes or just snmp1?
Because there is no logic in crm_mon that would have remapped the monitor to 
start, so my working theory is that its a duplicate of an old event.
Can you tell which node the trap is being sent from?

 
 
 Best Regards,
 Hideo Yamauchi.
 
 
 - Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to 
 open-source clustering welcomed users@clusterlabs.org
 Cc: 
 Date: 2015/8/4, Tue 14:15
 Subject: Re: [ClusterLabs] [Problem] The SNMP trap which has been already 
 started is transmitted.
 
 
 On 27 Jul 2015, at 4:18 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi All,
 
 The transmission of the SNMP trap of crm_mon seems to have a problem.
 I identified a problem on latest Pacemaker and Pacemaker1.1.13.
 
 
 Step 1) I constitute a cluster and send simple CLI file.
 
 [root@snmp1 ~]# crm_mon -1 
 Last updated: Mon Jul 27 14:40:37 2015  Last change: Mon Jul 27 
 14:40:29 2015 by root via cibadmin on snmp1
 Stack: corosync
 Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
 2 nodes and 1 resource configured
 
 Online: [ snmp1 snmp2 ]
 
   prmDummy   (ocf::heartbeat:Dummy): Started snmp1
 
 Step 2) I stop a node of the standby once.
 
 [root@snmp2 ~]# stop pacemaker
 pacemaker stop/waiting
 
 
 Step 3) I start a node of the standby again.
 [root@snmp2 ~]# start pacemaker
 pacemaker start/running, process 2284
 
 Step 4) The indication of crm_mon does not change in particular.
 [root@snmp1 ~]# crm_mon -1
 Last updated: Mon Jul 27 14:45:12 2015  Last change: Mon Jul 27 
 14:40:29 2015 by root via cibadmin on snmp1
 Stack: corosync
 Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
 2 nodes and 1 resource configured
 
 Online: [ snmp1 snmp2 ]
 
   prmDummy   (ocf::heartbeat:Dummy): Started snmp1
 
 
 In addition, as for the resource that started in snmp1 node, nothing 
 changes.
 
 ---
 Jul 27 14:41:39 snmp1 crmd[29116]:   notice: State transition S_IDLE - 
 S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
 origin=abort_transition_graph ]
 Jul 27 14:41:39 snmp1 cib[29111]: info: Completed cib_modify operation 
 for section status: OK (rc=0, origin=snmp1/attrd/11, version=0.4.20)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for probe_complete: 
 OK (0)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for 
 probe_complete[snmp1]=true: OK (0)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for 
 probe_complete[snmp2]=true: OK (0)
 Jul 27 14:41:39 snmp1 cib[29202]: info: Wrote version 0.4.0 of the CIB 
 to disk (digest: a1f1920279fe0b1466a79cab09fa77d6)
 Jul 27 14:41:39 snmp1 pengine[29115]:   notice: On loss of CCM Quorum: 
 Ignore
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp2 is online
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp1 is online
 Jul 27 14:41:39 snmp1 pengine[29115]: info: 
 prmDummy#011(ocf::heartbeat:Dummy):#011Started snmp1
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Leave  
 prmDummy#011(Started snmp1)
 ---
 
 However, a trap of crm_mon is sent to an SNMP manager.
 
 Are you using the built-in SNMP logic or using -E to give crm_mon a script 
 which 
 is then producing the trap?
 (I’m trying to figure out who could be turning the monitor action into a 
 start)
 
 The resource does not reboot, but the SNMP trap which a resource started is 
 sent.
 
 ---
 Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1 
 [UDP: 
 [192.168.40.100]:35265-[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
  
 = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 
 = 
 OID: 
 PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource
  
 = STRING: prmDummy#011PACEMAKER-MIB::pacemakerNotificationNode = 
 STRING: snmp1#011PACEMAKER-MIB::pacemakerNotificationOperation = 
 STRING: start#011PACEMAKER-MIB::pacemakerNotificationDescription = 
 STRING: OK#011PACEMAKER-MIB::pacemakerNotificationReturnCode = 
 INTEGER: 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = 
 INTEGER: 
 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
 Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1 
 [UDP: 
 [192.168.40.100]:35265-[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
  
 = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 
 = 
 OID: 
 PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB

Re: [ClusterLabs] circumstances under which resources become unmanaged

2015-08-16 Thread Andrew Beekhof

 On 13 Aug 2015, at 2:27 pm, N, Ravikiran ravikira...@hp.com wrote:
 
 Thanks for reply Andrei. What happens to the resources added with a 
 COLOCATION or an ORDER constraint with this resource (unmanaged FAILED 
 resource).. ? will the constraint be removed.. ?

the resource is considered stopped for the purposes of colocation and ordering

 
 Also please point me to any resource to understand this in detail.
 
 Regards
 Ravikiran
 
 -Original Message-
 From: Andrei Borzenkov [mailto:arvidj...@gmail.com] 
 Sent: Thursday, August 13, 2015 9:33 AM
 To: Cluster Labs - All topics related to open-source clustering welcomed
 Subject: Re: [ClusterLabs] circumstances under which resources become 
 unmanaged
 
 
 
 On 12.08.2015 20:46, N, Ravikiran wrote:
 Hi All,
 
 I have a resource added to pacemaker called 'cmsd' whose state is getting to 
 'unmanaged FAILED' state.
 
 Apart from manually changing the resource to unmanaged using pcs resource 
 unmanage cmsd , I'm trying to understand under what all circumstances a 
 resource can become unmanaged.. ?
 I have not set any value for multilple-active field, which means by 
 default it is set to stop-start, and hence I believe the resource can 
 never go to unmanaged if it finds the resource active on more than one node.
 
 
 unmanaged FAILED means pacemaker (or better resource agent) failed to stop 
 resource. At this point resource state is undefined so pacemaker won't do 
 anything with it.
 
 Also, it would be more helpful if anyone can point out to specific sections 
 of the pacemaker manuals for the answer.
 
 Regards,
 Ravikiran
 
 
 
 
 
 ___
 Users mailing list: Users@clusterlabs.org 
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Users mailing list: Users@clusterlabs.org 
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Phantom Node

2015-08-16 Thread Andrew Beekhof

 On 14 Aug 2015, at 7:53 am, Allan Brand allan.br...@gmail.com wrote:
 
 I can't seem to track this down and am hoping someone has seen this or can 
 tell me what's happening.

Try this:

- shut down the cluster
- remove the stray node entry from the cib (/var/lib/pacemaker/cib/cib.xml)
- delete the .sig file (/var/lib/pacemaker/cib/cib.xml.sig)
- clear the logs
- start the cluster

if you see the node come back, send us the logs and we should be able to 
determine where its coming from :)

possibility… does uname -n return node01 or node01.private ? same for node02?

 
 I have a 2 node test cluster, node01.private and node02.private.
 
 [root@node01 ~]# cat /etc/hosts
 127.0.0.1   localhost
 ::1 localhost
 
 192.168.168.9   node01.private
 192.168.168.10  node02.private
 192.168.168.14  cluster.private
 
 The issue is when I run 'pcs status' it shows both nodes online but a 3rd 
 node, node01, to be offline:
 
 [root@node01 ~]# pcs status
 Cluster name: cluster.private
 Last updated: Thu Aug 13 16:41:54 2015
 Last change: Wed Aug 12 18:23:22 2015
 Stack: cman
 Current DC: node01.private - partition with quorum
 Version: 1.1.11-97629de
 3 Nodes configured
 1 Resources configured
 
 
 Online: [ node01.private node02.private ]
 OFFLINE: [ node01 ]
 
 Full list of resources:
 
  privateIP  (ocf::heartbeat:IPaddr2):   Started node01.private
 
 [root@node01 ~]#
 [root@node01 ~]# pcs config
 Cluster Name: cluster.private
 Corosync Nodes:
  node01.private node02.private
 Pacemaker Nodes:
  node01 node01.private node02.private
 
 Resources:
  Resource: privateIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=192.168.168.14 cidr_netmask=32
   Operations: start interval=0s timeout=20s (privateIP-start-interval-0s)
   stop interval=0s timeout=20s (privateIP-stop-interval-0s)
   monitor interval=30s (privateIP-monitor-interval-30s)
 
 Stonith Devices:
 Fencing Levels:
 
 Location Constraints:
   Resource: privateIP
 Enabled on: node01.private (score:INFINITY) 
 (id:location-privateIP-node01.private-INFINITY)
 Ordering Constraints:
 Colocation Constraints:
 
 Resources Defaults:
  No defaults set
 Operations Defaults:
  No defaults set
 
 Cluster Properties:
  cluster-infrastructure: cman
  dc-version: 1.1.11-97629de
  expected-quorum-votes: 2
  no-quorum-policy: ignore
  stonith-enabled: false
 [root@node01 ~]#
 [root@node01 ~]# cat /etc/cluster/cluster.conf
 cluster config_version=8 name=cluster.private
   fence_daemon/
   clusternodes
 clusternode name=node01.private nodeid=1
   fence
 method name=pcmk-redirect
   device name=pcmk port=node01.private/
 /method
   /fence
 /clusternode
 clusternode name=node02.private nodeid=2
   fence
 method name=pcmk-redirect
   device name=pcmk port=node02.private/
 /method
   /fence
 /clusternode
   /clusternodes
   cman/
   fencedevices
 fencedevice agent=fence_pcmk name=pcmk/
   /fencedevices
   rm
 failoverdomains/
 resources/
   /rm
 /cluster
 [root@node01 ~]#
 
 
 Everything appears to be working correctly, just that phantom offline node 
 shows up.
 
 Thanks,
 Allan
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: stonithd: stonith_choose_peer: Couldn't find anyone to fence node with any

2015-08-13 Thread Andrew Beekhof

 On 13 Aug 2015, at 11:36 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:
 
 Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com schrieb am 
 13.08.2015
 um 13:39 in Nachricht
 caenth0fxlzwzw4jmoyk_go0w9o6e2gdd-zfdfohzrahwcgv...@mail.gmail.com:
 Hi,
 
 Brief description of the STONITH problem:
 
 I see two different behaviors with two different STONITH configurations. If
 Pacemaker cannot find a device that can STONITH a problematic node, the
 node remains up and running. Which is bad, because it must be STONITHed.
 
 Correct observation. I wonder whether cloning a STONITH resource would help;

no

 for a symmetric STONITH like SBD any node can fence any other node at the 
 same time. Still pacemaker waits for the stonith resource (wich is something 
 different than SBD) is confirmed running on one node (hard to get if one node 
 with the STONITH resource in a two-node cluster went down unexpectedly).
 
 As opposite to it, if Pacemaker finds a device that, it thinks, can STONITH
 a problematic node, even if the device actually cannot, Pacemaker goes down
 after STONITH returns false positive. The Pacemaker shutdowns itself right
 after STONITH.
 Is it the expected behavior?
 
 I'd surprised if it were.
 
 Do I need to configure a two more STONITH agents for just rebooting nodes
 on which they are running (e.g. with # reboot -f)?
 
 Good question ;-)
 
 [...]
 
 
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] Resource Agent language discussion

2015-08-10 Thread Andrew Beekhof

 On 8 Aug 2015, at 1:14 am, Jehan-Guillaume de Rorthais j...@dalibo.com 
 wrote:
 
 Hi Jan,
 
 On Fri, 7 Aug 2015 15:36:57 +0200
 Jan Pokorný jpoko...@redhat.com wrote:
 
 On 07/08/15 12:09 +0200, Jehan-Guillaume de Rorthais wrote:
 Now, I would like to discuss about the language used to write a RA in
 Pacemaker. I never seen discussion or page about this so far.
 
 it wasn't in such a heretic :) tone, but I tried to show that it
 is extremely hard (if not impossible in some instances) to write
 bullet-proof code in bash (or POSIX shell, for that matter) because
 it's so cumbersome to move from whitespace-delimited words as
 a single argument and words as standalone arguments back and forth,
 connected with quotation-desired/-counterproductive madness
 (what if one wants to indeed pass quotation marks as legitimate
 characters within the passed value, etc.) few months back:
 
 http://clusterlabs.org/pipermail/users/2015-May/000403.html
 (even on developers list, but with fewer replies and broken threading:
 http://clusterlabs.org/pipermail/developers/2015-May/23.html).
 
 Thanks for the links and history. You add some more argument to my points :)
 
 HINT: I don't want to discuss (neither troll about) what is the best
 language. I would like to know why **ALL** the RA are written in
 bash
 
 I would expect the original influence were the init scripts (as RAs
 are mostly just enriched variants to support more flexible
 configuration and better diagnostics back to the cluster stack),
 which in turn were born having simplicity and ease of debugging
 (maintainability) in mind.
 
 That sounds legitimate. And bash is still appropriate for some simple RA.
 
 But for the same ease of code debugging and maintainability arguments (and 
 many
 others), complexe RA shouldn't use shell as language.

You can and should use whatever language you like for your own private RAs.
But if you want it accepted and maintained by the resource-agents project, you 
would be advised to use the language they have standardised on.

As always, the people doing the work get to make the rules.

 
 and if there's traps (hidden far in ocf-shellfuncs as instance)
 to avoid if using a different language. And is it acceptable to
 include new libs for other languages?
 
 https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/doc/dev-guides/ra-dev-guide.txt#L33
 doesn't make any assumption about the target language beside stating
 what's a common one.
 
 Yes, I know that page. But this dev guide focus on shell and have some
 assumptions about ocf-shellfuncs.
 
 I'll take the same exemple than in my previous message, there's nothing
 about the best practice for logging. In the Script variables section, some
 comes from environment, others from ocf-shellfuncs.
 
 We rewrote the RA in perl, mostly because of me. I was bored with bash/sh
 limitations AND syntax AND useless code complexity for some easy tasks AND
 traps (return code etc). In my opinion, bash/sh are fine if you RA code is
 short and simple. Which was mostly the case back in the time of heartbeat
 which was stateless only. But it became a nightmare with multi-state agents
 struggling with complexe code to fit with Pacemaker behavior. Have a look
 to the mysql or pgsql agents.
 
 Moreover, with bash, I had some weird behaviors (timeouts) from the RA
 between runuser/su/sudo and systemd/pamd some months ago. The three of them
 have system implications or side effects deep in the system you need to
 take care off. Using a language able to seteuid/setuid after forking is
 much more natural and clean to drop root privileges and start the daemon
 (PostgreSQL refuses to start as root and is not able to drop its privileges
 to another system user itself).
 
 Other disadvantage of shell scripts is that frequently many processes
 are spawned for simple changes within the filesystem and for string
 parsing/reformatting, which in turn creates a dependency on plenty
 of external executables.
 
 True. Either you need to pipe multi small programs, forking all of them
 (cat|grep|cut|...), sometime with different behavior depending on the system 
 or
 use a complexe one most people don't want to hear anymore (sed, awk, perl, 
 ...).
 In the later case, you not only have to master bash, but other languages as
 well.
 
 Now, we are far to have a enterprise class certified code, our RA had its
 very first tests passed successfully yesterday, but here is a quick
 feedback. The downside of picking another language than bash/sh is that
 there is no OCF module/library available for them. This is quite
 inconvenient when you need to get system specifics variables or logging
 shortcut only defined in ocf-shellfuncs (and I would guess patched by
 packagers ?).
 
 As instance, I had to capture values of $HA_SBIN_DIR and $HA_RSCTMP from
 my perl code.
 
 There could be a shell wrapper that would put these values into the
 environment and then executed the target itself for its disposal
 (generic 

Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

2015-08-05 Thread Andrew Beekhof
Ok, I’ll look into it. Thanks for retesting. 

 On 5 Aug 2015, at 4:00 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi Andrew,
 
 Do you know if this behaviour still exists?
 A LOT of work went into the remote node logic in the last couple of months, 
 its 
 possible this was fixed as a side-effect.
  
  
 It is the latest and does not confirm it.
 I confirm it.
 
 
 I confirmed it in latest 
 Pacemaker.(pacemaker-eefdc909a41b571dc2e155f7b14b5ef0368f2de7)
 
 After all the phenomenon occurs.
 
 
 In the first clean up, pacemaker fails in connection with pacemaker_remote.
 The second succeeds.
 
 The problem does not seem to be settled somehow or other.
 
 
 
 It was the latest and incorporated my log again.
 
 ---
 (snip)
 static size_tcrm_remote_recv_once(crm_remote_t * remote){int rc = 0;
 size_t read_len = sizeof(struct crm_remote_header_v0);
 struct crm_remote_header_v0 *header = crm_remote_header(remote);
 
 if(header) {
 /* Stop at the end of the current message */
 read_len = header-size_total;
 }
 
 /* automatically grow the buffer when needed */
 if(remote-buffer_size  read_len) {
remote-buffer_size = 2 * read_len;
 crm_trace(Expanding buffer to %u bytes, remote-buffer_size);
 
 remote-buffer = realloc_safe(remote-buffer, remote-buffer_size + 
 1);CRM_ASSERT(remote-buffer != NULL);
 }
 
 #ifdef HAVE_GNUTLS_GNUTLS_H
 if (remote-tls_session) {if (remote-buffer == NULL) {
 crm_info(### YAMAUCHI buffer is NULL [buffer_zie[%d] 
 readlen[%d], remote-buffer_size, read_len);
 }
 rc = gnutls_record_recv(*(remote-tls_session),
 remote-buffer + remote-buffer_offset,
 remote-buffer_size - remote-buffer_offset);
 (snip)
 ---
 
 When Pacemaker fails in connection first in remote, my log is printed.
 My log is not printed by the second connection.
 
 [root@sl7-01 ~]# tail -f /var/log/messages | grep YAMA
 Aug  5 14:46:25 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL 
 [buffer_zie[1326] readlen[40]
 Aug  5 14:46:26 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL 
 [buffer_zie[1326] readlen[40]
 Aug  5 14:46:28 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL 
 [buffer_zie[1326] readlen[40]
 Aug  5 14:46:30 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL 
 [buffer_zie[1326] readlen[40]
 Aug  5 14:46:31 sl7-01 crmd[21306]: info: ### YAMAUCHI buffer is NULL 
 [buffer_zie[1326] readlen[40]
 (snip)
 
 Best Regards,
 Hideo Yamauchi.
 
 
 
 
 - Original Message -
 From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp
 To: Cluster Labs - All topics related to open-source clustering welcomed 
 users@clusterlabs.org
 Cc: 
 Date: 2015/8/4, Tue 18:40
 Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of 
 pacemaker_remote.
 
 Hi Andrew,
 
 Do you know if this behaviour still exists?
 A LOT of work went into the remote node logic in the last couple of months, 
 its 
 possible this was fixed as a side-effect.
 
 
 It is the latest and does not confirm it.
 I confirm it.
 
 Many Thanks!
 Hideo Yamauchi.
 
 
 - Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to 
 open-source clustering welcomed users@clusterlabs.org
 Cc: 
 Date: 2015/8/4, Tue 13:16
 Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of 
 pacemaker_remote.
 
 
   On 12 May 2015, at 12:12 pm, renayama19661...@ybb.ne.jp wrote:
 
   Hi All,
 
   The problem is like a buffer becoming NULL after crm_resouce -C 
 practice 
 somehow or other after having rebooted remote node.
 
   I incorporated log in a source code and confirmed it.
 
   
   crm_remote_recv_once(crm_remote_t * remote)
   {
   (snip)
  /* automatically grow the buffer when needed */
  if(remote-buffer_size  read_len) {
 remote-buffer_size = 2 * read_len;
  crm_trace(Expanding buffer to %u bytes, 
 remote-buffer_size);
 
  remote-buffer = realloc_safe(remote-buffer, 
 remote-buffer_size + 1);
  CRM_ASSERT(remote-buffer != NULL);
  }
 
   #ifdef HAVE_GNUTLS_GNUTLS_H
  if (remote-tls_session) {
  if (remote-buffer == NULL) {
 crm_info(### YAMAUCHI buffer is NULL [buffer_zie[%d] 
 readlen[%d], remote-buffer_size, read_len);
  }
  rc = gnutls_record_recv(*(remote-tls_session),
  remote-buffer + 
 remote-buffer_offset,
  remote-buffer_size - 
 remote-buffer_offset);
   (snip)
   
 
   May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### 
 YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
   May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### 
 YAMAUCHI buffer is NULL [buffer_zie

Re: [ClusterLabs] KVM RHEL7 cluster

2015-08-03 Thread Andrew Beekhof
So…

your resource definition was:


  clone id=libvirtd-clone
primitive class=systemd id=libvirtd type=libvirtd
  instance_attributes id=libvirtd-instance_attributes/
  operations
op id=libvirtd-monitor-interval-60s interval=60s 
name=monitor/
  /operations
  meta_attributes id=libvirtd-meta_attributes
nvpair id=libvirtd-meta_attributes-interleave name=interleave 
value=true/
  /meta_attributes
/primitive
meta_attributes id=libvirtd-clone-meta/
  /clone

but needed to be:

  clone id=libvirtd-clone
primitive class=systemd id=libvirtd type=libvirtd
  instance_attributes id=libvirtd-instance_attributes/
  operations
op id=libvirtd-monitor-interval-60s interval=60s 
name=monitor/
  /operations
/primitive
meta_attributes id=libvirtd-clone-meta
nvpair id=libvirtd-meta_attributes-interleave name=interleave 
value=true/
/meta_attributes
  /clone

always set clone options on the clone, not the thing being cloned.

Once I made that change it seemed to behave.

 On 20 May 2015, at 7:00 pm, Ondrej Koch ondrej.k...@techlib.cz wrote:
 
 ...and maybe some better logs using crm_report.
 
 
 On 20.5.2015 05:57, Andrew Beekhof wrote:
 
 On 19 May 2015, at 9:59 pm, Ondrej Koch ondrej.k...@techlib.cz wrote:
 
 Hi,
 we're trying to build a 6-node KVM virtualization cluster on
 CentOS7/RHEL7 using GFS2, CLVM and DLM.
 More or less we were following Red Hat documentation and everything
 seems to be fine but... We can stop cluster service on a single node and
 everything migrates to other nodes. However, as soon as that node
 becomes a cluster member again, every VirtualDomain on every cluster
 node reboots.
 
 We tried to kill clvmd as well, cluster noticed that, reinitialized
 clvmd and... again every VirtualDomain rebooted.
 Our cib is attached.
 
 Logs would be required to say much about this
 
 
 We suspect something about our dependency and/or colocation rules causes
 that. Could you please inspect and give us some directions?
 
 Thanks,
 Ondrej
 cib.xml___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 lva.tar.bz2___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Problem] The SNMP trap which has been already started is transmitted.

2015-08-03 Thread Andrew Beekhof

 On 27 Jul 2015, at 4:18 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi All,
 
 The transmission of the SNMP trap of crm_mon seems to have a problem.
 I identified a problem on latest Pacemaker and Pacemaker1.1.13.
 
 
 Step 1) I constitute a cluster and send simple CLI file.
 
 [root@snmp1 ~]# crm_mon -1 
 Last updated: Mon Jul 27 14:40:37 2015  Last change: Mon Jul 27 
 14:40:29 2015 by root via cibadmin on snmp1
 Stack: corosync
 Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
 2 nodes and 1 resource configured
 
 Online: [ snmp1 snmp2 ]
 
  prmDummy   (ocf::heartbeat:Dummy): Started snmp1
 
 Step 2) I stop a node of the standby once.
 
 [root@snmp2 ~]# stop pacemaker
 pacemaker stop/waiting
 
 
 Step 3) I start a node of the standby again.
 [root@snmp2 ~]# start pacemaker
 pacemaker start/running, process 2284
 
 Step 4) The indication of crm_mon does not change in particular.
 [root@snmp1 ~]# crm_mon -1
 Last updated: Mon Jul 27 14:45:12 2015  Last change: Mon Jul 27 
 14:40:29 2015 by root via cibadmin on snmp1
 Stack: corosync
 Current DC: snmp1 (version 1.1.13-3d781d3) - partition with quorum
 2 nodes and 1 resource configured
 
 Online: [ snmp1 snmp2 ]
 
  prmDummy   (ocf::heartbeat:Dummy): Started snmp1
 
 
 In addition, as for the resource that started in snmp1 node, nothing changes.
 
 ---
 Jul 27 14:41:39 snmp1 crmd[29116]:   notice: State transition S_IDLE - 
 S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
 origin=abort_transition_graph ]
 Jul 27 14:41:39 snmp1 cib[29111]: info: Completed cib_modify operation 
 for section status: OK (rc=0, origin=snmp1/attrd/11, version=0.4.20)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for probe_complete: 
 OK (0)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for 
 probe_complete[snmp1]=true: OK (0)
 Jul 27 14:41:39 snmp1 attrd[29114]: info: Update 11 for 
 probe_complete[snmp2]=true: OK (0)
 Jul 27 14:41:39 snmp1 cib[29202]: info: Wrote version 0.4.0 of the CIB to 
 disk (digest: a1f1920279fe0b1466a79cab09fa77d6)
 Jul 27 14:41:39 snmp1 pengine[29115]:   notice: On loss of CCM Quorum: Ignore
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp2 is online
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Node snmp1 is online
 Jul 27 14:41:39 snmp1 pengine[29115]: info: 
 prmDummy#011(ocf::heartbeat:Dummy):#011Started snmp1
 Jul 27 14:41:39 snmp1 pengine[29115]: info: Leave   prmDummy#011(Started 
 snmp1)
 ---
 
 However, a trap of crm_mon is sent to an SNMP manager.

Are you using the built-in SNMP logic or using -E to give crm_mon a script 
which is then producing the trap?
(I’m trying to figure out who could be turning the monitor action into a start)

 The resource does not reboot, but the SNMP trap which a resource started is 
 sent.
 
 ---
 Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1 [UDP: 
 [192.168.40.100]:35265-[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
  = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 
 = OID: 
 PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource
  = STRING: prmDummy#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: 
 snmp1#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
 start#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
 OK#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 
 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
 Jul 27 14:41:39 SNMP-MANAGER snmptrapd[4521]: 2015-07-27 14:41:39 snmp1 [UDP: 
 [192.168.40.100]:35265-[192.168.40.2]]:#012DISMAN-EVENT-MIB::sysUpTimeInstance
  = Timeticks: (1437975699) 166 days, 10:22:36.99#011SNMPv2-MIB::snmpTrapOID.0 
 = OID: 
 PACEMAKER-MIB::pacemakerNotification#011PACEMAKER-MIB::pacemakerNotificationResource
  = STRING: prmDummy#011PACEMAKER-MIB::pacemakerNotificationNode = STRING: 
 snmp1#011PACEMAKER-MIB::pacemakerNotificationOperation = STRING: 
 monitor#011PACEMAKER-MIB::pacemakerNotificationDescription = STRING: 
 OK#011PACEMAKER-MIB::pacemakerNotificationReturnCode = INTEGER: 
 0#011PACEMAKER-MIB::pacemakerNotificationTargetReturnCode = INTEGER: 
 0#011PACEMAKER-MIB::pacemakerNotificationStatus = INTEGER: 0
 ---
 
 A difference of CIB occurring by the start stop of the node seems to have a 
 problem.
 By this difference, crm_mon transmits an unnecessary SNMP trap.
 ---
 Jul 27 14:41:39 snmp1 cib[29111]: info: +  /cib:  @num_updates=19
 Jul 27 14:41:39 snmp1 cib[29111]: info: +  
 /cib/status/node_state[@id='3232238190']:  
 @crm-debug-origin=do_update_resource
 Jul 27 14:41:39 snmp1 cib[29111]: info: ++ 
 /cib/status/node_state[@id='3232238190']/lrm[@id='3232238190']/lrm_resources: 
  lrm_resource id=prmDummy type=Dummy class=ocf provider=heartbeat/
 Jul 27 14:41:39 snmp1 cib[29111]: info: ++   

Re: [ClusterLabs] Vagrantfile for Clusters_from_Scratch 1.1-pcs tutorial

2015-08-03 Thread Andrew Beekhof

 On 5 May 2015, at 12:57 am, Marcin Dulak marcin.du...@gmail.com wrote:
 
 Hi,
 
 i started working on converting 
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch
  into a Vagrantfile:
 https://github.com/marcindulak/Clusters_From_Scratch-1.1-pcs
 I got stopped at the STONITH configuration (Chapter 8): 
 http://bugs.clusterlabs.org/show_bug.cgi?id=5241
 
 I'm trying out SBD for STONITH, which i plan to handle by VirtualBox 
 shareable hard disks 
 (https://www.virtualbox.org/manual/ch05.html#hdimagewrites)
 I'm using primarily these two documents 
 https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_storage_protect_fencing.html
 and http://www.linux-ha.org/wiki/SBD_Fencing but many things are different on 
 Fedora 21.

TBH, the shared disk side of things might not even be being built on fedora yet.
No-one on the RH team has had a chance to test it yet so its a bit of an 
unknown.

Having said that, you don’t need the shared disk side of things to get some 
benefit from sbd (yes, i know how strange that sounds).
On Fedora/RHEL/CentOS (speaking of which, CentOS 7 would be a much better 
target than Fedora) you just need a functioning watchdog device (most virt 
frameworks offer one).

Then on each node:
- stop the cluster
- install the sbd package
- configure the following in /etc/sysconfig/sbd

SBD_DELAY_START=no
SBD_PACEMAKER=yes
SBD_STARTMODE=clean
SBD_WATCHDOG_DEV=/dev/watchdog

- if 'uname -n’ is not the same as the name by which the cluster knows your 
node, add:

SBD_OPTS=“-n ${the_uname_from_cib}”

- enable sbd to start: systemctl enable sbd

Once this is complete on all nodes, start that cluster again.


 
 Before I invest too much time:
 - is SBD on a VirtualBox shareable hard disk a good idea?
 - if so could you point me to a working example of SBD on modern Fedora + pcs?
 - i have experience in packaging RPMS so if there is any work related to sbd 
 RPM needed I can contribute.
 For example there is currently no sbd RPM on Fedora 21, 22 and I'm using the 
 package from Rawhide's (23) koji.
 
 Best regards,
 
 Marcin
 
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DRBD Can't Mount Under Pacemaker

2015-08-03 Thread Andrew Beekhof

 On 26 Jun 2015, at 8:34 pm, JA E eshm...@gmail.com wrote:
 
 Hi,
 
 I am newbie to clustering. I managed to setup cluster with 
 pacemaker,corosysnc, drbd and pcs in two nodes. But after test restart of 
 both node it seems drbd can't mount the desired folder when controlled by 
 pacemaker, manually it's fine. 
 
 [root@master Desktop]# pcs config
 Cluster Name: cluster_web
 Corosync Nodes:
  master slave 
 Pacemaker Nodes:
  master slave 
 Resources: 
  Resource: virtual_ip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=192.168.73.133 cidr_netmask=32 
   Operations: start interval=0s timeout=20s (virtual_ip-start-timeout-20s)
   stop interval=0s timeout=20s (virtual_ip-stop-timeout-20s)
   monitor interval=30s (virtual_ip-monitor-interval-30s)
  Resource: webserver (class=ocf provider=heartbeat type=apache)
   Attributes: configfile=/etc/httpd/conf/httpd.conf 
 statusurl=http://localhost/server-status 
   Operations: start interval=0s timeout=40s (webserver-start-timeout-40s)
   stop interval=0s timeout=60s (webserver-stop-timeout-60s)
   monitor interval=1min (webserver-monitor-interval-1min)
  Master: webserver_data_sync
   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
 notify=true 
   Resource: webserver_data (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=drbd0 
Operations: start interval=0s timeout=240 
 (webserver_data-start-timeout-240)
promote interval=0s timeout=90 
 (webserver_data-promote-timeout-90)
demote interval=0s timeout=90 
 (webserver_data-demote-timeout-90)
stop interval=0s timeout=100 (webserver_data-stop-timeout-100)
monitor interval=60s (webserver_data-monitor-interval-60s)
  Resource: webserver_fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/drbd0 directory=/var/www/html fstype=ext3 
   Operations: start interval=0s timeout=60 (webserver_fs-start-timeout-60)
   stop interval=0s timeout=60 (webserver_fs-stop-timeout-60)
   monitor interval=20 timeout=40 
 (webserver_fs-monitor-interval-20)
 Stonith Devices: 
 Fencing Levels: 
 Location Constraints:
   Resource: webserver
 Enabled on: node01 (score:50) (id:location-webserver-node01-50)
 Enabled on: master (score:50) (id:location-webserver-master-50)
 Ordering Constraints:
   start virtual_ip then start webserver (kind:Mandatory) 
 (id:order-virtual_ip-webserver-mandatory)
   start webserver_fs then start webserver (kind:Mandatory) 
 (id:order-webserver_fs-webserver-mandatory)
 Colocation Constraints:
   webserver with virtual_ip (score:INFINITY) 
 (id:colocation-webserver-virtual_ip-INFINITY)
   webserver_fs with webserver_data_sync (score:INFINITY) 
 (with-rsc-role:Master) 
 (id:colocation-webserver_fs-webserver_data_sync-INFINITY)
 Cluster Properties:
  cluster-infrastructure: corosync
  cluster-name: cluster_web
  dc-version: 1.1.12-a14efad
  have-watchdog: false
  no-quorum-policy: ignore
  stonith-enabled: false
 
  
 [root@master Desktop]# drbdadm dump
 # /etc/drbd.conf
 global {
 usage-count yes;
 cmd-timeout-medium 600;
 cmd-timeout-long 0;
 }
 common {
 net {
 protocol   C;
 }
 }
 # resource drbd0 on master: not ignored, not stacked
 # defined at /etc/drbd.d/drbd0.res:1
 resource drbd0 {
 on master {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/vg_drbd0/lv_drbd0;
 meta-diskinternal;
 }
 address  ipv4 192.168.73.131:7789;
 }
 on slave {
 volume 0 {
 device   /dev/drbd0 minor 0;
 disk /dev/vg_drbd0/lv_drbd0;
 meta-diskinternal;
 }
 address  ipv4 192.168.73.132:7789;
 }
 }
 
  
 [root@master Desktop]# pcs status
 Cluster name: cluster_web
 Last updated: Fri Jun 26 03:04:18 2015
 Last change: Fri Jun 26 02:13:11 2015
 Stack: corosync
 Current DC: master (1) - partition with quorum
 Version: 1.1.12-a14efad
 2 Nodes configured
 5 Resources configured
 
 Online: [ master slave ]
 Full list of resources:
  virtual_ip   (ocf::heartbeat:IPaddr2):   Started master 
  webserver(ocf::heartbeat:apache):Stopped 
  Master/Slave Set: webserver_data_sync [webserver_data]
  Masters: [ master ]
  Slaves: [ slave ]
  webserver_fs (ocf::heartbeat:Filesystem):Stopped 
 Failed actions:
 webserver_fs_start_0 on master 'unknown error' (1): call=23, 
 status=complete, exit-reason='Couldn't mount filesystem /dev/drbd0 on 
 /var/www/html', last-rc-change='Fri Jun 26 02:20:45 2015', queued=0ms, 
 exec=87ms
 webserver_fs_start_0 on slave 'unknown error' (1): call=23, 
 status=complete, exit-reason='Couldn't mount filesystem /dev/drbd0 on 
 /var/www/html', last-rc-change='Fri Jun 26 02:20:45 2015', queued=0ms, 
 exec=79ms
 
 PCSD Status:
   master: Online