Re: [ClusterLabs] Fwd: FW: heartbeat can monitor virtual IP alive or not .

2016-04-28 Thread Lars Ellenberg
On Thu, Apr 21, 2016 at 01:18:13AM +0800, fu ml wrote:
> Ha.cf:
>
> The question is we want heartbeat monitor virtual IP,
> 
> If this virtual IP on Linux01 can’t ping or respond ,
> 
> We want Linux02 auto take over this service IP Regardless of Linux01’s
> Admin IP is alive or not,
> 
> 
> 
> We try modify ha.cf as following (ex. Linux01):
> 
> 1)ucast eth0 10.88.222.53
> 2)ucast eth0:0 10.88.222.53
> 3)ucast eth0 10.88.222.51 & ucast eth0 10.88.222.53
> 4)ucast eth0 10.88.222.51 & ucast eth0:0 10.88.222.53

> We test the four type option but all failed,

Just to clarify:
in ha.cf, you tell heartbeat which infrastructure to use
for cluster communications.
That means, IPs and NICs you mention there must already exist.

In haresources,
you'd put the resources the cluster is supposed to manage.
That could be an IP address.

But no, *heartbeat* in haresources mode
does NOT do resource monitoring.
It does node alive checks based on heartbeats,
it re-acts on node-dead events only.
For resource monitoring, you'd have to combine it with pacemaker.
(or, like in the old days, mon, or similar stuff). But don't.

If you need more than "node-dead" detection,
what you should do for a new system is:
==> use pacemaker on corosync.

Or, if all you are going to manage is a bunch of IP adresses,
maybe you should chose a different tool, VRRP with keepalived
may be better for your needs.


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-28 Thread Lars Marowsky-Bree
On 2016-04-27T12:10:10, Klaus Wenninger  wrote:

> > Having things in ARGV[] is always risky due to them being exposed more
> > easily via ps. Environment variables or stdin appear better.
> What made you assume the recipient is being passed as argument?
> 
> The environment variable CRM_alert_recipient is being used to pass it.

Ah, excellent! But what made me think that this would be passed as
arguments is that your announcement said: "Each alert may have any
number of recipients configured. These values will simply be passed to
the script as *arguments*." ;-)

Thanks for clarifying this.

> > What I also miss is the ability to filter the events (at least
> > coarsely?) sent to a specific alert/recipient, and to constraint on
> > which nodes it will get executed.  Is that going to happen? On a busy
> > cluster, this could easily cause significant load otherwise.
> I'm aware of that and in the light of reducing complexity of the
> scripts / being able to use a generic scripts coming from anywhere
> it sounds reasonable as well as out of load considerations.
> 
> I was planning to see if I could come up with something easy
> and catchy maybe exploiting the already existing possibility
> to define rules via the nvpair-construct already used.
> 
> Intention of this first release was to have something that can
> replace the existing mechanisms in a smooth way in 1.1.15
> and to get feedback on that.

Makes sense, thanks. Just curious as to where you saw this going.

I'm still confused a little as to how I'd control on which node this
would get run. All, or is it always the DC?

> > It's also worth pointing out that this could likely "lose" events during
> > fail-overs, DC crashes, etc. Users probably should not strictly rely on
> > seeing *every* alert in their scripts, so this should be carefully
> > documented to not be considered a transactional, reliable message bus.
> Proper documentation is anyway still missing.
> Thanks for that input.

Thanks, I didn't mean to complain about this. This was actually
triggered by a recent experience "elsewhere" where someone tried to
build a reliable system on top of such notifications - and then some
were getting lost due to timing ... Best to immediately clarify what the
guarantees on this are ;-)



-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-28 Thread renayama19661014
Hi Klaus,

Because the script is performed the effectiveness of in async, I think that it 
is difficult to set "uptime" by the method of the sample.
After all we may request the transmission of the order.
#The patch before mine only controls a practice turn of the async and is not a 
thing giving load of crmd.

Japan begins a rest for one week from tomorrow.
I discuss after vacation with a member.

Best Regards,
Hideo Yamauchi.



- Original Message -
> From: Klaus Wenninger 
> To: users@clusterlabs.org
> Cc: 
> Date: 2016/4/28, Thu 03:14
> Subject: Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
> 
> On 04/27/2016 04:19 PM, renayama19661...@ybb.ne.jp wrote:
>>  Hi All,
>> 
>>  We have a request for a new SNMP function.
>> 
>> 
>>  The order of traps is not right.
>> 
>>  The turn of the trap is not sometimes followed.
>>  This is because the handling of notice carries out "path" in 
> async.
>>  I think that it is necessary to wait for completion of the practice at 
> "path" unit of "alerts".
>>   
>>  The turn of the trap is different from the real stop order of the resource.
> Writing the alerts in a local list and having the alert-scripts called
> in a serialized manner
> would lead to the snmptrap-tool creating timestamps in the order of the
> occurrence 
> of the alerts.
> Having the snmp-manager order the traps by timestamp this would indeed
> lead to
> seeing them in the order they had occured.
> 
> But this approach has a number of drawbacks:
> 
> - it works just when the traps are coming from one node as there is no
> way to serialize
>   over nodes - at least none that would work under all circumstances we
> want alerts
>   to be delivered
> 
> - it distorts the timestamps created even more from the points in time
> when the
>   alert had been triggered - making the result in a multi-node-scenario
> even worse and
>   making it hard to correlate with other sources of information like
> logfiles
> 
> - if you imagine a scenario with multiple mechanisms of delivering an
> alert + multiple
>   recipients we couldn't use a single list but we would need something more
>   complicated to prevent unneeded delays, delays coming from one of the
> delivery
>   methods not working properly due to e.g. a recipient that is not
> reachable, ...
>   (all solvable of course but if it doesn't solve your problem in the
> first place why the effort)
> 
> The alternative approach taken doesn't create the timestamps in the
> scripts but
> provides timestamps to the scripts already.
> This way it doesn't matter if the execution of the script is delayed.
> 
> 
> A short example how this approach could be used with snmp-traps:
> 
> edit pcmk_snmp_helper.sh:
> 
> ...
> starttickfile="/var/run/starttick"
> 
> # hack to have a reference
> # can have it e.g. in an attribute to be visible throughout the cluster
> if [ ! -f ${starttickfile} ] ; then
>         echo ${CRM_alert_timestamp} > ${starttickfile}
> fi
> 
> starttick=`cat ${starttickfile}`
> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
> 
> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" 
> ]] || [[
> ${CRM_alert_task} != "monitor" ]] ; then
>     # This trap is compliant with PACEMAKER MIB
>     # 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt
>     /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
> PACEMAKER-MIB::pacemakerNotificationTrap \
>         PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" 
> \
>         PACEMAKER-MIB::pacemakerNotificationResource s 
> "${CRM_alert_rsc}" \
>         PACEMAKER-MIB::pacemakerNotificationOperation s
> "${CRM_alert_task}" \
>         PACEMAKER-MIB::pacemakerNotificationDescription s
> "${CRM_alert_desc}" \
>         PACEMAKER-MIB::pacemakerNotificationStatus i 
> "${CRM_alert_status}" \
>         PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
>         PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
> ${CRM_alert_target_rc} && exit 0 || exit 1
> fi
> 
> exit 0
> ...
> 
> add a section to the cib:
> 
> cibadmin --create --xml-text '   id="snmp_traps" 
> path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>   id="snmp_timestamp"
> name="tstamp_format" value="%s%02N"/> 
>   id="trap_destination" value="192.168.123.3"/> 
>  
> '
> 
> 
> This should solve the issue of correct order after being sorted by
> timestamps
> without having the ugly side-effects as described above.
> 
> I hope I understood your scenario correctly and this small example
> points out how I roughly would suggest to cope with the issue.
> 
> Regards,
> Klaus  
>> 
>>  
>>  [root@rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy1_stop_0: 
> ok (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>>  Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy3_stop_0: 
> ok (node=rh72-01, call=37, rc=0, cib-update=57, 

Re: [ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

2016-04-28 Thread Klaus Wenninger
On 04/28/2016 08:33 AM, Ulrich Windl wrote:
> Hi!
>
> I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event 
> sequence number to the notification script (SNMP trap) help?

CRM_alert_node_sequence is there already but as the name says it is just
a reference within one node
and for the case of SNMP you would have to feed it somehow into snmptrap...

The CIB generation would be something cluster-wide but just in the case
that the cluster-nodes are
seeing each other at the moment. Alerts are often especially interesting
during theses periods
of time where this is not the case. But definitely something to think
about...
And again something abstract that some alert-collection-tool wouldn't
know about and thus
probably would refuse to sort by that value.
Some kind of time is probably something you'll find support for easier.

Regards,
Klaus
>
> Regards,
> Ulrich
>
 Klaus Wenninger  schrieb am 27.04.2016 um 20:14 in
> Nachricht <57210183.6050...@redhat.com>:
>> On 04/27/2016 04:19 PM, renayama19661...@ybb.ne.jp wrote:
>>> Hi All,
>>>
>>> We have a request for a new SNMP function.
>>>
>>>
>>> The order of traps is not right.
>>>
>>> The turn of the trap is not sometimes followed.
>>> This is because the handling of notice carries out "path" in async.
>>> I think that it is necessary to wait for completion of the practice at 
>> "path" unit of "alerts".
>>>  
>>> The turn of the trap is different from the real stop order of the resource.
>> Writing the alerts in a local list and having the alert-scripts called
>> in a serialized manner
>> would lead to the snmptrap-tool creating timestamps in the order of the
>> occurrence 
>> of the alerts.
>> Having the snmp-manager order the traps by timestamp this would indeed
>> lead to
>> seeing them in the order they had occured.
>>
>> But this approach has a number of drawbacks:
>>
>> - it works just when the traps are coming from one node as there is no
>> way to serialize
>>   over nodes - at least none that would work under all circumstances we
>> want alerts
>>   to be delivered
>>
>> - it distorts the timestamps created even more from the points in time
>> when the
>>   alert had been triggered - making the result in a multi-node-scenario
>> even worse and
>>   making it hard to correlate with other sources of information like
>> logfiles
>>
>> - if you imagine a scenario with multiple mechanisms of delivering an
>> alert + multiple
>>   recipients we couldn't use a single list but we would need something more
>>   complicated to prevent unneeded delays, delays coming from one of the
>> delivery
>>   methods not working properly due to e.g. a recipient that is not
>> reachable, ...
>>   (all solvable of course but if it doesn't solve your problem in the
>> first place why the effort)
>>
>> The alternative approach taken doesn't create the timestamps in the
>> scripts but
>> provides timestamps to the scripts already.
>> This way it doesn't matter if the execution of the script is delayed.
>>
>>
>> A short example how this approach could be used with snmp-traps:
>>
>> edit pcmk_snmp_helper.sh:
>>
>> ...
>> starttickfile="/var/run/starttick"
>>
>> # hack to have a reference
>> # can have it e.g. in an attribute to be visible throughout the cluster
>> if [ ! -f ${starttickfile} ] ; then
>> echo ${CRM_alert_timestamp} > ${starttickfile}
>> fi
>>
>> starttick=`cat ${starttickfile}`
>> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
>>
>> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
>> ${CRM_alert_task} != "monitor" ]] ; then
>> # This trap is compliant with PACEMAKER MIB
>> # 
>> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt 
>> /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
>> PACEMAKER-MIB::pacemakerNotificationTrap \
>> PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
>> PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
>> PACEMAKER-MIB::pacemakerNotificationOperation s
>> "${CRM_alert_task}" \
>> PACEMAKER-MIB::pacemakerNotificationDescription s
>> "${CRM_alert_desc}" \
>> PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
>> PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
>> PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
>> ${CRM_alert_target_rc} && exit 0 || exit 1
>> fi
>>
>> exit 0
>> ...
>>
>> add a section to the cib:
>>
>> cibadmin --create --xml-text '  > id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>>  > name="tstamp_format" value="%s%02N"/>  > id="trap_destination" value="192.168.123.3"/>  
>> '
>>
>>
>> This should solve the issue of correct order after being sorted by
>> timestamps
>> without having the ugly side-effects as described above.
>>
>> I hope I understood your scenario correctly and this small example
>> points out how I roughly would suggest to 

Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker

2016-04-28 Thread Sriram
Thanks Ken and Emmanuel.
Its a big endian machine. I will try with running "pcs cluster setup" and
"pcs cluster start"
Inside cluster.py, "service pacemaker start" and "service corosync start"
are executed to bring up pacemaker and corosync.
Those service scripts and the infrastructure needed to bring up the
processes in the above said manner doesn't exist in my board.
As it is a embedded board with the limited memory, full fledged linux is
not installed.
Just curious to know, what could be reason the pacemaker throws that error.



*"cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s"*
Thanks for response.

Regards,
Sriram.

On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot  wrote:

> On 04/27/2016 11:25 AM, emmanuel segura wrote:
> > you need to use pcs to do everything, pcs cluster setup and pcs
> > cluster start, try to use the redhat docs for more information.
>
> Agreed -- pcs cluster setup will create a proper corosync.conf for you.
> Your corosync.conf below uses corosync 1 syntax, and there were
> significant changes in corosync 2. In particular, you don't need the
> file created in step 4, because pacemaker is no longer launched via a
> corosync plugin.
>
> > 2016-04-27 17:28 GMT+02:00 Sriram :
> >> Dear All,
> >>
> >> I m trying to use pacemaker and corosync for the clustering requirement
> that
> >> came up recently.
> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc
> >> environment (Target board where pacemaker and corosync are supposed to
> run)
> >> I m having trouble bringing up pacemaker in that environment, though I
> could
> >> successfully bring up corosync.
> >> Any help is welcome.
> >>
> >> I m using these versions of pacemaker and corosync
> >> [root@node_cu pacemaker]# corosync -v
> >> Corosync Cluster Engine, version '2.3.5'
> >> Copyright (c) 2006-2009 Red Hat, Inc.
> >> [root@node_cu pacemaker]# pacemakerd -$
> >> Pacemaker 1.1.14
> >> Written by Andrew Beekhof
> >>
> >> For running corosync, I did the following.
> >> 1. Created the following directories,
> >> /var/lib/pacemaker
> >> /var/lib/corosync
> >> /var/lib/pacemaker
> >> /var/lib/pacemaker/cores
> >> /var/lib/pacemaker/pengine
> >> /var/lib/pacemaker/blackbox
> >> /var/lib/pacemaker/cib
> >>
> >>
> >> 2. Created a file called corosync.conf under /etc/corosync folder with
> the
> >> following contents
> >>
> >> totem {
> >>
> >> version: 2
> >> token:  5000
> >> token_retransmits_before_loss_const: 20
> >> join:   1000
> >> consensus:  7500
> >> vsftype:none
> >> max_messages:   20
> >> secauth:off
> >> cluster_name:   mycluster
> >> transport:  udpu
> >> threads:0
> >> clear_node_high_bit: yes
> >>
> >> interface {
> >> ringnumber: 0
> >> # The following three values need to be set based on
> your
> >> environment
> >> bindnetaddr: 10.x.x.x
> >> mcastaddr: 226.94.1.1
> >> mcastport: 5405
> >> }
> >>  }
> >>
> >>  logging {
> >> fileline: off
> >> to_syslog: yes
> >> to_stderr: no
> >> to_syslog: yes
> >> logfile: /var/log/corosync.log
> >> syslog_facility: daemon
> >> debug: on
> >> timestamp: on
> >>  }
> >>
> >>  amf {
> >> mode: disabled
> >>  }
> >>
> >>  quorum {
> >> provider: corosync_votequorum
> >>  }
> >>
> >> nodelist {
> >>   node {
> >> ring0_addr: node_cu
> >> nodeid: 1
> >>}
> >> }
> >>
> >> 3.  Created authkey under /etc/corosync
> >>
> >> 4.  Created a file called pcmk under /etc/corosync/service.d and
> contents as
> >> below,
> >>   cat pcmk
> >>   service {
> >>  # Load the Pacemaker Cluster Resource Manager
> >>  name: pacemaker
> >>  ver:  1
> >>   }
> >>
> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
> >>
> >> 6. ./corosync -f -p & --> this step started corosync
> >>
> >> [root@node_cu pacemaker]# netstat -alpn | grep -i coros
> >> udp0  0 10.X.X.X:61841 0.0.0.0:*
> >> 9133/corosync
> >> udp0  0 10.X.X.X:5405  0.0.0.0:*
> >> 9133/corosync
> >> unix  2  [ ACC ] STREAM LISTENING 14 9133/corosync
> >> @quorum
> >> unix  2  [ ACC ] STREAM LISTENING 148884 9133/corosync
> >> @cmap
> >> unix  2  [ ACC ] STREAM LISTENING 148887 9133/corosync
> >> @votequorum
> >> unix  2  [ ACC ] STREAM LISTENING 148885 9133/corosync
> >> @cfg
> >> unix  2  [ ACC ] STREAM LISTENING 148886 9133/corosync
> >> @cpg
> >> unix  2  [ ] DGRAM148840 9133/corosync
> >>
> >> 7. ./pacemakerd -f & gives the following error and exits.
> >> [root@node_cu pacemaker]# pacemakerd -f
> >> cmap 

Re: [ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

2016-04-28 Thread Kristoffer Grönlund
Ulrich Windl  writes:

> IMHO it reads too much like the XML (i.e. nobody understands it unless he
> knows the meaning of the XML behind).

Do you have any suggestion as to how that could be improved?

So far we've tried not to stray too far from the XML with the crmsh
syntax, but if there is some other option that would make more sense I
would certainly consider it.

I wouldn't expect anyone to learn the XML to understand the crmsh
syntax, but I would expect them to need some form of documentation, be
it the built-in help, the product documentation or an online guide of
some kind.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

2016-04-28 Thread Ulrich Windl
Hi!

I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event 
sequence number to the notification script (SNMP trap) help?

Regards,
Ulrich

>>> Klaus Wenninger  schrieb am 27.04.2016 um 20:14 in
Nachricht <57210183.6050...@redhat.com>:
> On 04/27/2016 04:19 PM, renayama19661...@ybb.ne.jp wrote:
>> Hi All,
>>
>> We have a request for a new SNMP function.
>>
>>
>> The order of traps is not right.
>>
>> The turn of the trap is not sometimes followed.
>> This is because the handling of notice carries out "path" in async.
>> I think that it is necessary to wait for completion of the practice at 
> "path" unit of "alerts".
>>  
>> The turn of the trap is different from the real stop order of the resource.
> Writing the alerts in a local list and having the alert-scripts called
> in a serialized manner
> would lead to the snmptrap-tool creating timestamps in the order of the
> occurrence 
> of the alerts.
> Having the snmp-manager order the traps by timestamp this would indeed
> lead to
> seeing them in the order they had occured.
> 
> But this approach has a number of drawbacks:
> 
> - it works just when the traps are coming from one node as there is no
> way to serialize
>   over nodes - at least none that would work under all circumstances we
> want alerts
>   to be delivered
> 
> - it distorts the timestamps created even more from the points in time
> when the
>   alert had been triggered - making the result in a multi-node-scenario
> even worse and
>   making it hard to correlate with other sources of information like
> logfiles
> 
> - if you imagine a scenario with multiple mechanisms of delivering an
> alert + multiple
>   recipients we couldn't use a single list but we would need something more
>   complicated to prevent unneeded delays, delays coming from one of the
> delivery
>   methods not working properly due to e.g. a recipient that is not
> reachable, ...
>   (all solvable of course but if it doesn't solve your problem in the
> first place why the effort)
> 
> The alternative approach taken doesn't create the timestamps in the
> scripts but
> provides timestamps to the scripts already.
> This way it doesn't matter if the execution of the script is delayed.
> 
> 
> A short example how this approach could be used with snmp-traps:
> 
> edit pcmk_snmp_helper.sh:
> 
> ...
> starttickfile="/var/run/starttick"
> 
> # hack to have a reference
> # can have it e.g. in an attribute to be visible throughout the cluster
> if [ ! -f ${starttickfile} ] ; then
> echo ${CRM_alert_timestamp} > ${starttickfile}
> fi
> 
> starttick=`cat ${starttickfile}`
> ticks=`eval ${CRM_alert_timestamp} - ${starttick}`
> 
> if [[ ${CRM_alert_rc} != 0 && ${CRM_alert_task} == "monitor" ]] || [[
> ${CRM_alert_task} != "monitor" ]] ; then
> # This trap is compliant with PACEMAKER MIB
> # 
> https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt 
> /usr/bin/snmptrap -v 2c -c public ${CRM_alert_recipient} ${ticks}
> PACEMAKER-MIB::pacemakerNotificationTrap \
> PACEMAKER-MIB::pacemakerNotificationNode s "${CRM_alert_node}" \
> PACEMAKER-MIB::pacemakerNotificationResource s "${CRM_alert_rsc}" \
> PACEMAKER-MIB::pacemakerNotificationOperation s
> "${CRM_alert_task}" \
> PACEMAKER-MIB::pacemakerNotificationDescription s
> "${CRM_alert_desc}" \
> PACEMAKER-MIB::pacemakerNotificationStatus i "${CRM_alert_status}" \
> PACEMAKER-MIB::pacemakerNotificationReturnCode i ${CRM_alert_rc} \
> PACEMAKER-MIB::pacemakerNotificationTargetReturnCode i
> ${CRM_alert_target_rc} && exit 0 || exit 1
> fi
> 
> exit 0
> ...
> 
> add a section to the cib:
> 
> cibadmin --create --xml-text '   id="snmp_traps" path="/usr/share/pacemaker/tests/pcmk_snmp_helper.sh">
>   name="tstamp_format" value="%s%02N"/>   id="trap_destination" value="192.168.123.3"/>  
> '
> 
> 
> This should solve the issue of correct order after being sorted by
> timestamps
> without having the ugly side-effects as described above.
> 
> I hope I understood your scenario correctly and this small example
> points out how I roughly would suggest to cope with the issue.
> 
> Regards,
> Klaus  
>>
>> 
>> [root@rh72-01 ~]# grep Operation  /var/log/ha-log | grep stop
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy1_stop_0: ok 
> (node=rh72-01, call=33, rc=0, cib-update=56, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy3_stop_0: ok 
> (node=rh72-01, call=37, rc=0, cib-update=57, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy4_stop_0: ok 
> (node=rh72-01, call=39, rc=0, cib-update=58, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy2_stop_0: ok 
> (node=rh72-01, call=35, rc=0, cib-update=59, confirmed=true)
>> Apr 25 18:48:48 rh72-01 crmd[28897]:  notice: Operation prmDummy5_stop_0: ok 
> (node=rh72-01, call=41, rc=0, cib-update=60, 

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-28 Thread Dimitri Maziuk
On 04/27/2016 11:01 AM, Dejan Muhamedagic wrote:

... I failed to
>> convince Andrew that wget'ing http://localhost/server-status/ is a
>> wrong thing to do in the first place (apache RA).
> 
> I'm not sure why would it be wrong, but neither can I vouch that
> there's no better way to do a basic apache functionality test. At
> any rate, the test URL can be defined using a parameter.

Define "basic apache functionality".

If the goal is to see that httpd is answering, http code 404 or 302 is
just as good as 200 OK, the failure is connection timeout or TCP RST. If
that is the case with the current version of the RA -- I didn't look --
then using http://floating_ip/ for the test URL should be good enough.
Certainly way better than the default of normally disabled
/server-status @ 127.0.0.1

If you wanted to further shave off a bit of the load you could assume
that if it's listening it's answering. That could be the "lightweight"
check if there's an easy way to get this out of /proc or something.

(As I recall what prompted that back then was that at the time Andrew's
Cluster from Scratch failed to mention that you need to install
wget/curl and enable /server-status in the first place.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker

2016-04-28 Thread Klaus Wenninger
On 04/27/2016 05:28 PM, Sriram wrote:
> Dear All,
>
> I m trying to use pacemaker and corosync for the clustering
> requirement that came up recently.
> We have cross compiled corosync, pacemaker and pcs(python) for ppc
> environment (Target board where pacemaker and corosync are supposed to
> run)

little or big endian?

> I m having trouble bringing up pacemaker in that environment, though I
> could successfully bring up corosync.
> Any help is welcome.
>
> I m using these versions of pacemaker and corosync
> [root@node_cu pacemaker]# corosync -v
> *Corosync Cluster Engine, version '2.3.5'*
> Copyright (c) 2006-2009 Red Hat, Inc.
> [root@node_cu pacemaker]# pacemakerd -$
> *Pacemaker 1.1.14
> Written by Andrew Beekhof*
>
> For running corosync, I did the following.
> 1. Created the following directories,
> /var/lib/pacemaker
> /var/lib/corosync
> /var/lib/pacemaker
> /var/lib/pacemaker/cores
> /var/lib/pacemaker/pengine
> /var/lib/pacemaker/blackbox
> /var/lib/pacemaker/cib
>
>
> 2. Created a file called corosync.conf under /etc/corosync folder with
> the following contents
>
> totem {
>
> version: 2
> token:  5000
> token_retransmits_before_loss_const: 20
> join:   1000
> consensus:  7500
> vsftype:none
> max_messages:   20
> secauth:off
> cluster_name:   mycluster
> transport:  udpu
> threads:0
> clear_node_high_bit: yes
>
> interface {
> ringnumber: 0
> # The following three values need to be set based on
> your environment
> bindnetaddr: 10.x.x.x
> mcastaddr: 226.94.1.1
> mcastport: 5405
> }
>  }
>
>  logging {
> fileline: off
> to_syslog: yes
> to_stderr: no
> to_syslog: yes
> logfile: /var/log/corosync.log
> syslog_facility: daemon
> debug: on
> timestamp: on
>  }
>
>  amf {
> mode: disabled
>  }
>
>  quorum {
> provider: corosync_votequorum
>  }
>
> nodelist {
>   node {
> ring0_addr: node_cu
> nodeid: 1
>}
> }
>
> 3.  Created authkey under /etc/corosync
>
> 4.  Created a file called pcmk under /etc/corosync/service.d and
> contents as below,
>   cat pcmk
>   service {
>  # Load the Pacemaker Cluster Resource Manager
>  name: pacemaker
>  ver:  1
>   }
>
> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>
> 6. ./corosync -f -p & --> this step started corosync
>
> [root@node_cu pacemaker]# netstat -alpn | grep -i coros
> udp0  0 10.X.X.X:61841
> 0.0.0.0:*   9133/corosync
> udp0  0 10.X.X.X:5405 
> 0.0.0.0:*   9133/corosync
> unix  2  [ ACC ] STREAM LISTENING 14
> 9133/corosync   @quorum
> unix  2  [ ACC ] STREAM LISTENING 148884
> 9133/corosync   @cmap
> unix  2  [ ACC ] STREAM LISTENING 148887
> 9133/corosync   @votequorum
> unix  2  [ ACC ] STREAM LISTENING 148885
> 9133/corosync   @cfg
> unix  2  [ ACC ] STREAM LISTENING 148886
> 9133/corosync   @cpg
> unix  2  [ ] DGRAM148840 9133/corosync
>
> 7. ./pacemakerd -f & gives the following error and exits.
> [root@node_cu pacemaker]# pacemakerd -f
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 2s
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 3s
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
> Could not connect to Cluster Configuration Database API, error 6
>
> Can you please point me, what is missing in these steps ?
>
> Before trying these steps, I tried running "pcs cluster start", but
> that command fails with "service" script not found. As the root
> filesystem doesn't contain either /etc/init.d/ or /sbin/service
>
> So, the plan is to bring up corosync and pacemaker manually, later do
> the cluster configuration using "pcs" commands.
>
> Regards,
> Sriram
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org