Re: [Linux-ha-dev] [ClusterLabs Developers] moving cluster-glue to github

2016-10-10 Thread Dejan Muhamedagic
Hi,

On Mon, Oct 10, 2016 at 12:07:48PM +0200, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
> 
> > Kristoffer Gronlund  wrote:
> >> We've discussed moving cluster-glue to github.com and the ClusterLabs
> >> organization, but no one has actually done it yet. ;)
> >
> > Out of curiosity what needs to be done for this, other than the
> > obvious "git push" to github, and maybe updating a README / wiki page
> > or two?
> >
> 
> The main thing would be to ensure that everyone who maintains it agrees
> to the move. AFAIK at least Lars Ellenberg and Dejan are both in favor,
> but I am not sure who else might be considered an owner of
> cluster-glue.
> 
> Cc:ing the Linux HA development list as well.

Lars (aka lge), if you don't see any obstacles, shall we do this?

Cheers,

Dejan

> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com
> 
> ___
> Developers mailing list
> develop...@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Some minor patches for cluster-glue

2016-08-31 Thread Dejan Muhamedagic
On Tue, Aug 30, 2016 at 08:02:26PM +0200, Kristoffer Grönlund wrote:
> Lars Ellenberg  writes:
> 
> > I think what Dejan was expecting is the result of
> > "hg export", which should look more like
> >
> > # HG changeset patch
> > # User Lars Ellenberg 
> > # Date 1413480257 -7200
> > #  Thu Oct 16 19:24:17 2014 +0200
> > # Node ID 0a7add1d9996b6d869d441da6c82fb7b8abcef4f
> > # Parent  f2227d4971baed13958306b2c7cabec0eda93e82
> > fix syslogmsgfmt logging inconsistency for stderr/stdout
> > ...
> >
> > not the output of "hg log -v -p",
> > which looks like what you sent.
> >
> > Though the formats are very similar,
> > and possibly could be massaged by hand, even,
> > hg import is best used with the output created by hg export.
> > Or sent dejan a hg bundle which he then can unbundle.
> 
> Hmm, the patches I sent this time were produced by "hg export".
> 
> Maybe it's a matter of mercurial configuration? git has pushed all
> memories of mercurial off the top of my mental stack. :/

Similar here. I was surprised that hg import would always put my
name etc and looked hard for some option to accept another
format but found nothing.

Maybe we should move the glue and heartbeat to github/clusterlabs
too?

Cheers,

Dejan

> 
> Cheers,
> Kristoffer
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Some minor patches for cluster-glue

2016-08-30 Thread Dejan Muhamedagic
On Fri, Aug 12, 2016 at 07:13:32AM +0200, Kristoffer Grönlund wrote:
> Dejan Muhamedagic <deja...@fastmail.fm> writes:
> 
> > Hi Kristoffer,
> >
> > On Wed, Aug 10, 2016 at 12:32:48PM +0200, Kristoffer Grönlund wrote:
> >> 
> >> Hi everyone (Lars and Dejan in particular),
> >> 
> >> Here are some minor patches for cluster-glue. The first one is an
> >> attempt to get the stonith man page somewhat up to date, and the other
> >> two are minor issues discovered when compiling cluster-glue using GCC
> >> 6.
> >
> > Pushed just now the man page patch which was pending in my queue.
> > Will apply the other two too.
> >
> > Thanks for the contribution!
> 
> Excellent, thank you!

Apparently, the patches as they are cannot be imported with "hg
import", i.e. the metadata gets lost. Did you do "hg export"? Can
you supply them with "hg export"?

Cheers,

Dejan

> Cheers,
> Kristoffer
> 
> >
> > Cheers,
> >
> > Dejan
> >
> >> 
> >> Cheers,
> >> Kristoffer
> >> 
> >> -- 
> >> // Kristoffer Grönlund
> >> // kgronl...@suse.com
> >> 
> >
> >
> >
> >> changeset:   2820:13875518ed6b
> >> parent:  2815:643ac28499bd
> >> user:Kristoffer Grönlund <kgronl...@suse.com>
> >> date:Wed Aug 10 12:13:13 2016 +0200
> >> files:   doc/stonith.xml.in
> >> description:
> >> Low: stonith: Update man page with -E, -m parameters (bsc#970307)
> >> 
> >> 
> >> diff --git a/doc/stonith.xml.in b/doc/stonith.xml.in
> >> --- a/doc/stonith.xml.in
> >> +++ b/doc/stonith.xml.in
> >> @@ -7,22 +7,28 @@
> >>  @VERSION@
> >>  
> >>
> >> -  Alan
> >> -  Robertson
> >> -  stonith
> >> -  al...@unix.sh
> >> +Alan
> >> +Robertson
> >> +stonith
> >> +al...@unix.sh
> >>
> >>
> >> -  Simon
> >> -  Horman
> >> -  man page
> >> -  ho...@vergenet.net
> >> +Simon
> >> +Horman
> >> +man page
> >> +ho...@vergenet.net
> >>
> >>
> >> -  Florian
> >> -  Haas
> >> -  man page
> >> -  florian.h...@linbit.com
> >> +Florian
> >> +Haas
> >> +man page
> >> +florian.h...@linbit.com
> >> +  
> >> +  
> >> +Kristoffer
> >> +Gronlund
> >> +man page
> >> +kgronl...@suse.com
> >>
> >>  
> >>
> >> @@ -44,12 +50,14 @@
> >>  
> >>stonith
> >>-s
> >> +  -v
> >>-h
> >>-L
> >>  
> >>  
> >>stonith
> >>-s
> >> +  -v
> >>-h
> >>-t 
> >> stonith-device-type
> >>-n
> >> @@ -57,14 +65,24 @@
> >>  
> >>stonith
> >>-s
> >> +  -v
> >> +  -h
> >> +  -t 
> >> stonith-device-type
> >> +  -m
> >> +
> >> +
> >> +  stonith
> >> +  -s
> >> +  -v
> >>-h
> >>-t 
> >> stonith-device-type
> >>
> >> -  
> >> - >> choice="plain">name=value
> >> -  
> >> -  -p 
> >> stonith-device-parameters
> >> -  -F 
> >> stonith-device-parameters-file
> >> +
> >> +   >> choice="plain">name=value
> >> +
> >> +-p 
> >> stonith-device-parameters
> >> +-E
> >> +-F 
> >> stonith-device-parameters-file
> >>
> >>-c 
> >> count
> >>-l
> >> @@ -73,22 +91,24 @@
> >>  
> >>stonith
> >>-s
> >> +  -v
> >>-h
> >>-t 
> >> stonith-device-type
> >>
> >> -  
> >> - >> choice="plain">name=value
> >> -  
> >> -  -p 
> >> stonith-device-parameters
> >> -  -F 
> >> stonith-device-parameters-file
> &g

Re: [Linux-ha-dev] Some minor patches for cluster-glue

2016-08-11 Thread Dejan Muhamedagic
Hi Kristoffer,

On Wed, Aug 10, 2016 at 12:32:48PM +0200, Kristoffer Grönlund wrote:
> 
> Hi everyone (Lars and Dejan in particular),
> 
> Here are some minor patches for cluster-glue. The first one is an
> attempt to get the stonith man page somewhat up to date, and the other
> two are minor issues discovered when compiling cluster-glue using GCC
> 6.

Pushed just now the man page patch which was pending in my queue.
Will apply the other two too.

Thanks for the contribution!

Cheers,

Dejan

> 
> Cheers,
> Kristoffer
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com
> 



> changeset:   2820:13875518ed6b
> parent:  2815:643ac28499bd
> user:Kristoffer Grönlund 
> date:Wed Aug 10 12:13:13 2016 +0200
> files:   doc/stonith.xml.in
> description:
> Low: stonith: Update man page with -E, -m parameters (bsc#970307)
> 
> 
> diff --git a/doc/stonith.xml.in b/doc/stonith.xml.in
> --- a/doc/stonith.xml.in
> +++ b/doc/stonith.xml.in
> @@ -7,22 +7,28 @@
>  @VERSION@
>  
>
> - Alan
> - Robertson
> - stonith
> - al...@unix.sh
> +Alan
> +Robertson
> +stonith
> +al...@unix.sh
>
>
> - Simon
> - Horman
> - man page
> - ho...@vergenet.net
> +Simon
> +Horman
> +man page
> +ho...@vergenet.net
>
>
> - Florian
> - Haas
> - man page
> - florian.h...@linbit.com
> +Florian
> +Haas
> +man page
> +florian.h...@linbit.com
> +  
> +  
> +Kristoffer
> +Gronlund
> +man page
> +kgronl...@suse.com
>
>  
>
> @@ -44,12 +50,14 @@
>  
>stonith
>-s
> +  -v
>-h
>-L
>  
>  
>stonith
>-s
> +  -v
>-h
>-t 
> stonith-device-type
>-n
> @@ -57,14 +65,24 @@
>  
>stonith
>-s
> +  -v
> +  -h
> +  -t 
> stonith-device-type
> +  -m
> +
> +
> +  stonith
> +  -s
> +  -v
>-h
>-t 
> stonith-device-type
>
> - 
> -choice="plain">name=value
> - 
> - -p 
> stonith-device-parameters
> - -F 
> stonith-device-parameters-file
> +
> +   choice="plain">name=value
> +
> +-p 
> stonith-device-parameters
> +-E
> +-F 
> stonith-device-parameters-file
>
>-c 
> count
>-l
> @@ -73,22 +91,24 @@
>  
>stonith
>-s
> +  -v
>-h
>-t 
> stonith-device-type
>
> - 
> -choice="plain">name=value
> - 
> - -p 
> stonith-device-parameters
> - -F 
> stonith-device-parameters-file
> +
> +   choice="plain">name=value
> +
> +-p 
> stonith-device-parameters
> +-E
> +-F 
> stonith-device-parameters-file
>
>-c 
> count
>-T
>  
> -   reset
> -   on
> -   off
> - 
> +  reset
> +  on
> +  off
> +
>
>nodename
>  
> @@ -108,145 +128,161 @@
>  The following options are supported:
>  
>
> - 
> -   -c count
> - 
> - 
> -   Perform any actions identified by the
> -   -l, -S and
> -   -T options count
> -   times.
> - 
> -  
> -  
> - 
> -   -F 
> stonith-device-parameters-file
> - 
> - 
> -   Path of file specifying parameters for a stonith
> -   device. To determine the syntax of the parameters file for a
> -   given device type run:
> -   # stonith -t 
> stonith-device-type -n
> -   All of the listed parameters need to appear in order
> -   on a single line in the parameters file and be delimited by
> -   whitespace.
> - 
> -  
> -  
> - 
> -   -h
> - 
> - 
> -   Display detailed information about a stonith device
> -   including description, configuration information, parameters
> -   and any other related information.  When specified without a
> -   stonith-device-type, detailed information on all stonith
> -   devices is displayed.
> -   If you don't yet own a stonith device and want to know
> -   more about the ones we support, this information is likely
> -   to be helpful.
> - 
> -  
> -  
> - 
> -   -L
> - 
> - 
> -   List the valid stonith device types, suitable for
> -   passing as an argument to the -t
> -   option.
> - 
> -  
> -  
> - 
> -   -l
> - 
> - 
> -   List the hosts controlled by the stonith device.
> - 
> -  
> -  
> - 
> -   -n
> - 
> - 
> -   Output the parameter names of the stonith device.
> - 
> +
> +  -c count
> +
> +
> +  Perform any actions identified by the
> +  -l, -S and
> +  

Re: [Linux-ha-dev] [Problem] The designation of the S option seems to have a problem.

2016-05-03 Thread Dejan Muhamedagic
Hi Hideo-san,

On Mon, May 02, 2016 at 04:57:09PM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> The S option of hb_report does not work well.
> Mr. Kristoer made similar modifications in hb_report of the crm shell.
> 
>  * https://github.com/ClusterLabs/crmsh/issues/137
> 
> I just request this correction in glue.

Thanks for the patch. But I think that we should deprecate
hb_report in favour of crm report, no use keeping two copies
around.

Cheers,

Dejan

> Best Regards,
> Hideo Yamauchi.


> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [ha-wg-technical] [Pacemaker] Fw: new important message

2016-02-19 Thread Dejan Muhamedagic
Hi Serge,

On Fri, Feb 19, 2016 at 02:57:04AM +, Serge Dubrouski wrote:
> Got hacked?

Nope. My guess is that somebody featuring my name in the address
book got a virus which does this kind of thing. Annoying. Perhaps
the clusterlabs.org owner could unsubscribe this one.

Cheers,

Dejan

> 
> On Thu, Feb 18, 2016, 7:53 PM Dejan Muhamedagic <bunker...@tiscali.it>
> wrote:
> 
> > Hello!
> >
> >
> >
> > *New message, please read* http://estoncamlievler76.com/leaving.php
> > <http://estoncamlievler76.com/leaving.php?u>
> >
> >
> >
> > Dejan Muhamedagic
> > ___
> > Pacemaker mailing list: pacema...@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >

> ___
> ha-wg-technical mailing list
> ha-wg-techni...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Call for review of undocumented parameters in resource agent meta data

2015-02-16 Thread Dejan Muhamedagic
Hi Lars,

On Thu, Feb 12, 2015 at 01:29:35AM +0100, Lars Ellenberg wrote:
 On Fri, Jan 30, 2015 at 09:52:49PM +0100, Dejan Muhamedagic wrote:
  Hello,
  
  We've tagged today (Jan 30) a new stable resource-agents release
  (3.9.6) in the upstream repository.
  
  Big thanks go to all contributors! Needless to say, without you
  this release would not be possible.
 
 Big thanks to Dejan.
 Who once again finally did,
 what I meant to do in late 2013 already, but simply pushed off
 for over a year (and no-one else stepped up, either...)
 
 So: Thank You.

Thanks. But your contributions, which were numerous, are
certainly appreciated.

 I just today noticed that apparently some resource agents
 accept and use parameters that are not documented in their meta data.
 
 I now came up with a bash two-liner,
 which likely still produces a lot of noise,
 because it does not take into account that some agents
 source additional helper files.

 But here is the list:
 
 --- used, but not described

This is bad and needs to be fixed.

 +++ described, but apparently not used.

Just drop?

 EvmsSCC   +OCF_RESKEY_ignore_deprecation
 Evmsd +OCF_RESKEY_ignore_deprecation
 
   ?? intentionally undocumented ??

No idea, but I doubt that anybody out there is using evms.

 IPaddr+OCF_RESKEY_iflabel

According to the history, this was never used.

 IPaddr-OCF_RESKEY_netmask

This got renamed to cidr_netmask, in an effort to make it more
consistent with IPaddr2 :) The same as what you found below.

   Not sure.
 
 
 IPaddr2   -OCF_RESKEY_netmask
 
   intentional, backward compat, quoting the agent:
 # Note: We had a version out there for a while which used
 # netmask instead of cidr_netmask. Don't remove this aliasing code!
 
 
 Please help review these:
 
 IPsrcaddr -OCF_RESKEY_ip
 IPsrcaddr +OCF_RESKEY_cidr_netmask
 IPv6addr.c-OCF_RESKEY_cidr_netmask
 IPv6addr.c-OCF_RESKEY_ipv6addr
 IPv6addr.c-OCF_RESKEY_nic
 LinuxSCSI +OCF_RESKEY_ignore_deprecation
 Squid -OCF_RESKEY_squid_confirm_trialcount
 Squid -OCF_RESKEY_squid_opts
 Squid -OCF_RESKEY_squid_suspend_trialcount
 SysInfo   -OCF_RESKEY_clone
 WAS6  -OCF_RESKEY_profileName
 apache+OCF_RESKEY_use_ipv6

This is used in http-mon.sh, sourced by apache.

 conntrackd-OCF_RESKEY_conntrackd

This one got renamed to binary, so it's OK. I can still recall
the discussion--IMO not a biggie to have various RA differently
named parameters for the program (but at the time the other party
prevailed :)

 dnsupdate -OCF_RESKEY_opts
 dnsupdate +OCF_RESKEY_nsupdate_opts

Bug? lmb? OK, just fixed it. It should be only the latter.

 docker-OCF_RESKEY_container
 ethmonitor-OCF_RESKEY_check_level
 ethmonitor-OCF_RESKEY_multiplicator
 
 galera+OCF_RESKEY_additional_parameters
 galera+OCF_RESKEY_binary
 galera+OCF_RESKEY_client_binary
 galera+OCF_RESKEY_config
 galera+OCF_RESKEY_datadir
 galera+OCF_RESKEY_enable_creation
 galera+OCF_RESKEY_group
 galera+OCF_RESKEY_log
 galera+OCF_RESKEY_pid
 galera+OCF_RESKEY_socket
 galera+OCF_RESKEY_user
 
   Probably all bogus, it source mysql-common.sh.
   Someone please have a more detailed look.
 
 
 iSCSILogicalUnit  +OCF_RESKEY_product_id
 iSCSILogicalUnit  +OCF_RESKEY_vendor_id
 
   false positive
 
   surprise: florian learned some wizardry back then ;-)
   for var in scsi_id scsi_sn vendor_id product_id; do
   envar=OCF_RESKEY_${var}
   if [ -n ${!envar} ]; then
   params=${params} ${var}=${!envar}
   fi
   done
 
   If such magic is used elsewhere,
   that could mask Used but not documented cases.
 
 
 iface-bridge  -OCF_RESKEY_multicast_querier
 
 !!Yep, that needs to be documented!
 
 mysql-proxy   -OCF_RESKEY_group
 mysql-proxy   -OCF_RESKEY_user
 
   Oops, apparently my magic scriptlet below needs to learn to
   ignore script comments...
 
 named -OCF_RESKEY_rootdir
 
 !!Probably a bug:
   named_rootdir is documented.
 
 
 nfsserver -OCF_RESKEY_nfs_notify_cmd
 
 !!Yep, that needs to be documented!
 
 
 nginx -OCF_RESKEY_client
 nginx +OCF_RESKEY_testclient
 !!client is used, but not documented,
 !!testclient is documented, but unused...
   Bug?

Yeah. Yet another one of the kind.

 nginx -OCF_RESKEY_nginx
 
   Bogus. Needs to be dropped from leading comment block.
 
 oracle-OCF_RESKEY_tns_admin
 
 !!Yep, that needs to be documented!

Nope. tns_admin is not used in oracle but in oralsnr, but the
two share some initialization stuff. Copypaste issue. Will fix
that too.

 pingd

Re: [Linux-ha-dev] resource-agents 3.9.6 released

2015-02-06 Thread Dejan Muhamedagic
Hi Krzysztof,

On Fri, Feb 06, 2015 at 02:10:57PM +0100, Krzysztof Gajdemski wrote:
 Hello,
 
 30.01.2015, 21:52:49, Dejan Muhamedagic wrote:
 
  We've tagged today (Jan 30) a new stable resource-agents release
  (3.9.6) in the upstream repository.
 
 [ ... ]
 
  - new resource agents:
  clvm
  dnsupdate
  docker
  galera
  iface-bridge
  iface-vlan
  kamailio
  nfsnotify
  sg_persist
  vsftpd
  zabbixserver
 
 Just a small correction, zabbixserver (written by me in 2012) was
 introduced in release 3.9.4, and has remained virtually unchanged since
 then.

Oh, I think I noticed that too, but then failed to remove it from
the list. Thanks for the correction.

Cheers,

Dejan

 Regards,
 
   k.
 -- 
 Krzysztof Gajdemski | songo (at) debian.org.pl | KG4751-RIPE
 Registered Linux User #133457 | BLUG Registered Member #0005
 PGP key at: http://s.debian.org.pl/gpg/gpgkey * ID: 3C38979D
 Szanuję was wszystkich, którzy pozostajecie w cieniu - Snerg
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] resource-agents 3.9.6 released

2015-01-30 Thread Dejan Muhamedagic
Hello,

We've tagged today (Jan 30) a new stable resource-agents release
(3.9.6) in the upstream repository.

Big thanks go to all contributors! Needless to say, without you
this release would not be possible.

It has been almost two years since the release v3.9.5, hence the
number of changes is quite big. Still, every precaution has been
taken not to introduce regressions.

These are the most significant new features in the linux-ha set:

- new resource agents:

clvm
dnsupdate
docker
galera
iface-bridge
iface-vlan
kamailio
nfsnotify
sg_persist
vsftpd
zabbixserver

- the drbd agent was removed (it has been deprecated since quite
  some time in favour of ocf:linbit:drbd)

The full list of changes for the linux-ha RA set is available in
ChangeLog
(https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/ChangeLog)

Use the agents introduced with this release with due care: they
probably haven't got a lot of field testing.

Please upgrade at the earliest opportunity.

Best,

The resource-agents maintainers
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] new release date for resource-agents release 3.9.6

2015-01-23 Thread Dejan Muhamedagic
Hello everybody,

Someone warned us that three days is too short a period to test a
release, so let's postpone the final release of resource-agents
v3.9.6 to:

Tuesday, Jan 27

Please do more testing in the meantime. The v3.9.6-rc1 packages
are available for most popular platforms:

http://download.opensuse.org/repositories/home:/dmuhamedagic:/branches:/network:/ha-clustering:/Stable

RHEL-7 and Fedora 21 are unfortunately missing, due to some
strange unresolvable dependencies issue.

Debian/Ubuntu people can use alien.

Many thanks!

The resource-agents crowd
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] announcement: schedule for resource-agents release 3.9.6

2015-01-20 Thread Dejan Muhamedagic
Hello,

On Wed, Jan 07, 2015 at 04:25:53PM +0100, Dejan Muhamedagic wrote:
 Hello,
 
 This is a tentative schedule for resource-agents v3.9.6:
 
 3.9.6-rc1: January 16.

The repository was tagged with v3.9.6-rc1 just now, a bit late
due to illness. Packages for some popular distributions (CentOS,
Fedora, openSUSE, RHEL, SLES) are (or will shortly be) available
here:

http://download.opensuse.org/repositories/home:/dmuhamedagic:/branches:/network:/ha-clustering:/Stable

The package for RedHat RHEL-7 is for some reason unresolvable,
please use the CentOS package instead--I guess that it should
work just the same.

The changes since v3.9.5 are as usual available in ChangeLog.

Unfortunately, I don't have packages for Debian or Debian based
distributions, but I suspect that alien would produce something
usable. For instance, I was able to run the Filesystem ocft test
successfully on Debian 7 Wheezy (after replacing /var/run with
/run).

Please give them a whirl.

 3.9.6: January 23.

I hope that we can still meet this deadline.

Cheers,

Dejan

 Let's hope that this time the schedule will work out ;-)
 I modified the corresponding milestones at
 https://github.com/ClusterLabs/resource-agents
 
 If there's anything you think should be part of the release
 please open an issue, a pull request, or a bugzilla, as you see
 fit.
 
 If there's anything that hasn't received due attention, please
 let us know.
 
 Finally, if you can help with resolving issues consider yourself
 invited to do so. There are currently 20 issues and 35 pull
 requests still open.
 
 Cheers,
 
 Dejan
 (for the resource-agents crowd)
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] announcement: schedule for resource-agents release 3.9.6

2015-01-07 Thread Dejan Muhamedagic
Hello,

This is a tentative schedule for resource-agents v3.9.6:

3.9.6-rc1: January 16.
3.9.6: January 23.

Let's hope that this time the schedule will work out ;-)
I modified the corresponding milestones at
https://github.com/ClusterLabs/resource-agents

If there's anything you think should be part of the release
please open an issue, a pull request, or a bugzilla, as you see
fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so. There are currently 20 issues and 35 pull
requests still open.

Cheers,

Dejan
(for the resource-agents crowd)
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] Announcing crmsh release 2.1.1

2014-10-29 Thread Dejan Muhamedagic
Hi Kristoffer,

On Wed, Oct 29, 2014 at 12:33:55AM +0100, Kristoffer Grönlund wrote:
 
 Today we are proud to announce the release of `crmsh` version 2.1.1!
 This version primarily fixes all known issues found since the release
 of `crmsh` 2.1 in June. We recommend that all users of crmsh upgrade
 to this version, especially if using Pacemaker 1.1.12 or newer.
 
 A massive thank you to everyone who has helped out with bug fixes,
 comments and contributions for this release!

Many thanks for the effort and diligence you put into making
crmsh always better. Great work!

Cheers,

Dejan

 For a complete list of changes since the previous version, please
 refer to the changelog:
 
 * https://github.com/crmsh/crmsh/blob/2.1.1/ChangeLog
 
 Packages for several popular Linux distributions can be downloaded
 from the Stable repository at the OBS:
 
 * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
 
 Archives of the tagged release:
 
 * https://github.com/crmsh/crmsh/archive/2.1.1.tar.gz
 * https://github.com/crmsh/crmsh/archive/2.1.1.zip
 
 Changes since the previous release:
 
  - cibconfig: Clean up output from crm_verify (bnc#893138)
  - high: constants: Add acl_target and acl_group to cib_cli_map (bnc#894041)
  - high: parse: split shortcuts into valid rules
  - medium: Handle broken CIB in find_objects
  - high: scripts: Handle corosync.conf without nodelist in add-node 
 (bnc#862577)
  - medium: config: Assign default path in all cases
  - high: cibconfig: Generate valid CLI syntax for attribute lists (bnc#897462)
  - high: cibconfig: Add tag:tag to get all resources in tag
  - doc: Documentation for show tag:tag
  - low: report: Sort list of nodes
  - high: parse: Allow empty attribute values in nvpairs (bnc#898625)
  - high: cibconfig: Delay reinitialization after commit
  - low: cibconfig: Improve wording of commit prompt
  - low: cibconfig: Fix vim modeline
  - high: report: Find nodes for any log type (boo#900654)
  - high: hb_report: Collect logs from journald (boo#900654)
  - high: cibconfig: Don't crash if given an invalid pattern (bnc#901714)
  - high: xmlutil: Filter list of referenced resources (bnc#901714)
  - medium: ui_resource: Only act on resources (#64)
  - medium: ui_resource: Flatten, then filter (#64)
  - high: ui_resource: Use correct name for error function (bnc#901453)
  - high: ui_resource: resource trace failed if operation existed (bnc#901453)
  - Improved test suite
 
 Thank you,
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 ___
 Linux-HA mailing list
 linux...@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-24 Thread Dejan Muhamedagic
On Thu, Oct 23, 2014 at 08:36:38PM +0200, Lars Ellenberg wrote:
 On Tue, Oct 21, 2014 at 02:06:24PM +0100, Tim Small wrote:
  On 20/10/14 20:17, Lars Ellenberg wrote:
   In other OSes, ps may be able to give a good enough equivalent?
  
  Debian's start-stop-daemon executable might be worth considering here -
  it's used extensively in the init script infrastructure of Debian (and
  derivatives, over several different OS kernels), and so is well
  debugged, and in my experience beats re-implementing it's functionality.
  
  http://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/utils/start-stop-daemon.c
  
  I've used it in pacemaker resource control scripts before successfully -
  it's kill expression support is very useful in particular on HA.
  
  Tim.
  
  
  NAME
  
 start-stop-daemon - start and stop system daemon programs
 
 Really? pasting a man page to a mailing list?
 
 But yes...
 
 If we want to require presence of start-stop-daemon,
 we could make all this somebody elses problem.
 I need find some time to browse through the code
 to see if it can be improved further.
 But in any case, using (a tool like) start-stop-daemon consistently
 throughout all RAs would improve the situation already.
 
 Do we want to do that?
 Dejan? David? Anyone?

I think I'm happy with a one-liner shell solution.

Cheers,

Dejan

 
   Lars
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-22 Thread Dejan Muhamedagic
Hi Lars,

On Mon, Oct 20, 2014 at 09:17:29PM +0200, Lars Ellenberg wrote:
 
 Recent discussions with Dejan made me again more prominently aware of a
 few issues we probably all know about, but usually dismis as having not
 much relevance in the real-world.
 
 The facts:
 
  * a pidfile typically only stores a pid
  * a pidfile may stale, not properly cleaned up
when the pid it references died.
  * pids are recycled
 
This is more an issue if kernel.pid_max is small
wrt the number of processes created per unit time,
for example on some embeded systems,
or on some very busy systems.
 
But it may be an issue on any system,
even a mostly idle one, given bad luck^W timing,
see below.
 
 A common idiom in resource agents is to
 
 kill_that_pid_and_wait_until_dead()
 {
   local pid=$1
   is_alive $pid || return 0
   kill -TERM $pid
   while is_alive $pid ; sleep 1; done
   return 0
 }
 
 The naïve implementation of is_alive() is
 is_alive() { kill -0 $1 ; }
 
 This is the main issue:
 ---
 
 If the last-used-pid is just a bit smaller then $pid,
 during the sleep 1, $pid may die,
 and the OS may already have created a new process with that exact pid.
 
 Using above is_alive, kill_that_pid() will not notice that the
 to-be-killed pid has actually terminated while that new process runs.
 Which may be a very long time if that is some other long running daemon.
 
 This may result in stop failure and resulting node level fencing.
 
 The question is, which better way do we have to detect if some pid died
 after we killed it. Or, related, and even better: how to detect if the
 process currently running with some pid is in fact still the process
 referenced by the pidfile.
 
 I have two suggestions.
 
 (I am trying to avoid bashisms in here.
  But maybe I overlook some.
  Also, the code is typed, not sourced from some working script,
  so there may be logic bugs and typos.
  My intent should be obvious enough, though.)
 
 using cd /proc/$pid; stat .
 -
 
 # this is most likely linux specific

Apparently not. According to Wikipedia at least, most UNIX
platforms (including BSD and Solaris) support /proc/$pid.

 kill_that_pid_and_wait_until_dead()
 {
   local pid=$1
   (
   cd /proc/$pid || return 0
   kill -TERM $pid
   while stat . ; sleep 1; done

I'd rather test -d . (it's more common in shell scripts and
runs faster). BTW, on my laptop, test -d is so fast that the
process doesn't get removed before it runs and the while loop
always gets executed. In that respect, stat or ls -d performs
better.

   )
   return 0
 }
 
 Once pid dies, /proc/$pid will become stale (but not completely go away,
 because it is our cwd), and stat . will return No such process.

This seems to be a very elegant solution and I cannot find fault
with it. Short and easy to understand too.

[... Skipping other proposals, some of which are quite exotic :) ]

 kill_using_pidfile()
 {
   local pidfile=$1
   local pid starttime proc_pid_starttime
 
   test -e $pidfile|| return # already dead
   read pid starttime $pidfile|| return # unreadable

I'd assume that we (the caller) knows what the process should
look like in the process table, as in say command and arguments.
We could also test that if there's a possibility that the process
left but the PID file somehow stayed behind.

   # check pid and starttime are both present, numeric only, ...
   # I have a version that distinguishes 16 distinct error

Wow!

   # conditions; this is the short version only...
 
   local i=0
   while
   get_proc_pid_starttime 
   [ $starttime = $proc_pid_starttime ]
   do
   : $(( i+=1 ))
   [ $i =  1 ]  kill -TERM $pid
   # MAYBE # [ $i = 30 ]  kill -KILL $pid
   sleep 1
   done
 
   # it's not (anymore) the process we where looking for
   # remove that pidfile.
 
   rm -f $pidfile
 }
 
 In other OSes, ps may be able to give a good enough equivalent?
 
 Any comments?

I'd just go with the cd /proc/$pid thing. Perhaps add a test
for ps -o cmd $pid output.

And thanks for giving this such a thorough analysis!

Thanks,

Dejan

 Thanks,
   Lars
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-22 Thread Dejan Muhamedagic
Hi Alan,

On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote:
 For the Assimilation code I use the full pathname of the binary from
 /proc to tell if it's one of mine.  That's not perfect if you're using
 an interpreted language.  It works quite well for compiled languages.

Yes, though not perfect, that may be good enough. I supposed that
the probability that the very same program gets the same recycled
pid is rather low. (Or is it?)

Cheers,

Dejan

 
 On 10/20/2014 01:17 PM, Lars Ellenberg wrote:
  Recent discussions with Dejan made me again more prominently aware of a
  few issues we probably all know about, but usually dismis as having not
  much relevance in the real-world.
 
  The facts:
 
   * a pidfile typically only stores a pid
   * a pidfile may stale, not properly cleaned up
 when the pid it references died.
   * pids are recycled
 
 This is more an issue if kernel.pid_max is small
 wrt the number of processes created per unit time,
 for example on some embeded systems,
 or on some very busy systems.
 
 But it may be an issue on any system,
 even a mostly idle one, given bad luck^W timing,
 see below.
 
  A common idiom in resource agents is to
 
  kill_that_pid_and_wait_until_dead()
  {
  local pid=$1
  is_alive $pid || return 0
  kill -TERM $pid
  while is_alive $pid ; sleep 1; done
  return 0
  }
 
  The naïve implementation of is_alive() is
  is_alive() { kill -0 $1 ; }
 
  This is the main issue:
  ---
 
  If the last-used-pid is just a bit smaller then $pid,
  during the sleep 1, $pid may die,
  and the OS may already have created a new process with that exact pid.
 
  Using above is_alive, kill_that_pid() will not notice that the
  to-be-killed pid has actually terminated while that new process runs.
  Which may be a very long time if that is some other long running daemon.
 
  This may result in stop failure and resulting node level fencing.
 
  The question is, which better way do we have to detect if some pid died
  after we killed it. Or, related, and even better: how to detect if the
  process currently running with some pid is in fact still the process
  referenced by the pidfile.
 
  I have two suggestions.
 
  (I am trying to avoid bashisms in here.
   But maybe I overlook some.
   Also, the code is typed, not sourced from some working script,
   so there may be logic bugs and typos.
   My intent should be obvious enough, though.)
 
  using cd /proc/$pid; stat .
  -
 
  # this is most likely linux specific
  kill_that_pid_and_wait_until_dead()
  {
  local pid=$1
  (
  cd /proc/$pid || return 0
  kill -TERM $pid
  while stat . ; sleep 1; done
  )
  return 0
  }
 
  Once pid dies, /proc/$pid will become stale (but not completely go away,
  because it is our cwd), and stat . will return No such process.
 
  Variants:
 
  using test -ef
  --
 
  exec 7/proc/$pid || return 0
  kill -TERM $pid
  while :; do
  exec 8/proc/$pid || break
  test /proc/self/fd/7 -ef /proc/self/fd/8 || break
  sleep 1
  done
  exec 7- 8-
 
  using stat -c %Y /proc/$pid
  ---
 
  ctime0=$(stat -c %Y /proc/$pid)
  kill -TERM $pid
  while ctime=$(stat -c %Y /proc/$pid)  [ $ctime = $ctime0 ] ; do sleep 
  1; done
 
 
  Why not use the inode number I hear you say.
  Because it is not stable. Sorry.
  Don't believe me? Don't want to read kernel source?
  Try it yourself:
 
  sleep 120  k=$!
  stat /proc/$k
  echo 3  /proc/sys/vm/drop_caches
  stat /proc/$k
 
  But that leads me to an other proposal: 
  store the starttime together with the pid in a pidfile.
 
  For linux that would be:
 
  (see proc(5) for /proc/pid/stat field meanings.
   note that (comm) may contain both whitespace and ),
   which is the reason for my sed | cut below)
 
  spawn_create_exclusive_pid_starttime()
  {
  local pidfile=$1
  shift
  local reset
  case $- in *C*) reset=:;; *) set -C; reset=set +C;; esac
  if ! exec 3$pidfile ; then
  $reset
  return 1
  fi
 
  $reset
  setsid sh -c '
  read pid _  /proc/self/stat
  starttime=$(sed -e 's/^.*) //' /proc/$pid/stat | cut -d' ' -f 
  20)
  3 echo $pid $starttime
  3- exec $@
  ' -- $@ 
  return 0
  }
 
  It does not seem possible to cycle through all available pids
  within fractions of time smaller than the granularity of starttime,
  so pid starttime should be a unique tuple (until the next reboot --
  at least on linux, starttime is measured as strictly monotonic uptime).
 
 
  If we have pid starttime in the pidfile,
  we can:
 
  get_proc_pid_starttime()
  {
  proc_pid_starttime=$(sed -e 's/^.*) //' /proc/$pid/stat) || return 1
  proc_pid_starttime=$(echo $proc_pid_starttime | 

Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.

2014-07-23 Thread Dejan Muhamedagic
On Wed, Jul 23, 2014 at 11:09:55AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Dejan,
 
 I confirmed it in the environment where NLS_LANG was set in 
 Japanese(Japanese_Japan.AL32UTF8).
 
 I changed the expiration date of the OCFMON user and pushed forward the date 
 of the system for one year.
 I confirmed that the next processing worked definitely.(...on oracle12c)
 
 Confirmed 1) After OCFMON user became expired (EXPIRED), the monitor 
 processing in the sysdba user succeeds.
 Confirmed 2) The grep judgment of the EXPIRED character string is carried out 
 definitely.
 Confirmed 3) When we start oracle again after OCFMON user expired, the time 
 limit of the OCFMON user is changed.
 
  
  415        if echo $output | grep -w EXPIRED 
  /dev/null; then
  
  Also, could you verify if common_sql_filter() need modifications?
 
 As a result, the correction of the next grep was not necessary.(Confirmed 
 2,Confirmed 3)

Many thanks for the testing and the patch!

Cheers,

Dejan

 Best Regards,
 Hideo Yamauchi.
 
 
 
 - Original Message -
  From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp
  To: Dejan Muhamedagic deja...@fastmail.fm; High-Availability Linux 
  Development List linux-ha-dev@lists.linux-ha.org
  Cc: 
  Date: 2014/7/22, Tue 20:50
  Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle 
  resource agent.
  
  Hi Dejan,
  
  All right!!
  
   Is that with the latest version?
  
  
  I confirm RA now in Oracle12c.
  It is the latest edition of oracle.
  
  Many Thanks!
  Hideo Yamauchi.
  
  
  
  - Original Message -
   From: Dejan Muhamedagic deja...@fastmail.fm
   To: renayama19661...@ybb.ne.jp; High-Availability Linux Development List 
  linux-ha-dev@lists.linux-ha.org
   Cc: 
   Date: 2014/7/22, Tue 18:46
   Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle 
  resource agent.
  
   Hi Hideo-san,
  
   On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp 
  wrote:
    Hi All,
  
    I am going to explain the next change to our user.
  
     * https://github.com/ClusterLabs/resource-agents/pull/367
     * https://github.com/ClusterLabs/resource-agents/pull/439
  
  
    Let me confirm whether it is the next contents that a patch intends.
  
    1) Because it was a problem that OCFMON user was added while the 
  oracle 
   manager did not know it, patch changed it to appoint it explicitly.
  
   The OCFMON user and password parameters are optional, hence in
   this respect nothing really changed. The user is still created
   by the RA. However, it is good that they're now visible in the
   meta-data.
  
    2) Patch changed a deadline of OCFMON.(A deadline for password of the 
   default may be 180 days.)
  
   That's the problem we had with the previous version. Now there's
   a profile created for the monitoring user which has unlimited
   password expiry. If the password expired in the meantime, due to
   a missing profile, then it is reset.
  
   If the monitor still fails, the RA tries as sysdba again.
  
    3) Patch kept compatibility with old RA.
  
   Yes.
  
    Is there the main point of any other patches?
  
   No.
  
    If there is really the problem that occurred, before this change, 
  please 
   teach to me.
  
   As mentioned above, the issue was that the password could
   expire.
  
    I intend to really show the problem that happened to a user.
     * For example, a time limit of OCFMON expired and failed in a monitor 
  of 
   oracle
  
   Is that with the latest version?
  
   Cheers,
  
   Dejan
  
    I am going to send a patch later.
  
    Best Regards,
    Hideo Yamauchi.
    ___
    Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
    http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
    Home Page: http://linux-ha.org/
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.

2014-07-22 Thread Dejan Muhamedagic
Hi Hideo-san,

On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,
 
 I am going to explain the next change to our user.
 
  * https://github.com/ClusterLabs/resource-agents/pull/367
  * https://github.com/ClusterLabs/resource-agents/pull/439
 
 
 Let me confirm whether it is the next contents that a patch intends.
 
 1) Because it was a problem that OCFMON user was added while the oracle 
 manager did not know it, patch changed it to appoint it explicitly.

The OCFMON user and password parameters are optional, hence in
this respect nothing really changed. The user is still created
by the RA. However, it is good that they're now visible in the
meta-data.

 2) Patch changed a deadline of OCFMON.(A deadline for password of the default 
 may be 180 days.)

That's the problem we had with the previous version. Now there's
a profile created for the monitoring user which has unlimited
password expiry. If the password expired in the meantime, due to
a missing profile, then it is reset.

If the monitor still fails, the RA tries as sysdba again.

 3) Patch kept compatibility with old RA.

Yes.

 Is there the main point of any other patches?

No.

 If there is really the problem that occurred, before this change, please 
 teach to me.

As mentioned above, the issue was that the password could
expire.

 I intend to really show the problem that happened to a user.
  * For example, a time limit of OCFMON expired and failed in a monitor of 
 oracle

Is that with the latest version?

Cheers,

Dejan

 I am going to send a patch later.
 
 Best Regards,
 Hideo Yamauchi.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Patch] oracle RA - Change of the judgment of the check_mon_user processing.

2014-07-22 Thread Dejan Muhamedagic
On Tue, Jul 22, 2014 at 11:57:04AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,
 
 Consideration when NLS_LANG is set for other languages in oracle resource 
 agent is necessary.
 I attached a patch.

The patch looks good. I wonder if this string is also
translated:

415 if echo $output | grep -w EXPIRED /dev/null; then

Also, could you verify if common_sql_filter() need modifications?

Cheers,

Dejan


 Best Regards,
 Hideo Yamauchi.


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] glue 1.0.12 released

2014-07-21 Thread Dejan Muhamedagic
Hello,

The current glue repository has been tagged as 1.0.12.

It's been a while since the release candidate 1.0.12-rc1. There
were a few minor fixes and additions in the meantime, mostly for
hb_report.

Please upgrade at the earliest possible opportunity.

You can get the 1.0.12 tarball here:

http://hg.linux-ha.org/glue/archive/glue-1.0.12.tar.bz2

The ChangeLog is available here:

http://hg.linux-ha.org/glue/file/glue-1.0.12/ChangeLog

A set of rpms is also available at the openSUSE Build Service:*)

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

The packages at the openSUSE Build Service will not work with
pacemaker versions earlier than v1.1.8 because the LRM bits are
not compiled.

Many thanks to all contributors. Without you this release would
not have been possible.

Enjoy!

Lars Ellenberg
Dejan Muhamedagic

*) Currently packages for RHEL6 and RHEL7 are not built due to
missing dependencies. I suppose that you could also use the
CentOS packages which were built fine. I hope that that issue
will eventually be resolved.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH:glue] Correctly locate the logd systemd service file

2014-06-23 Thread Dejan Muhamedagic
On Thu, Jun 19, 2014 at 02:04:45PM +0900, Kazunori INOUE wrote:
 Hi Dejan,
 
 2014-06-19 4:30 GMT+09:00 Dejan Muhamedagic deja...@fastmail.fm:
  Hi Kazunori-san,
 
  On Wed, Jun 18, 2014 at 05:57:14PM +0900, Kazunori INOUE wrote:
  Hi,
 
  make of cluster-glue fails on RHEL7.
 
  $ cat /etc/redhat-release
  Red Hat Enterprise Linux Server release 7.0 (Maipo)
 
  $ hg parents
  changeset:   2792:45e21bc9795d
  tag: tip
  user:Dejan Muhamedagic de...@hello-penguin.com
  date:Thu Jun 12 12:28:59 2014 +0200
  summary: Low: hb_report: gdb debug symbols output change
 
  $ make rpm
  rm -f cluster-glue.tar.bz2
  hg archive -t tbz2 -r tip cluster-glue.tar.bz2
  echo `date`: Rebuilt cluster-glue.tar.bz2
  Wed Jun 18 16:09:36 JST 2014: Rebuilt cluster-glue.tar.bz2
  rm -f *.src.rpm
  To create custom builds, edit the flags and options in
  cluster-glue-fedora.spec first
  rpmbuild -bs --define dist .fedora --define _sourcedir
  /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir
  /zzz/DEV/glue cluster-glue-fedora.spec
  Wrote: /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm
  rpmbuild --define _sourcedir /zzz/DEV/glue --define _specdir
  /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue --rebuild
  /zzz/DEV/glue/*.src.rpm
  Installing /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm
  (snip)
  + /usr/lib/rpm/redhat/brp-java-repack-jars
  Processing files: cluster-glue-1.0.12-0.rc1.el7.x86_64
  error: File not found:
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd
 
  Yes, unfortunately the systemd support was added without adding
  support to the spec files. I'm not an expert on these, so I'd
  like to ask you to test the patch I put together. If you could
  also fix any problems you run into, that'd be great.
 
 
 Since RHEL didn't have the following macro, it corrected.
 - service_add_post
 - service_del_preun
 - service_del_postun
 - service_add_pre : since the macro corresponding to service_add_pre
 was not found, it deleted. (quick fix)
 
 $ rpm -qp --scripts
 /root/rpmbuild/RPMS/x86_64/cluster-glue-1.0.12-0.rc1.el7.x86_64.rpm
 preinstall scriptlet (using /bin/sh):
 getent group haclient /dev/null || groupadd -r haclient
 getent passwd hacluster /dev/null || \
 useradd -r -g haclient -d /var/lib/heartbeat/cores/hacluster -s /sbin/nologin 
 \
 -c cluster user hacluster
   %service_add_pre logd.service
 exit 0
 postinstall scriptlet (using /bin/sh):
 %service_add_post logd.service
 preuninstall scriptlet (using /bin/sh):
 %service_del_preun logd.service
 postuninstall scriptlet (using /bin/sh):
 %service_del_postun logd.service
 
 $ rpm -ivh cluster-glue-1.0.12-0.rc1.el7.x86_64.rpm
 Preparing...  # [100%]
 /var/tmp/rpm-tmp.88IU8v: line 5: fg: no job control
 Updating / installing...
1:cluster-glue-1.0.12-0.rc1.el7# [100%]
 /var/tmp/rpm-tmp.d0hXL3: line 1: fg: no job control
 warning: %post(cluster-glue-1.0.12-0.rc1.el7.x86_64) scriptlet failed,
 exit status 1
 $ rpm -e cluster-glue
 /var/tmp/rpm-tmp.dT37S2: line 1: fg: no job control
 error: %preun(cluster-glue-1.0.12-0.rc1.el7.x86_64) scriptlet failed,
 exit status 1
 error: cluster-glue-1.0.12-0.rc1.el7.x86_64: erase failed
 
 And I have no SUSE environment now, so suse.spec is not checking.

Many thanks for the fedora spec file. I think I managed to fix
the suse spec file, at least it manages to produce a package with
openSUSE Factory. Both patches pushed.

Cheers,

Dejan

 Regards,
 
  Cheers,
 
  Dejan
 
 
  Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.kJSlFu
  + umask 022
  + cd /root/rpmbuild/BUILD
  + cd cluster-glue
  + 
  DOCDIR=/root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + export DOCDIR
  + /usr/bin/mkdir -p
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + cp -pr doc/stonith/README.bladehpi doc/stonith/README.cyclades
  doc/stonith/README.drac3 doc/stonith/README.dracmc
  doc/stonith/README.external doc/stonith/README.ibmrsa
  doc/stonith/README.ibmrsa-telnet doc/stonith/README.ipmilan
  doc/stonith/README.ippower9258 doc/stonith/README.meatware
  doc/stonith/README.rackpdu doc/stonith/README.rcd_serial
  doc/stonith/README.riloe doc/stonith/README.vacm
  doc/stonith/README.vcenter doc/stonith/README.wti_mpc
  doc/stonith/README_kdumpcheck.txt
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + cp -pr logd/logd.cf
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + cp -pr AUTHORS
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + cp -pr COPYING
  /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
  + cp -pr ChangeLog
  /root/rpmbuild/BUILDROOT/cluster

Re: [Linux-ha-dev] [PATCH:glue] Correctly locate the logd systemd service file

2014-06-18 Thread Dejan Muhamedagic
Hi Kazunori-san,

On Wed, Jun 18, 2014 at 05:57:14PM +0900, Kazunori INOUE wrote:
 Hi,
 
 make of cluster-glue fails on RHEL7.
 
 $ cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 7.0 (Maipo)
 
 $ hg parents
 changeset:   2792:45e21bc9795d
 tag: tip
 user:Dejan Muhamedagic de...@hello-penguin.com
 date:Thu Jun 12 12:28:59 2014 +0200
 summary: Low: hb_report: gdb debug symbols output change
 
 $ make rpm
 rm -f cluster-glue.tar.bz2
 hg archive -t tbz2 -r tip cluster-glue.tar.bz2
 echo `date`: Rebuilt cluster-glue.tar.bz2
 Wed Jun 18 16:09:36 JST 2014: Rebuilt cluster-glue.tar.bz2
 rm -f *.src.rpm
 To create custom builds, edit the flags and options in
 cluster-glue-fedora.spec first
 rpmbuild -bs --define dist .fedora --define _sourcedir
 /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir
 /zzz/DEV/glue cluster-glue-fedora.spec
 Wrote: /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm
 rpmbuild --define _sourcedir /zzz/DEV/glue --define _specdir
 /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue --rebuild
 /zzz/DEV/glue/*.src.rpm
 Installing /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm
 (snip)
 + /usr/lib/rpm/redhat/brp-java-repack-jars
 Processing files: cluster-glue-1.0.12-0.rc1.el7.x86_64
 error: File not found:
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd

Yes, unfortunately the systemd support was added without adding
support to the spec files. I'm not an expert on these, so I'd
like to ask you to test the patch I put together. If you could
also fix any problems you run into, that'd be great.

Cheers,

Dejan


 Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.kJSlFu
 + umask 022
 + cd /root/rpmbuild/BUILD
 + cd cluster-glue
 + 
 DOCDIR=/root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + export DOCDIR
 + /usr/bin/mkdir -p
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + cp -pr doc/stonith/README.bladehpi doc/stonith/README.cyclades
 doc/stonith/README.drac3 doc/stonith/README.dracmc
 doc/stonith/README.external doc/stonith/README.ibmrsa
 doc/stonith/README.ibmrsa-telnet doc/stonith/README.ipmilan
 doc/stonith/README.ippower9258 doc/stonith/README.meatware
 doc/stonith/README.rackpdu doc/stonith/README.rcd_serial
 doc/stonith/README.riloe doc/stonith/README.vacm
 doc/stonith/README.vcenter doc/stonith/README.wti_mpc
 doc/stonith/README_kdumpcheck.txt
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + cp -pr logd/logd.cf
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + cp -pr AUTHORS
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + cp -pr COPYING
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + cp -pr ChangeLog
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12
 + exit 0
 
 
 RPM build errors:
 File not found:
 /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd
 make: *** [rpm] Error 1
 $


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

# HG changeset patch
# User Dejan Muhamedagic de...@hello-penguin.com
# Date 1403119156 -7200
#  Wed Jun 18 21:19:16 2014 +0200
# Node ID 6380f4c77ddc62288eeb854f81288c78bdc06251
# Parent  45e21bc9795d70b86ecc3825b91ef6424db178d8
build: update spec files for systemd

diff -r 45e21bc9795d -r 6380f4c77ddc cluster-glue-fedora.spec
--- a/cluster-glue-fedora.spec	Thu Jun 12 12:28:59 2014 +0200
+++ b/cluster-glue-fedora.spec	Wed Jun 18 21:19:16 2014 +0200
@@ -60,6 +60,11 @@ BuildRequires: libuuid-devel
 BuildRequires: e2fsprogs-devel
 %endif
 
+%if %{defined systemd_requires}
+BuildRequires:  systemd
+%{?systemd_requires}
+%endif
+
 %prep
 %setup -q -n cluster-glue
 
@@ -82,6 +87,9 @@ export docdir=%{glue_docdir}
 --with-daemon-user=%{uname} \
 --localstatedir=%{_var} \
 --libdir=%{_libdir} \
+%if %{defined _unitdir}
+--with-systemdsystemunitdir=%{_unitdir} \
+%endif
 --docdir=%{glue_docdir}
 %endif
 
@@ -112,7 +120,11 @@ standards, and an interface to common ST
 %files
 %defattr(-,root,root)
 %dir %{_datadir}/%{name}
+%if %{defined _unitdir}
+%{_unitdir}/logd.service
+%else
 %{_sysconfdir}/init.d/logd
+%endif
 %{_datadir}/%{name}/ha_cf_support.sh
 %{_datadir}/%{name}/openais_conf_support.sh
 %{_datadir}/%{name}/utillib.sh
@@ -174,8 +186,22 @@ getent group %{gname} /dev/null || grou
 getent passwd %{uname} /dev/null || \
 useradd -r -g %{gname} -d %{_var}/lib/heartbeat/cores/hacluster -s /sbin/nologin \
 -c cluster user %{uname}
+%if %{defined _unitdir}
+  %service_add_pre logd.service
+%endif
 exit 0

Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-04-04 Thread Dejan Muhamedagic
Hi Kristoffer,

On Thu, Apr 03, 2014 at 06:03:33PM +0200, Kristoffer Grönlund wrote:
 Hello everyone,
 
 Today, I have two major announcements to make: crmsh is moving to a
 new location, and I'm releasing the next major version of the crm
 shell!

Congratulations for the new release! The crmsh made big strides
forward since I've been away. Great work and many thanks!

Cheers,

Dejan

 == Find us at crmsh.github.io
 
 Since the rest of the High-Availability stack is being developed over
 at Github, we thought it would make things easier to move crmsh over
 there as well. This means we're not only moving the website and issue
 tracker, we're also switching from Mercurial to git.
 
 From this release forward, you will find everything crmsh-related at
 http://crmsh.github.io, and the source code at
 https://github.com/crmsh/crmsh.
 
 Here are the new URLs related to crmsh:
 
 * Website: http://crmsh.github.io/
 
 * Documentation: http://crmsh.github.io/documentation.html
 
 * Source repository: https://github.com/crmsh/crmsh/
 
 * Issue tracker: https://github.com/crmsh/crmsh/issues/
 
 Not everything has moved quite yet, but the source code and web site
 are in place.
 
 == New stable release: crmsh 2.0
 
 Secondly, we are proud to finally release crmsh 2.0! This is the
 version of crmsh I have been developing since I became a maintainer
 last year, and there are a lot of new and improved features in this
 release.
 
 For a more complete list of changes since the previous version, please
 refer to the changelog:
 
 * https://github.com/crmsh/crmsh/blob/2.0.0/ChangeLog
 
 Packages for several popular Linux distributions (updated soon):
 
 http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
 
 Zip archive of the tagged release:
 
 * https://github.com/crmsh/crmsh/archive/2.0.0.zip
 
 Here is a short list of some of the biggest changes and features in
 crmsh 2.0:
 
 * *More stable than ever before!* Many bugs and issues have been
   fixed, with plenty of help from the community. At the same time,
   this is a major release with many new features. Testing and pull
   requests are more than welcome!
 
 * *Cluster management commands.* We've added a couple of new
   sub-levels that help with the installation and management of the
   cluster, as well as maintaining and synchronizing the corosync
   configuration across nodes. There are now commands for starting and
   stopping the cluster services, as well as cluster scripts that
   make the installation and configuration of cluster-controlled
   resources a one-line command.
 
 * *Cleaner CLI syntax.* The parser for the configure syntax of
   crmsh has been rewritten, allowing for cleaner syntax, better
   error detection and improved error messages.
 
 * *Tab completion everywhere.* Now tab completion works not only in
   the interactive mode, but directly from bash. In addition, the
   completion back end has been completely rewritten and many more
   commands now have full completion. It's not quite every single
   command yet, but we're getting there.
 
 * *New and improved configuration.* The new configuration file is
   installed in /etc/crm/crm.conf by default or per user if desired,
   and allows for a much more flexible configuration of crmsh.
 
 * *Cluster health evaluation.* As part of the cluster script
   functionality, there is now a cluster health command which
   analyses and reports on low disk space, problems with network
   configuration, firewall configuration issues and more. The best part
   of the cluster health command is that it can work without a
   configured cluster, providing a checklist of issues to amend before
   setting up a new cluster.
 
 * *And wait, there's more!* There is now not only an extensive
   regression test suite but a growing set of unit tests as well,
   support for many new features in Pacemaker 1.1.11 such as resource
   sets in location constraints, anonymous shadow CIBs makes it easier
   to avoid race conditions in scripts, full syntax highlighting for
   the built-in help, the assist sub-command helps with more advanced
   configurations... the list goes on.
 
 Big thanks to everyone who have helped with bug fixes, comments and
 contributions for this release!
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] Refine jboss RA functions about console logfile rotation

2013-10-07 Thread Dejan Muhamedagic
Hi Kazutomo-san,

On Fri, Oct 04, 2013 at 09:11:20PM +0900, NAKAHIRA Kazutomo wrote:
 Hi, all
 
 I wrote 8 pathces for jboss RA that refine jboss RA functions
 about console logfile rotation.
 (some patches were written for current code refactoring.)
 
 Please see following pull requests and comment me

These are not pull requests, but commits in your repository. You
should create one or more pull requests for the upstream. We're
currently about to make a new release, so unless there are bug
fixes or new features, they will have to wait until after the
release.

Cheers,

Dejan

 if you have a any question.
 
 1. Low: jboss: Refine validate_all_jboss(It checks JAVA_HOME,
 JBOSS_HOME, and JAVA)
 
 https://github.com/knakahira/resource-agents/commit/5024ede722c337f273f30f852369cc386d1369b9
 
 2. Low: jboss: Check JBOSS_BASE_DIR at the validate_all_jboss
 
 https://github.com/knakahira/resource-agents/commit/562ab655398b9bbbf6e0a722d61f01cd45b5409c
 
 3. Low: jboss: Check ROTATELOGS command at the validate_all_jboss
 
 https://github.com/knakahira/resource-agents/commit/90122d3d24020c3caa5e504bf22e68d8364bd0af
 
 4. Low: jboss: Avoid starting JBoss without rotatelogs when
 rotate_console is true
 
 https://github.com/knakahira/resource-agents/commit/988ba56520625fc11f83f40c12afb28fc0655e1f
 
 5. Low: jboss: Monitor rotatelogs process and restart when it is stopped
 
 https://github.com/knakahira/resource-agents/commit/83fe1937360b720115403baf787f39532378247c
 
 6. Low: jboss: Avoid overwriting the existing CONSOLE logfile when
 rotate_console was changed
 
 https://github.com/knakahira/resource-agents/commit/bf3b3075bda37aa068cce206f8de665b89d3866c
 
 7. Low: jboss: Change test command operator = to -eq at numerical
 comparison
 
 https://github.com/knakahira/resource-agents/commit/534b4bc299232e5feb8f133a247e8334402f92e5
 
 8. Low: jboss: Avoid starting JBoss if $JBOSS_USER can not write CONSOLE
 logfile that created by rotatelogs command
 
 https://github.com/knakahira/resource-agents/commit/62588ae50408f6897452dedbeb0b1074a5ca4c26
 
 Best regards,
 
 -- 
 NAKAHIRA Kazutomo
 Open Source Business Unit
 NTT DATA INTELLILINK Corporation
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] announcement: glue release candidate 1.0.12

2013-10-04 Thread Dejan Muhamedagic
Hello,

The current glue repository has been tagged as glue-1.0.12-rc1.
It contains several fixes for stonith agents and hb_report.
Please give it a try.

You can get the glue-1.0.12-rc1 tarball here:

http://hg.linux-ha.org/glue/archive/glue-1.0.12-rc1.tar.bz2

The ChangeLog is available here:

http://hg.linux-ha.org/glue/file/glue-1.0.12-rc1/ChangeLog

A set of rpms is also available at the openSUSE Build Service:

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

The packages at the openSUSE Build Service will not work with
pacemaker versions earlier than v1.1.8 because the LRM bits are
not compiled.

If there are no serious issues the v1.0.12 is released on Oct 11.

Many thanks to all contributors.

Enjoy!

Lars Ellenberg
Dejan Muhamedagic
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] announcement: planning resource-agents release 3.9.6

2013-10-01 Thread Dejan Muhamedagic
Hi Lars,

On Mon, Sep 30, 2013 at 03:47:18PM +0200, Lars Ellenberg wrote:
 On Mon, Sep 30, 2013 at 03:33:35PM +0200, Dejan Muhamedagic wrote:
  Hello,
  
  We released resource-agents v3.9.5 back in February. In the
  meantime there have been quite a few fixes and new features
  pushed to the repository and it is high time for another release.
  
  Lars Ellenberg will run the release this time and do whatever is
  necessary that we have a good set of resource agents.
  Thanks Lars!
 
 Is that so ;-)
 I thought I'd only try to poke all contributers,
 authors and maintainers, as well as the community,
 to either raise issues now,
 or don't get to complain about it later ;)

Yes, that's about it. Only that!

Cheers,

Dejan

  Two milestones were created at github.com today and this is the
  tentative schedule:
  
  3.9.6-rc1: October 9.
  3.9.6: October 16.
  
  If there's anything you think should be part of the release
  please open an issue, a pull request, or a bugzilla, as you see
  fit.
  
  If there's anything that hasn't received due attention, please
  let us know.
  
  Finally, if you can help with resolving issues consider yourself
  invited to do so.
 
 
 Thanks,
 
 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 ___
 Linux-HA mailing list
 linux...@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] announcement: planning resource-agents release 3.9.6

2013-09-30 Thread Dejan Muhamedagic
Hello,

We released resource-agents v3.9.5 back in February. In the
meantime there have been quite a few fixes and new features
pushed to the repository and it is high time for another release.

Lars Ellenberg will run the release this time and do whatever is
necessary that we have a good set of resource agents.
Thanks Lars!

Two milestones were created at github.com today and this is the
tentative schedule:

3.9.6-rc1: October 9.
3.9.6: October 16.

If there's anything you think should be part of the release
please open an issue, a pull request, or a bugzilla, as you see
fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so.

Cheers,

The resource-agents crowd
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] crmsh 1.2.6 is released!

2013-09-26 Thread Dejan Muhamedagic
= CRM shell v1.2.6 Released =

Hello everone,

We are happy to announce the release of crmsh 1.2.6. Many thanks to everyone 
who contributed to this release! Version 1.2.6 has improved performance, 
several new features and many bug fixes and other improvements. Please refer to 
the changelog and documentation for more information.

== New since 1.2.6-rc3 ==

* Fixed regression in configuration update/refresh
* Fixed regression in removing cluster properties
* cibconf: fix rsc_template referencing (savannah#40011)
* rsctest: add support for STONITH resources

== New features in 1.2.6 ==

* Support for containers (nagios)
* Support for RA tracing
* Switch from minidom to [http://lxml.de/ lxml]
* Many performance improvements
* Element editing improvements
* History feature improvements

For a full list of changes since the previous version, please take a look at 
the changelog:
 
* http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.6/ChangeLog
 
More information on where to download, how to install and how to contribute to 
the crmsh project can be found on the project website at 
[http://crmsh.nongnu.org crmsh.nongnu.org].
 
== Resources ==

Packages for several popular Linux distributions:
 
* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
 
The archive of the new release:
 
* http://download.savannah.gnu.org/releases/crmsh/crmsh-1.2.6.tar.bz2

GPG signature:

* http://download.savannah.gnu.org/releases/crmsh/crmsh-1.2.6.tar.bz2.sig

To discuss the ongoing development of crmsh, please join the linux-ha mailing 
list at:

* http://lists.linux-ha.org/mailman/listinfo/linux-ha

Bugs can be reported via the mailing list, or one of the project bug trackers:

* https://savannah.nongnu.org/bugs/?group=crmsh

* 
https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2
 
Enjoy!
 
Dejan and Kristoffer
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-29 Thread Dejan Muhamedagic
Hi Lars,

On Thu, Aug 29, 2013 at 10:49:33AM +0200, Lars Marowsky-Bree wrote:
 On 2013-08-28T20:13:43, Dejan Muhamedagic de...@suse.de wrote:
 
  A new RC has been released today. It contains both fixes. It
  doesn't do atomic updates anymore, because cibadmin or something
  cannot stomach comments.
 
 Couldn't find the upstream bug report :-( Can you give me the pacemaker
 bugid, please? Thanks!

The bug's been reported here:


https://bugzilla.novell.com/show_bug.cgi?id=836965

Thanks,

Dejan

 
 Regards,
 Lars
 
 -- 
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-28 Thread Dejan Muhamedagic
On Tue, Aug 27, 2013 at 09:14:31PM +0300, Vladislav Bogdanov wrote:
 27.08.2013 19:11, Dejan Muhamedagic wrote:
  Hi,
  
  On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote:
  23.08.2013 16:48, Kristoffer Grönlund wrote:
  Hi,
 
  On Fri, 23 Aug 2013 16:33:28 +0300
  Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
  No-no, it was before that fix too, at least with 19a3f1e5833c.
  Should I still try?
 
 
  Ah, in that case, it has not been fixed.
 
  No need to try. I will investigate further.
 
  I verified that crm_diff produces correct xml diff if I change just one
  property, so problem should really be in crmsh.
  
  Yes, just found where it is. The fix will be pushed tomorrow.
 
 Yeees!
 Thank you for info.

A new RC has been released today. It contains both fixes. It
doesn't do atomic updates anymore, because cibadmin or something
cannot stomach comments. The updates for several distributions
are, as usual, available here:

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

And many thanks for testing.

Cheers,

Dejan

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Dejan Muhamedagic
Hi,

On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote:
 23.08.2013 16:48, Kristoffer Grönlund wrote:
  Hi,
  
  On Fri, 23 Aug 2013 16:33:28 +0300
  Vladislav Bogdanov bub...@hoster-ok.com wrote:
  
  No-no, it was before that fix too, at least with 19a3f1e5833c.
  Should I still try?
 
  
  Ah, in that case, it has not been fixed.
  
  No need to try. I will investigate further.
 
 I verified that crm_diff produces correct xml diff if I change just one
 property, so problem should really be in crmsh.

Yes, just found where it is. The fix will be pushed tomorrow.

Cheers,

Dejan

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Dejan Muhamedagic
Hi all,

On Thu, Aug 22, 2013 at 12:57:20PM +0200, Kristoffer Grönlund wrote:
 Hi Takatoshi-san,
 
 On Wed, 21 Aug 2013 13:56:34 +0900
 Takatoshi MATSUO matsuo@gmail.com wrote:
 
  Hi Kristoffer
  
  I reproduced the error with latest changest(b5ffd99e).
 
 Thank you, with your description I was able to reproduce and create a
 test case for the problem. I have pushed a workaround for the issue in
 the crm shell which stops the crm shell from adding comments to the
 CIB. (changeset e35236439b8e)
 
 However, it may be that this is a problem that ought to be fixed in
 Pacemaker, so I have not created a new release candidate containing the
 workaround. I will try to investigate this possibility before doing so.

This is an issue with cibadmin. Before that gets fixed, we'll
have to disable the cibadmin -P commit method and keep the old
one. At least I don't see any other sensible alternative.

Cheers,

Dejan

 Thank you,
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-08 Thread Dejan Muhamedagic
Hi Takatoshi-san,

On Thu, Aug 08, 2013 at 11:26:54AM +0900, Takatoshi MATSUO wrote:
 Hi Dejan
 
 I caught this error with 1.2.6-rc1 when loading configuration file.
 ---
 crm configure load update config.crm
 ERROR: elements cib-bootstrap-options already exist
 ---
 
 config.crm
 ---
 property \
 no-quorum-policy=ignore \
 stonith-enabled=false
 
 rsc_defaults \
 resource-stickiness=INFINITY \
 migration-threshold=1
 ---
 
 I use
  - RHEL6
  - Pacemaker 83fc351  (latest)

Should be fixed now. Thanks for reporting.

Cheers,

Dejan

 Thanks,
 Takatoshi MATSUO
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] crmsh release candidate 1.2.6-rc1

2013-08-07 Thread Dejan Muhamedagic
Hello,

The first release candidate for CRM shell v1.2.6 has been
released.

This is also the first release in which Kristoffer Grönlund takes
the responsibility of co-maintainership. Welcome Kristoffer!

The highlights of the release:

* Atomic CIB updates (via cibadmin -P, only with pacemaker = 1.1.10)
* Support for containers (nagios)
* Support for RA tracing
* Many performance improvements
  (including a switch from minidom to lxml)
* Element editing improvements
* General history feature improvements

For the full set of changes, take a look at the changelog:

http://hg.savannah.gnu.org/hgweb/crmsh/file/1.2.6-rc1/ChangeLog

We will allow for a period of testing and bug fixing before the
actual release of v1.2.6.

The final release of CRM shell v1.2.6 is expected to be available
in a few weeks.

Note about Pacemaker versions

CRM shell 1.2.6 supports all Pacemaker 1.1 versions including the
latest v1.1.10.

Installing with pacemaker versions = v1.1.7

Installing the CRM shell along with Pacemaker 1.1 versions =
v1.1.7 is possible, but it will result in file conflicts. You
need to enforce file overwriting when installing the crmsh
package.

Note that pacemaker up to v1.1.7 includes an older version of the CRM
shell, and these versions are quite outdated. There are several
interesting new features, including history, not found in these
bundled versions of the shell.

Support and bug reporting

Please report any bugs found in this release in one of the bug
trackers below, or send a message to the linux-ha mailing list. The
mailing list can also be used for other questions related to the CRM
shell, as well as the IRC channel #linux-ha on irc.freenode.net.

https://savannah.nongnu.org/bugs/?group=crmsh
https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Resources

Packages for several popular Linux distributions:

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

The archive of the release candidate:

http://hg.savannah.gnu.org/hgweb/crmsh/archive/1.2.6-rc1.tar.bz2

The man page:

http://crmsh.nongnu.org/crm.8.html

The CRM shell project web page at GNU savannah:

https://savannah.nongnu.org/projects/crmsh/

Support and bug reporting: 

http://lists.linux-ha.org/mailman/listinfo/linux-ha
https://savannah.nongnu.org/bugs/?group=crmsh
https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2

The sources repository is available at:

http://hg.savannah.gnu.org/hgweb/crmsh

Enjoy!

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch/rfc: Add multiple IP support for eDir88 RA

2013-07-15 Thread Dejan Muhamedagic
On Mon, Jul 15, 2013 at 01:40:47PM +0200, Dejan Muhamedagic wrote:
 Hi Sami,
 
  On Fri, Jun 28, 2013 at 03:40:02PM +0300, Sami Kähkönen wrote:
  Hi,
  
  Here is a small patch to enable multiple IP support for eDir88 RA. To
  summarize, eDirectory supports multiple IP numbers in config file separated
  by comma.
  Example line in nds.conf:
n4u.server.interfaces=168.0.0.1@524,10.0.0.1@524
  
  Current resource agent is unable to cope with such configurations.
  
  This patch creates an array of IP:port configurations and checks them
  individually. Tested in SLES 11 SP2 HA environment with one and multiple
  IP's. All comments and additional testing are welcome.
  
  Patch also in github: https://github.com/skahkonen/resource-agents
 
 Sorry for the delay and many thanks for the patch. It looks good
 to me. If you can, it would be good to make a pull request at
 github.

I did that myself and put together the indentation patch. See

https://github.com/ClusterLabs/resource-agents/pull/283

Cheers,

Dejan

 Cheers,
 
 Dejan
 
 
  Regards,
Sami Kähkönen
  
  --
  @@ -238,14 +238,34 @@ eDir_status() {  ** ocf_log err Cannot
  retrieve interfaces from $NDSCONF. eDirectory may not be correctly
  configured.
**  exit $OCF_ERR_GENERIC   **  fi   **-
  NDSD_SOCKS=$(netstat -ntlp | grep -ce $IFACE.*ndsd)
  ** ** -if [ $NDSD_SOCKS -eq 1 ] ; then   **+# In case of
  multiple IP's split into an array
  ** +# and check all of them   **+IFS=', ' read -a IFACE2  
  $IFACE
  ** +ocf_log debug Found ${#IFACE2[@]} interfaces from $NDSCONF.   **+
  ** +counter=${#IFACE2[@]}   ** +   ** +for IFACE in ${IFACE2[@]}   *
  * +do   ** +ocf_log debug Checking ndsd instance for $IFACE
  ** +NDSD_SOCKS=$(netstat -ntlp | grep -ce $IFACE.*ndsd)   ** +   *
  * +if [ $NDSD_SOCKS -eq 1 ] ; then   ** +
  let counter=counter-1   ** +
  ocf_log debug Found ndsd instance for $IFACE   ** +
   elif [ $NDSD_SOCKS -gt 1 ] ; then   ** +
  ocf_log err More than 1 ndsd listening socket matched. Likely
  misconfiguration of eDirectory.
** +   exit $OCF_ERR_GENERIC   ** +fi   ** +done   **+
  ** +if [ $counter -eq 0 ] ; then   ** # Correct ndsd
  instance is definitely running
  ** -# Further checks are superfluous (I think...)   **-
  return 0
  ** -elif [ $NDSD_SOCKS -gt 1 ] ; then   **-ocf_log err
  More than 1 ndsd listening socket matched. Likely misconfiguration of
  eDirectory.
** +ocf_log debug All ndsd instances found.   **+
  return 0;
  ** +elif [ $counter -lt ${#IFACE2[@]} ]; then   **+ocf_log
  err Only some ndsd listening sockets matched, something is very
  wrong.
**  exit $OCF_ERR_GENERIC   **  fi   **@@ -270,7
  +290,7 @@ eDir_status() {
  **  exit $OCF_ERR_GENERIC   **  fi   ** ** -
  # Instance is not running, but no other error detected.   ** +
  ocf_log debug ndsd instance is not running, but no other error detected.
  **  return 1   **  }
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Patch]Exit code of reset of external/libvirt is wrong.

2013-07-05 Thread Dejan Muhamedagic
Hi Hideo-san,

On Fri, Jul 05, 2013 at 08:51:14AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,
 
 The exit code of reset of external/libvirt is wrong.

Indeed. Quite sloppy the latest change, my apologies.

 I attached a patch.

Many thanks for the patch. Applied (slightly modified).
I wonder if we should also ignore the outcome of libvirt_start.
What are the chances that it fails?

Cheers,

Dejan

 Best Regards,
 Hideo Yamauchi.


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for crmsh.spec

2013-07-02 Thread Dejan Muhamedagic
Hi Yusuke-san,

On Tue, Jul 02, 2013 at 01:08:18PM +0900, yusuke iida wrote:
 Hi, Dejan
 
 Could you incorporate this patch?

Oh, somehow missed this one. Sorry about that.

Yes, I'll apply this patch.

Anyway, why is it so difficult to install pssh? Or did I already
forget you reasoning?

Cheers,

Dejan

 I want some messages.
 
 Regards,
 Yusuke
 
 2013/6/3 yusuke iida yusk.i...@gmail.com:
  Hi, Dejan
 
  2013/4/18 Dejan Muhamedagic de...@suse.de:
  Hi Yusuke-san,
 
  On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote:
  Hi, Dejan
 
  2013/4/4 Dejan Muhamedagic de...@suse.de:
   Hi Yusuke,
  
   On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote:
   Hi, Dejan
  
   I also tested by rhel6.3 and fedora17.
   Since there is no environment, centos is not tested.
  
   The point worried below is shown:
   - I think that %{?fedora_version} and %{?rhel_version} are macro not 
   to exist.
  
   Those macros work in OBS when rhel6 packages are built. I wonder
   if that's some build service extension.
 
  In my environment, the macro of rpmbuild is as follows.
 
  rhel6.3
  # rpmbuild --showrc | grep rhel
  -14: rhel   6
 
  fedora18
  # rpmbuild --showrc | grep fedora
  -14: fedora 18
 
  So I want you to revise it as follows at least.
 
  # hg diff
  diff -r da93d3523e6a crmsh.spec
  --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100
  +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900
  @@ -6,7 +6,7 @@
   %global upstream_version tip
   %global upstream_prefix crmsh
 
  -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
  +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
  || 0%{?rhel} || 0%{?fedora}
   %define pkg_group System Environment/Daemons
   %else
   %define pkg_group Productivity/Clustering/HA
 
  Patch applied. Thanks!
 
   - pssh is not provided in rhel.
 I think that you should not put it in Requires.
  
   OK, but currently the only RPM built is the one in OBS where the
   repository includes pssh RPMs for rhel/centos too. See for
   instance:
  
   http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/
  
   I made a patch to solve the above.
  
   Note that the .spec file in the upstream may not be perfect or
   even work on particular distribution. However, it should advise
   packagers on what it should contain. The pssh requirement is
   there because history would not work well without it. It is
   further rather unfortunate that that feature is used very seldom
   and that it got so little attention.
  
   Therefore, I'm reluctant to apply the pssh part of the patch.
 
  hmm ...
  For example, can't it change so that the function in which pssh is
  used may be disabled by the configure option?
 
  The functionality is still there, even without pssh. For
  instance, static reports can also be examined. It's just that the
  live updates are going to be quite a bit slower, if somebody
  wants to use the history feature to examine changes happening in
  the cluster.
  I made sure the source code.
 
  In the environment where pssh is not supported, history was collecting
  information using the crm_report command.
  Furthermore, the processing which is using pssh was found. It is rsctest.
 
  This processing serves as an error of python in the environment where
  pssh is not supported.
 
  Probing resources .Traceback (most recent call last):
File /usr/sbin/crm, line 44, in module
  main.run()
File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 413, in run
  do_work()
File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 323, in 
  do_work
  if parse_line(levels,shlex.split(' '.join(l))):
File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149,
  in parse_line
  rv = d() # execute the command
File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 148, in 
  lambda
  d = lambda: cmd[0](*args)
File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 1945, in 
  rsc_test
  return test_resources(rsc_l, node_l, all_nodes)
File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line
  300, in test_resources
  if not are_all_stopped(rsc_l, all_nodes_l):
File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line
  250, in are_all_stopped
  drv.runop(probe)
File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line 143, in 
  runop
  from crm_pssh import do_pssh_cmd
File /usr/lib64/python2.6/site-packages/crmsh/crm_pssh.py, line
  24, in module
  from psshlib import psshutil
  ImportError: No module named psshlib
 
  Since I thought that this was a problem, I added the processing which
  checks support of pssh.
 
  If there is no problem, I want you to apply this patch.
 
  Regards,
  Yusuke
 
  If it is possible, can it not exclude pssh from Requires?
 
  I already reasoned in my previous message (quoted above) why I'm
  reluctant to do that.
 
  Cheers,
 
  Dejan
 
  Regards

Re: [Linux-ha-dev] crmsh issues FutureWarning of python

2013-06-27 Thread Dejan Muhamedagic
Hi Takatoshi-san,

On Thu, Jun 27, 2013 at 02:43:15PM +0900, Takatoshi MATSUO wrote:
 Hi Dejan
 
 Thanks.
 But previous commit c8e7cb61 or 7c9df6e81 (Jun 21) cause this error
 when using comment between configuration.

Oops. Good catch. Fixed now. Thanks for reporting!

Cheers,

Dejan

 config.crm
 
 property \
  no-quorum-policy=ignore \
  stonith-enabled=false \
  startup-fencing=false \
  stonith-timeout=20s
 
 # comment
 primitive dummy ocf:heartbeat:Dummy
 --
 
 # crm configure load update config.crm
 Traceback (most recent call last):
   File /usr/sbin/crm, line 44, in module
 main.run()
   File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 414, in run
 do_work()
   File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 323, in 
 do_work
 if parse_line(levels, shlex.split(' '.join(l))):
   File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149,
 in parse_line
 rv = d() # execute the command
   File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 148, in 
 lambda
 d = lambda: cmd[0](*args)
   File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 1694, in load
 return set_obj.import_file(method, url)
   File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line
 268, in import_file
 return self.save(s, method == update)
   File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line
 455, in save
 if self.process(cli_list, update) == False:
   File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line
 409, in process
 obj = cib_factory.create_from_cli(cli_list)
   File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line
 2615, in create_from_cli
 node = obj.cli2node(cli_list)
   File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line
 918, in cli2node
 stuff_comments(node, comments)
   File /usr/lib64/python2.6/site-packages/crmsh/xmlutil.py, line
 506, in stuff_comments
 add_comment(node, s)
   File /usr/lib64/python2.6/site-packages/crmsh/xmlutil.py, line
 503, in add_comment
 e.insert(firstelem, comm_elem)
   File lxml.etree.pyx, line 723, in lxml.etree._Element.insert
 (src/lxml/lxml.etree.c:32132)
 TypeError: 'NoneType' object cannot be interpreted as an index
 
 
 
 Commit ff28b19b doesn't issue this error.
 
 Regards,
 Takatoshi MATSUO
 
 2013/6/26 Dejan Muhamedagic de...@suse.de:
  Hi Takatoshi-san,
 
  On Wed, Jun 26, 2013 at 11:48:00AM +0900, Takatoshi MATSUO wrote:
  Hi Dejan
 
  I received another FutureWarning of python.
 
  /usr/lib64/python2.6/site-packages/crmsh/completion.py:88:
  FutureWarning: The behavior of this
  method will change in future versions. Use specific 'len(elem)' or
  'elem is not None' test instead.
if not doc
 
  Fixed now. Thanks!
 
  Dejan
 
  Regards,
  Takatoshi MATSUO
 
  2013/6/21 Takatoshi MATSUO matsuo@gmail.com:
   Hi Dejan
  
   Thank you for your quick response.
   I can inhibit warnings.
  
   Regards,
   Takatoshi MATSUO
  
  
   2013/6/21 Dejan Muhamedagic de...@suse.de:
   Hi Takatoshi-san,
  
   On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote:
   Hi Dejan
  
   I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python
   when using comment(#).
  
   config.crm file
   
   # Comment
   property \
   no-quorum-policy=ignore \
   stonith-enabled=false \
   startup-fencing=false \
   stonith-timeout=20s
   
  
   ---
   # crm configure load update config.crm
   /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917:
   FutureWarning: The behavior of this method will change in future
   versions. Use specific 'len(elem)' or 'elem is not None' test instead.
 if comments and node:
  
   Fixed now. And some more. Hope there won't be any more in Future.
  
   Cheers,
  
   Dejan
  
   ---
  
   I use python 2.7.3 on Fedora 18.
   Python 2.6.6 issues same warning on RHEL6.
  
   Thanks,
   Takatoshi MATSUO
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org

Re: [Linux-ha-dev] crmsh issues FutureWarning of python

2013-06-26 Thread Dejan Muhamedagic
Hi Takatoshi-san,

On Wed, Jun 26, 2013 at 11:48:00AM +0900, Takatoshi MATSUO wrote:
 Hi Dejan
 
 I received another FutureWarning of python.
 
 /usr/lib64/python2.6/site-packages/crmsh/completion.py:88:
 FutureWarning: The behavior of this
 method will change in future versions. Use specific 'len(elem)' or
 'elem is not None' test instead.
   if not doc

Fixed now. Thanks!

Dejan

 Regards,
 Takatoshi MATSUO
 
 2013/6/21 Takatoshi MATSUO matsuo@gmail.com:
  Hi Dejan
 
  Thank you for your quick response.
  I can inhibit warnings.
 
  Regards,
  Takatoshi MATSUO
 
 
  2013/6/21 Dejan Muhamedagic de...@suse.de:
  Hi Takatoshi-san,
 
  On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote:
  Hi Dejan
 
  I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python
  when using comment(#).
 
  config.crm file
  
  # Comment
  property \
  no-quorum-policy=ignore \
  stonith-enabled=false \
  startup-fencing=false \
  stonith-timeout=20s
  
 
  ---
  # crm configure load update config.crm
  /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917:
  FutureWarning: The behavior of this method will change in future
  versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if comments and node:
 
  Fixed now. And some more. Hope there won't be any more in Future.
 
  Cheers,
 
  Dejan
 
  ---
 
  I use python 2.7.3 on Fedora 18.
  Python 2.6.6 issues same warning on RHEL6.
 
  Thanks,
  Takatoshi MATSUO
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh issues FutureWarning of python

2013-06-21 Thread Dejan Muhamedagic
Hi Takatoshi-san,

On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote:
 Hi Dejan
 
 I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python
 when using comment(#).
 
 config.crm file
 
 # Comment
 property \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 startup-fencing=false \
 stonith-timeout=20s
 
 
 ---
 # crm configure load update config.crm
 /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917:
 FutureWarning: The behavior of this method will change in future
 versions. Use specific 'len(elem)' or 'elem is not None' test instead.
   if comments and node:

Fixed now. And some more. Hope there won't be any more in Future.

Cheers,

Dejan

 ---
 
 I use python 2.7.3 on Fedora 18.
 Python 2.6.6 issues same warning on RHEL6.
 
 Thanks,
 Takatoshi MATSUO
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-05-29 Thread Dejan Muhamedagic
On Tue, Apr 23, 2013 at 04:44:19PM +0200, Dejan Muhamedagic wrote:
 Hi Junko-san,
 
 Can you try the attached patch, instead of this one?

Any news? Was the patch any good?

Cheers,

Dejan

 Cheers,
 
 Dejan
 
 On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote:
  Hi,
  I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
  Corosync 2.3.0.
  
  [root@GUEST04 ~]# crm_mon -1
  Last updated: Wed Apr 10 15:12:48 2013
  Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
  Stack: corosync
  Current DC: GUEST04 (3232242817) - partition with quorum
  Version: 1.1.9-e8caee8
  2 Nodes configured, unknown expected votes
  1 Resources configured.
  
  
  Online: [ GUEST03 GUEST04 ]
  
   dummy  (ocf::pacemaker:Dummy): Started GUEST03
  
  
  for example, call crm shell with lower-case hostname.
  
  [root@GUEST04 ~]# crm node standby guest03
  ERROR: bad lifetime: guest03
  
  crm node standby GUEST03 surely works well,
  so crm shell just doesn't take into account the hostname conversion.
  It's better to accept the both of the upper/lower-case.
  
  node standby, node delete, resource migrate(move)  get hit with this
  issue.
  Please see the attached.
  
  Thanks,
  Junko
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 

 # HG changeset patch
 # User Dejan Muhamedagic de...@hello-penguin.com
 # Date 1366728211 -7200
 # Node ID cd4d36b347c17b06b76f3386c041947a03c708bb
 # Parent  4a47465b1fe1f48123080b4336f0b4516d9264f6
 Medium: node: ignore case when looking up nodes (thanks to Junko Ikeda)
 
 diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/ui.py.in
 --- a/modules/ui.py.inTue Apr 23 11:23:10 2013 +0200
 +++ b/modules/ui.py.inTue Apr 23 16:43:31 2013 +0200
 @@ -924,7 +924,7 @@ class RscMgmt(UserInterface):
  lifetime = None
  opt_l = fetch_opts(argl, [force])
  if len(argl) == 1:
 -if not argl[0] in listnodes():
 +if not is_node(argl[0]):
  lifetime = argl[0]
  else:
  node = argl[0]
 @@ -1186,7 +1186,7 @@ class NodeMgmt(UserInterface):
  if not args:
  node = vars.this_node
  if len(args) == 1:
 -if not args[0] in listnodes():
 +if not is_node(args[0]):
  node = vars.this_node
  lifetime = args[0]
  else:
 @@ -1249,7 +1249,7 @@ class NodeMgmt(UserInterface):
  'usage: delete node'
  if not is_name_sane(node):
  return False
 -if not node in listnodes():
 +if not is_node(node):
  common_err(node %s not found in the CIB % node)
  return False
  rc = True
 diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/xmlutil.py
 --- a/modules/xmlutil.py  Tue Apr 23 11:23:10 2013 +0200
 +++ b/modules/xmlutil.py  Tue Apr 23 16:43:31 2013 +0200
 @@ -159,6 +159,15 @@ def mk_rsc_type(n):
  if ra_provider:
  s2 = %s:%ra_provider
  return ''.join((s1,s2,ra_type))
 +def is_node(s):
 +'''
 +Check if s is in a list of our nodes (ignore case).
 +This is not fast, perhaps should be cached.
 +'''
 +for n in listnodes():
 +if n.lower() == s.lower():
 +return True
 +return False
  def listnodes():
  nodes_elem = cibdump2elem(nodes)
  if nodes_elem is None:

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] cluster-glue compilation problem

2013-05-29 Thread Dejan Muhamedagic
Hi,

On Mon, Apr 29, 2013 at 02:24:12PM +0200, khaled atteya wrote:
 Hi
 I tried to install glue pkg from source code , though i install net-snmp
 but when i run ./configure ,this message appear checking for
 ucd-snmp/snmp.h... no ,though /usr/include/ucd-snmp/snmp.h is existing ,
 what is the problem?

Sorry for the delay.

There seems to be an issue with ucd-snmp, i.e. some macro is
supposed to be defined. However, the alternative net-snmp should
be OK and both stonith plugins which need snmp support can use
either. Do you need ucd-snmp specifically for some reason?

 This problem also appear in other libraries.

Which other libraries do you refer to?

Thanks,

Dejan

 Thanks
 
 -- 
 KHALED MOHAMMED ATTEYA
 System Engineer

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes

2013-05-29 Thread Dejan Muhamedagic
On Mon, Apr 08, 2013 at 02:33:27PM +0100, Tim Small wrote:
 On 13/03/13 18:01, Dejan Muhamedagic wrote:
 
  If you could split the
  patch we can consider them on a one-by-one basis.
 
 I used Debian's start-stop-daemon utility in my modified script, and
 it looks like Redhat etc. doesn't package it (yet): 
 http://fedoraproject.org/wiki/Features/start-stop-daemon 
 
 ... the comments in that page express why I chose to use
 start-stop-daemon - reworking the script to have the same level of
 functionality as the start-stop-daemon version (but just using lsb
 stuff) would be a bit awkward + time-consuming.
 
 How about I use start-stop-daemon where available, and the LSB functions
 when not?  This would still represent an improvement on the current
 behaviour of the script - which is pretty broken - e.g. stopping an
 already-stopped resource fails, and stuff like this:
 
 
 #
 #   This is always active, because it doesn't do much
 #
 sendarp_monitor() {
 return $OCF_SUCCESS
 }
 
 
 
 and this:
 
 
 
 sendarp_status() {
 if
 [ -f $SENDARPPIDFILE ]
 then
 return $OCF_SUCCESS
 else
 return $OCF_NOT_RUNNING
 fi
 }
 
 
 A pid file is there, so it must be running!

The fix for the resource agent itself is already in the
repository. It is based on the standard ha_pseudo_* functions
like in any other pseudo agents (i.e. those that don't have long
running processes).

  Otherwise, I found some patch in my local queue, which never got pushed for 
  some reason. Don't know if that would help (attached).

 
 I'll have a go with them, and check to see if they fix the bug which I
 was seeing.

Did you get a chance to verify the two patches attached? There's
now also a pull request for the socket leaks issue at github.com:

https://github.com/ClusterLabs/resource-agents/pull/247

Cheers,

Dejan

 Tim.
 
 
 -- 
 South East Open Source Solutions Limited
 Registered in England and Wales with company number 06134732.  
 Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
 VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak

2013-05-07 Thread Dejan Muhamedagic
Hi,

On Tue, May 07, 2013 at 05:22:24PM +0200, Lars Ellenberg wrote:
 On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote:
  Hi All,
  
  I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)
  
  crmd caused a memory leak. And, the memory leak happens in 3 place.
  I could fix 1 place. So, I attached a patch.
  
  However,  the rest couldn't be not easy to solve. The issues is that
  stonith API can't call DelPILPluginUnive function in pils.c.  I think
  that we need to call DelPILPluginUnive function to completely relese a
  memory which stonith_new function got.
 
 Is it just that there is this few bytes that are allocated once,
 and never freed, or is this a real memleak,
 that is accumulating more and more bytes during process lifetime?
 
 I suspect the former.
 In which case I doubt it is even worthwhile to try and fix it.

Agreed. Though the first leak is not related to PILS.

 Why?
 because, in that case we basically have:
 main()
 {
   global_variable = malloc(something);
   endless_loop_that_is_not_expected_to_ever_return();
   /* so, ok, we could free(global_variable) here.
* but why bother?  */
   exit(1);
 }
 
 In that pseudo code above, it is easy to fix.
 In the (over-abstracted) case of PILs, I'm afraid, it's not that easy.
 And appart from academic correctness,
 there is no gain from fixing this for the real world.
 
  -=-
 
 If however we have a *real* memleak, that has to be fixed, of course.

The first one, for which the patch is provided, could be a real
memory leak. I'll apply the patch. Many thanks!

Cheers,

Dejan

   Lars
 
  I show Valgrind. This is that I can fixed a memory leak.
  
  ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
  ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
  ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
  ==3484==by 0xA2C2365: external_run_cmd (external.c:767)
  ==3484==by 0xA2C1AC8: external_getinfo (external.c:598)
  ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327)
  ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
  ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
  ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
  ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
  ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
  ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
  ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
  ==3484==by 0x420670: build_operation_update (lrm.c:672)
  ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
  ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
  ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
  ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
  ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
  ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
  ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
  ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
  ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
  ==3484==by 0x4055E7: crmd_init (main.c:154)
  ==3484==by 0x405419: main (main.c:120)
  
  I show the rest.
  
  ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
  ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
  ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
  ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
  ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
  ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
  ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487)
  ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75)
  ==3484==by 0x9EB90EC: stonith_new (stonith.c:105)
  ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434)
  ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
  ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
  ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
  ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
  ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
  ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
  ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
  ==3484==by 0x420670: build_operation_update (lrm.c:672)
  ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
  ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
  ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
  ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
  ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
  ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
  ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
  ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
  
  ==3484== 13 

Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-04-23 Thread Dejan Muhamedagic
Hi Lars,

On Tue, Apr 23, 2013 at 03:37:30PM +0200, Lars Ellenberg wrote:
 On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote:
  Hi,
  I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
  Corosync 2.3.0.
  
  [root@GUEST04 ~]# crm_mon -1
  Last updated: Wed Apr 10 15:12:48 2013
  Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
  Stack: corosync
  Current DC: GUEST04 (3232242817) - partition with quorum
  Version: 1.1.9-e8caee8
  2 Nodes configured, unknown expected votes
  1 Resources configured.
  
  
  Online: [ GUEST03 GUEST04 ]
  
   dummy  (ocf::pacemaker:Dummy): Started GUEST03
  
  
  for example, call crm shell with lower-case hostname.
  
  [root@GUEST04 ~]# crm node standby guest03
  ERROR: bad lifetime: guest03
  
  crm node standby GUEST03 surely works well,
  so crm shell just doesn't take into account the hostname conversion.
  It's better to accept the both of the upper/lower-case.
  
  node standby, node delete, resource migrate(move)  get hit with this
  issue.
  Please see the attached.
  
  Thanks,
  Junko
 
 Sorry for the late reaction.
 
  diff -r da93d3523e6a modules/ui.py.in
  --- a/modules/ui.py.in  Tue Mar 26 11:44:17 2013 +0100
  +++ b/modules/ui.py.in  Mon Apr 08 17:49:00 2013 +0900
  @@ -924,10 +924,14 @@
   lifetime = None
   opt_l = fetch_opts(argl, [force])
   if len(argl) == 1:
  -if not argl[0] in listnodes():
  -lifetime = argl[0]
  -else:
  -node = argl[0]
  +   for i in listnodes():
  +   pattern = re.compile(i, re.IGNORECASE)
  +   if pattern.match(argl[1]) and len(i) == len(argl[1]):
  +   node = argl[1]
 
 
 This is not exactly equivalent.
 
Before, we had a string comparison.
Now we have a regexp match.
 
This may be considered as a new feature.
But it should then be done intentionally.
 
Otherwise, i would need to be quote-metaed first.
In Perl I'd write \Q$i\E, in python we probably have to
insert some '\' into it first.
 
I admit in most setups it would not make any difference,
as there should at most be dots in there .,
and they should be at places where they won't be ambiguous,
especially with the additional len() check.
 
 Maybe rather compare argl[0].lower() with listnodes(), which
 should also return all elements as .lower().

Looks like I forgot about this patch, wanted to take a closer
look before applying, thanks for the analysis. There also seems
to be some code repetion, IIRC.

Cheers,

Dejan

 
   Lars
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-04-23 Thread Dejan Muhamedagic
Hi Junko-san,

Can you try the attached patch, instead of this one?

Cheers,

Dejan

On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote:
 Hi,
 I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
 Corosync 2.3.0.
 
 [root@GUEST04 ~]# crm_mon -1
 Last updated: Wed Apr 10 15:12:48 2013
 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
 Stack: corosync
 Current DC: GUEST04 (3232242817) - partition with quorum
 Version: 1.1.9-e8caee8
 2 Nodes configured, unknown expected votes
 1 Resources configured.
 
 
 Online: [ GUEST03 GUEST04 ]
 
  dummy  (ocf::pacemaker:Dummy): Started GUEST03
 
 
 for example, call crm shell with lower-case hostname.
 
 [root@GUEST04 ~]# crm node standby guest03
 ERROR: bad lifetime: guest03
 
 crm node standby GUEST03 surely works well,
 so crm shell just doesn't take into account the hostname conversion.
 It's better to accept the both of the upper/lower-case.
 
 node standby, node delete, resource migrate(move)  get hit with this
 issue.
 Please see the attached.
 
 Thanks,
 Junko


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

# HG changeset patch
# User Dejan Muhamedagic de...@hello-penguin.com
# Date 1366728211 -7200
# Node ID cd4d36b347c17b06b76f3386c041947a03c708bb
# Parent  4a47465b1fe1f48123080b4336f0b4516d9264f6
Medium: node: ignore case when looking up nodes (thanks to Junko Ikeda)

diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/ui.py.in
--- a/modules/ui.py.in	Tue Apr 23 11:23:10 2013 +0200
+++ b/modules/ui.py.in	Tue Apr 23 16:43:31 2013 +0200
@@ -924,7 +924,7 @@ class RscMgmt(UserInterface):
 lifetime = None
 opt_l = fetch_opts(argl, [force])
 if len(argl) == 1:
-if not argl[0] in listnodes():
+if not is_node(argl[0]):
 lifetime = argl[0]
 else:
 node = argl[0]
@@ -1186,7 +1186,7 @@ class NodeMgmt(UserInterface):
 if not args:
 node = vars.this_node
 if len(args) == 1:
-if not args[0] in listnodes():
+if not is_node(args[0]):
 node = vars.this_node
 lifetime = args[0]
 else:
@@ -1249,7 +1249,7 @@ class NodeMgmt(UserInterface):
 'usage: delete node'
 if not is_name_sane(node):
 return False
-if not node in listnodes():
+if not is_node(node):
 common_err(node %s not found in the CIB % node)
 return False
 rc = True
diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/xmlutil.py
--- a/modules/xmlutil.py	Tue Apr 23 11:23:10 2013 +0200
+++ b/modules/xmlutil.py	Tue Apr 23 16:43:31 2013 +0200
@@ -159,6 +159,15 @@ def mk_rsc_type(n):
 if ra_provider:
 s2 = %s:%ra_provider
 return ''.join((s1,s2,ra_type))
+def is_node(s):
+'''
+Check if s is in a list of our nodes (ignore case).
+This is not fast, perhaps should be cached.
+'''
+for n in listnodes():
+if n.lower() == s.lower():
+return True
+return False
 def listnodes():
 nodes_elem = cibdump2elem(nodes)
 if nodes_elem is None:
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for crmsh.spec

2013-04-22 Thread Dejan Muhamedagic
On Mon, Apr 22, 2013 at 11:27:49AM +0900, yusuke iida wrote:
 Hi, Dejan
 
 Thank you for merging a patch.
 
 However, since there is typo in part, please correct again.

Oops. Sorry for the typo. Fixed now.

Cheers,

Dejan

 Regards,
 Yusuke
 
 2013/4/18 Dejan Muhamedagic de...@suse.de:
  Hi Yusuke-san,
 
  On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote:
  Hi, Dejan
 
  2013/4/4 Dejan Muhamedagic de...@suse.de:
   Hi Yusuke,
  
   On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote:
   Hi, Dejan
  
   I also tested by rhel6.3 and fedora17.
   Since there is no environment, centos is not tested.
  
   The point worried below is shown:
   - I think that %{?fedora_version} and %{?rhel_version} are macro not to 
   exist.
  
   Those macros work in OBS when rhel6 packages are built. I wonder
   if that's some build service extension.
 
  In my environment, the macro of rpmbuild is as follows.
 
  rhel6.3
  # rpmbuild --showrc | grep rhel
  -14: rhel   6
 
  fedora18
  # rpmbuild --showrc | grep fedora
  -14: fedora 18
 
  So I want you to revise it as follows at least.
 
  # hg diff
  diff -r da93d3523e6a crmsh.spec
  --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100
  +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900
  @@ -6,7 +6,7 @@
   %global upstream_version tip
   %global upstream_prefix crmsh
 
  -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
  +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
  || 0%{?rhel} || 0%{?fedora}
   %define pkg_group System Environment/Daemons
   %else
   %define pkg_group Productivity/Clustering/HA
 
  Patch applied. Thanks!
 
   - pssh is not provided in rhel.
 I think that you should not put it in Requires.
  
   OK, but currently the only RPM built is the one in OBS where the
   repository includes pssh RPMs for rhel/centos too. See for
   instance:
  
   http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/
  
   I made a patch to solve the above.
  
   Note that the .spec file in the upstream may not be perfect or
   even work on particular distribution. However, it should advise
   packagers on what it should contain. The pssh requirement is
   there because history would not work well without it. It is
   further rather unfortunate that that feature is used very seldom
   and that it got so little attention.
  
   Therefore, I'm reluctant to apply the pssh part of the patch.
 
  hmm ...
  For example, can't it change so that the function in which pssh is
  used may be disabled by the configure option?
 
  The functionality is still there, even without pssh. For
  instance, static reports can also be examined. It's just that the
  live updates are going to be quite a bit slower, if somebody
  wants to use the history feature to examine changes happening in
  the cluster.
 
  If it is possible, can it not exclude pssh from Requires?
 
  I already reasoned in my previous message (quoted above) why I'm
  reluctant to do that.
 
  Cheers,
 
  Dejan
 
  Regards,
  Yusuke
  
   Cheers,
  
   Dejan
  
   Regards,
   Yusuke
  
   2013/2/19 Dejan Muhamedagic de...@suse.de:
On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote:
On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote:
 Hi,

 On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote:
  Hi, Dejan
 
  I made a patch for spec file to make rpm of crmsh in rhel 
  environment.
  I want a crmsh repository to merge it if I do not have any 
  problem.
   
This is a problem which I ran into earlier too. Something
(probably one of the rpm macros) does a 'rm -rf' of the doc
directory _after_ the files got installed:
   
[   29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   29s]  /usr/bin/install -c -m 644 'AUTHORS' 
'/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS'
[   29s]  /usr/bin/install -c -m 644 'COPYING' 
'/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING'
...
[   30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245
[   30s] + umask 022
[   30s] + cd /usr/src/packages/BUILD
[   30s] + cd crmsh
[   30s] + 
DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + export DOCDIR
[   30s] + rm -rf 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + /bin/mkdir -p 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + cp -pr ChangeLog 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
...
[   32s] error: create archive failed on file 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: 
cpio: open failed - Bad file descriptor
   
If somebody can shed some light or suggest how to deal with
this ...
   
OK. I think I managed to fix it. The result is already

Re: [Linux-ha-dev] A patch for crmsh.spec

2013-04-18 Thread Dejan Muhamedagic
Hi Yusuke-san,

On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote:
 Hi, Dejan
 
 2013/4/4 Dejan Muhamedagic de...@suse.de:
  Hi Yusuke,
 
  On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote:
  Hi, Dejan
 
  I also tested by rhel6.3 and fedora17.
  Since there is no environment, centos is not tested.
 
  The point worried below is shown:
  - I think that %{?fedora_version} and %{?rhel_version} are macro not to 
  exist.
 
  Those macros work in OBS when rhel6 packages are built. I wonder
  if that's some build service extension.
 
 In my environment, the macro of rpmbuild is as follows.
 
 rhel6.3
 # rpmbuild --showrc | grep rhel
 -14: rhel   6
 
 fedora18
 # rpmbuild --showrc | grep fedora
 -14: fedora 18
 
 So I want you to revise it as follows at least.
 
 # hg diff
 diff -r da93d3523e6a crmsh.spec
 --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100
 +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900
 @@ -6,7 +6,7 @@
  %global upstream_version tip
  %global upstream_prefix crmsh
 
 -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
 +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
 || 0%{?rhel} || 0%{?fedora}
  %define pkg_group System Environment/Daemons
  %else
  %define pkg_group Productivity/Clustering/HA

Patch applied. Thanks!

  - pssh is not provided in rhel.
I think that you should not put it in Requires.
 
  OK, but currently the only RPM built is the one in OBS where the
  repository includes pssh RPMs for rhel/centos too. See for
  instance:
 
  http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/
 
  I made a patch to solve the above.
 
  Note that the .spec file in the upstream may not be perfect or
  even work on particular distribution. However, it should advise
  packagers on what it should contain. The pssh requirement is
  there because history would not work well without it. It is
  further rather unfortunate that that feature is used very seldom
  and that it got so little attention.
 
  Therefore, I'm reluctant to apply the pssh part of the patch.
 
 hmm ...
 For example, can't it change so that the function in which pssh is
 used may be disabled by the configure option?

The functionality is still there, even without pssh. For
instance, static reports can also be examined. It's just that the
live updates are going to be quite a bit slower, if somebody
wants to use the history feature to examine changes happening in
the cluster.

 If it is possible, can it not exclude pssh from Requires?

I already reasoned in my previous message (quoted above) why I'm
reluctant to do that.

Cheers,

Dejan

 Regards,
 Yusuke
 
  Cheers,
 
  Dejan
 
  Regards,
  Yusuke
 
  2013/2/19 Dejan Muhamedagic de...@suse.de:
   On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote:
   On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote:
Hi,
   
On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote:
 Hi, Dejan

 I made a patch for spec file to make rpm of crmsh in rhel 
 environment.
 I want a crmsh repository to merge it if I do not have any problem.
  
   This is a problem which I ran into earlier too. Something
   (probably one of the rpm macros) does a 'rm -rf' of the doc
   directory _after_ the files got installed:
  
   [   29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p 
   /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
   [   29s]  /usr/bin/install -c -m 644 'AUTHORS' 
   '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS'
   [   29s]  /usr/bin/install -c -m 644 'COPYING' 
   '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING'
   ...
   [   30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245
   [   30s] + umask 022
   [   30s] + cd /usr/src/packages/BUILD
   [   30s] + cd crmsh
   [   30s] + 
   DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
   [   30s] + export DOCDIR
   [   30s] + rm -rf 
   /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
   [   30s] + /bin/mkdir -p 
   /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
   [   30s] + cp -pr ChangeLog 
   /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
   ...
   [   32s] error: create archive failed on file 
   /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: 
   open failed - Bad file descriptor
  
   If somebody can shed some light or suggest how to deal with
   this ...
  
   OK. I think I managed to fix it. The result is already upstream.
   I tested it with rhel6, centos6, fedora 17 and 18. Can you
   please test too.
  
   Thanks,
  
   Dejan
  
   Thanks,
  
   Dejan
  
  
No problem. Will test the patch. BTW, did you notice that there
are packages for rhel too at OBS (see the latest news item at
https://savannah.nongnu.org/projects/crmsh/).
   
Cheers,
   
Dejan
   

 Best regards,
 Yusuke

Re: [Linux-ha-dev] crm shell: Inconsistent configure for non-existing objects

2013-04-17 Thread Dejan Muhamedagic
Hi,

On Wed, Apr 17, 2013 at 08:39:30AM +0200, Ulrich Windl wrote:
 Hi!
 
 In SLES11 SP2 when I try to display (show) a resource group that does not 
 exist, there is no error message, but when I try to delete a non-existing 
 object, I get an error message. That's inconsistent: Trying to display an 
 object that does not exist should also display an error.
 
 Example:
 crm(live)configure# show grp_v02
 crm(live)configure# delete grp_v02
 ERROR: object grp_v02 does not exist
 crm(live)configure# show grp_v02xy
 crm(live)configure# show grp_v0s
 crm(live)configure# 

There's a somewhat technical explanation why this happens, but
obviously needs to be somehow fixed.

Thanks for reporting.

Cheers,

Dejan

 Regards,
 Ulrich
 
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Proposal for crm shell and show changed

2013-04-17 Thread Dejan Muhamedagic
Hi,

On Mon, Apr 15, 2013 at 10:38:25AM +0200, Ulrich Windl wrote:
 Hi!
 
 When deleting resources using the crm shell, show changed will show nothing:
 
 crm(live)configure# delete prm_v05_npiv_2
 INFO: resource references in location:cli-standby-grp_v05 updated
 INFO: hanging location:cli-standby-grp_v05 deleted
 crm(live)configure# verify
 crm(live)configure# show changed
 crm(live)configure# commit
 
 It would be nice if that could be improved.

The show command shows all or parts of the current configuration.
Removed elements are not any more in the configuration. If we
were to also (somehow) show deleted elements, that would change
the semantics of the show command.*

I suppose that you'd like to be able to see a kind of list of
changes that happened since the previous commit. Perhaps a diff
would fit better? There's right now no such command. You can
perhaps open an enhancement bugzilla.

Cheers,

Dejan

*) I can see now that it should've been named show modified :)

 Regards,
 Ulrich
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-04-11 Thread Dejan Muhamedagic
Hi Junko-san,

On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote:
 Hi,
 I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
 Corosync 2.3.0.
 
 [root@GUEST04 ~]# crm_mon -1
 Last updated: Wed Apr 10 15:12:48 2013
 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
 Stack: corosync
 Current DC: GUEST04 (3232242817) - partition with quorum
 Version: 1.1.9-e8caee8
 2 Nodes configured, unknown expected votes
 1 Resources configured.
 
 
 Online: [ GUEST03 GUEST04 ]
 
  dummy  (ocf::pacemaker:Dummy): Started GUEST03
 
 
 for example, call crm shell with lower-case hostname.
 
 [root@GUEST04 ~]# crm node standby guest03
 ERROR: bad lifetime: guest03

This message looks awkward.

 crm node standby GUEST03 surely works well,
 so crm shell just doesn't take into account the hostname conversion.
 It's better to accept the both of the upper/lower-case.

Yes, indeed.

 node standby, node delete, resource migrate(move)  get hit with this
 issue.
 Please see the attached.

The patch looks correct. Many thanks for the contribution!

Cheers,

Dejan

 Thanks,
 Junko


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] ManageVE prints bogus errors to the syslog

2013-04-11 Thread Dejan Muhamedagic
On Wed, Apr 10, 2013 at 12:23:38AM +0200, Lars Ellenberg wrote:
 On Fri, Apr 05, 2013 at 12:39:46PM +0200, Dejan Muhamedagic wrote:
  Hi Lars,
  
  On Thu, Apr 04, 2013 at 09:28:00PM +0200, Lars Ellenberg wrote:
   On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote:
Hi,

On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
 Hi,
 
 When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
 get a lot of such messages in the syslog:
 
 Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 
 returned: 10002 does not exist.
 Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on 
 opensim for client 2550: pid 2586 exited with return code 7
 
 It looks to me as if lrmd is making sure the CT is not running 
 anymore.
 However, this triggers ManageVE to print an error.

Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
could be improved. The following patch should help:

https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
   
   BTW, why call `vzctl | awk` *twice*,
   just to get two items out of the vzctl output?
   
   how about lose the awk, and the second invokation?
   something like this:
   (should veexists and vestatus be local as well?)
   
   diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
   index 56a3d03..53f9bab 100755
   --- a/heartbeat/ManageVE
   +++ b/heartbeat/ManageVE
   @@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{ 
  declare -i retcode
   -
   -  veexists=`$VZCTL status $VEID 2/dev/null | $AWK '{print $3}'`
   -  vestatus=`$VZCTL status $VEID 2/dev/null | $AWK '{print $5}'`
   +  local vzstatus
   +  vzstatus=`$VZCTL status $VEID 2/dev/null`
  retcode=$?
   +  set -- $vzstatus
   +  veexists=$3
   +  vestatus=$5

  if [[ $retcode != 0 ]]; then
ocf_log err vzctl status $VEID returned: $retcode
  
  Well, you do have commit rights, don't you? :)
 
 Sure, but I don't have a vz handy to test even obviously correct
 patches with, before I commit...

Looked correct to me too, but then it wouldn't have been the
first time I got something wrong :D

Maybe the reporter can help with testing. Roman?

Cheers,

Dejan

 
   Lars
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)

2013-04-03 Thread Dejan Muhamedagic
Hi,

On Thu, Mar 21, 2013 at 02:59:17PM +, Tim Small wrote:
 On 13/03/13 16:18, Dejan Muhamedagic wrote:
  On Tue, Mar 12, 2013 at 12:58:44PM +, Tim Small wrote:

  The attached patch changes the behaviour of the OpenVZ virtual machine
  cluster resource agent, so that:
 
  1. The default resource stop timeout is greater than the hardcoded
  
  Just for the record: where is this hardcoded actually? Is it
  also documented?

 
 Defined here:
 
 http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26
 
 /** Shutdown timeout.
  */
 #define MAX_SHTD_TM 120
 
 
 
 Used by env_stop() here:
 
 http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821
 http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818
 
  
 
for (i = 0; i  MAX_SHTD_TM; i++) {
 sleep(1);
 if (!vps_is_run(h, veid)) {
 ret = 0;
 goto out;
 }
 }
 
 kill_vps:
 logger(0, 0, Killing container ...);
 
 
 
 Perhaps something based on wall time would be more consistent, and I can
 think of cases where users might want it to be a bit higher, or a bit
 lower, but currently it's just fixed at 120s.
 
 
 I can't find the timeout documented anywhere.

That makes it hard to reference in other software products. But
we can anyway increase the advised timeout in the metadata.

  2. The start operation now waits for resource startup to complete i.e.
  for the VE to boot up (so that the cluster manager can detect VEs
  which are hanging on startup, and also throttle simultaneous startups,
  so as not-to overburden the node in question).  Since the start
  operation now does a lot more, the default start operation timeout has
  been increased.
  
  I'm not sure if we can introduce this just like that. It changes
  significantly the agent's behaviour.

 
 Yes.  I think it probably makes the agent's behavour a bit more correct,
 but that depends what your definition of a VE resource having started
 is, I suppose.  Currently with this agent the says that it has started
 as soon as it has begun the boot process, whereas with the proposed
 change, it would mean that it has started when it has booted up (which
 should imply is operational).
 
 Although my personal reason for the change was so that I had a
 reasonable way to avoid booting tens of VEs on the host machine at the
 same time, I can think of other benefits - such as making other
 resources depend on the fully-booted VE, or detecting the case where a
 faulty VE host node causes the VE to hang during start-up.
 
 
 I suppose other options are:
 
 1. Make start --wait the default, but make starting without waiting
 selectable using a RA parameter.
 
 2. Make start without waiting the default, but make --wait selectable
 using a RA parameter.
 
 
 I suppose that the change will break configurations where the
 administrator has hard coded a short timeout, and this change is
 introduced as part of an upgrade, which I suppose is a bad thing...

Yes, it could be so. I think that we should go for option 2.

  BTW, how does vzctl know when the VE is started?

 
 The vzctl manual page says that 'vzctl start --wait' will attempt to
 wait till the default runlevel is reached within the container.

OK. Though that may mean different things depending on which
init system is running.

  If the description above matches
  the code modifications, then there should be three instead of
  one patch.

 
 Fair enough - I was being lazy!

:)

Cheers,

Dejan

 
 Tim.
 
 -- 
 South East Open Source Solutions Limited
 Registered in England and Wales with company number 06134732.  
 Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
 VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
 

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes

2013-04-03 Thread Dejan Muhamedagic
Hello,

Anybody have objections to the patches posted here? If not, I'll
push them upstream.

Cheers,

Dejan

On Wed, Mar 13, 2013 at 07:01:33PM +0100, Dejan Muhamedagic wrote:
 Hi,
 
 On Sat, Mar 09, 2013 at 07:53:34PM +, Tim Small wrote:
  Hi,
  
  I've been using the ocf:heartbeat:SendArp script and notice a couple of
  issues - some problems with starting and monitoring the service, and
  also a file descriptor leak in the binary (which would cause it to
  terminate).
  
  I've detailed the problems and supplied some patches:
  
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701913
 
 Cannot just replace the whole RA. Sorry. If you could split the
 patch we can consider them on a one-by-one basis. Otherwise, I
 found some patch in my local queue, which never got pushed for
 some reason. Don't know if that would help (attached).
 
  and
  
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701914
 
 Can you try the attached send_arp.libnet.c patch. It does first
 packet build then reuses them.
 
 Cheers,
 
 Dejan
 
  ... they're not perfect, but an improvement I think.
  
  HTH,
  
  Tim.
  
  -- 
  South East Open Source Solutions Limited
  Registered in England and Wales with company number 06134732.  
  Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
  VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
  
  
  ___
  Pacemaker mailing list: pacema...@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org

 From 9dae34616ef62b98b762a1f821f9e1ee749e6315 Mon Sep 17 00:00:00 2001
 From: Dejan Muhamedagic de...@suse.de
 Date: Wed, 13 Mar 2013 18:19:10 +0100
 Subject: [PATCH] Medium: tools: send_arp.libnet: reuse ARP packets
 
 ---
  tools/send_arp.libnet.c | 174 
 
  1 file changed, 115 insertions(+), 59 deletions(-)
 
 diff --git a/tools/send_arp.libnet.c b/tools/send_arp.libnet.c
 index 71148bb..2abeecb 100644
 --- a/tools/send_arp.libnet.c
 +++ b/tools/send_arp.libnet.c
 @@ -49,17 +49,18 @@
  
  #ifdef HAVE_LIBNET_1_0_API
  #define  LTYPE   struct libnet_link_int
 + static u_char *mk_packet(u_int32_t ip, u_char *device, u_char *macaddr, 
 u_char *broadcast, u_char *netmask, u_short arptype);
 + static int send_arp(struct libnet_link_int *l, u_char *device, u_char 
 *buf);
  #endif
  #ifdef HAVE_LIBNET_1_1_API
  #define  LTYPE   libnet_t
 + static libnet_t *mk_packet(libnet_t* lntag, u_int32_t ip, u_char 
 *device, u_char macaddr[6], u_char *broadcast, u_char *netmask, u_short 
 arptype);
 + int send_arp(libnet_t* lntag);
  #endif
  
  #define PIDDIR   HA_VARRUNDIR / PACKAGE
  #define PIDFILE_BASE PIDDIR /send_arp-
  
 -static int send_arp(LTYPE* l, u_int32_t ip, u_char *device, u_char mac[6]
 -,u_char *broadcast, u_char *netmask, u_short arptype);
 -
  static char print_usage[]={
  send_arp: sends out custom ARP packet.\n
usage: send_arp [-i repeatinterval-ms] [-r repeatcount] [-p pidfile] \\\n
 @@ -135,7 +136,6 @@ main(int argc, char *argv[])
   char*   netmask;
   u_int32_t   ip;
   u_char  src_mac[6];
 - LTYPE*  l;
   int repeatcount = 1;
   int j;
   longmsinterval = 1000;
 @@ -143,6 +143,13 @@ main(int argc, char *argv[])
   charpidfilenamebuf[64];
   char*pidfilename = NULL;
  
 +#ifdef HAVE_LIBNET_1_0_API
 + LTYPE*  l;
 + u_char *request, *reply;
 +#elif defined(HAVE_LIBNET_1_1_API)
 + LTYPE *request, *reply;
 +#endif
 +
   CL_SIGNAL(SIGTERM, byebye);
   CL_SIGINTERRUPT(SIGTERM, 1);
  
 @@ -201,6 +208,24 @@ main(int argc, char *argv[])
   return EXIT_FAILURE;
   }
  
 + if (!strcasecmp(macaddr, AUTO_MAC_ADDR)) {
 + if (get_hw_addr(device, src_mac)  0) {
 +  cl_log(LOG_ERR, Cannot find mac address for %s, 
 +  device);
 +  unlink(pidfilename);
 +  return EXIT_FAILURE;
 + }
 + }
 + else {
 + convert_macaddr((unsigned char *)macaddr, src_mac);
 + }
 +
 +/*
 + * We need to send both a broadcast ARP request as well as the ARP response 
 we
 + * were already sending.  All the interesting research work for this fix was
 + * done by Masaki Hasegawa masak...@pp.iij4u.or.jp and his colleagues.
 + */
 +
  #if defined(HAVE_LIBNET_1_0_API)
  #ifdef ON_DARWIN
   if ((ip = libnet_name_resolve((unsigned char*)ipaddr, 1)) == -1UL) {
 @@ -219,49 +244,65 @@ main(int argc, char *argv[])
   unlink(pidfilename);
   return EXIT_FAILURE;
   }
 + request = mk_packet(ip, (unsigned char*)device, src_mac
 + , (unsigned char*)broadcast, (unsigned char*)netmask
 + , ARPOP_REQUEST

Re: [Linux-ha-dev] [Pacemaker] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)

2013-03-13 Thread Dejan Muhamedagic
On Tue, Mar 12, 2013 at 12:58:44PM +, Tim Small wrote:
 The attached patch changes the behaviour of the OpenVZ virtual machine
 cluster resource agent, so that:
 
 1. The default resource stop timeout is greater than the hardcoded

Just for the record: where is this hardcoded actually? Is it
also documented?

 timeout in vzctl stop (after this time, vzctl forcibly stops the
 virtual machine) (since failure to stop a resource can lead to the
 cluster node being evicted from the cluster entirely - and this is
 generally a BAD thing).

Agreed.

 2. The start operation now waits for resource startup to complete i.e.
 for the VE to boot up (so that the cluster manager can detect VEs
 which are hanging on startup, and also throttle simultaneous startups,
 so as not-to overburden the node in question).  Since the start
 operation now does a lot more, the default start operation timeout has
 been increased.

I'm not sure if we can introduce this just like that. It changes
significantly the agent's behaviour.

BTW, how does vzctl know when the VE is started?

 3. Backs off the default timeouts and intervals for various operations
 to less aggressive values.

Please make patches which are self-contained, but can be
described in a succinct manner. If the description above matches
the code modifications, then there should be three instead of
one patch.

Please continue the discussion at linux-ha-dev, that's where RA
development discussions take place.

Cheers,

Dejan

 
 Cheers,
 
 Tim.
 
 
 n.b.  There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that
 vzctl start VEID --wait hangs.  The bug doesn't impact the
 OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either).
 
 -- 
 South East Open Source Solutions Limited
 Registered in England and Wales with company number 06134732.  
 Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
 VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
 

 --- ManageVE.old  2010-10-22 05:54:50.0 +
 +++ ManageVE  2013-03-12 11:39:47.895102380 +
 @@ -26,12 +26,15 @@
  #
  #
  # Created  07. Sep 2006
 -# Updated  18. Sep 2006
 +# Updated  12. Mar 2013
  #
 -# rev. 1.00.3
 +# rev. 1.00.4
  #
  # Changelog
  #
 +# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start 
 timeout.
 +#  Default stop timeout to longer than the vzctl stop 
 'polite'
 +#  interval.
  # 12/Sep/06 1.00.3 more cleanup
  # 12/Sep/06 1.00.2 fixed some logic in start_ve
  #  general cleanup all over the place
 @@ -67,7 +70,7 @@
  ?xml version=1.0?
  !DOCTYPE resource-agent SYSTEM ra-api-1.dtd
  resource-agent name=ManageVE
 -  version1.00.3/version
 +  version1.00.4/version
  
longdesc lang=en
  This OCF complaint resource agent manages OpenVZ VEs and thus requires
 @@ -87,12 +90,12 @@
/parameters
  
actions
 -action name=start timeout=75 /
 -action name=stop timeout=75 /
 -action name=status depth=0 timeout=10 interval=10 /
 -action name=monitor depth=0 timeout=10 interval=10 /
 -action name=validate-all timeout=5 /
 -action name=meta-data timeout=5 /
 +action name=start timeout=240 /
 +action name=stop timeout=150 /
 +action name=status depth=0 timeout=20 interval=60 /
 +action name=monitor depth=0 timeout=20 interval=60 /
 +action name=validate-all timeout=10 /
 +action name=meta-data timeout=10 /
/actions
  /resource-agent
  END
 @@ -127,7 +130,7 @@
  return $retcode
fi
  
 -  $VZCTL start $VEID  /dev/null
 +  $VZCTL start $VEID --wait  /dev/null
retcode=$?
  
if [[ $retcode != 0  $retcode != 32 ]]; then

 ___
 Pacemaker mailing list: pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes

2013-03-13 Thread Dejan Muhamedagic
Hi,

On Sat, Mar 09, 2013 at 07:53:34PM +, Tim Small wrote:
 Hi,
 
 I've been using the ocf:heartbeat:SendArp script and notice a couple of
 issues - some problems with starting and monitoring the service, and
 also a file descriptor leak in the binary (which would cause it to
 terminate).
 
 I've detailed the problems and supplied some patches:
 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701913

Cannot just replace the whole RA. Sorry. If you could split the
patch we can consider them on a one-by-one basis. Otherwise, I
found some patch in my local queue, which never got pushed for
some reason. Don't know if that would help (attached).

 and
 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701914

Can you try the attached send_arp.libnet.c patch. It does first
packet build then reuses them.

Cheers,

Dejan

 ... they're not perfect, but an improvement I think.
 
 HTH,
 
 Tim.
 
 -- 
 South East Open Source Solutions Limited
 Registered in England and Wales with company number 06134732.  
 Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
 VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
 
 
 ___
 Pacemaker mailing list: pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
From 9dae34616ef62b98b762a1f821f9e1ee749e6315 Mon Sep 17 00:00:00 2001
From: Dejan Muhamedagic de...@suse.de
Date: Wed, 13 Mar 2013 18:19:10 +0100
Subject: [PATCH] Medium: tools: send_arp.libnet: reuse ARP packets

---
 tools/send_arp.libnet.c | 174 
 1 file changed, 115 insertions(+), 59 deletions(-)

diff --git a/tools/send_arp.libnet.c b/tools/send_arp.libnet.c
index 71148bb..2abeecb 100644
--- a/tools/send_arp.libnet.c
+++ b/tools/send_arp.libnet.c
@@ -49,17 +49,18 @@
 
 #ifdef HAVE_LIBNET_1_0_API
 #	define	LTYPE	struct libnet_link_int
+	static u_char *mk_packet(u_int32_t ip, u_char *device, u_char *macaddr, u_char *broadcast, u_char *netmask, u_short arptype);
+	static int send_arp(struct libnet_link_int *l, u_char *device, u_char *buf);
 #endif
 #ifdef HAVE_LIBNET_1_1_API
 #	define	LTYPE	libnet_t
+	static libnet_t *mk_packet(libnet_t* lntag, u_int32_t ip, u_char *device, u_char macaddr[6], u_char *broadcast, u_char *netmask, u_short arptype);
+	int send_arp(libnet_t* lntag);
 #endif
 
 #define PIDDIR   HA_VARRUNDIR / PACKAGE
 #define PIDFILE_BASE PIDDIR /send_arp-
 
-static int send_arp(LTYPE* l, u_int32_t ip, u_char *device, u_char mac[6]
-,	u_char *broadcast, u_char *netmask, u_short arptype);
-
 static char print_usage[]={
 send_arp: sends out custom ARP packet.\n
   usage: send_arp [-i repeatinterval-ms] [-r repeatcount] [-p pidfile] \\\n
@@ -135,7 +136,6 @@ main(int argc, char *argv[])
 	char*	netmask;
 	u_int32_t	ip;
 	u_char  src_mac[6];
-	LTYPE*	l;
 	int	repeatcount = 1;
 	int	j;
 	long	msinterval = 1000;
@@ -143,6 +143,13 @@ main(int argc, char *argv[])
 	charpidfilenamebuf[64];
 	char*pidfilename = NULL;
 
+#ifdef HAVE_LIBNET_1_0_API
+	LTYPE*	l;
+	u_char *request, *reply;
+#elif defined(HAVE_LIBNET_1_1_API)
+	LTYPE *request, *reply;
+#endif
+
 	CL_SIGNAL(SIGTERM, byebye);
 	CL_SIGINTERRUPT(SIGTERM, 1);
 
@@ -201,6 +208,24 @@ main(int argc, char *argv[])
 		return EXIT_FAILURE;
 	}
 
+	if (!strcasecmp(macaddr, AUTO_MAC_ADDR)) {
+		if (get_hw_addr(device, src_mac)  0) {
+			 cl_log(LOG_ERR, Cannot find mac address for %s, 
+	 device);
+			 unlink(pidfilename);
+			 return EXIT_FAILURE;
+		}
+	}
+	else {
+		convert_macaddr((unsigned char *)macaddr, src_mac);
+	}
+
+/*
+ * We need to send both a broadcast ARP request as well as the ARP response we
+ * were already sending.  All the interesting research work for this fix was
+ * done by Masaki Hasegawa masak...@pp.iij4u.or.jp and his colleagues.
+ */
+
 #if defined(HAVE_LIBNET_1_0_API)
 #ifdef ON_DARWIN
 	if ((ip = libnet_name_resolve((unsigned char*)ipaddr, 1)) == -1UL) {
@@ -219,49 +244,65 @@ main(int argc, char *argv[])
 		unlink(pidfilename);
 		return EXIT_FAILURE;
 	}
+	request = mk_packet(ip, (unsigned char*)device, src_mac
+		, (unsigned char*)broadcast, (unsigned char*)netmask
+		, ARPOP_REQUEST);
+	reply = mk_packet(ip, (unsigned char*)device, src_mac
+		, (unsigned char *)broadcast
+		, (unsigned char *)netmask, ARPOP_REPLY);
+	if (!request || !reply) {
+		cl_log(LOG_ERR, could not create packets);
+		unlink(pidfilename);
+		return EXIT_FAILURE;
+	}
+	for (j=0; j  repeatcount; ++j) {
+		c = send_arp(l, (unsigned char*)device, request);
+		if (c  0) {
+			break;
+		}
+		mssleep(msinterval / 2);
+		c = send_arp(l, (unsigned char*)device, reply);
+		if (c  0) {
+			break;
+		}
+		if (j != repeatcount-1) {
+			mssleep(msinterval / 2);
+		}
+	}
 #elif defined(HAVE_LIBNET_1_1_API)
-	if ((l=libnet_init(LIBNET_LINK, device, errbuf)) == NULL

Re: [Linux-ha-dev] [PATCH] crmsh: fix in python version checking

2013-03-01 Thread Dejan Muhamedagic
Hi Keisuke-san,

On Fri, Mar 01, 2013 at 10:31:46AM +0900, Keisuke MORI wrote:
 Hi Dejan,
 
 Here is a trivial patch for crmsh.
 It is totally harmless because it affects only when the python version
 is too old and crmsh would abort anyway :)

Many thanks for the patch.

Cheers,

Dejan

 Thanks,
 -- 
 Keisuke MORI


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for crmsh.spec

2013-02-19 Thread Dejan Muhamedagic
On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote:
 Hi,
 
 On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote:
  Hi, Dejan
  
  I made a patch for spec file to make rpm of crmsh in rhel environment.
  I want a crmsh repository to merge it if I do not have any problem.

This is a problem which I ran into earlier too. Something
(probably one of the rpm macros) does a 'rm -rf' of the doc
directory _after_ the files got installed:

[   29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   29s]  /usr/bin/install -c -m 644 'AUTHORS' 
'/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS'
[   29s]  /usr/bin/install -c -m 644 'COPYING' 
'/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING'
...
[   30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245
[   30s] + umask 022
[   30s] + cd /usr/src/packages/BUILD
[   30s] + cd crmsh
[   30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + export DOCDIR
[   30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
[   30s] + cp -pr ChangeLog 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
...
[   32s] error: create archive failed on file 
/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open 
failed - Bad file descriptor

If somebody can shed some light or suggest how to deal with
this ...

Thanks,

Dejan


 No problem. Will test the patch. BTW, did you notice that there
 are packages for rhel too at OBS (see the latest news item at
 https://savannah.nongnu.org/projects/crmsh/).
 
 Cheers,
 
 Dejan
 
  
  Best regards,
  Yusuke
  --
  
  METRO SYSTEMS CO., LTD
  
  Yusuke Iida
  Mail: yusk.i...@gmail.com
  
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for crmsh.spec

2013-02-19 Thread Dejan Muhamedagic
On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote:
 On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote:
  Hi,
  
  On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote:
   Hi, Dejan
   
   I made a patch for spec file to make rpm of crmsh in rhel environment.
   I want a crmsh repository to merge it if I do not have any problem.
 
 This is a problem which I ran into earlier too. Something
 (probably one of the rpm macros) does a 'rm -rf' of the doc
 directory _after_ the files got installed:
 
 [   29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p 
 /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
 [   29s]  /usr/bin/install -c -m 644 'AUTHORS' 
 '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS'
 [   29s]  /usr/bin/install -c -m 644 'COPYING' 
 '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING'
 ...
 [   30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245
 [   30s] + umask 022
 [   30s] + cd /usr/src/packages/BUILD
 [   30s] + cd crmsh
 [   30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
 [   30s] + export DOCDIR
 [   30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
 [   30s] + /bin/mkdir -p 
 /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
 [   30s] + cp -pr ChangeLog 
 /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh
 ...
 [   32s] error: create archive failed on file 
 /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open 
 failed - Bad file descriptor
 
 If somebody can shed some light or suggest how to deal with
 this ...

OK. I think I managed to fix it. The result is already upstream.
I tested it with rhel6, centos6, fedora 17 and 18. Can you
please test too.

Thanks,

Dejan

 Thanks,
 
 Dejan
 
 
  No problem. Will test the patch. BTW, did you notice that there
  are packages for rhel too at OBS (see the latest news item at
  https://savannah.nongnu.org/projects/crmsh/).
  
  Cheers,
  
  Dejan
  
   
   Best regards,
   Yusuke
   --
   
   METRO SYSTEMS CO., LTD
   
   Yusuke Iida
   Mail: yusk.i...@gmail.com
   
  
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for crmsh.spec

2013-02-15 Thread Dejan Muhamedagic
Hi,

On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote:
 Hi, Dejan
 
 I made a patch for spec file to make rpm of crmsh in rhel environment.
 I want a crmsh repository to merge it if I do not have any problem.

No problem. Will test the patch. BTW, did you notice that there
are packages for rhel too at OBS (see the latest news item at
https://savannah.nongnu.org/projects/crmsh/).

Cheers,

Dejan

 
 Best regards,
 Yusuke
 --
 
 METRO SYSTEMS CO., LTD
 
 Yusuke Iida
 Mail: yusk.i...@gmail.com
 


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] crmsh release 1.2.5

2013-02-12 Thread Dejan Muhamedagic
Hello,

The CRM shell v1.2.5 is released.

The highlights of the release:

* cibconfig: modgroup command
* cibconfig: directed graph support
* history: diff command (between PE inputs)
* history: show command (show configuration of PE inputs)
* history: graph command

For the full set of changes, take a look at the changelog:

http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.5/ChangeLog

== Note about Pacemaker versions ==

CRM shell 1.2.5 supports all Pacemaker 1.1 versions.

== Installing ==

Installing the CRM shell along with Pacemaker 1.1 versions =
v1.1.7 is possible, but it will result in file conflicts. You
need to enforce file overwriting when installing packages.

Note that pacemaker v1.1.7 carries a crm shell version which is
the same as in v1.1.6, or put differently quite outdated. There
are several interesting new features, including history, which
never made it in any pacemaker release.

== Resources ==

Packages for several popular Linux distributions:

http://download.opensuse.org/repositories/network:/ha-clustering/

The man page:

http://crmsh.nongnu.org/crm.8.html

The CRM shell project web page at GNU savannah:

https://savannah.nongnu.org/projects/crmsh/

Support and bug reporting:

http://lists.linux-ha.org/mailman/listinfo/linux-ha
https://savannah.nongnu.org/bugs/?group=crmsh
https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2

The sources repository is available at:

http://hg.savannah.gnu.org/hgweb/crmsh

Enjoy!

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] announcement: resource-agents release 3.9.5

2013-02-07 Thread Dejan Muhamedagic
Hello,

We've tagged today (Feb 7) a new stable resource-agents release
(3.9.5) in the upstream repository.

Big thanks go to all contributors! Needless to say, without you
this release would not be possible.

The Linux-HA resource agents set changes consist mainly of bug
fixes and a few improvements and new features. The most
important fix is for the missing unsolicited ARPs issue in
IPaddr2.

The following two features are worth mentioning too:

- support for RA tracing (see the README file for more details);
  your favourite UI should provide a way to turn trace on/off

- pgsql: support starting as Hot Standby

The full list of changes for the linux-ha RA set is available in
ChangeLog:

https://github.com/ClusterLabs/resource-agents/blob/master/ChangeLog

The rgmanager resource agents set received mainly bug fixes.

Please upgrade at the earliest opportunity.

Best,

The resource-agents maintainers
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1

2013-01-31 Thread Dejan Muhamedagic
Hi Keisuke-san,

On Thu, Jan 31, 2013 at 06:33:49PM +0900, Keisuke MORI wrote:
 Hi,
 
 Does IPaddr2 need to support 'eth0:label' format in a single 'nic'
 parameter when you want to use an interface label?
 
 I thought it does't from the mata-data description of 'nic' but it
 looks conflicting with the 'iflabel' description;
 
 nic:
  Do NOT specify an alias interface in the form eth0:1 or anything here;
  rather, specify the base interface only.
  If you want a label, see the iflabel parameter.
 
 iflabel:
  (omit)
  If a label is specified in nic name, this parameter has no effect.

Hmpf.

 The latest IPaddr2 (findif.sh version) would reject it as an invalid
 nic has specified.
 If we do need to support it I will submit a patch for this.

I'd rather just update the iflabel description. After all,
normally one doesn't need to specify the nic. But you'll
get different preferences from different people :)
However, it seems to be a regression, so we should probably allow
labels. BTW, is this related to
https://github.com/ClusterLabs/resource-agents/issues/200 ?

Cheers,

Dejan


 Thanks,
 
 
 2013/1/30 Dejan Muhamedagic de...@suse.de:
  Hello,
 
  The current resource-agents repository has been tagged
  v3.9.5rc1. It is mainly a bug fix release.
 
  The full list of changes for the linux-ha RA set is available in
  ChangeLog:
 
  https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog
 
  We'll allow a week for testing. The final release is planned for
  Feb 6.
 
  Many thanks to all contributors!
 
  Best,
 
  The resource-agents maintainers
  ___
  ha-wg-technical mailing list
  ha-wg-techni...@lists.linux-foundation.org
  https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical
 
 
 
 -- 
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1

2013-01-30 Thread Dejan Muhamedagic
Hello,

The current resource-agents repository has been tagged
v3.9.5rc1. It is mainly a bug fix release.

The full list of changes for the linux-ha RA set is available in
ChangeLog:

https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog

We'll allow a week for testing. The final release is planned for
Feb 6.

Many thanks to all contributors!

Best,

The resource-agents maintainers
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] 答复: Re: 答复: Re: New patch for?resource-agents(nfsserver)

2013-01-28 Thread Dejan Muhamedagic
On Wed, Jan 23, 2013 at 11:34:34PM -0700, John Shi wrote:
 Hi Dejan,
 
 You take the action of checks on the place it should go, better! 
 Thanks in advance!

Thanks for the review. Applied now.

Cheers,

Dejan

 
 Best regards,
 John
 
  Dejan Muhamedagic de...@suse.de 2013-1-24 上午 0:17 
 On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote:
  Hi Dejan,
 Is this patch OK ?  I didn't see the pache going upstream yet.
 Thanks!
 
 Applied now. Can you please also test the attached patch. It is
 fairly small and low impact, but still.
 
 Cheers,
 
 Dejan
 
  Best regards,
  John
  
   Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 
  Hi John,
  
  On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote:
   Hi all,
   
   Fix all cause an error to call rpc.statd, including:
   
   Specify the value of nfs_notify_cmd should be either sm-notify or 
   rpc.statd in meta_data.
   Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, 
   bacase no retrytime option for rpc.statd.
   The parameter nfs_notify_foreground can make the correct option for 
   rpc.statd
  
  Looks good to me.
  
  Cheers,
  
  Dejan
  
   Best regards,
   John
   
  
  
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
  
  
  
 
 
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] 答复: Re: ocft updated (test tool of resource-agnets)

2013-01-28 Thread Dejan Muhamedagic
On Tue, Jan 22, 2013 at 02:21:37AM -0700, John Shi wrote:
 Done.

Applied now.

Many thanks for the contribution!

Cheers,

Dejan

  Dejan Muhamedagic de...@suse.de 2013-1-22 上午 4:20 
 Hi John,
 
 On Sun, Jan 20, 2013 at 09:23:05AM -0700, John Shi wrote:
  Medium: tools: ocft: update to version 0.44
  
  Added incremental mode (ocft test -i) , cache results and
  not repeatedly rerun those cases which succeeded.
  
  Improved *ocft make*, the ocft will only make the updated test-case 
  file.
  
  The ocft test prints the actual case names instead of just case numbers.
 
 Can you please split the patch into several self-contained
 patches, each of which fixes a single issue.
 
 Great work!
 
 Thanks,
 
 Dejan
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
 
 




___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation

2013-01-28 Thread Dejan Muhamedagic
Hi Attila,

Sorry for the delay, somehow missed your message.

On Fri, Dec 28, 2012 at 12:52:22PM +0100, Attila Megyeri wrote:
 Hi Dejan,
 
 -Original Message-
 From: linux-ha-dev-boun...@lists.linux-ha.org 
 [mailto:linux-ha-dev-boun...@lists.linux-ha.org] On Behalf Of Dejan 
 Muhamedagic
 Sent: Monday, December 24, 2012 11:07 AM
 To: linux-ha-dev@lists.linux-ha.org
 Subject: Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation
 
 Hi,
 
 On Thu, Dec 20, 2012 at 08:03:32PM +0100, Attila Megyeri wrote:
  hi,
  
  I have a cluster configuration with two IPsrcaddr resources (e.g. IP 
  address A and B) They are configured to two different addresses, and 
  are never supposed to run on the same nodes. So A can run on nodes N1 and 
  N2, B can run on  N3,N4.
  
  My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource 
  is running on a node where it shouldn't, and of course it is in unmanaged 
  state and cannot be stopped.
  For instance:
  IP address A is started, unamanged on node N3.
  
  I am using pacemaker 1.1.6 on a debian system, with the latest RA from 
  github.
  
  I checked the RA, and here are my findings.
  
  
  -  When status is called, it calls the srca_read() function
  
  -  srca_read() returns 2, if a srcip is running on the given node, 
  but with a different IP address.
  
  -  srca_status(), when gets 2 from srca_read(), returns 
  $OCF_ERR_GENERIC
  
  As a result, in my case IP B is running on N3, which is OK, but 
  CRM_mon reports that IP A is also running on N3 (unmanaged). [for some 
  reason this is how the OCF_ERR_GENERIC is interpreted] This is definitively 
  a bug, the question is whether in pacemaker or in the RA.
  If I change the script to return $OCF_NOT_RUNNING instead of  
  $OCF_ERR_GENERIC it works properly.
  
  What is the proper behavior in this case?
  My recommendation is to fix the RA so that srca_read() returns 1, if there 
  is a srcip on the node, but it is not the queried one.
 
 The comment in the agent says:
 
 #   NOTES:
 #
 #   1) There must be one and not more than 1 default route! Mainly because
 #   I can't see why you should have more than one.  And if there is more
 #   than one, we would have to box clever to find out which one is to be
 #   modified, or we would have to pass its identity as an argument.
 #
 
 This should actually be in the meta-data, as it is obviously intended for 
 users.
 
 It looks like your use case doesn't fit this description, right?
 Perhaps we could add a parameter like allow_multiple_default_routes.
 
 Thanks,
 
 Dejan
 
 
 On the host where the resource is running I have only one default gateway. 
 The other pair of this host (the other node) uses a different default gateway 
 - but I do not think this should be a limitation (on that host I have a 
 single default gateway as well).

The must be one and not more than 1 should also say
cluster-wide.

 The srca_read() function does not fail in the steps that check the default 
 gateway. The function runs till the last line where 2 is returned, although 
 it is not a generic error, rather the SRC ip is not running on the node.

The exit code 2 signifies that the default route has an
unexpected address.

I think that it works as designed. As mentioned earlier, we can
extend the resource agent to support clusters with multiple
default routes, but that would need to be done with an extra
configuration parameter. Patches welcome :)

Thanks,

Dejan

 
 Thanks,
 
 Attila
 
 
 
 
 
 
 
  In this case the RA would return a $OCF_NOT_RUNNING
  
  
  
  Cheers,
  Attila
  
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org 
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] IPaddr2 issue in resource-agents 3.9.4 and the next release

2013-01-24 Thread Dejan Muhamedagic
Hello,

IPaddr2 version from the latest resource-agents release 3.9.4
has a serious regression, it won't send unsolicited ARPs on
start. Depending on the ARP cache timeouts of the closest
network device, that would result in slower failover times. All
IP requests would still go to the node which was running the
previous instance of IPaddr2 resource. Unfortunately, the
regression tests are not capable of catching such a regression.
My sincere apologies for the blunder.

The issue has been fixed in the meantime and we're planning to
release 3.9.5 imminently, probably by Wednesday next week.

If there are any other urgent RA issues please let us know.

Cheers,

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RA trace facility

2013-01-24 Thread Dejan Muhamedagic
Hi,

On Tue, Nov 27, 2012 at 08:28:04AM +0100, Dejan Muhamedagic wrote:
 On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote:
  On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote:
  
What would you think of OCF_RESKEY_RA_TRACE ?
   A meta attribute perhaps? That wouldn't cause a resource
   restart.
  
  Point, but - meta attributes so far were mostly for the PE/pacemaker,
  this would be for the RA.
 
 Not exactly for the RA itself. The RA execution would just be
 observed. The attribute is consumed by others. Whether it is PE
 or lrmd or something else makes less of a difference. It is up to
 these subsystems to sort the meta attributes out.

It turns out that pacemaker won't export meta attributes which
were not recognized. At any rate, we can go with
OCF_RESKEY_trace_ra. The good thing is that it can be specified
per operation (op start trace_ra=1).

The interface is simple and it's described in ocf-shellfuncs. It
would get support in the UI.

  Would a changed definition for a resource we're trying to trace be an
  actual problem? I mean, tracing clearly means you want to trace an
  resource action, so one would put the attribute on the resource before
  triggering that.
  
  (It can also be put on in maintenance mode, avoiding the restart.)
  
 Our include script could
enable that; it's unlikely that the problem occurs prior to that.

- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
   
   bash has a way to send trace to a separate FD, but that feature
   is available with version =4.x. Otherwise, it could be messy to
   separate the trace from the other stderr output. Of course, one
   could just redirect stderr in this case. I suppose that that
   would work too.
  
  I assume that'd be easiest.
  
  (And people not using bash can write their own implementation for this.
  ;-)
  
- on-error: always trace, but delete on successful exit
   Good idea.

This is not implemented right now.

The patch is attached. It's planned for the release 3.9.5.

Thanks,

Dejan

hb_report/history explorer could gather this too.
   Right.
   
(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)

Sorry. Maybe I'm getting carried away ;-)
   
   Good points. I didn't really think much (yet) about how to
   further facilitate the feature, just had a vague idea that
   somehow lrmd should set the environment variable.
  
  Sure. LRM is an other obvious entry point for increased
  tracing/logging. That could also work.
  
   Perhaps we could do something like this:
   
   # crm resource trace rsc_id [action] [when-to-trace]
   
   This would set the appropriate meta attribute for the resource which
   would trickle down to the RA. ocf-shellfuncs would then do whatever's
   necessary to setup the trace. The file management could get tricky
   though, as we don't have a single point of exit (and trap is already
   used elsewhere).
  
  The file/log management would be easier to do in the LRM - and also
  handle the timeout situation; that could also make use of the redirect
  trace elsewhere if the shell is new enough.
 
 Indeed. Until then, ocf-shellfuncs can fallback to some well
 known location.
 
 Thanks,
 
 Dejan
 
  
  Regards,
  Lars
  
  -- 
  Architect Storage/HA
  SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
  HRB 21284 (AG Nürnberg)
  Experience is the name everyone gives to their mistakes. -- Oscar Wilde
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
From edad4f4f7ef39da0243c1b3444bb8630443a8c38 Mon Sep 17 00:00:00 2001
From: Dejan Muhamedagic de...@suse.de
Date: Wed, 23 Jan 2013 17:36:08 +0100
Subject: [PATCH] Medium: ocf-shellfuncs: RA tracing

---
 doc/dev-guides/ra-dev-guide.txt |  6 +++
 heartbeat/ocf-shellfuncs.in | 82 +
 tools/ocf-tester.8  |  5 ++-
 tools/ocf-tester.in |  4 +-
 4 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/doc/dev-guides/ra-dev-guide.txt b/doc/dev-guides/ra-dev-guide.txt
index af5e3b1..11e9a5d 100644
--- a/doc/dev-guides/ra-dev-guide.txt
+++ b/doc/dev-guides/ra-dev-guide.txt
@@ -1623,6 +1623,12 @@ Beginning tests for /home/johndoe/ra-dev/foobar...
 /home/johndoe/ra-dev/foobar passed all tests
 --
 
+If the resource agent exhibits some difficult to grasp behaviour,
+which is typically the case with just developed software, there
+are +-v+ and +-d+ options to dump more output

Re: [Linux-ha-dev] 答复: Re: New patch for resource-agents(nfsserver)

2013-01-23 Thread Dejan Muhamedagic
On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote:
 Hi Dejan,
Is this patch OK ?  I didn't see the pache going upstream yet.
Thanks!

Applied now. Can you please also test the attached patch. It is
fairly small and low impact, but still.

Cheers,

Dejan

 Best regards,
 John
 
  Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 
 Hi John,
 
 On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote:
  Hi all,
  
  Fix all cause an error to call rpc.statd, including:
  
  Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd 
  in meta_data.
  Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, 
  bacase no retrytime option for rpc.statd.
  The parameter nfs_notify_foreground can make the correct option for 
  rpc.statd
 
 Looks good to me.
 
 Cheers,
 
 Dejan
 
  Best regards,
  John
  
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
 
 

From 186919824523c362d63b4ac176ce91511b91c99d Mon Sep 17 00:00:00 2001
From: Dejan Muhamedagic de...@suse.de
Date: Wed, 23 Jan 2013 17:10:11 +0100
Subject: [PATCH] Low: nfsserver: move configuration checks to the validation
 phase

---
 heartbeat/nfsserver | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver
index 003fcc6..09136d7 100755
--- a/heartbeat/nfsserver
+++ b/heartbeat/nfsserver
@@ -254,10 +254,6 @@ nfsserver_start ()
 		fi
 
 		if [ -n $OCF_RESKEY_nfs_smnotify_retry_time ]; then
-			if ! ocf_is_decimal $OCF_RESKEY_nfs_smnotify_retry_time; then
-ocf_log err Invalid OCF_RESKEY_nfs_smnotify_retry_time [$OCF_RESKEY_nfs_smnotify_retry_time]
-return $OCF_ERR_CONFIGURED
-			fi
 			opts=$opts -m $OCF_RESKEY_nfs_smnotify_retry_time
 		fi
 
@@ -271,10 +267,6 @@ nfsserver_start ()
 		opts=$opts -n
 		;;
 
-	*)
-		ocf_log err Invalid OCF_RESKEY_nfs_notify_cmd [$OCF_RESKEY_nfs_notify_cmd]
-		return $OCF_ERR_CONFIGURED
-		;;
 	esac
 
 	rm -rf /var/lib/nfs/sm.ha.save  /dev/null 21
@@ -324,6 +316,21 @@ nfsserver_validate ()
 		exit $OCF_ERR_CONFIGURED
 	fi
 
+	if [ -n $OCF_RESKEY_nfs_smnotify_retry_time ]; then
+		if ! ocf_is_decimal $OCF_RESKEY_nfs_smnotify_retry_time; then
+			ocf_log err Invalid nfs_smnotify_retry_time [$OCF_RESKEY_nfs_smnotify_retry_time]
+			exit $OCF_ERR_CONFIGURED
+		fi
+	fi
+
+	case ${OCF_RESKEY_nfs_notify_cmd##*/} in 
+	sm-notify|rpc.statd) ;;
+	*)
+		ocf_log err Invalid nfs_notify_cmd [$OCF_RESKEY_nfs_notify_cmd]
+		exit $OCF_ERR_CONFIGURED
+		;;
+	esac
+
 	return $OCF_SUCCESS
 }
 
-- 
1.8.0

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] 答复: Re: New patch for resource-agents(nfsserver)

2013-01-22 Thread Dejan Muhamedagic
Hi John,

On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote:
 Hi Dejan,
Is this patch OK ?  I didn't see the pache going upstream yet.
Thanks!

Oh, yes, I think so. Sorry, I missed it somehow. Will take care
of it today.

Cheers,

Dejan

 Best regards,
 John
 
  Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 
 Hi John,
 
 On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote:
  Hi all,
  
  Fix all cause an error to call rpc.statd, including:
  
  Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd 
  in meta_data.
  Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, 
  bacase no retrytime option for rpc.statd.
  The parameter nfs_notify_foreground can make the correct option for 
  rpc.statd
 
 Looks good to me.
 
 Cheers,
 
 Dejan
 
  Best regards,
  John
  
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
 
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] ocft updated (test tool of resource-agnets)

2013-01-21 Thread Dejan Muhamedagic
Hi John,

On Sun, Jan 20, 2013 at 09:23:05AM -0700, John Shi wrote:
 Medium: tools: ocft: update to version 0.44
 
 Added incremental mode (ocft test -i) , cache results and
 not repeatedly rerun those cases which succeeded.
 
 Improved *ocft make*, the ocft will only make the updated test-case file.
 
 The ocft test prints the actual case names instead of just case numbers.

Can you please split the patch into several self-contained
patches, each of which fixes a single issue.

Great work!

Thanks,

Dejan


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] NMI Crashdump with external/ipmi

2013-01-16 Thread Dejan Muhamedagic
Hi,

On Mon, Jan 14, 2013 at 11:12:11PM +0100, Tobias D. Oestreicher wrote:
 Hi all,
 
 I've written a small patch for externel/ipmi, so it's possible to
 configure it not to reset a node, but trigger a crashdump via NMI.
 
 If a node becomes unavailable for several reasons it will be fenced but
 this makes investigating the root cause of the nodes unavailability very
 difficult; if you have a chashdump you can reconstruct the root cause.
 
 For this I added 3 new options:
 
 crashdump - set this to true to enable crashdump.
 
 sshcheck  - if this is true, a ssh connection will be
established to eighter $sshipaddr, if this is not
set, $hostname will be used as remoteadress.
 sshipaddr - in case ssh is listening on an other interface,
where dns isn't equal $hostname.

ssh is used only in case sshcheck is set to true? If so, then
that should be mentioned in the description of sshipaddr.
Further, the sshcheck parameter should come first (i.e. exchange
place with sshipaddr).

The test is Linux specific, that should also be noted in the
parameter description.

Please see below for notes on code.

 Maybe it could be usefull for others too.
 
 For any comments, suggestions I would be glad.
 
 
 Tobias D. Oestreicher
 
 -- 
 Tobias D. Oestreicher
 Linux Consultant  Trainer
 Tel.: +49-160-5329935
 Mail: oestreic...@b1-systems.de
 
 B1 Systems GmbH
 Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
 GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

 diff -r da5832ae23dd lib/plugins/stonith/external/ipmi
 --- a/lib/plugins/stonith/external/ipmi   Sun Dec 23 16:05:11 2012 +0100
 +++ b/lib/plugins/stonith/external/ipmi   Mon Jan 14 22:01:57 2013 +0100
 @@ -36,7 +36,11 @@
  POWEROFF=power off
  POWERON=power on
  STATUS=power status
 +CRASHDUMP=chassis power diag
 +
  IPMITOOL=${ipmitool:-`which ipmitool 2/dev/null`}
 +SYSCTL=`which sysctl 2/dev/null`

Normally, sysctl is in the PATH. As well as ssh (SSH_BIN below).

 +SSH_OPTS=-q -o PasswordAuthentication=no -o StrictHostKeyChecking=no

Add -l root, just in case?

  
  have_ipmi() {
   test -x ${IPMITOOL}
 @@ -138,7 +142,11 @@
   ;;
  reset)
   if ipmi_is_power_on; then
 - do_ipmi ${RESET}
 + if [ ${crashdump} == true ]; then
 + do_ipmi ${CRASHDUMP}
 + else
 + do_ipmi ${RESET}
 + fi
   else
   do_ipmi ${POWERON}
   fi
 @@ -149,11 +157,40 @@
   # the managed node. Hence, only check if we can contact the
   # IPMI device with power status command, don't pay attention
   # to whether the node is in fact powered on or off.
 + if [ ${crashdump} == true ]; then
 + if [ ${sshcheck} == true ];then

This should go to a separate function, sth like
check_crashdump_eligibility or check_crashdump_setup. Then you
can do:

if [ ${crashdump} == true -a ${sshcheck} == true ]; then
check_crashdump_setup || exit
fi

Otherwise, it'd be hard to get the meaning of this big chunk of
code.

 + if [ -z ${hostname} -a -z ${sshipaddr} ]; then
 + ha_log.sh err Neigther hostname nor sshipaddr 
 is set, crashdump testing not possible 
  + ha_log.sh err Neither ...
 + elif [ -z ${sshipaddr} ]; then
 + REMOTESSHHOST=${hostname}
 + else
 + REMOTESSHHOST=${sshipaddr}
 + fi
 + SSH_BIN=`which ssh 2/dev/null`
 + SSH_COMMAND=${SSH_BIN} ${REMOTESSHHOST} ${SSH_OPTS}
 + remote_crashdump_state=`${SSH_COMMAND} grep -c 
 crashkernel /proc/cmdline;${SYSCTL} -n kernel.unknown_nmi_panic 
 kernel.panic_on_unrecovered_nmi`

What if crashkernel is set to nothing? Would crash dump work then
too?

 + if [ $? -ne 0 ];then
 + ha_log.sh err Not possible to connect via ssh 
 to ${REMOTESSHHOST}
 + exit 1
 + fi
 + unknown_nmi=`echo ${remote_crashdump_state}|awk '{print 
 $2}'`
 + unrecovered_nmi=`echo ${remote_crashdump_state}|awk 
 '{print $3}'`
 + crashdump_kernel_option=`echo 
 ${remote_crashdump_state}|awk '{print $1}'`
 + if [ ${crashdump_kernel_option} -ne 1 ];then
 + ha_log.sh err Crashdump seems not to be 
 configured on host ${REMOTESSHHOST}
 + exit 1
 + fi
 + if [ ${unknown_nmi} -eq 0 -o ${unrecovered_nmi} -eq 0 
 ]; then
 +ha_log.sh err Non Maskerable Interupts do 
 not trigger a reset. Set \kernel.unknown_nmi_panic\ and 
 \kernel.panic_on_unrecovered_nmi\ to \1\

Replace Non 

[Linux-ha-dev] crmsh release v1.2.4

2013-01-14 Thread Dejan Muhamedagic
Hello,

With a bit of delay, here's the announcement for crmsh v1.2.4.

***

The CRM shell v1.2.4 is released.

The highlights of the release:

* history: fine tuning and several regression fixes
* more pacemaker 1.1.8 compatibility code

For the full set of changes, take a look at the changelog:

http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.4/ChangeLog

== Note about Pacemaker versions ==

CRM shell 1.2.4 supports all Pacemaker 1.1 versions.

== Installing ==

Installing the CRM shell along with Pacemaker 1.1 versions =
v1.1.7 is possible, but it will result in file conflicts. You
need to enforce file overwriting when installing packages.

Note that pacemaker v1.1.7 carries a crm shell version which is
the same as in v1.1.6, or put differently quite outdated. There
are several interesting new features, including history, which
never made it in any pacemaker release.

== Resources ==

Packages for several popular Linux distributions:

http://download.opensuse.org/repositories/network:/ha-clustering/

The man page:

http://crmsh.nongnu.org/crm.8.html

The CRM shell project web page at GNU savannah:

https://savannah.nongnu.org/projects/crmsh/

Support and bug reporting:

http://lists.linux-ha.org/mailman/listinfo/linux-ha
https://savannah.nongnu.org/bugs/?group=crmsh
https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2

The sources repository is available at:

http://hg.savannah.gnu.org/hgweb/crmsh

Enjoy!

Dejan

P.S. In case you wonder what happened to v1.2.2 and v1.2.3, well,
let's just say I didn't like the numbers ;-)
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] agents: including LGPL license file

2012-12-24 Thread Dejan Muhamedagic
Hi Keisuke-san,

On Thu, Dec 20, 2012 at 02:14:10PM +0900, Keisuke MORI wrote:
 Hi,
 
 The resource-agents package is licensed under GPL and LGPL,
 but the full copy of LGPL license file is missing
 as opposed to the heartbeat and the glue packages that includes it.
 
 Why don't we include COPYING.LGPL in the agents package too
 as the verbatim copy of LGPL license for the consistency?

Not really an expert in the area, but I think there's no problem
adding a copy of a license.

Cheers,

Dejan


 Thanks,
 -- 
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] New patch for resource-agents(nfsserver)

2012-12-24 Thread Dejan Muhamedagic
Hi John,

On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote:
 Hi all,
 
 Fix all cause an error to call rpc.statd, including:
 
 Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd 
 in meta_data.
 Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, bacase 
 no retrytime option for rpc.statd.
 The parameter nfs_notify_foreground can make the correct option for rpc.statd

Looks good to me.

Cheers,

Dejan

 Best regards,
 John
 


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation

2012-12-24 Thread Dejan Muhamedagic
Hi,

On Thu, Dec 20, 2012 at 08:03:32PM +0100, Attila Megyeri wrote:
 hi,
 
 I have a cluster configuration with two IPsrcaddr resources (e.g. IP address 
 A and B)
 They are configured to two different addresses, and are never supposed to run 
 on the same nodes. So A can run on nodes N1 and N2, B can run on  N3,N4.
 
 My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource 
 is running on a node where it shouldn't, and of course it is in unmanaged 
 state and cannot be stopped.
 For instance:
 IP address A is started, unamanged on node N3.
 
 I am using pacemaker 1.1.6 on a debian system, with the latest RA from github.
 
 I checked the RA, and here are my findings.
 
 
 -  When status is called, it calls the srca_read() function
 
 -  srca_read() returns 2, if a srcip is running on the given node, 
 but with a different IP address.
 
 -  srca_status(), when gets 2 from srca_read(), returns 
 $OCF_ERR_GENERIC
 
 As a result, in my case IP B is running on N3, which is OK, but CRM_mon 
 reports that IP A is also running on N3 (unmanaged). [for some reason this 
 is how the OCF_ERR_GENERIC is interpreted]
 This is definitively a bug, the question is whether in pacemaker or in the RA.
 If I change the script to return $OCF_NOT_RUNNING instead of  
 $OCF_ERR_GENERIC it works properly.
 
 What is the proper behavior in this case?
 My recommendation is to fix the RA so that srca_read() returns 1, if there is 
 a srcip on the node, but it is not the queried one.

The comment in the agent says:

#   NOTES:
#
#   1) There must be one and not more than 1 default route! Mainly because
#   I can't see why you should have more than one.  And if there is more
#   than one, we would have to box clever to find out which one is to be
#   modified, or we would have to pass its identity as an argument.
#

This should actually be in the meta-data, as it is obviously
intended for users.

It looks like your use case doesn't fit this description, right?
Perhaps we could add a parameter like allow_multiple_default_routes.

Thanks,

Dejan


 In this case the RA would return a $OCF_NOT_RUNNING
 
 
 
 Cheers,
 Attila
 

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] 转发: 答复: Re: A patch for resource-agents(nfsserver) ( add attachment )

2012-12-14 Thread Dejan Muhamedagic
Hi John,

On Fri, Dec 14, 2012 at 12:24:34AM -0700, John Shi wrote:
 
 
  John Shi 2012-12-14 下午 15:20 
 
 
  Dejan Muhamedagic de...@suse.de 2012-12-13 下午 17:33 
 
 This might be a bit better:
 
 # set default options
 local opts=-f -v
 
 # add option for notify_retry_time, if set
 if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then
 if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then
ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time 
  [$OCF_RESKEY_nfs_notify_retry_time]
 return $OCF_ERR_CONFIGURED
 fi
 opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time
 fi
 
 # run in foreground, if requested
 if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then
 opts=$opts -d
 fi
 
 What do you say?
 
 You are right, the default value of retry_time should be  .
 A little adjustment, please see the code 229 line of nfsserver:  
 
 ${OCF_RESKEY_nfs_notify_cmd} $opts $ip -P /var/lib/nfs/sm.ha
 
 $ip is the optarg of -v, so the code may be:

oops, missed that. Well spotted!

Patches applied. Many thanks for the contribution!

Cheers,

Dejan

 # set default options
 local opts=-f -v
 
 # add option for notify_retry_time, if set
 if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then
 if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then
ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time 
 [$OCF_RESKEY_nfs_notify_retry_time]
 return $OCF_ERR_CONFIGURED
 fi
 opts=-m $OCF_RESKEY_nfs_notify_retry_time $opts
 fi
 
 # run in foreground, if requested
 if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then
 opts=-d $opts
 fi
 
 
 Best regards,
 John
 



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] A patch for resource-agents(nfsserver)

2012-12-13 Thread Dejan Muhamedagic
Hi John,

On Thu, Dec 13, 2012 at 12:08:56AM -0700, John Shi wrote:
 
 Hi Dejan,
 
   I have a patch for nfsserver agent in resource-agents package.
 
 sm-notify was invoked in nfsserver, with -d option, sm-notify doesn't fork, 
 and without time limit, it tries to  reach each client for 15 minutes, still 
 spending this time in the  startup procedure of the resource, so this option 
 has been removed as fix  for git commit 
 ae83f2befdafdfae633afa1553f5d7c4f72d0196
 
 I think it should be an option to set or unset the -d, and we can improve 
 this even beter we could use the option -m to set the retrytime. 

OK, I guess that there could be use cases where the failover
would wait until all clients get notified.

 The patch added 2 parameters for nfsserver agent: 
 
 OCF_RESKEY_nfs_notify_foreground:  
type: boolean
default  'false' ,
'false'  to unset -d,  'true' to set -d
 
 OCF_RESKEY_nfs_notify_retry_time:  
   type: integer  
   default '15'
 
 Best regards,
 John 
 
 From f98a19be89eff8cb3f46a8957c4d81be694a2921 Mon Sep 17 00:00:00 2001
 From: John Shi j...@suse.com
 Date: Wed, 12 Dec 2012 15:50:39 +0800
 Subject: [PATCH] Medium: nfsserver: make the retry time for sm-notify in the
  nfsserver resource agent configurable

The two options are actually independent. So, they should be
either put into two patches or the description modified to
better reflect the change.

 ---
  heartbeat/nfsserver |   38 +-
  1 file changed, 37 insertions(+), 1 deletion(-)
 
 diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver
 index 6414e3a..974bff9 100755
 --- a/heartbeat/nfsserver
 +++ b/heartbeat/nfsserver
 @@ -14,6 +14,8 @@ fi
  
  DEFAULT_INIT_SCRIPT=/etc/init.d/nfsserver
  DEFAULT_NOTIFY_CMD=/sbin/sm-notify
 +DEFAULT_NOTIFY_FOREGROUND=false
 +DEFAULT_NOTIFY_RETRY_TIME=15
  DEFAULT_RPCPIPEFS_DIR=/var/lib/nfs/rpc_pipefs
  
  nfsserver_meta_data() {
 @@ -55,6 +57,28 @@ The tool to send out notification.
  content type=string default=$DEFAULT_NOTIFY_CMD /
  /parameter
  
 +parameter name=nfs_notify_foreground unique=0 required=0
 +longdesc lang=en
 +Keeps sm-notify attached to its controlling terminal and running in the 
 foreground.
 +/longdesc
 +shortdesc lang=en
 +Keeps sm-notify running in the foreground.
 +/shortdesc
 +content type=boolean default=$DEFAULT_NOTIFY_FOREGROUND /
 +/parameter
 +
 +parameter name=nfs_notify_retry_time unique=0 required=0
 +longdesc lang=en
 +Specifies the length of sm-notify retry time, in minutes, to continue 
 retrying notifications to unresponsive hosts.  
 +If this option is not specified, sm-notify attempts to send notifications 
 for 15 minutes. Specifying a value of 0 
 +causes sm-notify to continue sending notifications to unresponsive peers 
 until it is manually killed.
 +/longdesc
 +shortdesc lang=en
 +Specifies the length of sm-notify retry time(minutes).
 +/shortdesc
 +content type=integer default=$DEFAULT_NOTIFY_RETRY_TIME /
 +/parameter
 +
  parameter name=nfs_shared_infodir unique=0 required=1
  longdesc lang=en
  The nfsserver resource agent will save nfs related information in this 
 specific directory.
 @@ -129,6 +153,8 @@ esac
  fp=$OCF_RESKEY_nfs_shared_infodir
  : ${OCF_RESKEY_nfs_init_script=$DEFAULT_INIT_SCRIPT}
  : ${OCF_RESKEY_nfs_notify_cmd=$DEFAULT_NOTIFY_CMD}
 +: ${OCF_RESKEY_nfs_notify_foreground=$DEFAULT_NOTIFY_FOREGROUND}
 +: ${OCF_RESKEY_nfs_notify_retry_time=$DEFAULT_NOTIFY_RETRY_TIME}
  
  if [ -z ${OCF_RESKEY_rpcpipefs_dir} ]; then
   rpcpipefs_make_dir=$fp/rpc_pipefs
 @@ -220,7 +246,17 @@ nfsserver_start ()
   #Notify the nfs server has been moved or rebooted
   #The init script do that already, but with the hostname, which may be 
 ignored by client
   #we have to do it again with the nfs_ip 
 - local opts=-f -v
 + local opts
 +
 + if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then
 + ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time 
 [$OCF_RESKEY_nfs_notify_retry_time]
 + return $OCF_ERR_CONFIGURED
 + fi
 + if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then
 + opts=-d
 + fi
 + opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time -f -v

This might be a bit better:

# set default options
local opts=-f -v

# add option for notify_retry_time, if set
if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then
if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then
ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time 
[$OCF_RESKEY_nfs_notify_retry_time]
return $OCF_ERR_CONFIGURED
fi
opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time
fi

# run in foreground, if requested
if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then
opts=$opts -d
fi

What do you say?

Cheers,

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org

Re: [Linux-ha-dev] RA trace facility

2012-11-26 Thread Dejan Muhamedagic
On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote:
 On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote:
 
   What would you think of OCF_RESKEY_RA_TRACE ?
  A meta attribute perhaps? That wouldn't cause a resource
  restart.
 
 Point, but - meta attributes so far were mostly for the PE/pacemaker,
 this would be for the RA.

Not exactly for the RA itself. The RA execution would just be
observed. The attribute is consumed by others. Whether it is PE
or lrmd or something else makes less of a difference. It is up to
these subsystems to sort the meta attributes out.

 Would a changed definition for a resource we're trying to trace be an
 actual problem? I mean, tracing clearly means you want to trace an
 resource action, so one would put the attribute on the resource before
 triggering that.
 
 (It can also be put on in maintenance mode, avoiding the restart.)
 
Our include script could
   enable that; it's unlikely that the problem occurs prior to that.
   
   - never (default): Does nothing
   - always: Always trace, write to $(which path?)/raname.rscid.$timestamp
  
  bash has a way to send trace to a separate FD, but that feature
  is available with version =4.x. Otherwise, it could be messy to
  separate the trace from the other stderr output. Of course, one
  could just redirect stderr in this case. I suppose that that
  would work too.
 
 I assume that'd be easiest.
 
 (And people not using bash can write their own implementation for this.
 ;-)
 
   - on-error: always trace, but delete on successful exit
  Good idea.
  
   hb_report/history explorer could gather this too.
  Right.
  
   (And yes I know this introduces a fake parameter that doesn't really
   exist. But it'd be so helpful.)
   
   Sorry. Maybe I'm getting carried away ;-)
  
  Good points. I didn't really think much (yet) about how to
  further facilitate the feature, just had a vague idea that
  somehow lrmd should set the environment variable.
 
 Sure. LRM is an other obvious entry point for increased
 tracing/logging. That could also work.
 
  Perhaps we could do something like this:
  
  # crm resource trace rsc_id [action] [when-to-trace]
  
  This would set the appropriate meta attribute for the resource which
  would trickle down to the RA. ocf-shellfuncs would then do whatever's
  necessary to setup the trace. The file management could get tricky
  though, as we don't have a single point of exit (and trap is already
  used elsewhere).
 
 The file/log management would be easier to do in the LRM - and also
 handle the timeout situation; that could also make use of the redirect
 trace elsewhere if the shell is new enough.

Indeed. Until then, ocf-shellfuncs can fallback to some well
known location.

Thanks,

Dejan

 
 Regards,
 Lars
 
 -- 
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RA trace facility

2012-11-26 Thread Dejan Muhamedagic
Hi Keisuke-san,

On Thu, Nov 22, 2012 at 06:27:59PM +0900, Keisuke MORI wrote:
 Hi,
 
 2012/11/22 Dejan Muhamedagic de...@suse.de:
  Hi Lars,
 
  On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote:
  On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote:
 
   Hi,
  
   This is little something which could help while debugging
   resource agents. Setting the environment variable __OCF_TRACE_RA
   would cause the resource agent run to be traced (as in set -x).
   PS4 is set accordingly (that's a bash feature, don't know if
   other shells support it). ocf-tester got an option (-X) to turn
   the feature on. The agent itself can also turn on/off tracing
   via ocf_start_trace/ocf_stop_trace.
  
   Do you find anything amiss?
 
  I *really* like this.
 
  But I'd like a different way to turn it on - a standard one that is
  available via the CIB configuration, without modifying the script.
 
  I don't really want that the script gets modified either.
  The above instructions are for people developing a new RA.
 
 I like this, too.
 I would be useful when you need to diagnose in the production
 environment if you can enable / disable it without any modifications
 to RAs.

Of course.

 It might be also helpful if it has a kind of 'hook' functionality that
 allows you to execute an arbitrary script for collecting the runtime
 information such as CPU usage, memory status, I/O status or the list
 of running processes etc. for diagnosis.

Yes. I guess that one could run such a hook in background. Did
you mean that? Or once the RA instance exited? This is a bit
different feature though.

Thanks,

Dejan

 -- 
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] RA trace facility

2012-11-21 Thread Dejan Muhamedagic
Hi,

This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.

Do you find anything amiss?

Thanks,

Dejan
commit 77b3c997289a097fecab13179c4a2364bc34f15a
Author: Dejan Muhamedagic de...@suse.de
Date:   Wed Nov 21 16:24:20 2012 +0100

Dev: add RA trace capability

diff --git a/doc/dev-guides/ra-dev-guide.txt b/doc/dev-guides/ra-dev-guide.txt
index af5e3b1..11e9a5d 100644
--- a/doc/dev-guides/ra-dev-guide.txt
+++ b/doc/dev-guides/ra-dev-guide.txt
@@ -1623,6 +1623,12 @@ Beginning tests for /home/johndoe/ra-dev/foobar...
 /home/johndoe/ra-dev/foobar passed all tests
 --
 
+If the resource agent exhibits some difficult to grasp behaviour,
+which is typically the case with just developed software, there
+are +-v+ and +-d+ options to dump more output. If that does not
+help, instruct +ocf-tester+ to trace the resource agent with
++-X+ (make sure to redirect output to a file, unless you are a
+really fast reader).
 
 === Testing with +ocft+
 
diff --git a/heartbeat/ocf-shellfuncs.in b/heartbeat/ocf-shellfuncs.in
index f3822b7..04e4ecb 100644
--- a/heartbeat/ocf-shellfuncs.in
+++ b/heartbeat/ocf-shellfuncs.in
@@ -675,4 +675,14 @@ ocf_stop_processes() {
 	return 1
 }
 
+ocf_start_trace() {
+	PS4='+ `date +%T`: ${FUNCNAME[0]:+${FUNCNAME[0]}:}${LINENO}: '
+	set -x
+}
+ocf_stop_trace() {
+	set +x
+}
+
 __ocf_set_defaults $@
+
+[ $__OCF_TRACE_RA ]  ocf_start_trace
diff --git a/tools/ocf-tester.8 b/tools/ocf-tester.8
index ba07058..850ec0b 100644
--- a/tools/ocf-tester.8
+++ b/tools/ocf-tester.8
@@ -3,7 +3,7 @@
 ocf-tester \- Part of the Linux-HA project
 .SH SYNOPSIS
 .B ocf-tester
-[\fI-Lh\fR] \fI-n resource_name \fR[\fI-o name=value\fR]\fI* /full/path/to/resource/agent\fR
+[\fI-LhvqdX\fR] \fI-n resource_name \fR[\fI-o name=value\fR]\fI* /full/path/to/resource/agent\fR
 .SH DESCRIPTION
 Tool for testing if a cluster resource is OCF compliant
 .SH OPTIONS
@@ -43,6 +43,9 @@ Be quiet while testing
 \fB\-d\fR
 Turn on RA debugging
 .TP
+\fB\-X\fR
+Turn on RA tracing (expect large output)
+.TP
 \fB\-n\fR name
 Name of the resource
 .TP
diff --git a/tools/ocf-tester.in b/tools/ocf-tester.in
index 214e25c..2eaf220 100755
--- a/tools/ocf-tester.in
+++ b/tools/ocf-tester.in
@@ -51,13 +51,14 @@ usage() {
 
 echo Tool for testing if a cluster resource is OCF compliant
 echo 
-echo Usage: ocf-tester [-Lh] -n resource_name [-o name=value]* /full/path/to/resource/agent
+echo Usage: ocf-tester [-LhvqdX] -n resource_name [-o name=value]* /full/path/to/resource/agent
 echo 
 echo Options:
 echo   -h   		This text
 echo   -v   		Be verbose while testing
 echo   -q   		Be quiet while testing
 echo   -d   		Turn on RA debugging
+echo   -X   		Turn on RA tracing (expect large output)
 echo   -n name		Name of the resource	
 echo   -o name=value		Name and value of any parameters required by the agent
 echo   -L			Use lrmadmin/lrmd for tests
@@ -106,6 +107,7 @@ while test $done = 0; do
 	-L) use_lrmd=1; shift;;
 	-v) verbose=1; shift;;
 	-d) export HA_debug=1; shift;;
+	-X) export __OCF_TRACE_RA=1; verbose=1; shift;;
 	-q) quiet=1; shift;;
 	-?|--help) usage 0;;
 	--version) echo @PACKAGE_VERSION@; exit 0;;
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RA trace facility

2012-11-21 Thread Dejan Muhamedagic
Hi Lars,

On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote:
 On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote:
 
  Hi,
  
  This is little something which could help while debugging
  resource agents. Setting the environment variable __OCF_TRACE_RA
  would cause the resource agent run to be traced (as in set -x).
  PS4 is set accordingly (that's a bash feature, don't know if
  other shells support it). ocf-tester got an option (-X) to turn
  the feature on. The agent itself can also turn on/off tracing
  via ocf_start_trace/ocf_stop_trace.
  
  Do you find anything amiss?
 
 I *really* like this.
 
 But I'd like a different way to turn it on - a standard one that is
 available via the CIB configuration, without modifying the script.

I don't really want that the script gets modified either.
The above instructions are for people developing a new RA.

 What would you think of OCF_RESKEY_RA_TRACE ?

A meta attribute perhaps? That wouldn't cause a resource
restart.

  Our include script could
 enable that; it's unlikely that the problem occurs prior to that.
 
 - never (default): Does nothing
 - always: Always trace, write to $(which path?)/raname.rscid.$timestamp

bash has a way to send trace to a separate FD, but that feature
is available with version =4.x. Otherwise, it could be messy to
separate the trace from the other stderr output. Of course, one
could just redirect stderr in this case. I suppose that that
would work too.

 - on-error: always trace, but delete on successful exit

Good idea.

 hb_report/history explorer could gather this too.

Right.

 (And yes I know this introduces a fake parameter that doesn't really
 exist. But it'd be so helpful.)
 
 Sorry. Maybe I'm getting carried away ;-)

Good points. I didn't really think much (yet) about how to
further facilitate the feature, just had a vague idea that
somehow lrmd should set the environment variable. Perhaps we
could do something like this:

# crm resource trace rsc_id [action] [when-to-trace]

This would set the appropriate meta attribute for the resource
which would trickle down to the RA. ocf-shellfuncs would then
do whatever's necessary to setup the trace. The file management
could get tricky though, as we don't have a single point of exit
(and trap is already used elsewhere).

Cheers,

Dejan

 
 Regards,
 Lars
 
 -- 
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: stonith plugin external/vcenter HOSTLIST separator

2012-11-15 Thread Dejan Muhamedagic
Hi,

On Thu, Nov 15, 2012 at 11:52:48AM +0100, Stefan Botter wrote:
 Hi all,
 
 recently I tried to use the STONITH plugin external/vcenter along with 
 vCenter 5 (I doubt, that the version is significant).
 While using the stonith resource for each node separately, I had no 
 problems, but using it in a clone resulted in failures like that one:
 
 
 Nov 14 08:53:57 shermcl1 external/vcenter(vfencing:0)[23236]: [23257]:
ERROR: [reset shermcl2] Invalid target specified
 
 where the cluster consists of virtual machines SHERMCL1, SHERMCL2 and 
 SHERMCL3, with their unames shermcl1, shermcl2 and shermcl3, 
 accordingly. shermcl2 should be fenced, but the remaining cluster 
 members were unable to kill that machine.
 
 The relevant portion of the cluster configuration is here:
 
 
 node shermcl1
 node shermcl2
 node shermcl3
 
 primitive vfencing stonith:external/vcenter \
 params VI_SERVER=virtualcenter.dom.ain
   VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml
   HOSTLIST=shermcl1=SHERMCL1;shermcl2=SHERMCL2;shermcl3=SHERMCL3
   RESETPOWERON=0 \
 op monitor interval=3600s
 clone Fencing vfencing

It could be that the issue comes from the bug in fence_legacy,
which has been resolved in the meantime. Can you try to edit that
and replace the split command (line 86) with the following (i.e.
just append , 2):

 ($name,$val)=split /\s*=\s*/, $opt, 2;

The file location should be /usr/sbin/fence_legacy.

Can you please see if that helps?

Cheers,

Dejan

 location l-Fencing_shermcl1 Fencing 0: shermcl1
 location l-Fencing_shermcl2 Fencing 0: shermcl2
 location l-Fencing_shermcl3 Fencing 0: shermcl3
 
 
 The location statements are needed, as the cluster itself is no 
 symmetric.
 
 All machines are plain openSUSE 12.2 with corosync 1.4.3 and pacemaker 
 1.1.6.
 
 While running perfectly on the commandline with
 
 stonith -t external/vcenter VI_SERVER=virtualcenter.dom.ain \
  VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml \
  HOSTLIST=shermcl1=SHERMCL1;shermcl2=SHERMCL2;shermcl3=SHERMCL3 \
  RESETPOWERON=0 -l
 
 and showing the names of the three virtual machines, I found, that 
 called as resource inside the cluster only the first hostname until the 
 first = is visible, perhaps caused by the handover as environment 
 variable.
 
 Applying the attached trivial patch to use a colon (:) instead of the 
 equal sign (=) the command line test
 
 stonith -t external/vcenter VI_SERVER=virtualcenter.dom.ain \
  VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml \
  HOSTLIST=shermcl1:SHERMCL1;shermcl2:SHERMCL2;shermcl3:SHERMCL3 \
  RESETPOWERON=0 -l
 
 as well as fencing inside the cluster with
 
 primitive vfencing stonith:external/vcenter \
 params VI_SERVER=virtualcenter.dom.ain
   VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml
   HOSTLIST=shermcl1:SHERMCL1;shermcl2:SHERMCL2;shermcl3:SHERMCL3
   RESETPOWERON=0 \
 op monitor interval=3600s
 
 succeeds.
 
 
 So a question around: Is anyone using the external/vcenter with the 
 cloned resource successfully with the original syntax?
 If so, where is my problem?
 
 If not, the attached patch changes the syntax in the above described 
 way. If there is no objection can it be applied?
 
 Greetings,
 
 Stefan
 
 PS: sorry for the line breaks in the code
 -- 
 Stefan Botter  listrea...@jsj.dyndns.org

 # HG changeset patch
 # User Stefan Botter j...@jsj.dyndns.org
 # Date 1352974761 -3600
 # Node ID 3429be9596a95127e04706c38c5c4d82fb67e206
 # Parent  0809ed6abeb7289f3a8f4229f537df8d509c0854
 - trivial change to use : as hostname delimiter in HOSTLIST instead of =
 
 diff -r 0809ed6abeb7 -r 3429be9596a9 lib/plugins/stonith/external/vcenter
 --- a/lib/plugins/stonith/external/vcenterMon Oct 22 17:35:17 2012 +0200
 +++ b/lib/plugins/stonith/external/vcenterThu Nov 15 11:19:21 2012 +0100
 @@ -55,12 +55,12 @@
  longdesc lang=en
  The list of hosts that the VMware vCenter STONITH device controls.
  Syntax is:
 -  hostname1[=VirtualMachineName1];hostname2[=VirtualMachineName2]
 +  hostname1[:VirtualMachineName1];hostname2[:VirtualMachineName2]
  
 -NOTE: omit =VirtualMachineName if hostname and virtual machine names are 
 identical
 +NOTE: omit :VirtualMachineName if hostname and virtual machine names are 
 identical
  
  Example:
 -  cluster1=VMCL1;cluster2=VMCL2
 +  cluster1:VMCL1;cluster2:VMCL2
  /longdesc
  /parameter
  parameter name=VI_SERVER
 @@ -128,7 +128,7 @@
   my %host_to_vm = ();
   my %vm_to_host = ();
   foreach my $host (@hostlist) {
 - my @config = split(/=/, $host);
 + my @config = split(/:/, $host);
   my $key = $config[0]; my $value = $config[1];
   if (!defined($value)) { $value = $config[0]; }
   $host_to_vm{$key} = $value;

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 

[Linux-ha-dev] announcement: resource-agents release candidate 3.9.4rc1

2012-11-13 Thread Dejan Muhamedagic
Hello,

The current resource-agents repository has been tagged
v3.9.4rc1. It is mainly a bug fix release.

The full list of changes for the linux-ha RA set is available in
ChangeLog.

We'll allow a week for agents testing. The final release is
planned for Nov 20.

Many thanks to all contributors!

Best,

The resource-agents maintainers
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] planning resource agents v3.9.4

2012-10-31 Thread Dejan Muhamedagic
Hello,

A couple of milestones were created for the resource-agents
project at github.com yesterday. Since most of activity is
happening at github, it seemed like the most logical place to use
for the release planning.

This is the tentative schedule:

3.9.4-rc: November 13.
3.9.4: November 20.

If there's anything you think should be part of the
resource-agents release please open an issue, a pull request, or
a bugzilla, as you see fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so.

Cheers,

The resource-agents crowd
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)

2012-10-29 Thread Dejan Muhamedagic
On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote:
 On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic de...@suse.de wrote:
  On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
  On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
   Usually,  we use crm_master command instead of crm_attribute to 
   change master score in RA.
   But PostgreSQL's slave can't get own replication status, so Master 
   changes Slave's master-score
   using instance number on Pacemaker 1.0.x .
   This probably is not ordinary usage.
  
Would the existing resource agent work with globally-unique=true ?
  
   I don't know it works with true.
   I use it with false and it dosen't need true.
 
  I suggested that you actually should use globally-unique clones,
  as in that case you still get those instance numbers...
 
  Does using different clones make sense in pgsql? What is to be
  different between them? Or would it be just for the sake of
  getting instance numbers? If so, then it somehow looks wrong to
  me :)
 
  But thinking about it once more, I'm not so sure anymore.
 
  Correct me where I'm wrong.
 
  This is about the master score.
  In case the Master instance fails, we preferably want to promote the
  slave instance that is as close as possible to the Master.
  We only know which *node* was best at the last monitoring interval,
  which may be good enough.
 
  We need to then change the master score for *all possible instances*,
  for all nodes, accordingly.
 
  Which is what that loop did.
  (I think skipping the current instance is actually a bug;
   If pacemaker relabeles things in a bad way, you may hit it).
 
  Now, with pacemaker 1.1.8, all instances become equal
  (for anonymous clones, aka globally-unique=false),
  and we only need to set the score on the resource-id,
  not for all resource-id:instance combinations.
 
  OK.
 
  Which is great. After all, the master score in this case is attached to
  the node (or, the data set accessible from that node), and not to the
  (arbitrary, potentially relabeled anytime) instance number pacemaker
  assigned to the clone instance running on that node.
 
 
  And that is exactly what your patch does:
   * detect if a version of pacemaker is in use that attaches the instance
 number to the resource id
 * if so, do the loop on all possible instance numbers as before
 * if not, only set the master score on the resource-id
 
 
  Is my understanding correct?
  Then I think you patch is good.
 
  Yes, the patch seems good then. Though there is quite a bit of
  code repetition. The set attribute part should be moved to an
  extra function.
 
  Still, other resource agents that use master scores (or any other
  attributes that reference instance numbers of anonymous clones)
  need to be reviewed.
 
  Though this I'll set scores for other instances, not only myself
  logic is unique to pgsql, so most other resource agents should just
  work with whatever is present in the environment, they typically treat
  the $OCF_RESOURCE_INSTANCE as opaque.
 
  Seems like no other RA uses instance numbers. However, quite a
  few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
  resources, may potentially lead to unpredictable results on
  upgrade to 1.1.8.
 
 No. Otherwise all the regression tests would fail.  The PE is smart
 enough to find promotion score and failcounts in either case.

Cool.

 Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the
 resource as, not what we call it internally to the PE.

What I meant was that some RA use OCF_RESOURCE_INSTANCE to name
local files which keep some kind of state. If
OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that
the worst that can happen is for the probe to fail. But I didn't
take a closer look.

Thanks,

Dejan

  Thanks,
Lars
 
  Cheers,
 
  Dejan
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)

2012-10-25 Thread Dejan Muhamedagic
On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
 On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
  Usually,  we use crm_master command instead of crm_attribute to change 
  master score in RA.
  But PostgreSQL's slave can't get own replication status, so Master changes 
  Slave's master-score 
  using instance number on Pacemaker 1.0.x .
  This probably is not ordinary usage.
  
   Would the existing resource agent work with globally-unique=true ?
  
  I don't know it works with true.
  I use it with false and it dosen't need true.
 
 I suggested that you actually should use globally-unique clones,
 as in that case you still get those instance numbers...

Does using different clones make sense in pgsql? What is to be
different between them? Or would it be just for the sake of
getting instance numbers? If so, then it somehow looks wrong to
me :)

 But thinking about it once more, I'm not so sure anymore.
 
 Correct me where I'm wrong.
 
 This is about the master score.
 In case the Master instance fails, we preferably want to promote the
 slave instance that is as close as possible to the Master.
 We only know which *node* was best at the last monitoring interval,
 which may be good enough.
 
 We need to then change the master score for *all possible instances*,
 for all nodes, accordingly.
 
 Which is what that loop did.
 (I think skipping the current instance is actually a bug;
  If pacemaker relabeles things in a bad way, you may hit it).
 
 Now, with pacemaker 1.1.8, all instances become equal
 (for anonymous clones, aka globally-unique=false),
 and we only need to set the score on the resource-id,
 not for all resource-id:instance combinations.

OK.

 Which is great. After all, the master score in this case is attached to
 the node (or, the data set accessible from that node), and not to the
 (arbitrary, potentially relabeled anytime) instance number pacemaker
 assigned to the clone instance running on that node.
 
 
 And that is exactly what your patch does:
  * detect if a version of pacemaker is in use that attaches the instance
number to the resource id
* if so, do the loop on all possible instance numbers as before
* if not, only set the master score on the resource-id
 
 
 Is my understanding correct?
 Then I think you patch is good.

Yes, the patch seems good then. Though there is quite a bit of
code repetition. The set attribute part should be moved to an
extra function.

 Still, other resource agents that use master scores (or any other
 attributes that reference instance numbers of anonymous clones)
 need to be reviewed.
 
 Though this I'll set scores for other instances, not only myself
 logic is unique to pgsql, so most other resource agents should just
 work with whatever is present in the environment, they typically treat
 the $OCF_RESOURCE_INSTANCE as opaque.

Seems like no other RA uses instance numbers. However, quite a
few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
resources, may potentially lead to unpredictable results on
upgrade to 1.1.8.

 Thanks,
   Lars

Cheers,

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] external/vcenter: don't fail when a machine is powered off

2012-10-23 Thread Dejan Muhamedagic
Hi,

On Mon, Oct 22, 2012 at 11:06:07AM +0200, Robbert Muller wrote:
 Hello,
 
 While testing a new cluster we found the following behavior which i
 discussed on #linux-ha with andreask afterwards and we both agree the
 behavior was wrong.
 
 bug scenario:
 3 node cluster, 1 standby just for having 3 nodes, 2 active nodes
 when we did a power off of the machine ( similar to pulling the power
 cable from a machine ) the cluster failed to failover to the next node.
 
 This is because the following setting:
 RESETPOWERON was set to 0, so a machine powered off stays powered off

Just to make sure: RESETPOWERON was set to 0 in the configuration?

 with the current code path, a machine in the state poweroff is
 considered a failure for the stonith reset operation. which results in
 no resources are started on the second node, and the machine stays in a
 unclean state.
 
 The analogy with real hardware and a powerbar and imho correct behavior:
 ---
 If i pull the plug of node1, node 2 will fence it with the powerbar. The
 power will powercycle the socket without any result, because i pulled
 the plug. But the fencing operation is a success and all resources are
 started on the second node
 ---
 
 Patch to fix this with i hope a minimal change is attached.

Thanks for the patch. But we'll need to rework it a bit.

 After finding this bug i got ill and have to stay at home for a few
 days, so i don't have access to an environment to test this patch atm.

Get better soon!

Cheers,

Dejan

 Regards
 
 Robbert Müller
 
 
 
 

 diff -r 66f7442698e6 lib/plugins/stonith/external/vcenter
 --- a/lib/plugins/stonith/external/vcenterMon Oct 15 15:59:57 2012 +0200
 +++ b/lib/plugins/stonith/external/vcenterMon Oct 22 10:38:09 2012 +0200
 @@ -199,6 +199,8 @@
   if ($powerState eq poweredOff 
  (! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) {
   $vm-PowerOnVM();
   system(ha_log.sh, 
 info, Machine $esx:$vm-{'name'} has been powered on);
 + } elsif( $powerState eq 
 poweredOff ) {
 + system(ha_log.sh, 
 info, Machine $esx:$vm-{'name'} is poweredoff and RESETPOWERON was 
 disabled);
   } else {
   dielog(Could not 
 complete $esx:$vm-{'name'} power cycle);
   }

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] external/vcenter: don't fail when a machine is powered off

2012-10-23 Thread Dejan Muhamedagic
On Tue, Oct 23, 2012 at 01:19:53PM +0200, Robbert Müller wrote:
 Hi,
 
 On 23-10-12 13:13, Dejan Muhamedagic wrote:
  Hi,
 
  On Mon, Oct 22, 2012 at 11:06:07AM +0200, Robbert Muller wrote:
  Hello,
 
  While testing a new cluster we found the following behavior which i
  discussed on #linux-ha with andreask afterwards and we both agree the
  behavior was wrong.
 
  bug scenario:
  3 node cluster, 1 standby just for having 3 nodes, 2 active nodes
  when we did a power off of the machine ( similar to pulling the power
  cable from a machine ) the cluster failed to failover to the next node.
 
  This is because the following setting:
  RESETPOWERON was set to 0, so a machine powered off stays powered off
 
  Just to make sure: RESETPOWERON was set to 0 in the configuration?
 Yes it is.

OK.

  with the current code path, a machine in the state poweroff is
  considered a failure for the stonith reset operation. which results in
  no resources are started on the second node, and the machine stays in a
  unclean state.
 
  The analogy with real hardware and a powerbar and imho correct behavior:
  ---
  If i pull the plug of node1, node 2 will fence it with the powerbar. The
  power will powercycle the socket without any result, because i pulled
  the plug. But the fencing operation is a success and all resources are
  started on the second node
  ---
 
  Patch to fix this with i hope a minimal change is attached.
 
  Thanks for the patch. But we'll need to rework it a bit.
 
 Could you tell me what is wrong with it? i am currently testing it on 
 our customers environment. And it seems to work as expected.

Functionally nothing wrong with it, it's just that the extra if
was repeating part of the previous if, which may be difficult to
understand at times. Please see, and possibly test, the attached
patch.

Cheers,

Dejan


  After finding this bug i got ill and have to stay at home for a few
  days, so i don't have access to an environment to test this patch atm.
 
  Get better soon!
 
 Thx, the antibiotics seem to have killed the infection. So i'm back to work.
 
 
 Regards
 
 Robbert
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
diff -r 0809ed6abeb7 lib/plugins/stonith/external/vcenter
--- a/lib/plugins/stonith/external/vcenter	Mon Oct 22 17:35:17 2012 +0200
+++ b/lib/plugins/stonith/external/vcenter	Tue Oct 23 16:30:54 2012 +0200
@@ -196,9 +196,13 @@ elsif ($command ~~ @netCommands) {
 	} else {
 		system(ha_log.sh, warn, Tried to ResetVM $esx:$vm-{'name'} that was $powerState);
 		# Start a virtual machine on reset only if explicitly allowed by RESETPOWERON
-		if ($powerState eq poweredOff  (! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) {
-			$vm-PowerOnVM();
-			system(ha_log.sh, info, Machine $esx:$vm-{'name'} has been powered on);
+		if ($powerState eq poweredOff) {
+			if ((! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) {
+$vm-PowerOnVM();
+system(ha_log.sh, info, Machine $esx:$vm-{'name'} has been powered on);
+			} else {
+system(ha_log.sh, info, Machine $esx:$vm-{'name'} is poweredoff and RESETPOWERON was disabled);
+			}
 		} else {
 			dielog(Could not complete $esx:$vm-{'name'} power cycle);
 		}
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] A patch for stonith external/libvirt

2012-10-22 Thread Dejan Muhamedagic
Hi Holger,

On Fri, Oct 12, 2012 at 12:32:30PM +0200, Holger Teutsch wrote:
 Dejan,
 I'm no longer in the cluster business.

Good luck!

 I can not recall the reason but I suspect in my configuration nearly 2
 years ago it was the other way round:
 reboot did not work but stop/start did.

OK. The script supports the reboot method, but it still power
cycles by default.

Cheers,

Dejan

 Regards
 Holger
 
 On Thu, Oct 11, 2012 at 4:46 PM, Dejan Muhamedagic deja...@fastmail.fmwrote:
 
  Hi Owen,
 
  On Wed, Oct 10, 2012 at 10:07:41AM +0100, Owen Le Blanc wrote:
   I attach a patch for the stonith agent external/libvirt.  This agent
   was failing on our machines because for rebooting machines it tried to
   stop and then start them, which doesn't work on our system, while
   rebooting them does.  We have cluster glue version 1.0.8-2 installed
   on a Debian system, with libvirt 0.9.12-3.
 
  It would be good to have both, i.e. on-off and reboot method.
  With a parameter which would specify the method. I wonder why
  didn't the author put reboot in the first place. Holger?
 
  Cheers,
 
  Dejan
 
  P.S. Moving discussion to linux-ha-dev.
 
  
-- Owen Le Blanc
 
 
   ___
   Pacemaker mailing list: pacema...@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
 
 

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Problem] external/vcenter fails in stonith of the guest of the similar name.

2012-10-22 Thread Dejan Muhamedagic
Hi Hideo-san,

On Mon, Oct 22, 2012 at 09:20:53AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,
 
 external/vcenter fails in stonith of the guest of the similar name.
 
 For example, as for the practice of stonith of sr2, stonith does backup-sr2 
 when two guests of sr2,backup-sr2 exist.
 
 The problem is a thing by the next search.
 
  $vm = Vim::find_entity_view(view_type = VirtualMachine, filter = { name 
 = qr/\Q$host_to_vm{$targetHost}\E/i });
 
 
 It seems to be caused by the fact that the correction that Mr. Lars pointed 
 out before leaks out.
 
  * 
 http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-April/018397.html
 
 (snip)
 Unless this filter thing has a special mode where it internally does a
 $x eq $y for scalars and $x =~ $y for explicitly designated qr//
 Regexp objects, I'd suggest to here also do
   filter = { name = qr/^\Q$realTarget\E$/i }
 (snip)
 
 Please revise it to add a character of ^ to a search.

Applied. Thanks!

Dejan

 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] A patch for stonith external/libvirt

2012-10-15 Thread Dejan Muhamedagic
On Thu, Oct 11, 2012 at 04:46:48PM +0200, Dejan Muhamedagic wrote:
 Hi Owen,
 
 On Wed, Oct 10, 2012 at 10:07:41AM +0100, Owen Le Blanc wrote:
  I attach a patch for the stonith agent external/libvirt.  This agent
  was failing on our machines because for rebooting machines it tried to
  stop and then start them, which doesn't work on our system, while
  rebooting them does.  We have cluster glue version 1.0.8-2 installed
  on a Debian system, with libvirt 0.9.12-3.
 
 It would be good to have both, i.e. on-off and reboot method.
 With a parameter which would specify the method. I wonder why
 didn't the author put reboot in the first place. Holger?

I modified the patch and introduced a reset_method parameter. It
defaults to power_cycle, so the default behaviour remains the
same.

Many thanks for the patch!

Cheers,

Dejan

 Cheers,
 
 Dejan
 
 P.S. Moving discussion to linux-ha-dev.
 
  
   -- Owen Le Blanc
 
 
  ___
  Pacemaker mailing list: pacema...@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] glue 1.0.11 released

2012-10-15 Thread Dejan Muhamedagic
Hello,

The current glue repository has been tagged as 1.0.11.

The highlights:

- lrmd sets max number of children depending on the number of
  processors
- compatibility for stonith agents and hb_report for pacemaker
  v1.1.8

You can get the 1.0.11 tarball here:

http://hg.linux-ha.org/glue/archive/glue-1.0.11.tar.bz2

Many thanks to all contributors!

Enjoy!

Lars Ellenberg
Dejan Muhamedagic
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] crmsh v1.2.1 released

2012-10-15 Thread Dejan Muhamedagic
Hello,

The CRM shell v1.2.1 is released.

The highlights of the release:

* history: add the exclude (log messages) command
* pacemaker 1.1.8 compatibility code

There are two important bug fixes:

* cibconfig: repair edit for non-vi users
* cibconfig: update schema separately (don't remove the status section)

For the full set of changes, take a look at the changelog:

http://hg.savannah.gnu.org/hgweb/crmsh/file/b6bb311c7bd3/ChangeLog

== Note about Pacemaker versions ==

CRM shell 1.2.1 supports all Pacemaker 1.1 versions. The history
feature is unfortunately not as well supported with version
1.1.8.

== Installing ==

Installing the CRM shell along with Pacemaker 1.1 versions =
v1.1.7 is possible, but it will result in file conflicts. You
need to enforce file overwriting when installing packages.

== Resources ==

The CRM shell project web page at GNU savannah:

https://savannah.nongnu.org/projects/crmsh/

The sources repository is available at:

http://hg.savannah.gnu.org/hgweb/crmsh

Packages for several popular Linux distributions:

http://download.opensuse.org/repositories/network:/ha-clustering/

The man page:

http://crmsh.nongnu.org/crm.8.html

Support and bug reporting:

http://lists.linux-ha.org/mailman/listinfo/linux-ha
https://savannah.nongnu.org/bugs/?group=crmsh

Enjoy!

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.

2012-10-12 Thread Dejan Muhamedagic
Hi,

On Fri, Oct 12, 2012 at 08:31:21AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Andrew,
 Hi Dejan,
 
  Makes sense to me.
  With the patch, the effective options are create+op rather than
  create+op1+op2+op3...
 
 Will it be a meaning to change the structure of the op-done message?
 I cannot change op message when I think about other influence.
 I think that a patch is right by the op message of present lrmd and crmd.
 
 We want to apply a patch to glue early if we can do it.

I'll do some testing first.

Cheers,

Dejan

 Best Regards,
 Hideo Yamauchi.
 
 --- On Thu, 2012/10/11, Andrew Beekhof beek...@gmail.com wrote:
 
  On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic de...@suse.de wrote:
   Hi Hideo-san,
  
   On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp 
   wrote:
   Hi All,
  
   We found pacemaker that we could not judge a result of the operation of 
   lrmd well.
  
   When we carry out following crm, a parameter of the operation of start 
   is given back to crmd as a result of operation of monitor.
  
   (snip)
   primitive prmDiskd ocf:pacemaker:Dummy \
           params name=diskcheck_status_internal device=/dev/vda 
  interval=30 \
           op start interval=0 timeout=60s on-fail=restart 
  prereq=fencing \
           op monitor interval=30s timeout=60s on-fail=restart \
           op stop interval=0s timeout=60s on-fail=block
   (snip)
  
   This is because lrmd gives back prereq parameter of start as a result of 
   monitor operation.
   As a result, crmd judge mismatched with a parameter of the monitor 
   operation that crmd asked lrmd for for the parameter that Irmd carried 
   out of the monitor operation.
  
   We can confirm this problem by the next command in Pacemaker1.0.12.
  
   Command 1) crm_verify command outputs the difference in digest cord.
  
   [root@rh63-heartbeat1 ~]# crm_verify -L
   crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: 
   Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: 
   recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. 
   d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 
   0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
  
  
   Command 2) The ptest command outputs the difference in digest cord, too.
  
   [root@rh63-heartbeat1 ~]# ptest -L -VV
   ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not 
   fencing unseen nodes
   ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: 
   Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: 
   recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. 
   d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 
   0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
   [root@rh63-heartbeat1 ~]#
  
   Command 3) By cibadmin -B command, pengine restart monitor of an 
   unnecessary resource.
  
   Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: 
   check_action_definition: Parameters to prmDiskd:0_monitor_3 on 
   rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. 
   d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 
   0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
   Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  
   Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
   Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: 
   Leave   resource prmDiskd:0#011(Started rh63-heartbeat1)
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: 
   do_state_transition: State transition S_POLICY_ENGINE - 
   S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
   origin=handle_response ]
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: 
   Unpacked transition 2: 1 actions in 1 synapses
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: 
   Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from 
   /var/lib/pengine/pe-input-2.bz2
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: 
   Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 
   (local)
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: 
   Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 
   op=prmDiskd:0_monitor_3 )
   Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: 
   operation monitor[4] on prmDiskd:0 for client 19839, its parameters: 
   CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] 
   name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] 
   CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
   CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] 
   prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] 
   CRM_meta_interval=[3] CRM_meta_timeout=[6]  cancelled
   Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 
   monitor[5] (pid 20009)
   Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: 
   LRM operation prmDiskd:0_monitor_3

Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.

2012-10-10 Thread Dejan Muhamedagic
Hi Hideo-san,

On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,
 
 We found pacemaker that we could not judge a result of the operation of lrmd 
 well.
 
 When we carry out following crm, a parameter of the operation of start is 
 given back to crmd as a result of operation of monitor.
 
 (snip)
 primitive prmDiskd ocf:pacemaker:Dummy \
 params name=diskcheck_status_internal device=/dev/vda 
 interval=30 \
 op start interval=0 timeout=60s on-fail=restart 
 prereq=fencing \
 op monitor interval=30s timeout=60s on-fail=restart \
 op stop interval=0s timeout=60s on-fail=block
 (snip)
 
 This is because lrmd gives back prereq parameter of start as a result of 
 monitor operation.
 As a result, crmd judge mismatched with a parameter of the monitor operation 
 that crmd asked lrmd for for the parameter that Irmd carried out of the 
 monitor operation.
 
 We can confirm this problem by the next command in Pacemaker1.0.12.
 
 Command 1) crm_verify command outputs the difference in digest cord.
 
 [root@rh63-heartbeat1 ~]# crm_verify -L
 crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: 
 Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 
 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce 
 (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
 
 
 Command 2) The ptest command outputs the difference in digest cord, too.
 
 [root@rh63-heartbeat1 ~]# ptest -L -VV
 ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not 
 fencing unseen nodes
 ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters 
 to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 
 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce 
 (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
 [root@rh63-heartbeat1 ~]# 
 
 Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary 
 resource.
 
 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: 
 check_action_definition: Parameters to prmDiskd:0_monitor_3 on 
 rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. 
 d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 
 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  Start 
 recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave   
 resource prmDiskd:0#011(Started rh63-heartbeat1)
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: 
 State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
 cause=C_IPC_MESSAGE origin=handle_response ]
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked 
 transition 2: 1 actions in 1 synapses
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing 
 graph 2 (ref=pe_calc-dc-1349868660-20) derived from 
 /var/lib/pengine/pe-input-2.bz2
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: 
 Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 
 (local)
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: 
 Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 
 op=prmDiskd:0_monitor_3 )
 Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation 
 monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] 
 CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] 
 CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
 CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] 
 prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] 
 CRM_meta_interval=[3] CRM_meta_timeout=[6]  cancelled
 Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 
 monitor[5] (pid 20009)
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM 
 operation prmDiskd:0_monitor_3 (call=4, status=1, cib-update=0, 
 confirmed=true) Cancelled
 Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on 
 prmDiskd:0 for client 19839: pid 20009 exited with return code 0
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest:  
 yamauchi Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for 
 prmDiskd:0_monitor_3 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). 
 Source: parameters device=/dev/vda name=diskcheck_status_internal 
 interval=30 prereq=fencing CRM_meta_timeout=6/
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM 
 operation prmDiskd:0_monitor_3 (call=5, rc=0, cib-update=53, 
 confirmed=false) ok
 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: 
 Action prmDiskd:0_monitor_3 (1) confirmed on rh63-heartbeat1 (rc=0)
 
 
 It is a problem to judge 

Re: [Linux-ha-dev] Patch for named

2012-10-03 Thread Dejan Muhamedagic
Hi Serge,

On Mon, Oct 01, 2012 at 08:29:50PM -0600, Serge Dubrouski wrote:
 Hi, Dejan -
 
 Will you apply it?

The grep ps part I'll apply. I was just curious why the previous
version didn't work, but I guess it's not worth the time to
investigate.

And I'm trying to understand this part:

 named_getpid () {
 local pattern=$OCF_RESKEY_named
 
-if [ -n $OCF_RESKEY_named_rootdir ]; then
+if [ -n $OCF_RESKEY_named_rootdir -a x${OCF_RESKEY_named_rootdir} != 
x/ ]; then
pattern=$pattern.*-t $OCF_RESKEY_named_rootdir
 fi
 
How would named_rootdir be set to / unless the user sets it as
a parameter? Why would / then be treated differently?

Cheers,

Dejan

 On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com wrote:
 
  Yes it is. It also includes a fix for a small bug. So 2  lines changed.
  On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote:
 
  Hi Serge,
 
  On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
   Hello -
  
   Attached a short patch for named RA to fix improve getpid function.
 
  Sorry for the delay. Is this the same as
  https://github.com/ClusterLabs/resource-agents/issues/134
  and
  https://github.com/ClusterLabs/resource-agents/pull/140
 
  Cheers,
 
  Dejan
 
   --
   Serge Dubrouski.
 
 
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
 
 
 -- 
 Serge Dubrouski.

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named

2012-10-03 Thread Dejan Muhamedagic
On Wed, Oct 03, 2012 at 07:59:55AM -0600, Serge Dubrouski wrote:
 Look at start function. If one sets rootdir parameter to / , then start
 function strips it and monitor fails. So the patch fixes it.

Ah, OK. Missed that. Applied now. Many thanks for the patches!

Cheers,

Dejan



 On Oct 3, 2012 7:45 AM, Dejan Muhamedagic de...@suse.de wrote:
 
  Hi Serge,
 
  On Mon, Oct 01, 2012 at 08:29:50PM -0600, Serge Dubrouski wrote:
   Hi, Dejan -
  
   Will you apply it?
 
  The grep ps part I'll apply. I was just curious why the previous
  version didn't work, but I guess it's not worth the time to
  investigate.
 
  And I'm trying to understand this part:
 
   named_getpid () {
   local pattern=$OCF_RESKEY_named
 
  -if [ -n $OCF_RESKEY_named_rootdir ]; then
  +if [ -n $OCF_RESKEY_named_rootdir -a x${OCF_RESKEY_named_rootdir}
  != x/ ]; then
  pattern=$pattern.*-t $OCF_RESKEY_named_rootdir
   fi
 
  How would named_rootdir be set to / unless the user sets it as
  a parameter? Why would / then be treated differently?
 
  Cheers,
 
  Dejan
 
   On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com
  wrote:
  
Yes it is. It also includes a fix for a small bug. So 2  lines changed.
On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote:
   
Hi Serge,
   
On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
 Hello -

 Attached a short patch for named RA to fix improve getpid function.
   
Sorry for the delay. Is this the same as
https://github.com/ClusterLabs/resource-agents/issues/134
and
https://github.com/ClusterLabs/resource-agents/pull/140
   
Cheers,
   
Dejan
   
 --
 Serge Dubrouski.
   
   
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
   
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
   
   
  
  
   --
   Serge Dubrouski.
 
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named

2012-09-28 Thread Dejan Muhamedagic
Hi Serge,

On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
 Hello -
 
 Attached a short patch for named RA to fix improve getpid function.

Sorry for the delay. Is this the same as
https://github.com/ClusterLabs/resource-agents/issues/134
and
https://github.com/ClusterLabs/resource-agents/pull/140

Cheers,

Dejan

 -- 
 Serge Dubrouski.


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


  1   2   3   4   5   6   7   8   9   >