Re: [Linux-ha-dev] [ClusterLabs Developers] moving cluster-glue to github
Hi, On Mon, Oct 10, 2016 at 12:07:48PM +0200, Kristoffer Grönlund wrote: > Adam Spierswrites: > > > Kristoffer Gronlund wrote: > >> We've discussed moving cluster-glue to github.com and the ClusterLabs > >> organization, but no one has actually done it yet. ;) > > > > Out of curiosity what needs to be done for this, other than the > > obvious "git push" to github, and maybe updating a README / wiki page > > or two? > > > > The main thing would be to ensure that everyone who maintains it agrees > to the move. AFAIK at least Lars Ellenberg and Dejan are both in favor, > but I am not sure who else might be considered an owner of > cluster-glue. > > Cc:ing the Linux HA development list as well. Lars (aka lge), if you don't see any obstacles, shall we do this? Cheers, Dejan > -- > // Kristoffer Grönlund > // kgronl...@suse.com > > ___ > Developers mailing list > develop...@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/developers ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Some minor patches for cluster-glue
On Tue, Aug 30, 2016 at 08:02:26PM +0200, Kristoffer Grönlund wrote: > Lars Ellenbergwrites: > > > I think what Dejan was expecting is the result of > > "hg export", which should look more like > > > > # HG changeset patch > > # User Lars Ellenberg > > # Date 1413480257 -7200 > > # Thu Oct 16 19:24:17 2014 +0200 > > # Node ID 0a7add1d9996b6d869d441da6c82fb7b8abcef4f > > # Parent f2227d4971baed13958306b2c7cabec0eda93e82 > > fix syslogmsgfmt logging inconsistency for stderr/stdout > > ... > > > > not the output of "hg log -v -p", > > which looks like what you sent. > > > > Though the formats are very similar, > > and possibly could be massaged by hand, even, > > hg import is best used with the output created by hg export. > > Or sent dejan a hg bundle which he then can unbundle. > > Hmm, the patches I sent this time were produced by "hg export". > > Maybe it's a matter of mercurial configuration? git has pushed all > memories of mercurial off the top of my mental stack. :/ Similar here. I was surprised that hg import would always put my name etc and looked hard for some option to accept another format but found nothing. Maybe we should move the glue and heartbeat to github/clusterlabs too? Cheers, Dejan > > Cheers, > Kristoffer > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Some minor patches for cluster-glue
On Fri, Aug 12, 2016 at 07:13:32AM +0200, Kristoffer Grönlund wrote: > Dejan Muhamedagic <deja...@fastmail.fm> writes: > > > Hi Kristoffer, > > > > On Wed, Aug 10, 2016 at 12:32:48PM +0200, Kristoffer Grönlund wrote: > >> > >> Hi everyone (Lars and Dejan in particular), > >> > >> Here are some minor patches for cluster-glue. The first one is an > >> attempt to get the stonith man page somewhat up to date, and the other > >> two are minor issues discovered when compiling cluster-glue using GCC > >> 6. > > > > Pushed just now the man page patch which was pending in my queue. > > Will apply the other two too. > > > > Thanks for the contribution! > > Excellent, thank you! Apparently, the patches as they are cannot be imported with "hg import", i.e. the metadata gets lost. Did you do "hg export"? Can you supply them with "hg export"? Cheers, Dejan > Cheers, > Kristoffer > > > > > Cheers, > > > > Dejan > > > >> > >> Cheers, > >> Kristoffer > >> > >> -- > >> // Kristoffer Grönlund > >> // kgronl...@suse.com > >> > > > > > > > >> changeset: 2820:13875518ed6b > >> parent: 2815:643ac28499bd > >> user:Kristoffer Grönlund <kgronl...@suse.com> > >> date:Wed Aug 10 12:13:13 2016 +0200 > >> files: doc/stonith.xml.in > >> description: > >> Low: stonith: Update man page with -E, -m parameters (bsc#970307) > >> > >> > >> diff --git a/doc/stonith.xml.in b/doc/stonith.xml.in > >> --- a/doc/stonith.xml.in > >> +++ b/doc/stonith.xml.in > >> @@ -7,22 +7,28 @@ > >> @VERSION@ > >> > >> > >> - Alan > >> - Robertson > >> - stonith > >> - al...@unix.sh > >> +Alan > >> +Robertson > >> +stonith > >> +al...@unix.sh > >> > >> > >> - Simon > >> - Horman > >> - man page > >> - ho...@vergenet.net > >> +Simon > >> +Horman > >> +man page > >> +ho...@vergenet.net > >> > >> > >> - Florian > >> - Haas > >> - man page > >> - florian.h...@linbit.com > >> +Florian > >> +Haas > >> +man page > >> +florian.h...@linbit.com > >> + > >> + > >> +Kristoffer > >> +Gronlund > >> +man page > >> +kgronl...@suse.com > >> > >> > >> > >> @@ -44,12 +50,14 @@ > >> > >>stonith > >>-s > >> + -v > >>-h > >>-L > >> > >> > >>stonith > >>-s > >> + -v > >>-h > >>-t > >> stonith-device-type > >>-n > >> @@ -57,14 +65,24 @@ > >> > >>stonith > >>-s > >> + -v > >> + -h > >> + -t > >> stonith-device-type > >> + -m > >> + > >> + > >> + stonith > >> + -s > >> + -v > >>-h > >>-t > >> stonith-device-type > >> > >> - > >> - >> choice="plain">name=value > >> - > >> - -p > >> stonith-device-parameters > >> - -F > >> stonith-device-parameters-file > >> + > >> + >> choice="plain">name=value > >> + > >> +-p > >> stonith-device-parameters > >> +-E > >> +-F > >> stonith-device-parameters-file > >> > >>-c > >> count > >>-l > >> @@ -73,22 +91,24 @@ > >> > >>stonith > >>-s > >> + -v > >>-h > >>-t > >> stonith-device-type > >> > >> - > >> - >> choice="plain">name=value > >> - > >> - -p > >> stonith-device-parameters > >> - -F > >> stonith-device-parameters-file > &g
Re: [Linux-ha-dev] Some minor patches for cluster-glue
Hi Kristoffer, On Wed, Aug 10, 2016 at 12:32:48PM +0200, Kristoffer Grönlund wrote: > > Hi everyone (Lars and Dejan in particular), > > Here are some minor patches for cluster-glue. The first one is an > attempt to get the stonith man page somewhat up to date, and the other > two are minor issues discovered when compiling cluster-glue using GCC > 6. Pushed just now the man page patch which was pending in my queue. Will apply the other two too. Thanks for the contribution! Cheers, Dejan > > Cheers, > Kristoffer > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > > changeset: 2820:13875518ed6b > parent: 2815:643ac28499bd > user:Kristoffer Grönlund> date:Wed Aug 10 12:13:13 2016 +0200 > files: doc/stonith.xml.in > description: > Low: stonith: Update man page with -E, -m parameters (bsc#970307) > > > diff --git a/doc/stonith.xml.in b/doc/stonith.xml.in > --- a/doc/stonith.xml.in > +++ b/doc/stonith.xml.in > @@ -7,22 +7,28 @@ > @VERSION@ > > > - Alan > - Robertson > - stonith > - al...@unix.sh > +Alan > +Robertson > +stonith > +al...@unix.sh > > > - Simon > - Horman > - man page > - ho...@vergenet.net > +Simon > +Horman > +man page > +ho...@vergenet.net > > > - Florian > - Haas > - man page > - florian.h...@linbit.com > +Florian > +Haas > +man page > +florian.h...@linbit.com > + > + > +Kristoffer > +Gronlund > +man page > +kgronl...@suse.com > > > > @@ -44,12 +50,14 @@ > >stonith >-s > + -v >-h >-L > > >stonith >-s > + -v >-h >-t > stonith-device-type >-n > @@ -57,14 +65,24 @@ > >stonith >-s > + -v > + -h > + -t > stonith-device-type > + -m > + > + > + stonith > + -s > + -v >-h >-t > stonith-device-type > > - > -choice="plain">name=value > - > - -p > stonith-device-parameters > - -F > stonith-device-parameters-file > + > + choice="plain">name=value > + > +-p > stonith-device-parameters > +-E > +-F > stonith-device-parameters-file > >-c > count >-l > @@ -73,22 +91,24 @@ > >stonith >-s > + -v >-h >-t > stonith-device-type > > - > -choice="plain">name=value > - > - -p > stonith-device-parameters > - -F > stonith-device-parameters-file > + > + choice="plain">name=value > + > +-p > stonith-device-parameters > +-E > +-F > stonith-device-parameters-file > >-c > count >-T > > - reset > - on > - off > - > + reset > + on > + off > + > >nodename > > @@ -108,145 +128,161 @@ > The following options are supported: > > > - > - -c count > - > - > - Perform any actions identified by the > - -l, -S and > - -T options count > - times. > - > - > - > - > - -F > stonith-device-parameters-file > - > - > - Path of file specifying parameters for a stonith > - device. To determine the syntax of the parameters file for a > - given device type run: > - # stonith -t > stonith-device-type -n > - All of the listed parameters need to appear in order > - on a single line in the parameters file and be delimited by > - whitespace. > - > - > - > - > - -h > - > - > - Display detailed information about a stonith device > - including description, configuration information, parameters > - and any other related information. When specified without a > - stonith-device-type, detailed information on all stonith > - devices is displayed. > - If you don't yet own a stonith device and want to know > - more about the ones we support, this information is likely > - to be helpful. > - > - > - > - > - -L > - > - > - List the valid stonith device types, suitable for > - passing as an argument to the -t > - option. > - > - > - > - > - -l > - > - > - List the hosts controlled by the stonith device. > - > - > - > - > - -n > - > - > - Output the parameter names of the stonith device. > - > + > + -c count > + > + > + Perform any actions identified by the > + -l, -S and > +
Re: [Linux-ha-dev] [Problem] The designation of the S option seems to have a problem.
Hi Hideo-san, On Mon, May 02, 2016 at 04:57:09PM +0900, renayama19661...@ybb.ne.jp wrote: > Hi All, > > The S option of hb_report does not work well. > Mr. Kristoer made similar modifications in hb_report of the crm shell. > > * https://github.com/ClusterLabs/crmsh/issues/137 > > I just request this correction in glue. Thanks for the patch. But I think that we should deprecate hb_report in favour of crm report, no use keeping two copies around. Cheers, Dejan > Best Regards, > Hideo Yamauchi. > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg-technical] [Pacemaker] Fw: new important message
Hi Serge, On Fri, Feb 19, 2016 at 02:57:04AM +, Serge Dubrouski wrote: > Got hacked? Nope. My guess is that somebody featuring my name in the address book got a virus which does this kind of thing. Annoying. Perhaps the clusterlabs.org owner could unsubscribe this one. Cheers, Dejan > > On Thu, Feb 18, 2016, 7:53 PM Dejan Muhamedagic <bunker...@tiscali.it> > wrote: > > > Hello! > > > > > > > > *New message, please read* http://estoncamlievler76.com/leaving.php > > <http://estoncamlievler76.com/leaving.php?u> > > > > > > > > Dejan Muhamedagic > > ___ > > Pacemaker mailing list: pacema...@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > ha-wg-technical mailing list > ha-wg-techni...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Call for review of undocumented parameters in resource agent meta data
Hi Lars, On Thu, Feb 12, 2015 at 01:29:35AM +0100, Lars Ellenberg wrote: On Fri, Jan 30, 2015 at 09:52:49PM +0100, Dejan Muhamedagic wrote: Hello, We've tagged today (Jan 30) a new stable resource-agents release (3.9.6) in the upstream repository. Big thanks go to all contributors! Needless to say, without you this release would not be possible. Big thanks to Dejan. Who once again finally did, what I meant to do in late 2013 already, but simply pushed off for over a year (and no-one else stepped up, either...) So: Thank You. Thanks. But your contributions, which were numerous, are certainly appreciated. I just today noticed that apparently some resource agents accept and use parameters that are not documented in their meta data. I now came up with a bash two-liner, which likely still produces a lot of noise, because it does not take into account that some agents source additional helper files. But here is the list: --- used, but not described This is bad and needs to be fixed. +++ described, but apparently not used. Just drop? EvmsSCC +OCF_RESKEY_ignore_deprecation Evmsd +OCF_RESKEY_ignore_deprecation ?? intentionally undocumented ?? No idea, but I doubt that anybody out there is using evms. IPaddr+OCF_RESKEY_iflabel According to the history, this was never used. IPaddr-OCF_RESKEY_netmask This got renamed to cidr_netmask, in an effort to make it more consistent with IPaddr2 :) The same as what you found below. Not sure. IPaddr2 -OCF_RESKEY_netmask intentional, backward compat, quoting the agent: # Note: We had a version out there for a while which used # netmask instead of cidr_netmask. Don't remove this aliasing code! Please help review these: IPsrcaddr -OCF_RESKEY_ip IPsrcaddr +OCF_RESKEY_cidr_netmask IPv6addr.c-OCF_RESKEY_cidr_netmask IPv6addr.c-OCF_RESKEY_ipv6addr IPv6addr.c-OCF_RESKEY_nic LinuxSCSI +OCF_RESKEY_ignore_deprecation Squid -OCF_RESKEY_squid_confirm_trialcount Squid -OCF_RESKEY_squid_opts Squid -OCF_RESKEY_squid_suspend_trialcount SysInfo -OCF_RESKEY_clone WAS6 -OCF_RESKEY_profileName apache+OCF_RESKEY_use_ipv6 This is used in http-mon.sh, sourced by apache. conntrackd-OCF_RESKEY_conntrackd This one got renamed to binary, so it's OK. I can still recall the discussion--IMO not a biggie to have various RA differently named parameters for the program (but at the time the other party prevailed :) dnsupdate -OCF_RESKEY_opts dnsupdate +OCF_RESKEY_nsupdate_opts Bug? lmb? OK, just fixed it. It should be only the latter. docker-OCF_RESKEY_container ethmonitor-OCF_RESKEY_check_level ethmonitor-OCF_RESKEY_multiplicator galera+OCF_RESKEY_additional_parameters galera+OCF_RESKEY_binary galera+OCF_RESKEY_client_binary galera+OCF_RESKEY_config galera+OCF_RESKEY_datadir galera+OCF_RESKEY_enable_creation galera+OCF_RESKEY_group galera+OCF_RESKEY_log galera+OCF_RESKEY_pid galera+OCF_RESKEY_socket galera+OCF_RESKEY_user Probably all bogus, it source mysql-common.sh. Someone please have a more detailed look. iSCSILogicalUnit +OCF_RESKEY_product_id iSCSILogicalUnit +OCF_RESKEY_vendor_id false positive surprise: florian learned some wizardry back then ;-) for var in scsi_id scsi_sn vendor_id product_id; do envar=OCF_RESKEY_${var} if [ -n ${!envar} ]; then params=${params} ${var}=${!envar} fi done If such magic is used elsewhere, that could mask Used but not documented cases. iface-bridge -OCF_RESKEY_multicast_querier !!Yep, that needs to be documented! mysql-proxy -OCF_RESKEY_group mysql-proxy -OCF_RESKEY_user Oops, apparently my magic scriptlet below needs to learn to ignore script comments... named -OCF_RESKEY_rootdir !!Probably a bug: named_rootdir is documented. nfsserver -OCF_RESKEY_nfs_notify_cmd !!Yep, that needs to be documented! nginx -OCF_RESKEY_client nginx +OCF_RESKEY_testclient !!client is used, but not documented, !!testclient is documented, but unused... Bug? Yeah. Yet another one of the kind. nginx -OCF_RESKEY_nginx Bogus. Needs to be dropped from leading comment block. oracle-OCF_RESKEY_tns_admin !!Yep, that needs to be documented! Nope. tns_admin is not used in oracle but in oralsnr, but the two share some initialization stuff. Copypaste issue. Will fix that too. pingd
Re: [Linux-ha-dev] resource-agents 3.9.6 released
Hi Krzysztof, On Fri, Feb 06, 2015 at 02:10:57PM +0100, Krzysztof Gajdemski wrote: Hello, 30.01.2015, 21:52:49, Dejan Muhamedagic wrote: We've tagged today (Jan 30) a new stable resource-agents release (3.9.6) in the upstream repository. [ ... ] - new resource agents: clvm dnsupdate docker galera iface-bridge iface-vlan kamailio nfsnotify sg_persist vsftpd zabbixserver Just a small correction, zabbixserver (written by me in 2012) was introduced in release 3.9.4, and has remained virtually unchanged since then. Oh, I think I noticed that too, but then failed to remove it from the list. Thanks for the correction. Cheers, Dejan Regards, k. -- Krzysztof Gajdemski | songo (at) debian.org.pl | KG4751-RIPE Registered Linux User #133457 | BLUG Registered Member #0005 PGP key at: http://s.debian.org.pl/gpg/gpgkey * ID: 3C38979D Szanuję was wszystkich, którzy pozostajecie w cieniu - Snerg ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] resource-agents 3.9.6 released
Hello, We've tagged today (Jan 30) a new stable resource-agents release (3.9.6) in the upstream repository. Big thanks go to all contributors! Needless to say, without you this release would not be possible. It has been almost two years since the release v3.9.5, hence the number of changes is quite big. Still, every precaution has been taken not to introduce regressions. These are the most significant new features in the linux-ha set: - new resource agents: clvm dnsupdate docker galera iface-bridge iface-vlan kamailio nfsnotify sg_persist vsftpd zabbixserver - the drbd agent was removed (it has been deprecated since quite some time in favour of ocf:linbit:drbd) The full list of changes for the linux-ha RA set is available in ChangeLog (https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/ChangeLog) Use the agents introduced with this release with due care: they probably haven't got a lot of field testing. Please upgrade at the earliest opportunity. Best, The resource-agents maintainers ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] new release date for resource-agents release 3.9.6
Hello everybody, Someone warned us that three days is too short a period to test a release, so let's postpone the final release of resource-agents v3.9.6 to: Tuesday, Jan 27 Please do more testing in the meantime. The v3.9.6-rc1 packages are available for most popular platforms: http://download.opensuse.org/repositories/home:/dmuhamedagic:/branches:/network:/ha-clustering:/Stable RHEL-7 and Fedora 21 are unfortunately missing, due to some strange unresolvable dependencies issue. Debian/Ubuntu people can use alien. Many thanks! The resource-agents crowd ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] announcement: schedule for resource-agents release 3.9.6
Hello, On Wed, Jan 07, 2015 at 04:25:53PM +0100, Dejan Muhamedagic wrote: Hello, This is a tentative schedule for resource-agents v3.9.6: 3.9.6-rc1: January 16. The repository was tagged with v3.9.6-rc1 just now, a bit late due to illness. Packages for some popular distributions (CentOS, Fedora, openSUSE, RHEL, SLES) are (or will shortly be) available here: http://download.opensuse.org/repositories/home:/dmuhamedagic:/branches:/network:/ha-clustering:/Stable The package for RedHat RHEL-7 is for some reason unresolvable, please use the CentOS package instead--I guess that it should work just the same. The changes since v3.9.5 are as usual available in ChangeLog. Unfortunately, I don't have packages for Debian or Debian based distributions, but I suspect that alien would produce something usable. For instance, I was able to run the Filesystem ocft test successfully on Debian 7 Wheezy (after replacing /var/run with /run). Please give them a whirl. 3.9.6: January 23. I hope that we can still meet this deadline. Cheers, Dejan Let's hope that this time the schedule will work out ;-) I modified the corresponding milestones at https://github.com/ClusterLabs/resource-agents If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. There are currently 20 issues and 35 pull requests still open. Cheers, Dejan (for the resource-agents crowd) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] announcement: schedule for resource-agents release 3.9.6
Hello, This is a tentative schedule for resource-agents v3.9.6: 3.9.6-rc1: January 16. 3.9.6: January 23. Let's hope that this time the schedule will work out ;-) I modified the corresponding milestones at https://github.com/ClusterLabs/resource-agents If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. There are currently 20 issues and 35 pull requests still open. Cheers, Dejan (for the resource-agents crowd) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] Announcing crmsh release 2.1.1
Hi Kristoffer, On Wed, Oct 29, 2014 at 12:33:55AM +0100, Kristoffer Grönlund wrote: Today we are proud to announce the release of `crmsh` version 2.1.1! This version primarily fixes all known issues found since the release of `crmsh` 2.1 in June. We recommend that all users of crmsh upgrade to this version, especially if using Pacemaker 1.1.12 or newer. A massive thank you to everyone who has helped out with bug fixes, comments and contributions for this release! Many thanks for the effort and diligence you put into making crmsh always better. Great work! Cheers, Dejan For a complete list of changes since the previous version, please refer to the changelog: * https://github.com/crmsh/crmsh/blob/2.1.1/ChangeLog Packages for several popular Linux distributions can be downloaded from the Stable repository at the OBS: * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Archives of the tagged release: * https://github.com/crmsh/crmsh/archive/2.1.1.tar.gz * https://github.com/crmsh/crmsh/archive/2.1.1.zip Changes since the previous release: - cibconfig: Clean up output from crm_verify (bnc#893138) - high: constants: Add acl_target and acl_group to cib_cli_map (bnc#894041) - high: parse: split shortcuts into valid rules - medium: Handle broken CIB in find_objects - high: scripts: Handle corosync.conf without nodelist in add-node (bnc#862577) - medium: config: Assign default path in all cases - high: cibconfig: Generate valid CLI syntax for attribute lists (bnc#897462) - high: cibconfig: Add tag:tag to get all resources in tag - doc: Documentation for show tag:tag - low: report: Sort list of nodes - high: parse: Allow empty attribute values in nvpairs (bnc#898625) - high: cibconfig: Delay reinitialization after commit - low: cibconfig: Improve wording of commit prompt - low: cibconfig: Fix vim modeline - high: report: Find nodes for any log type (boo#900654) - high: hb_report: Collect logs from journald (boo#900654) - high: cibconfig: Don't crash if given an invalid pattern (bnc#901714) - high: xmlutil: Filter list of referenced resources (bnc#901714) - medium: ui_resource: Only act on resources (#64) - medium: ui_resource: Flatten, then filter (#64) - high: ui_resource: Use correct name for error function (bnc#901453) - high: ui_resource: resource trace failed if operation existed (bnc#901453) - Improved test suite Thank you, -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA mailing list linux...@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
On Thu, Oct 23, 2014 at 08:36:38PM +0200, Lars Ellenberg wrote: On Tue, Oct 21, 2014 at 02:06:24PM +0100, Tim Small wrote: On 20/10/14 20:17, Lars Ellenberg wrote: In other OSes, ps may be able to give a good enough equivalent? Debian's start-stop-daemon executable might be worth considering here - it's used extensively in the init script infrastructure of Debian (and derivatives, over several different OS kernels), and so is well debugged, and in my experience beats re-implementing it's functionality. http://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/utils/start-stop-daemon.c I've used it in pacemaker resource control scripts before successfully - it's kill expression support is very useful in particular on HA. Tim. NAME start-stop-daemon - start and stop system daemon programs Really? pasting a man page to a mailing list? But yes... If we want to require presence of start-stop-daemon, we could make all this somebody elses problem. I need find some time to browse through the code to see if it can be improved further. But in any case, using (a tool like) start-stop-daemon consistently throughout all RAs would improve the situation already. Do we want to do that? Dejan? David? Anyone? I think I'm happy with a one-liner shell solution. Cheers, Dejan Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
Hi Lars, On Mon, Oct 20, 2014 at 09:17:29PM +0200, Lars Ellenberg wrote: Recent discussions with Dejan made me again more prominently aware of a few issues we probably all know about, but usually dismis as having not much relevance in the real-world. The facts: * a pidfile typically only stores a pid * a pidfile may stale, not properly cleaned up when the pid it references died. * pids are recycled This is more an issue if kernel.pid_max is small wrt the number of processes created per unit time, for example on some embeded systems, or on some very busy systems. But it may be an issue on any system, even a mostly idle one, given bad luck^W timing, see below. A common idiom in resource agents is to kill_that_pid_and_wait_until_dead() { local pid=$1 is_alive $pid || return 0 kill -TERM $pid while is_alive $pid ; sleep 1; done return 0 } The naïve implementation of is_alive() is is_alive() { kill -0 $1 ; } This is the main issue: --- If the last-used-pid is just a bit smaller then $pid, during the sleep 1, $pid may die, and the OS may already have created a new process with that exact pid. Using above is_alive, kill_that_pid() will not notice that the to-be-killed pid has actually terminated while that new process runs. Which may be a very long time if that is some other long running daemon. This may result in stop failure and resulting node level fencing. The question is, which better way do we have to detect if some pid died after we killed it. Or, related, and even better: how to detect if the process currently running with some pid is in fact still the process referenced by the pidfile. I have two suggestions. (I am trying to avoid bashisms in here. But maybe I overlook some. Also, the code is typed, not sourced from some working script, so there may be logic bugs and typos. My intent should be obvious enough, though.) using cd /proc/$pid; stat . - # this is most likely linux specific Apparently not. According to Wikipedia at least, most UNIX platforms (including BSD and Solaris) support /proc/$pid. kill_that_pid_and_wait_until_dead() { local pid=$1 ( cd /proc/$pid || return 0 kill -TERM $pid while stat . ; sleep 1; done I'd rather test -d . (it's more common in shell scripts and runs faster). BTW, on my laptop, test -d is so fast that the process doesn't get removed before it runs and the while loop always gets executed. In that respect, stat or ls -d performs better. ) return 0 } Once pid dies, /proc/$pid will become stale (but not completely go away, because it is our cwd), and stat . will return No such process. This seems to be a very elegant solution and I cannot find fault with it. Short and easy to understand too. [... Skipping other proposals, some of which are quite exotic :) ] kill_using_pidfile() { local pidfile=$1 local pid starttime proc_pid_starttime test -e $pidfile|| return # already dead read pid starttime $pidfile|| return # unreadable I'd assume that we (the caller) knows what the process should look like in the process table, as in say command and arguments. We could also test that if there's a possibility that the process left but the PID file somehow stayed behind. # check pid and starttime are both present, numeric only, ... # I have a version that distinguishes 16 distinct error Wow! # conditions; this is the short version only... local i=0 while get_proc_pid_starttime [ $starttime = $proc_pid_starttime ] do : $(( i+=1 )) [ $i = 1 ] kill -TERM $pid # MAYBE # [ $i = 30 ] kill -KILL $pid sleep 1 done # it's not (anymore) the process we where looking for # remove that pidfile. rm -f $pidfile } In other OSes, ps may be able to give a good enough equivalent? Any comments? I'd just go with the cd /proc/$pid thing. Perhaps add a test for ps -o cmd $pid output. And thanks for giving this such a thorough analysis! Thanks, Dejan Thanks, Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing
Hi Alan, On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote: For the Assimilation code I use the full pathname of the binary from /proc to tell if it's one of mine. That's not perfect if you're using an interpreted language. It works quite well for compiled languages. Yes, though not perfect, that may be good enough. I supposed that the probability that the very same program gets the same recycled pid is rather low. (Or is it?) Cheers, Dejan On 10/20/2014 01:17 PM, Lars Ellenberg wrote: Recent discussions with Dejan made me again more prominently aware of a few issues we probably all know about, but usually dismis as having not much relevance in the real-world. The facts: * a pidfile typically only stores a pid * a pidfile may stale, not properly cleaned up when the pid it references died. * pids are recycled This is more an issue if kernel.pid_max is small wrt the number of processes created per unit time, for example on some embeded systems, or on some very busy systems. But it may be an issue on any system, even a mostly idle one, given bad luck^W timing, see below. A common idiom in resource agents is to kill_that_pid_and_wait_until_dead() { local pid=$1 is_alive $pid || return 0 kill -TERM $pid while is_alive $pid ; sleep 1; done return 0 } The naïve implementation of is_alive() is is_alive() { kill -0 $1 ; } This is the main issue: --- If the last-used-pid is just a bit smaller then $pid, during the sleep 1, $pid may die, and the OS may already have created a new process with that exact pid. Using above is_alive, kill_that_pid() will not notice that the to-be-killed pid has actually terminated while that new process runs. Which may be a very long time if that is some other long running daemon. This may result in stop failure and resulting node level fencing. The question is, which better way do we have to detect if some pid died after we killed it. Or, related, and even better: how to detect if the process currently running with some pid is in fact still the process referenced by the pidfile. I have two suggestions. (I am trying to avoid bashisms in here. But maybe I overlook some. Also, the code is typed, not sourced from some working script, so there may be logic bugs and typos. My intent should be obvious enough, though.) using cd /proc/$pid; stat . - # this is most likely linux specific kill_that_pid_and_wait_until_dead() { local pid=$1 ( cd /proc/$pid || return 0 kill -TERM $pid while stat . ; sleep 1; done ) return 0 } Once pid dies, /proc/$pid will become stale (but not completely go away, because it is our cwd), and stat . will return No such process. Variants: using test -ef -- exec 7/proc/$pid || return 0 kill -TERM $pid while :; do exec 8/proc/$pid || break test /proc/self/fd/7 -ef /proc/self/fd/8 || break sleep 1 done exec 7- 8- using stat -c %Y /proc/$pid --- ctime0=$(stat -c %Y /proc/$pid) kill -TERM $pid while ctime=$(stat -c %Y /proc/$pid) [ $ctime = $ctime0 ] ; do sleep 1; done Why not use the inode number I hear you say. Because it is not stable. Sorry. Don't believe me? Don't want to read kernel source? Try it yourself: sleep 120 k=$! stat /proc/$k echo 3 /proc/sys/vm/drop_caches stat /proc/$k But that leads me to an other proposal: store the starttime together with the pid in a pidfile. For linux that would be: (see proc(5) for /proc/pid/stat field meanings. note that (comm) may contain both whitespace and ), which is the reason for my sed | cut below) spawn_create_exclusive_pid_starttime() { local pidfile=$1 shift local reset case $- in *C*) reset=:;; *) set -C; reset=set +C;; esac if ! exec 3$pidfile ; then $reset return 1 fi $reset setsid sh -c ' read pid _ /proc/self/stat starttime=$(sed -e 's/^.*) //' /proc/$pid/stat | cut -d' ' -f 20) 3 echo $pid $starttime 3- exec $@ ' -- $@ return 0 } It does not seem possible to cycle through all available pids within fractions of time smaller than the granularity of starttime, so pid starttime should be a unique tuple (until the next reboot -- at least on linux, starttime is measured as strictly monotonic uptime). If we have pid starttime in the pidfile, we can: get_proc_pid_starttime() { proc_pid_starttime=$(sed -e 's/^.*) //' /proc/$pid/stat) || return 1 proc_pid_starttime=$(echo $proc_pid_starttime |
Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.
On Wed, Jul 23, 2014 at 11:09:55AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I confirmed it in the environment where NLS_LANG was set in Japanese(Japanese_Japan.AL32UTF8). I changed the expiration date of the OCFMON user and pushed forward the date of the system for one year. I confirmed that the next processing worked definitely.(...on oracle12c) Confirmed 1) After OCFMON user became expired (EXPIRED), the monitor processing in the sysdba user succeeds. Confirmed 2) The grep judgment of the EXPIRED character string is carried out definitely. Confirmed 3) When we start oracle again after OCFMON user expired, the time limit of the OCFMON user is changed. 415 if echo $output | grep -w EXPIRED /dev/null; then Also, could you verify if common_sql_filter() need modifications? As a result, the correction of the next grep was not necessary.(Confirmed 2,Confirmed 3) Many thanks for the testing and the patch! Cheers, Dejan Best Regards, Hideo Yamauchi. - Original Message - From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp To: Dejan Muhamedagic deja...@fastmail.fm; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 20:50 Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent. Hi Dejan, All right!! Is that with the latest version? I confirm RA now in Oracle12c. It is the latest edition of oracle. Many Thanks! Hideo Yamauchi. - Original Message - From: Dejan Muhamedagic deja...@fastmail.fm To: renayama19661...@ybb.ne.jp; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 18:46 Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent. Hi Hideo-san, On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I am going to explain the next change to our user. * https://github.com/ClusterLabs/resource-agents/pull/367 * https://github.com/ClusterLabs/resource-agents/pull/439 Let me confirm whether it is the next contents that a patch intends. 1) Because it was a problem that OCFMON user was added while the oracle manager did not know it, patch changed it to appoint it explicitly. The OCFMON user and password parameters are optional, hence in this respect nothing really changed. The user is still created by the RA. However, it is good that they're now visible in the meta-data. 2) Patch changed a deadline of OCFMON.(A deadline for password of the default may be 180 days.) That's the problem we had with the previous version. Now there's a profile created for the monitoring user which has unlimited password expiry. If the password expired in the meantime, due to a missing profile, then it is reset. If the monitor still fails, the RA tries as sysdba again. 3) Patch kept compatibility with old RA. Yes. Is there the main point of any other patches? No. If there is really the problem that occurred, before this change, please teach to me. As mentioned above, the issue was that the password could expire. I intend to really show the problem that happened to a user. * For example, a time limit of OCFMON expired and failed in a monitor of oracle Is that with the latest version? Cheers, Dejan I am going to send a patch later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.
Hi Hideo-san, On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I am going to explain the next change to our user. * https://github.com/ClusterLabs/resource-agents/pull/367 * https://github.com/ClusterLabs/resource-agents/pull/439 Let me confirm whether it is the next contents that a patch intends. 1) Because it was a problem that OCFMON user was added while the oracle manager did not know it, patch changed it to appoint it explicitly. The OCFMON user and password parameters are optional, hence in this respect nothing really changed. The user is still created by the RA. However, it is good that they're now visible in the meta-data. 2) Patch changed a deadline of OCFMON.(A deadline for password of the default may be 180 days.) That's the problem we had with the previous version. Now there's a profile created for the monitoring user which has unlimited password expiry. If the password expired in the meantime, due to a missing profile, then it is reset. If the monitor still fails, the RA tries as sysdba again. 3) Patch kept compatibility with old RA. Yes. Is there the main point of any other patches? No. If there is really the problem that occurred, before this change, please teach to me. As mentioned above, the issue was that the password could expire. I intend to really show the problem that happened to a user. * For example, a time limit of OCFMON expired and failed in a monitor of oracle Is that with the latest version? Cheers, Dejan I am going to send a patch later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] oracle RA - Change of the judgment of the check_mon_user processing.
On Tue, Jul 22, 2014 at 11:57:04AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, Consideration when NLS_LANG is set for other languages in oracle resource agent is necessary. I attached a patch. The patch looks good. I wonder if this string is also translated: 415 if echo $output | grep -w EXPIRED /dev/null; then Also, could you verify if common_sql_filter() need modifications? Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] glue 1.0.12 released
Hello, The current glue repository has been tagged as 1.0.12. It's been a while since the release candidate 1.0.12-rc1. There were a few minor fixes and additions in the meantime, mostly for hb_report. Please upgrade at the earliest possible opportunity. You can get the 1.0.12 tarball here: http://hg.linux-ha.org/glue/archive/glue-1.0.12.tar.bz2 The ChangeLog is available here: http://hg.linux-ha.org/glue/file/glue-1.0.12/ChangeLog A set of rpms is also available at the openSUSE Build Service:*) http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ The packages at the openSUSE Build Service will not work with pacemaker versions earlier than v1.1.8 because the LRM bits are not compiled. Many thanks to all contributors. Without you this release would not have been possible. Enjoy! Lars Ellenberg Dejan Muhamedagic *) Currently packages for RHEL6 and RHEL7 are not built due to missing dependencies. I suppose that you could also use the CentOS packages which were built fine. I hope that that issue will eventually be resolved. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH:glue] Correctly locate the logd systemd service file
On Thu, Jun 19, 2014 at 02:04:45PM +0900, Kazunori INOUE wrote: Hi Dejan, 2014-06-19 4:30 GMT+09:00 Dejan Muhamedagic deja...@fastmail.fm: Hi Kazunori-san, On Wed, Jun 18, 2014 at 05:57:14PM +0900, Kazunori INOUE wrote: Hi, make of cluster-glue fails on RHEL7. $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.0 (Maipo) $ hg parents changeset: 2792:45e21bc9795d tag: tip user:Dejan Muhamedagic de...@hello-penguin.com date:Thu Jun 12 12:28:59 2014 +0200 summary: Low: hb_report: gdb debug symbols output change $ make rpm rm -f cluster-glue.tar.bz2 hg archive -t tbz2 -r tip cluster-glue.tar.bz2 echo `date`: Rebuilt cluster-glue.tar.bz2 Wed Jun 18 16:09:36 JST 2014: Rebuilt cluster-glue.tar.bz2 rm -f *.src.rpm To create custom builds, edit the flags and options in cluster-glue-fedora.spec first rpmbuild -bs --define dist .fedora --define _sourcedir /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue cluster-glue-fedora.spec Wrote: /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm rpmbuild --define _sourcedir /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue --rebuild /zzz/DEV/glue/*.src.rpm Installing /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm (snip) + /usr/lib/rpm/redhat/brp-java-repack-jars Processing files: cluster-glue-1.0.12-0.rc1.el7.x86_64 error: File not found: /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd Yes, unfortunately the systemd support was added without adding support to the spec files. I'm not an expert on these, so I'd like to ask you to test the patch I put together. If you could also fix any problems you run into, that'd be great. Since RHEL didn't have the following macro, it corrected. - service_add_post - service_del_preun - service_del_postun - service_add_pre : since the macro corresponding to service_add_pre was not found, it deleted. (quick fix) $ rpm -qp --scripts /root/rpmbuild/RPMS/x86_64/cluster-glue-1.0.12-0.rc1.el7.x86_64.rpm preinstall scriptlet (using /bin/sh): getent group haclient /dev/null || groupadd -r haclient getent passwd hacluster /dev/null || \ useradd -r -g haclient -d /var/lib/heartbeat/cores/hacluster -s /sbin/nologin \ -c cluster user hacluster %service_add_pre logd.service exit 0 postinstall scriptlet (using /bin/sh): %service_add_post logd.service preuninstall scriptlet (using /bin/sh): %service_del_preun logd.service postuninstall scriptlet (using /bin/sh): %service_del_postun logd.service $ rpm -ivh cluster-glue-1.0.12-0.rc1.el7.x86_64.rpm Preparing... # [100%] /var/tmp/rpm-tmp.88IU8v: line 5: fg: no job control Updating / installing... 1:cluster-glue-1.0.12-0.rc1.el7# [100%] /var/tmp/rpm-tmp.d0hXL3: line 1: fg: no job control warning: %post(cluster-glue-1.0.12-0.rc1.el7.x86_64) scriptlet failed, exit status 1 $ rpm -e cluster-glue /var/tmp/rpm-tmp.dT37S2: line 1: fg: no job control error: %preun(cluster-glue-1.0.12-0.rc1.el7.x86_64) scriptlet failed, exit status 1 error: cluster-glue-1.0.12-0.rc1.el7.x86_64: erase failed And I have no SUSE environment now, so suse.spec is not checking. Many thanks for the fedora spec file. I think I managed to fix the suse spec file, at least it manages to produce a package with openSUSE Factory. Both patches pushed. Cheers, Dejan Regards, Cheers, Dejan Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.kJSlFu + umask 022 + cd /root/rpmbuild/BUILD + cd cluster-glue + DOCDIR=/root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + export DOCDIR + /usr/bin/mkdir -p /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr doc/stonith/README.bladehpi doc/stonith/README.cyclades doc/stonith/README.drac3 doc/stonith/README.dracmc doc/stonith/README.external doc/stonith/README.ibmrsa doc/stonith/README.ibmrsa-telnet doc/stonith/README.ipmilan doc/stonith/README.ippower9258 doc/stonith/README.meatware doc/stonith/README.rackpdu doc/stonith/README.rcd_serial doc/stonith/README.riloe doc/stonith/README.vacm doc/stonith/README.vcenter doc/stonith/README.wti_mpc doc/stonith/README_kdumpcheck.txt /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr logd/logd.cf /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr AUTHORS /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr COPYING /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr ChangeLog /root/rpmbuild/BUILDROOT/cluster
Re: [Linux-ha-dev] [PATCH:glue] Correctly locate the logd systemd service file
Hi Kazunori-san, On Wed, Jun 18, 2014 at 05:57:14PM +0900, Kazunori INOUE wrote: Hi, make of cluster-glue fails on RHEL7. $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.0 (Maipo) $ hg parents changeset: 2792:45e21bc9795d tag: tip user:Dejan Muhamedagic de...@hello-penguin.com date:Thu Jun 12 12:28:59 2014 +0200 summary: Low: hb_report: gdb debug symbols output change $ make rpm rm -f cluster-glue.tar.bz2 hg archive -t tbz2 -r tip cluster-glue.tar.bz2 echo `date`: Rebuilt cluster-glue.tar.bz2 Wed Jun 18 16:09:36 JST 2014: Rebuilt cluster-glue.tar.bz2 rm -f *.src.rpm To create custom builds, edit the flags and options in cluster-glue-fedora.spec first rpmbuild -bs --define dist .fedora --define _sourcedir /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue cluster-glue-fedora.spec Wrote: /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm rpmbuild --define _sourcedir /zzz/DEV/glue --define _specdir /zzz/DEV/glue --define _srcrpmdir /zzz/DEV/glue --rebuild /zzz/DEV/glue/*.src.rpm Installing /zzz/DEV/glue/cluster-glue-1.0.12-0.rc1.fedora.src.rpm (snip) + /usr/lib/rpm/redhat/brp-java-repack-jars Processing files: cluster-glue-1.0.12-0.rc1.el7.x86_64 error: File not found: /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd Yes, unfortunately the systemd support was added without adding support to the spec files. I'm not an expert on these, so I'd like to ask you to test the patch I put together. If you could also fix any problems you run into, that'd be great. Cheers, Dejan Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.kJSlFu + umask 022 + cd /root/rpmbuild/BUILD + cd cluster-glue + DOCDIR=/root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + export DOCDIR + /usr/bin/mkdir -p /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr doc/stonith/README.bladehpi doc/stonith/README.cyclades doc/stonith/README.drac3 doc/stonith/README.dracmc doc/stonith/README.external doc/stonith/README.ibmrsa doc/stonith/README.ibmrsa-telnet doc/stonith/README.ipmilan doc/stonith/README.ippower9258 doc/stonith/README.meatware doc/stonith/README.rackpdu doc/stonith/README.rcd_serial doc/stonith/README.riloe doc/stonith/README.vacm doc/stonith/README.vcenter doc/stonith/README.wti_mpc doc/stonith/README_kdumpcheck.txt /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr logd/logd.cf /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr AUTHORS /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr COPYING /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + cp -pr ChangeLog /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/usr/share/doc/cluster-glue-1.0.12 + exit 0 RPM build errors: File not found: /root/rpmbuild/BUILDROOT/cluster-glue-1.0.12-0.rc1.fedora.x86_64/etc/init.d/logd make: *** [rpm] Error 1 $ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ # HG changeset patch # User Dejan Muhamedagic de...@hello-penguin.com # Date 1403119156 -7200 # Wed Jun 18 21:19:16 2014 +0200 # Node ID 6380f4c77ddc62288eeb854f81288c78bdc06251 # Parent 45e21bc9795d70b86ecc3825b91ef6424db178d8 build: update spec files for systemd diff -r 45e21bc9795d -r 6380f4c77ddc cluster-glue-fedora.spec --- a/cluster-glue-fedora.spec Thu Jun 12 12:28:59 2014 +0200 +++ b/cluster-glue-fedora.spec Wed Jun 18 21:19:16 2014 +0200 @@ -60,6 +60,11 @@ BuildRequires: libuuid-devel BuildRequires: e2fsprogs-devel %endif +%if %{defined systemd_requires} +BuildRequires: systemd +%{?systemd_requires} +%endif + %prep %setup -q -n cluster-glue @@ -82,6 +87,9 @@ export docdir=%{glue_docdir} --with-daemon-user=%{uname} \ --localstatedir=%{_var} \ --libdir=%{_libdir} \ +%if %{defined _unitdir} +--with-systemdsystemunitdir=%{_unitdir} \ +%endif --docdir=%{glue_docdir} %endif @@ -112,7 +120,11 @@ standards, and an interface to common ST %files %defattr(-,root,root) %dir %{_datadir}/%{name} +%if %{defined _unitdir} +%{_unitdir}/logd.service +%else %{_sysconfdir}/init.d/logd +%endif %{_datadir}/%{name}/ha_cf_support.sh %{_datadir}/%{name}/openais_conf_support.sh %{_datadir}/%{name}/utillib.sh @@ -174,8 +186,22 @@ getent group %{gname} /dev/null || grou getent passwd %{uname} /dev/null || \ useradd -r -g %{gname} -d %{_var}/lib/heartbeat/cores/hacluster -s /sbin/nologin \ -c cluster user %{uname} +%if %{defined _unitdir} + %service_add_pre logd.service +%endif exit 0
Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github
Hi Kristoffer, On Thu, Apr 03, 2014 at 06:03:33PM +0200, Kristoffer Grönlund wrote: Hello everyone, Today, I have two major announcements to make: crmsh is moving to a new location, and I'm releasing the next major version of the crm shell! Congratulations for the new release! The crmsh made big strides forward since I've been away. Great work and many thanks! Cheers, Dejan == Find us at crmsh.github.io Since the rest of the High-Availability stack is being developed over at Github, we thought it would make things easier to move crmsh over there as well. This means we're not only moving the website and issue tracker, we're also switching from Mercurial to git. From this release forward, you will find everything crmsh-related at http://crmsh.github.io, and the source code at https://github.com/crmsh/crmsh. Here are the new URLs related to crmsh: * Website: http://crmsh.github.io/ * Documentation: http://crmsh.github.io/documentation.html * Source repository: https://github.com/crmsh/crmsh/ * Issue tracker: https://github.com/crmsh/crmsh/issues/ Not everything has moved quite yet, but the source code and web site are in place. == New stable release: crmsh 2.0 Secondly, we are proud to finally release crmsh 2.0! This is the version of crmsh I have been developing since I became a maintainer last year, and there are a lot of new and improved features in this release. For a more complete list of changes since the previous version, please refer to the changelog: * https://github.com/crmsh/crmsh/blob/2.0.0/ChangeLog Packages for several popular Linux distributions (updated soon): http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Zip archive of the tagged release: * https://github.com/crmsh/crmsh/archive/2.0.0.zip Here is a short list of some of the biggest changes and features in crmsh 2.0: * *More stable than ever before!* Many bugs and issues have been fixed, with plenty of help from the community. At the same time, this is a major release with many new features. Testing and pull requests are more than welcome! * *Cluster management commands.* We've added a couple of new sub-levels that help with the installation and management of the cluster, as well as maintaining and synchronizing the corosync configuration across nodes. There are now commands for starting and stopping the cluster services, as well as cluster scripts that make the installation and configuration of cluster-controlled resources a one-line command. * *Cleaner CLI syntax.* The parser for the configure syntax of crmsh has been rewritten, allowing for cleaner syntax, better error detection and improved error messages. * *Tab completion everywhere.* Now tab completion works not only in the interactive mode, but directly from bash. In addition, the completion back end has been completely rewritten and many more commands now have full completion. It's not quite every single command yet, but we're getting there. * *New and improved configuration.* The new configuration file is installed in /etc/crm/crm.conf by default or per user if desired, and allows for a much more flexible configuration of crmsh. * *Cluster health evaluation.* As part of the cluster script functionality, there is now a cluster health command which analyses and reports on low disk space, problems with network configuration, firewall configuration issues and more. The best part of the cluster health command is that it can work without a configured cluster, providing a checklist of issues to amend before setting up a new cluster. * *And wait, there's more!* There is now not only an extensive regression test suite but a growing set of unit tests as well, support for many new features in Pacemaker 1.1.11 such as resource sets in location constraints, anonymous shadow CIBs makes it easier to avoid race conditions in scripts, full syntax highlighting for the built-in help, the assist sub-command helps with more advanced configurations... the list goes on. Big thanks to everyone who have helped with bug fixes, comments and contributions for this release! -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Refine jboss RA functions about console logfile rotation
Hi Kazutomo-san, On Fri, Oct 04, 2013 at 09:11:20PM +0900, NAKAHIRA Kazutomo wrote: Hi, all I wrote 8 pathces for jboss RA that refine jboss RA functions about console logfile rotation. (some patches were written for current code refactoring.) Please see following pull requests and comment me These are not pull requests, but commits in your repository. You should create one or more pull requests for the upstream. We're currently about to make a new release, so unless there are bug fixes or new features, they will have to wait until after the release. Cheers, Dejan if you have a any question. 1. Low: jboss: Refine validate_all_jboss(It checks JAVA_HOME, JBOSS_HOME, and JAVA) https://github.com/knakahira/resource-agents/commit/5024ede722c337f273f30f852369cc386d1369b9 2. Low: jboss: Check JBOSS_BASE_DIR at the validate_all_jboss https://github.com/knakahira/resource-agents/commit/562ab655398b9bbbf6e0a722d61f01cd45b5409c 3. Low: jboss: Check ROTATELOGS command at the validate_all_jboss https://github.com/knakahira/resource-agents/commit/90122d3d24020c3caa5e504bf22e68d8364bd0af 4. Low: jboss: Avoid starting JBoss without rotatelogs when rotate_console is true https://github.com/knakahira/resource-agents/commit/988ba56520625fc11f83f40c12afb28fc0655e1f 5. Low: jboss: Monitor rotatelogs process and restart when it is stopped https://github.com/knakahira/resource-agents/commit/83fe1937360b720115403baf787f39532378247c 6. Low: jboss: Avoid overwriting the existing CONSOLE logfile when rotate_console was changed https://github.com/knakahira/resource-agents/commit/bf3b3075bda37aa068cce206f8de665b89d3866c 7. Low: jboss: Change test command operator = to -eq at numerical comparison https://github.com/knakahira/resource-agents/commit/534b4bc299232e5feb8f133a247e8334402f92e5 8. Low: jboss: Avoid starting JBoss if $JBOSS_USER can not write CONSOLE logfile that created by rotatelogs command https://github.com/knakahira/resource-agents/commit/62588ae50408f6897452dedbeb0b1074a5ca4c26 Best regards, -- NAKAHIRA Kazutomo Open Source Business Unit NTT DATA INTELLILINK Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] announcement: glue release candidate 1.0.12
Hello, The current glue repository has been tagged as glue-1.0.12-rc1. It contains several fixes for stonith agents and hb_report. Please give it a try. You can get the glue-1.0.12-rc1 tarball here: http://hg.linux-ha.org/glue/archive/glue-1.0.12-rc1.tar.bz2 The ChangeLog is available here: http://hg.linux-ha.org/glue/file/glue-1.0.12-rc1/ChangeLog A set of rpms is also available at the openSUSE Build Service: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ The packages at the openSUSE Build Service will not work with pacemaker versions earlier than v1.1.8 because the LRM bits are not compiled. If there are no serious issues the v1.0.12 is released on Oct 11. Many thanks to all contributors. Enjoy! Lars Ellenberg Dejan Muhamedagic ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] announcement: planning resource-agents release 3.9.6
Hi Lars, On Mon, Sep 30, 2013 at 03:47:18PM +0200, Lars Ellenberg wrote: On Mon, Sep 30, 2013 at 03:33:35PM +0200, Dejan Muhamedagic wrote: Hello, We released resource-agents v3.9.5 back in February. In the meantime there have been quite a few fixes and new features pushed to the repository and it is high time for another release. Lars Ellenberg will run the release this time and do whatever is necessary that we have a good set of resource agents. Thanks Lars! Is that so ;-) I thought I'd only try to poke all contributers, authors and maintainers, as well as the community, to either raise issues now, or don't get to complain about it later ;) Yes, that's about it. Only that! Cheers, Dejan Two milestones were created at github.com today and this is the tentative schedule: 3.9.6-rc1: October 9. 3.9.6: October 16. If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list linux...@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] announcement: planning resource-agents release 3.9.6
Hello, We released resource-agents v3.9.5 back in February. In the meantime there have been quite a few fixes and new features pushed to the repository and it is high time for another release. Lars Ellenberg will run the release this time and do whatever is necessary that we have a good set of resource agents. Thanks Lars! Two milestones were created at github.com today and this is the tentative schedule: 3.9.6-rc1: October 9. 3.9.6: October 16. If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. Cheers, The resource-agents crowd ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] crmsh 1.2.6 is released!
= CRM shell v1.2.6 Released = Hello everone, We are happy to announce the release of crmsh 1.2.6. Many thanks to everyone who contributed to this release! Version 1.2.6 has improved performance, several new features and many bug fixes and other improvements. Please refer to the changelog and documentation for more information. == New since 1.2.6-rc3 == * Fixed regression in configuration update/refresh * Fixed regression in removing cluster properties * cibconf: fix rsc_template referencing (savannah#40011) * rsctest: add support for STONITH resources == New features in 1.2.6 == * Support for containers (nagios) * Support for RA tracing * Switch from minidom to [http://lxml.de/ lxml] * Many performance improvements * Element editing improvements * History feature improvements For a full list of changes since the previous version, please take a look at the changelog: * http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.6/ChangeLog More information on where to download, how to install and how to contribute to the crmsh project can be found on the project website at [http://crmsh.nongnu.org crmsh.nongnu.org]. == Resources == Packages for several popular Linux distributions: * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ The archive of the new release: * http://download.savannah.gnu.org/releases/crmsh/crmsh-1.2.6.tar.bz2 GPG signature: * http://download.savannah.gnu.org/releases/crmsh/crmsh-1.2.6.tar.bz2.sig To discuss the ongoing development of crmsh, please join the linux-ha mailing list at: * http://lists.linux-ha.org/mailman/listinfo/linux-ha Bugs can be reported via the mailing list, or one of the project bug trackers: * https://savannah.nongnu.org/bugs/?group=crmsh * https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2 Enjoy! Dejan and Kristoffer ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist
Hi Lars, On Thu, Aug 29, 2013 at 10:49:33AM +0200, Lars Marowsky-Bree wrote: On 2013-08-28T20:13:43, Dejan Muhamedagic de...@suse.de wrote: A new RC has been released today. It contains both fixes. It doesn't do atomic updates anymore, because cibadmin or something cannot stomach comments. Couldn't find the upstream bug report :-( Can you give me the pacemaker bugid, please? Thanks! The bug's been reported here: https://bugzilla.novell.com/show_bug.cgi?id=836965 Thanks, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist
On Tue, Aug 27, 2013 at 09:14:31PM +0300, Vladislav Bogdanov wrote: 27.08.2013 19:11, Dejan Muhamedagic wrote: Hi, On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote: 23.08.2013 16:48, Kristoffer Grönlund wrote: Hi, On Fri, 23 Aug 2013 16:33:28 +0300 Vladislav Bogdanov bub...@hoster-ok.com wrote: No-no, it was before that fix too, at least with 19a3f1e5833c. Should I still try? Ah, in that case, it has not been fixed. No need to try. I will investigate further. I verified that crm_diff produces correct xml diff if I change just one property, so problem should really be in crmsh. Yes, just found where it is. The fix will be pushed tomorrow. Yeees! Thank you for info. A new RC has been released today. It contains both fixes. It doesn't do atomic updates anymore, because cibadmin or something cannot stomach comments. The updates for several distributions are, as usual, available here: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ And many thanks for testing. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist
Hi, On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote: 23.08.2013 16:48, Kristoffer Grönlund wrote: Hi, On Fri, 23 Aug 2013 16:33:28 +0300 Vladislav Bogdanov bub...@hoster-ok.com wrote: No-no, it was before that fix too, at least with 19a3f1e5833c. Should I still try? Ah, in that case, it has not been fixed. No need to try. I will investigate further. I verified that crm_diff produces correct xml diff if I change just one property, so problem should really be in crmsh. Yes, just found where it is. The fix will be pushed tomorrow. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist
Hi all, On Thu, Aug 22, 2013 at 12:57:20PM +0200, Kristoffer Grönlund wrote: Hi Takatoshi-san, On Wed, 21 Aug 2013 13:56:34 +0900 Takatoshi MATSUO matsuo@gmail.com wrote: Hi Kristoffer I reproduced the error with latest changest(b5ffd99e). Thank you, with your description I was able to reproduce and create a test case for the problem. I have pushed a workaround for the issue in the crm shell which stops the crm shell from adding comments to the CIB. (changeset e35236439b8e) However, it may be that this is a problem that ought to be fixed in Pacemaker, so I have not created a new release candidate containing the workaround. I will try to investigate this possibility before doing so. This is an issue with cibadmin. Before that gets fixed, we'll have to disable the cibadmin -P commit method and keep the old one. At least I don't see any other sensible alternative. Cheers, Dejan Thank you, -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist
Hi Takatoshi-san, On Thu, Aug 08, 2013 at 11:26:54AM +0900, Takatoshi MATSUO wrote: Hi Dejan I caught this error with 1.2.6-rc1 when loading configuration file. --- crm configure load update config.crm ERROR: elements cib-bootstrap-options already exist --- config.crm --- property \ no-quorum-policy=ignore \ stonith-enabled=false rsc_defaults \ resource-stickiness=INFINITY \ migration-threshold=1 --- I use - RHEL6 - Pacemaker 83fc351 (latest) Should be fixed now. Thanks for reporting. Cheers, Dejan Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] crmsh release candidate 1.2.6-rc1
Hello, The first release candidate for CRM shell v1.2.6 has been released. This is also the first release in which Kristoffer Grönlund takes the responsibility of co-maintainership. Welcome Kristoffer! The highlights of the release: * Atomic CIB updates (via cibadmin -P, only with pacemaker = 1.1.10) * Support for containers (nagios) * Support for RA tracing * Many performance improvements (including a switch from minidom to lxml) * Element editing improvements * General history feature improvements For the full set of changes, take a look at the changelog: http://hg.savannah.gnu.org/hgweb/crmsh/file/1.2.6-rc1/ChangeLog We will allow for a period of testing and bug fixing before the actual release of v1.2.6. The final release of CRM shell v1.2.6 is expected to be available in a few weeks. Note about Pacemaker versions CRM shell 1.2.6 supports all Pacemaker 1.1 versions including the latest v1.1.10. Installing with pacemaker versions = v1.1.7 Installing the CRM shell along with Pacemaker 1.1 versions = v1.1.7 is possible, but it will result in file conflicts. You need to enforce file overwriting when installing the crmsh package. Note that pacemaker up to v1.1.7 includes an older version of the CRM shell, and these versions are quite outdated. There are several interesting new features, including history, not found in these bundled versions of the shell. Support and bug reporting Please report any bugs found in this release in one of the bug trackers below, or send a message to the linux-ha mailing list. The mailing list can also be used for other questions related to the CRM shell, as well as the IRC channel #linux-ha on irc.freenode.net. https://savannah.nongnu.org/bugs/?group=crmsh https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2 http://lists.linux-ha.org/mailman/listinfo/linux-ha Resources Packages for several popular Linux distributions: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ The archive of the release candidate: http://hg.savannah.gnu.org/hgweb/crmsh/archive/1.2.6-rc1.tar.bz2 The man page: http://crmsh.nongnu.org/crm.8.html The CRM shell project web page at GNU savannah: https://savannah.nongnu.org/projects/crmsh/ Support and bug reporting: http://lists.linux-ha.org/mailman/listinfo/linux-ha https://savannah.nongnu.org/bugs/?group=crmsh https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2 The sources repository is available at: http://hg.savannah.gnu.org/hgweb/crmsh Enjoy! Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch/rfc: Add multiple IP support for eDir88 RA
On Mon, Jul 15, 2013 at 01:40:47PM +0200, Dejan Muhamedagic wrote: Hi Sami, On Fri, Jun 28, 2013 at 03:40:02PM +0300, Sami Kähkönen wrote: Hi, Here is a small patch to enable multiple IP support for eDir88 RA. To summarize, eDirectory supports multiple IP numbers in config file separated by comma. Example line in nds.conf: n4u.server.interfaces=168.0.0.1@524,10.0.0.1@524 Current resource agent is unable to cope with such configurations. This patch creates an array of IP:port configurations and checks them individually. Tested in SLES 11 SP2 HA environment with one and multiple IP's. All comments and additional testing are welcome. Patch also in github: https://github.com/skahkonen/resource-agents Sorry for the delay and many thanks for the patch. It looks good to me. If you can, it would be good to make a pull request at github. I did that myself and put together the indentation patch. See https://github.com/ClusterLabs/resource-agents/pull/283 Cheers, Dejan Cheers, Dejan Regards, Sami Kähkönen -- @@ -238,14 +238,34 @@ eDir_status() { ** ocf_log err Cannot retrieve interfaces from $NDSCONF. eDirectory may not be correctly configured. ** exit $OCF_ERR_GENERIC ** fi **- NDSD_SOCKS=$(netstat -ntlp | grep -ce $IFACE.*ndsd) ** ** -if [ $NDSD_SOCKS -eq 1 ] ; then **+# In case of multiple IP's split into an array ** +# and check all of them **+IFS=', ' read -a IFACE2 $IFACE ** +ocf_log debug Found ${#IFACE2[@]} interfaces from $NDSCONF. **+ ** +counter=${#IFACE2[@]} ** + ** +for IFACE in ${IFACE2[@]} * * +do ** +ocf_log debug Checking ndsd instance for $IFACE ** +NDSD_SOCKS=$(netstat -ntlp | grep -ce $IFACE.*ndsd) ** + * * +if [ $NDSD_SOCKS -eq 1 ] ; then ** + let counter=counter-1 ** + ocf_log debug Found ndsd instance for $IFACE ** + elif [ $NDSD_SOCKS -gt 1 ] ; then ** + ocf_log err More than 1 ndsd listening socket matched. Likely misconfiguration of eDirectory. ** + exit $OCF_ERR_GENERIC ** +fi ** +done **+ ** +if [ $counter -eq 0 ] ; then ** # Correct ndsd instance is definitely running ** -# Further checks are superfluous (I think...) **- return 0 ** -elif [ $NDSD_SOCKS -gt 1 ] ; then **-ocf_log err More than 1 ndsd listening socket matched. Likely misconfiguration of eDirectory. ** +ocf_log debug All ndsd instances found. **+ return 0; ** +elif [ $counter -lt ${#IFACE2[@]} ]; then **+ocf_log err Only some ndsd listening sockets matched, something is very wrong. ** exit $OCF_ERR_GENERIC ** fi **@@ -270,7 +290,7 @@ eDir_status() { ** exit $OCF_ERR_GENERIC ** fi ** ** - # Instance is not running, but no other error detected. ** + ocf_log debug ndsd instance is not running, but no other error detected. ** return 1 ** } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Exit code of reset of external/libvirt is wrong.
Hi Hideo-san, On Fri, Jul 05, 2013 at 08:51:14AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, The exit code of reset of external/libvirt is wrong. Indeed. Quite sloppy the latest change, my apologies. I attached a patch. Many thanks for the patch. Applied (slightly modified). I wonder if we should also ignore the outcome of libvirt_start. What are the chances that it fails? Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for crmsh.spec
Hi Yusuke-san, On Tue, Jul 02, 2013 at 01:08:18PM +0900, yusuke iida wrote: Hi, Dejan Could you incorporate this patch? Oh, somehow missed this one. Sorry about that. Yes, I'll apply this patch. Anyway, why is it so difficult to install pssh? Or did I already forget you reasoning? Cheers, Dejan I want some messages. Regards, Yusuke 2013/6/3 yusuke iida yusk.i...@gmail.com: Hi, Dejan 2013/4/18 Dejan Muhamedagic de...@suse.de: Hi Yusuke-san, On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote: Hi, Dejan 2013/4/4 Dejan Muhamedagic de...@suse.de: Hi Yusuke, On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote: Hi, Dejan I also tested by rhel6.3 and fedora17. Since there is no environment, centos is not tested. The point worried below is shown: - I think that %{?fedora_version} and %{?rhel_version} are macro not to exist. Those macros work in OBS when rhel6 packages are built. I wonder if that's some build service extension. In my environment, the macro of rpmbuild is as follows. rhel6.3 # rpmbuild --showrc | grep rhel -14: rhel 6 fedora18 # rpmbuild --showrc | grep fedora -14: fedora 18 So I want you to revise it as follows at least. # hg diff diff -r da93d3523e6a crmsh.spec --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100 +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900 @@ -6,7 +6,7 @@ %global upstream_version tip %global upstream_prefix crmsh -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} || 0%{?rhel} || 0%{?fedora} %define pkg_group System Environment/Daemons %else %define pkg_group Productivity/Clustering/HA Patch applied. Thanks! - pssh is not provided in rhel. I think that you should not put it in Requires. OK, but currently the only RPM built is the one in OBS where the repository includes pssh RPMs for rhel/centos too. See for instance: http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/ I made a patch to solve the above. Note that the .spec file in the upstream may not be perfect or even work on particular distribution. However, it should advise packagers on what it should contain. The pssh requirement is there because history would not work well without it. It is further rather unfortunate that that feature is used very seldom and that it got so little attention. Therefore, I'm reluctant to apply the pssh part of the patch. hmm ... For example, can't it change so that the function in which pssh is used may be disabled by the configure option? The functionality is still there, even without pssh. For instance, static reports can also be examined. It's just that the live updates are going to be quite a bit slower, if somebody wants to use the history feature to examine changes happening in the cluster. I made sure the source code. In the environment where pssh is not supported, history was collecting information using the crm_report command. Furthermore, the processing which is using pssh was found. It is rsctest. This processing serves as an error of python in the environment where pssh is not supported. Probing resources .Traceback (most recent call last): File /usr/sbin/crm, line 44, in module main.run() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 413, in run do_work() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 323, in do_work if parse_line(levels,shlex.split(' '.join(l))): File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in parse_line rv = d() # execute the command File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 148, in lambda d = lambda: cmd[0](*args) File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 1945, in rsc_test return test_resources(rsc_l, node_l, all_nodes) File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line 300, in test_resources if not are_all_stopped(rsc_l, all_nodes_l): File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line 250, in are_all_stopped drv.runop(probe) File /usr/lib64/python2.6/site-packages/crmsh/rsctest.py, line 143, in runop from crm_pssh import do_pssh_cmd File /usr/lib64/python2.6/site-packages/crmsh/crm_pssh.py, line 24, in module from psshlib import psshutil ImportError: No module named psshlib Since I thought that this was a problem, I added the processing which checks support of pssh. If there is no problem, I want you to apply this patch. Regards, Yusuke If it is possible, can it not exclude pssh from Requires? I already reasoned in my previous message (quoted above) why I'm reluctant to do that. Cheers, Dejan Regards
Re: [Linux-ha-dev] crmsh issues FutureWarning of python
Hi Takatoshi-san, On Thu, Jun 27, 2013 at 02:43:15PM +0900, Takatoshi MATSUO wrote: Hi Dejan Thanks. But previous commit c8e7cb61 or 7c9df6e81 (Jun 21) cause this error when using comment between configuration. Oops. Good catch. Fixed now. Thanks for reporting! Cheers, Dejan config.crm property \ no-quorum-policy=ignore \ stonith-enabled=false \ startup-fencing=false \ stonith-timeout=20s # comment primitive dummy ocf:heartbeat:Dummy -- # crm configure load update config.crm Traceback (most recent call last): File /usr/sbin/crm, line 44, in module main.run() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 414, in run do_work() File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 323, in do_work if parse_line(levels, shlex.split(' '.join(l))): File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in parse_line rv = d() # execute the command File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 148, in lambda d = lambda: cmd[0](*args) File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 1694, in load return set_obj.import_file(method, url) File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 268, in import_file return self.save(s, method == update) File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 455, in save if self.process(cli_list, update) == False: File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 409, in process obj = cib_factory.create_from_cli(cli_list) File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 2615, in create_from_cli node = obj.cli2node(cli_list) File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 918, in cli2node stuff_comments(node, comments) File /usr/lib64/python2.6/site-packages/crmsh/xmlutil.py, line 506, in stuff_comments add_comment(node, s) File /usr/lib64/python2.6/site-packages/crmsh/xmlutil.py, line 503, in add_comment e.insert(firstelem, comm_elem) File lxml.etree.pyx, line 723, in lxml.etree._Element.insert (src/lxml/lxml.etree.c:32132) TypeError: 'NoneType' object cannot be interpreted as an index Commit ff28b19b doesn't issue this error. Regards, Takatoshi MATSUO 2013/6/26 Dejan Muhamedagic de...@suse.de: Hi Takatoshi-san, On Wed, Jun 26, 2013 at 11:48:00AM +0900, Takatoshi MATSUO wrote: Hi Dejan I received another FutureWarning of python. /usr/lib64/python2.6/site-packages/crmsh/completion.py:88: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. if not doc Fixed now. Thanks! Dejan Regards, Takatoshi MATSUO 2013/6/21 Takatoshi MATSUO matsuo@gmail.com: Hi Dejan Thank you for your quick response. I can inhibit warnings. Regards, Takatoshi MATSUO 2013/6/21 Dejan Muhamedagic de...@suse.de: Hi Takatoshi-san, On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote: Hi Dejan I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python when using comment(#). config.crm file # Comment property \ no-quorum-policy=ignore \ stonith-enabled=false \ startup-fencing=false \ stonith-timeout=20s --- # crm configure load update config.crm /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. if comments and node: Fixed now. And some more. Hope there won't be any more in Future. Cheers, Dejan --- I use python 2.7.3 on Fedora 18. Python 2.6.6 issues same warning on RHEL6. Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org
Re: [Linux-ha-dev] crmsh issues FutureWarning of python
Hi Takatoshi-san, On Wed, Jun 26, 2013 at 11:48:00AM +0900, Takatoshi MATSUO wrote: Hi Dejan I received another FutureWarning of python. /usr/lib64/python2.6/site-packages/crmsh/completion.py:88: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. if not doc Fixed now. Thanks! Dejan Regards, Takatoshi MATSUO 2013/6/21 Takatoshi MATSUO matsuo@gmail.com: Hi Dejan Thank you for your quick response. I can inhibit warnings. Regards, Takatoshi MATSUO 2013/6/21 Dejan Muhamedagic de...@suse.de: Hi Takatoshi-san, On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote: Hi Dejan I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python when using comment(#). config.crm file # Comment property \ no-quorum-policy=ignore \ stonith-enabled=false \ startup-fencing=false \ stonith-timeout=20s --- # crm configure load update config.crm /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. if comments and node: Fixed now. And some more. Hope there won't be any more in Future. Cheers, Dejan --- I use python 2.7.3 on Fedora 18. Python 2.6.6 issues same warning on RHEL6. Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crmsh issues FutureWarning of python
Hi Takatoshi-san, On Fri, Jun 21, 2013 at 04:41:39PM +0900, Takatoshi MATSUO wrote: Hi Dejan I use latest crmsh(ff28b19bdb1d) and it issues FutureWarning of python when using comment(#). config.crm file # Comment property \ no-quorum-policy=ignore \ stonith-enabled=false \ startup-fencing=false \ stonith-timeout=20s --- # crm configure load update config.crm /usr/lib64/python2.7/site-packages/crmsh/cibconfig.py:917: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. if comments and node: Fixed now. And some more. Hope there won't be any more in Future. Cheers, Dejan --- I use python 2.7.3 on Fedora 18. Python 2.6.6 issues same warning on RHEL6. Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname
On Tue, Apr 23, 2013 at 04:44:19PM +0200, Dejan Muhamedagic wrote: Hi Junko-san, Can you try the attached patch, instead of this one? Any news? Was the patch any good? Cheers, Dejan Cheers, Dejan On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote: Hi, I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 + Corosync 2.3.0. [root@GUEST04 ~]# crm_mon -1 Last updated: Wed Apr 10 15:12:48 2013 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04 Stack: corosync Current DC: GUEST04 (3232242817) - partition with quorum Version: 1.1.9-e8caee8 2 Nodes configured, unknown expected votes 1 Resources configured. Online: [ GUEST03 GUEST04 ] dummy (ocf::pacemaker:Dummy): Started GUEST03 for example, call crm shell with lower-case hostname. [root@GUEST04 ~]# crm node standby guest03 ERROR: bad lifetime: guest03 crm node standby GUEST03 surely works well, so crm shell just doesn't take into account the hostname conversion. It's better to accept the both of the upper/lower-case. node standby, node delete, resource migrate(move) get hit with this issue. Please see the attached. Thanks, Junko ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ # HG changeset patch # User Dejan Muhamedagic de...@hello-penguin.com # Date 1366728211 -7200 # Node ID cd4d36b347c17b06b76f3386c041947a03c708bb # Parent 4a47465b1fe1f48123080b4336f0b4516d9264f6 Medium: node: ignore case when looking up nodes (thanks to Junko Ikeda) diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/ui.py.in --- a/modules/ui.py.inTue Apr 23 11:23:10 2013 +0200 +++ b/modules/ui.py.inTue Apr 23 16:43:31 2013 +0200 @@ -924,7 +924,7 @@ class RscMgmt(UserInterface): lifetime = None opt_l = fetch_opts(argl, [force]) if len(argl) == 1: -if not argl[0] in listnodes(): +if not is_node(argl[0]): lifetime = argl[0] else: node = argl[0] @@ -1186,7 +1186,7 @@ class NodeMgmt(UserInterface): if not args: node = vars.this_node if len(args) == 1: -if not args[0] in listnodes(): +if not is_node(args[0]): node = vars.this_node lifetime = args[0] else: @@ -1249,7 +1249,7 @@ class NodeMgmt(UserInterface): 'usage: delete node' if not is_name_sane(node): return False -if not node in listnodes(): +if not is_node(node): common_err(node %s not found in the CIB % node) return False rc = True diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/xmlutil.py --- a/modules/xmlutil.py Tue Apr 23 11:23:10 2013 +0200 +++ b/modules/xmlutil.py Tue Apr 23 16:43:31 2013 +0200 @@ -159,6 +159,15 @@ def mk_rsc_type(n): if ra_provider: s2 = %s:%ra_provider return ''.join((s1,s2,ra_type)) +def is_node(s): +''' +Check if s is in a list of our nodes (ignore case). +This is not fast, perhaps should be cached. +''' +for n in listnodes(): +if n.lower() == s.lower(): +return True +return False def listnodes(): nodes_elem = cibdump2elem(nodes) if nodes_elem is None: ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] cluster-glue compilation problem
Hi, On Mon, Apr 29, 2013 at 02:24:12PM +0200, khaled atteya wrote: Hi I tried to install glue pkg from source code , though i install net-snmp but when i run ./configure ,this message appear checking for ucd-snmp/snmp.h... no ,though /usr/include/ucd-snmp/snmp.h is existing , what is the problem? Sorry for the delay. There seems to be an issue with ucd-snmp, i.e. some macro is supposed to be defined. However, the alternative net-snmp should be OK and both stonith plugins which need snmp support can use either. Do you need ucd-snmp specifically for some reason? This problem also appear in other libraries. Which other libraries do you refer to? Thanks, Dejan Thanks -- KHALED MOHAMMED ATTEYA System Engineer ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes
On Mon, Apr 08, 2013 at 02:33:27PM +0100, Tim Small wrote: On 13/03/13 18:01, Dejan Muhamedagic wrote: If you could split the patch we can consider them on a one-by-one basis. I used Debian's start-stop-daemon utility in my modified script, and it looks like Redhat etc. doesn't package it (yet): http://fedoraproject.org/wiki/Features/start-stop-daemon ... the comments in that page express why I chose to use start-stop-daemon - reworking the script to have the same level of functionality as the start-stop-daemon version (but just using lsb stuff) would be a bit awkward + time-consuming. How about I use start-stop-daemon where available, and the LSB functions when not? This would still represent an improvement on the current behaviour of the script - which is pretty broken - e.g. stopping an already-stopped resource fails, and stuff like this: # # This is always active, because it doesn't do much # sendarp_monitor() { return $OCF_SUCCESS } and this: sendarp_status() { if [ -f $SENDARPPIDFILE ] then return $OCF_SUCCESS else return $OCF_NOT_RUNNING fi } A pid file is there, so it must be running! The fix for the resource agent itself is already in the repository. It is based on the standard ha_pseudo_* functions like in any other pseudo agents (i.e. those that don't have long running processes). Otherwise, I found some patch in my local queue, which never got pushed for some reason. Don't know if that would help (attached). I'll have a go with them, and check to see if they fix the bug which I was seeing. Did you get a chance to verify the two patches attached? There's now also a pull request for the socket leaks issue at github.com: https://github.com/ClusterLabs/resource-agents/pull/247 Cheers, Dejan Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak
Hi, On Tue, May 07, 2013 at 05:22:24PM +0200, Lars Ellenberg wrote: On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote: Hi All, I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011) crmd caused a memory leak. And, the memory leak happens in 3 place. I could fix 1 place. So, I attached a patch. However, the rest couldn't be not easy to solve. The issues is that stonith API can't call DelPILPluginUnive function in pils.c. I think that we need to call DelPILPluginUnive function to completely relese a memory which stonith_new function got. Is it just that there is this few bytes that are allocated once, and never freed, or is this a real memleak, that is accumulating more and more bytes during process lifetime? I suspect the former. In which case I doubt it is even worthwhile to try and fix it. Agreed. Though the first leak is not related to PILS. Why? because, in that case we basically have: main() { global_variable = malloc(something); endless_loop_that_is_not_expected_to_ever_return(); /* so, ok, we could free(global_variable) here. * but why bother? */ exit(1); } In that pseudo code above, it is easy to fix. In the (over-abstracted) case of PILs, I'm afraid, it's not that easy. And appart from academic correctness, there is no gain from fixing this for the real world. -=- If however we have a *real* memleak, that has to be fixed, of course. The first one, for which the patch is provided, could be a real memory leak. I'll apply the patch. Many thanks! Cheers, Dejan Lars I show Valgrind. This is that I can fixed a memory leak. ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0xA2C2365: external_run_cmd (external.c:767) ==3484==by 0xA2C1AC8: external_getinfo (external.c:598) ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327) ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799) ==3484==by 0x4055E7: crmd_init (main.c:154) ==3484==by 0x405419: main (main.c:120) I show the rest. ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102) ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611) ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723) ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487) ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75) ==3484==by 0x9EB90EC: stonith_new (stonith.c:105) ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434) ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484== 13
Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname
Hi Lars, On Tue, Apr 23, 2013 at 03:37:30PM +0200, Lars Ellenberg wrote: On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote: Hi, I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 + Corosync 2.3.0. [root@GUEST04 ~]# crm_mon -1 Last updated: Wed Apr 10 15:12:48 2013 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04 Stack: corosync Current DC: GUEST04 (3232242817) - partition with quorum Version: 1.1.9-e8caee8 2 Nodes configured, unknown expected votes 1 Resources configured. Online: [ GUEST03 GUEST04 ] dummy (ocf::pacemaker:Dummy): Started GUEST03 for example, call crm shell with lower-case hostname. [root@GUEST04 ~]# crm node standby guest03 ERROR: bad lifetime: guest03 crm node standby GUEST03 surely works well, so crm shell just doesn't take into account the hostname conversion. It's better to accept the both of the upper/lower-case. node standby, node delete, resource migrate(move) get hit with this issue. Please see the attached. Thanks, Junko Sorry for the late reaction. diff -r da93d3523e6a modules/ui.py.in --- a/modules/ui.py.in Tue Mar 26 11:44:17 2013 +0100 +++ b/modules/ui.py.in Mon Apr 08 17:49:00 2013 +0900 @@ -924,10 +924,14 @@ lifetime = None opt_l = fetch_opts(argl, [force]) if len(argl) == 1: -if not argl[0] in listnodes(): -lifetime = argl[0] -else: -node = argl[0] + for i in listnodes(): + pattern = re.compile(i, re.IGNORECASE) + if pattern.match(argl[1]) and len(i) == len(argl[1]): + node = argl[1] This is not exactly equivalent. Before, we had a string comparison. Now we have a regexp match. This may be considered as a new feature. But it should then be done intentionally. Otherwise, i would need to be quote-metaed first. In Perl I'd write \Q$i\E, in python we probably have to insert some '\' into it first. I admit in most setups it would not make any difference, as there should at most be dots in there ., and they should be at places where they won't be ambiguous, especially with the additional len() check. Maybe rather compare argl[0].lower() with listnodes(), which should also return all elements as .lower(). Looks like I forgot about this patch, wanted to take a closer look before applying, thanks for the analysis. There also seems to be some code repetion, IIRC. Cheers, Dejan Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname
Hi Junko-san, Can you try the attached patch, instead of this one? Cheers, Dejan On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote: Hi, I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 + Corosync 2.3.0. [root@GUEST04 ~]# crm_mon -1 Last updated: Wed Apr 10 15:12:48 2013 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04 Stack: corosync Current DC: GUEST04 (3232242817) - partition with quorum Version: 1.1.9-e8caee8 2 Nodes configured, unknown expected votes 1 Resources configured. Online: [ GUEST03 GUEST04 ] dummy (ocf::pacemaker:Dummy): Started GUEST03 for example, call crm shell with lower-case hostname. [root@GUEST04 ~]# crm node standby guest03 ERROR: bad lifetime: guest03 crm node standby GUEST03 surely works well, so crm shell just doesn't take into account the hostname conversion. It's better to accept the both of the upper/lower-case. node standby, node delete, resource migrate(move) get hit with this issue. Please see the attached. Thanks, Junko ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ # HG changeset patch # User Dejan Muhamedagic de...@hello-penguin.com # Date 1366728211 -7200 # Node ID cd4d36b347c17b06b76f3386c041947a03c708bb # Parent 4a47465b1fe1f48123080b4336f0b4516d9264f6 Medium: node: ignore case when looking up nodes (thanks to Junko Ikeda) diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/ui.py.in --- a/modules/ui.py.in Tue Apr 23 11:23:10 2013 +0200 +++ b/modules/ui.py.in Tue Apr 23 16:43:31 2013 +0200 @@ -924,7 +924,7 @@ class RscMgmt(UserInterface): lifetime = None opt_l = fetch_opts(argl, [force]) if len(argl) == 1: -if not argl[0] in listnodes(): +if not is_node(argl[0]): lifetime = argl[0] else: node = argl[0] @@ -1186,7 +1186,7 @@ class NodeMgmt(UserInterface): if not args: node = vars.this_node if len(args) == 1: -if not args[0] in listnodes(): +if not is_node(args[0]): node = vars.this_node lifetime = args[0] else: @@ -1249,7 +1249,7 @@ class NodeMgmt(UserInterface): 'usage: delete node' if not is_name_sane(node): return False -if not node in listnodes(): +if not is_node(node): common_err(node %s not found in the CIB % node) return False rc = True diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/xmlutil.py --- a/modules/xmlutil.py Tue Apr 23 11:23:10 2013 +0200 +++ b/modules/xmlutil.py Tue Apr 23 16:43:31 2013 +0200 @@ -159,6 +159,15 @@ def mk_rsc_type(n): if ra_provider: s2 = %s:%ra_provider return ''.join((s1,s2,ra_type)) +def is_node(s): +''' +Check if s is in a list of our nodes (ignore case). +This is not fast, perhaps should be cached. +''' +for n in listnodes(): +if n.lower() == s.lower(): +return True +return False def listnodes(): nodes_elem = cibdump2elem(nodes) if nodes_elem is None: ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for crmsh.spec
On Mon, Apr 22, 2013 at 11:27:49AM +0900, yusuke iida wrote: Hi, Dejan Thank you for merging a patch. However, since there is typo in part, please correct again. Oops. Sorry for the typo. Fixed now. Cheers, Dejan Regards, Yusuke 2013/4/18 Dejan Muhamedagic de...@suse.de: Hi Yusuke-san, On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote: Hi, Dejan 2013/4/4 Dejan Muhamedagic de...@suse.de: Hi Yusuke, On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote: Hi, Dejan I also tested by rhel6.3 and fedora17. Since there is no environment, centos is not tested. The point worried below is shown: - I think that %{?fedora_version} and %{?rhel_version} are macro not to exist. Those macros work in OBS when rhel6 packages are built. I wonder if that's some build service extension. In my environment, the macro of rpmbuild is as follows. rhel6.3 # rpmbuild --showrc | grep rhel -14: rhel 6 fedora18 # rpmbuild --showrc | grep fedora -14: fedora 18 So I want you to revise it as follows at least. # hg diff diff -r da93d3523e6a crmsh.spec --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100 +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900 @@ -6,7 +6,7 @@ %global upstream_version tip %global upstream_prefix crmsh -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} || 0%{?rhel} || 0%{?fedora} %define pkg_group System Environment/Daemons %else %define pkg_group Productivity/Clustering/HA Patch applied. Thanks! - pssh is not provided in rhel. I think that you should not put it in Requires. OK, but currently the only RPM built is the one in OBS where the repository includes pssh RPMs for rhel/centos too. See for instance: http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/ I made a patch to solve the above. Note that the .spec file in the upstream may not be perfect or even work on particular distribution. However, it should advise packagers on what it should contain. The pssh requirement is there because history would not work well without it. It is further rather unfortunate that that feature is used very seldom and that it got so little attention. Therefore, I'm reluctant to apply the pssh part of the patch. hmm ... For example, can't it change so that the function in which pssh is used may be disabled by the configure option? The functionality is still there, even without pssh. For instance, static reports can also be examined. It's just that the live updates are going to be quite a bit slower, if somebody wants to use the history feature to examine changes happening in the cluster. If it is possible, can it not exclude pssh from Requires? I already reasoned in my previous message (quoted above) why I'm reluctant to do that. Cheers, Dejan Regards, Yusuke Cheers, Dejan Regards, Yusuke 2013/2/19 Dejan Muhamedagic de...@suse.de: On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote: On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote: Hi, On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote: Hi, Dejan I made a patch for spec file to make rpm of crmsh in rhel environment. I want a crmsh repository to merge it if I do not have any problem. This is a problem which I ran into earlier too. Something (probably one of the rpm macros) does a 'rm -rf' of the doc directory _after_ the files got installed: [ 29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 29s] /usr/bin/install -c -m 644 'AUTHORS' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS' [ 29s] /usr/bin/install -c -m 644 'COPYING' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING' ... [ 30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245 [ 30s] + umask 022 [ 30s] + cd /usr/src/packages/BUILD [ 30s] + cd crmsh [ 30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + export DOCDIR [ 30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + cp -pr ChangeLog /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh ... [ 32s] error: create archive failed on file /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open failed - Bad file descriptor If somebody can shed some light or suggest how to deal with this ... OK. I think I managed to fix it. The result is already
Re: [Linux-ha-dev] A patch for crmsh.spec
Hi Yusuke-san, On Tue, Apr 16, 2013 at 02:55:40PM +0900, yusuke iida wrote: Hi, Dejan 2013/4/4 Dejan Muhamedagic de...@suse.de: Hi Yusuke, On Thu, Feb 21, 2013 at 09:04:45PM +0900, yusuke iida wrote: Hi, Dejan I also tested by rhel6.3 and fedora17. Since there is no environment, centos is not tested. The point worried below is shown: - I think that %{?fedora_version} and %{?rhel_version} are macro not to exist. Those macros work in OBS when rhel6 packages are built. I wonder if that's some build service extension. In my environment, the macro of rpmbuild is as follows. rhel6.3 # rpmbuild --showrc | grep rhel -14: rhel 6 fedora18 # rpmbuild --showrc | grep fedora -14: fedora 18 So I want you to revise it as follows at least. # hg diff diff -r da93d3523e6a crmsh.spec --- a/crmsh.specTue Mar 26 11:44:17 2013 +0100 +++ b/crmsh.specTue Apr 16 13:08:37 2013 +0900 @@ -6,7 +6,7 @@ %global upstream_version tip %global upstream_prefix crmsh -%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} +%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} || 0%{?rhel} || 0%{?fedora} %define pkg_group System Environment/Daemons %else %define pkg_group Productivity/Clustering/HA Patch applied. Thanks! - pssh is not provided in rhel. I think that you should not put it in Requires. OK, but currently the only RPM built is the one in OBS where the repository includes pssh RPMs for rhel/centos too. See for instance: http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/ I made a patch to solve the above. Note that the .spec file in the upstream may not be perfect or even work on particular distribution. However, it should advise packagers on what it should contain. The pssh requirement is there because history would not work well without it. It is further rather unfortunate that that feature is used very seldom and that it got so little attention. Therefore, I'm reluctant to apply the pssh part of the patch. hmm ... For example, can't it change so that the function in which pssh is used may be disabled by the configure option? The functionality is still there, even without pssh. For instance, static reports can also be examined. It's just that the live updates are going to be quite a bit slower, if somebody wants to use the history feature to examine changes happening in the cluster. If it is possible, can it not exclude pssh from Requires? I already reasoned in my previous message (quoted above) why I'm reluctant to do that. Cheers, Dejan Regards, Yusuke Cheers, Dejan Regards, Yusuke 2013/2/19 Dejan Muhamedagic de...@suse.de: On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote: On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote: Hi, On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote: Hi, Dejan I made a patch for spec file to make rpm of crmsh in rhel environment. I want a crmsh repository to merge it if I do not have any problem. This is a problem which I ran into earlier too. Something (probably one of the rpm macros) does a 'rm -rf' of the doc directory _after_ the files got installed: [ 29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 29s] /usr/bin/install -c -m 644 'AUTHORS' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS' [ 29s] /usr/bin/install -c -m 644 'COPYING' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING' ... [ 30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245 [ 30s] + umask 022 [ 30s] + cd /usr/src/packages/BUILD [ 30s] + cd crmsh [ 30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + export DOCDIR [ 30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + cp -pr ChangeLog /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh ... [ 32s] error: create archive failed on file /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open failed - Bad file descriptor If somebody can shed some light or suggest how to deal with this ... OK. I think I managed to fix it. The result is already upstream. I tested it with rhel6, centos6, fedora 17 and 18. Can you please test too. Thanks, Dejan Thanks, Dejan No problem. Will test the patch. BTW, did you notice that there are packages for rhel too at OBS (see the latest news item at https://savannah.nongnu.org/projects/crmsh/). Cheers, Dejan Best regards, Yusuke
Re: [Linux-ha-dev] crm shell: Inconsistent configure for non-existing objects
Hi, On Wed, Apr 17, 2013 at 08:39:30AM +0200, Ulrich Windl wrote: Hi! In SLES11 SP2 when I try to display (show) a resource group that does not exist, there is no error message, but when I try to delete a non-existing object, I get an error message. That's inconsistent: Trying to display an object that does not exist should also display an error. Example: crm(live)configure# show grp_v02 crm(live)configure# delete grp_v02 ERROR: object grp_v02 does not exist crm(live)configure# show grp_v02xy crm(live)configure# show grp_v0s crm(live)configure# There's a somewhat technical explanation why this happens, but obviously needs to be somehow fixed. Thanks for reporting. Cheers, Dejan Regards, Ulrich ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Proposal for crm shell and show changed
Hi, On Mon, Apr 15, 2013 at 10:38:25AM +0200, Ulrich Windl wrote: Hi! When deleting resources using the crm shell, show changed will show nothing: crm(live)configure# delete prm_v05_npiv_2 INFO: resource references in location:cli-standby-grp_v05 updated INFO: hanging location:cli-standby-grp_v05 deleted crm(live)configure# verify crm(live)configure# show changed crm(live)configure# commit It would be nice if that could be improved. The show command shows all or parts of the current configuration. Removed elements are not any more in the configuration. If we were to also (somehow) show deleted elements, that would change the semantics of the show command.* I suppose that you'd like to be able to see a kind of list of changes that happened since the previous commit. Perhaps a diff would fit better? There's right now no such command. You can perhaps open an enhancement bugzilla. Cheers, Dejan *) I can see now that it should've been named show modified :) Regards, Ulrich ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname
Hi Junko-san, On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote: Hi, I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 + Corosync 2.3.0. [root@GUEST04 ~]# crm_mon -1 Last updated: Wed Apr 10 15:12:48 2013 Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04 Stack: corosync Current DC: GUEST04 (3232242817) - partition with quorum Version: 1.1.9-e8caee8 2 Nodes configured, unknown expected votes 1 Resources configured. Online: [ GUEST03 GUEST04 ] dummy (ocf::pacemaker:Dummy): Started GUEST03 for example, call crm shell with lower-case hostname. [root@GUEST04 ~]# crm node standby guest03 ERROR: bad lifetime: guest03 This message looks awkward. crm node standby GUEST03 surely works well, so crm shell just doesn't take into account the hostname conversion. It's better to accept the both of the upper/lower-case. Yes, indeed. node standby, node delete, resource migrate(move) get hit with this issue. Please see the attached. The patch looks correct. Many thanks for the contribution! Cheers, Dejan Thanks, Junko ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] ManageVE prints bogus errors to the syslog
On Wed, Apr 10, 2013 at 12:23:38AM +0200, Lars Ellenberg wrote: On Fri, Apr 05, 2013 at 12:39:46PM +0200, Dejan Muhamedagic wrote: Hi Lars, On Thu, Apr 04, 2013 at 09:28:00PM +0200, Lars Ellenberg wrote: On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote: Hi, On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote: Hi, When stopping a node of our cluster managing a bunch of OpenVZ CTs, I get a lot of such messages in the syslog: Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist. Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7 It looks to me as if lrmd is making sure the CT is not running anymore. However, this triggers ManageVE to print an error. Could be. Looking at the RA, there's a bunch of places where the status is invoked and where this message could get logged. It could be improved. The following patch should help: https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015 BTW, why call `vzctl | awk` *twice*, just to get two items out of the vzctl output? how about lose the awk, and the second invokation? something like this: (should veexists and vestatus be local as well?) diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE index 56a3d03..53f9bab 100755 --- a/heartbeat/ManageVE +++ b/heartbeat/ManageVE @@ -182,10 +182,12 @@ migrate_from_ve() status_ve() { declare -i retcode - - veexists=`$VZCTL status $VEID 2/dev/null | $AWK '{print $3}'` - vestatus=`$VZCTL status $VEID 2/dev/null | $AWK '{print $5}'` + local vzstatus + vzstatus=`$VZCTL status $VEID 2/dev/null` retcode=$? + set -- $vzstatus + veexists=$3 + vestatus=$5 if [[ $retcode != 0 ]]; then ocf_log err vzctl status $VEID returned: $retcode Well, you do have commit rights, don't you? :) Sure, but I don't have a vz handy to test even obviously correct patches with, before I commit... Looked correct to me too, but then it wouldn't have been the first time I got something wrong :D Maybe the reporter can help with testing. Roman? Cheers, Dejan Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)
Hi, On Thu, Mar 21, 2013 at 02:59:17PM +, Tim Small wrote: On 13/03/13 16:18, Dejan Muhamedagic wrote: On Tue, Mar 12, 2013 at 12:58:44PM +, Tim Small wrote: The attached patch changes the behaviour of the OpenVZ virtual machine cluster resource agent, so that: 1. The default resource stop timeout is greater than the hardcoded Just for the record: where is this hardcoded actually? Is it also documented? Defined here: http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26 /** Shutdown timeout. */ #define MAX_SHTD_TM 120 Used by env_stop() here: http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821 http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818 for (i = 0; i MAX_SHTD_TM; i++) { sleep(1); if (!vps_is_run(h, veid)) { ret = 0; goto out; } } kill_vps: logger(0, 0, Killing container ...); Perhaps something based on wall time would be more consistent, and I can think of cases where users might want it to be a bit higher, or a bit lower, but currently it's just fixed at 120s. I can't find the timeout documented anywhere. That makes it hard to reference in other software products. But we can anyway increase the advised timeout in the metadata. 2. The start operation now waits for resource startup to complete i.e. for the VE to boot up (so that the cluster manager can detect VEs which are hanging on startup, and also throttle simultaneous startups, so as not-to overburden the node in question). Since the start operation now does a lot more, the default start operation timeout has been increased. I'm not sure if we can introduce this just like that. It changes significantly the agent's behaviour. Yes. I think it probably makes the agent's behavour a bit more correct, but that depends what your definition of a VE resource having started is, I suppose. Currently with this agent the says that it has started as soon as it has begun the boot process, whereas with the proposed change, it would mean that it has started when it has booted up (which should imply is operational). Although my personal reason for the change was so that I had a reasonable way to avoid booting tens of VEs on the host machine at the same time, I can think of other benefits - such as making other resources depend on the fully-booted VE, or detecting the case where a faulty VE host node causes the VE to hang during start-up. I suppose other options are: 1. Make start --wait the default, but make starting without waiting selectable using a RA parameter. 2. Make start without waiting the default, but make --wait selectable using a RA parameter. I suppose that the change will break configurations where the administrator has hard coded a short timeout, and this change is introduced as part of an upgrade, which I suppose is a bad thing... Yes, it could be so. I think that we should go for option 2. BTW, how does vzctl know when the VE is started? The vzctl manual page says that 'vzctl start --wait' will attempt to wait till the default runlevel is reached within the container. OK. Though that may mean different things depending on which init system is running. If the description above matches the code modifications, then there should be three instead of one patch. Fair enough - I was being lazy! :) Cheers, Dejan Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes
Hello, Anybody have objections to the patches posted here? If not, I'll push them upstream. Cheers, Dejan On Wed, Mar 13, 2013 at 07:01:33PM +0100, Dejan Muhamedagic wrote: Hi, On Sat, Mar 09, 2013 at 07:53:34PM +, Tim Small wrote: Hi, I've been using the ocf:heartbeat:SendArp script and notice a couple of issues - some problems with starting and monitoring the service, and also a file descriptor leak in the binary (which would cause it to terminate). I've detailed the problems and supplied some patches: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701913 Cannot just replace the whole RA. Sorry. If you could split the patch we can consider them on a one-by-one basis. Otherwise, I found some patch in my local queue, which never got pushed for some reason. Don't know if that would help (attached). and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701914 Can you try the attached send_arp.libnet.c patch. It does first packet build then reuses them. Cheers, Dejan ... they're not perfect, but an improvement I think. HTH, Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org From 9dae34616ef62b98b762a1f821f9e1ee749e6315 Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic de...@suse.de Date: Wed, 13 Mar 2013 18:19:10 +0100 Subject: [PATCH] Medium: tools: send_arp.libnet: reuse ARP packets --- tools/send_arp.libnet.c | 174 1 file changed, 115 insertions(+), 59 deletions(-) diff --git a/tools/send_arp.libnet.c b/tools/send_arp.libnet.c index 71148bb..2abeecb 100644 --- a/tools/send_arp.libnet.c +++ b/tools/send_arp.libnet.c @@ -49,17 +49,18 @@ #ifdef HAVE_LIBNET_1_0_API #define LTYPE struct libnet_link_int + static u_char *mk_packet(u_int32_t ip, u_char *device, u_char *macaddr, u_char *broadcast, u_char *netmask, u_short arptype); + static int send_arp(struct libnet_link_int *l, u_char *device, u_char *buf); #endif #ifdef HAVE_LIBNET_1_1_API #define LTYPE libnet_t + static libnet_t *mk_packet(libnet_t* lntag, u_int32_t ip, u_char *device, u_char macaddr[6], u_char *broadcast, u_char *netmask, u_short arptype); + int send_arp(libnet_t* lntag); #endif #define PIDDIR HA_VARRUNDIR / PACKAGE #define PIDFILE_BASE PIDDIR /send_arp- -static int send_arp(LTYPE* l, u_int32_t ip, u_char *device, u_char mac[6] -,u_char *broadcast, u_char *netmask, u_short arptype); - static char print_usage[]={ send_arp: sends out custom ARP packet.\n usage: send_arp [-i repeatinterval-ms] [-r repeatcount] [-p pidfile] \\\n @@ -135,7 +136,6 @@ main(int argc, char *argv[]) char* netmask; u_int32_t ip; u_char src_mac[6]; - LTYPE* l; int repeatcount = 1; int j; longmsinterval = 1000; @@ -143,6 +143,13 @@ main(int argc, char *argv[]) charpidfilenamebuf[64]; char*pidfilename = NULL; +#ifdef HAVE_LIBNET_1_0_API + LTYPE* l; + u_char *request, *reply; +#elif defined(HAVE_LIBNET_1_1_API) + LTYPE *request, *reply; +#endif + CL_SIGNAL(SIGTERM, byebye); CL_SIGINTERRUPT(SIGTERM, 1); @@ -201,6 +208,24 @@ main(int argc, char *argv[]) return EXIT_FAILURE; } + if (!strcasecmp(macaddr, AUTO_MAC_ADDR)) { + if (get_hw_addr(device, src_mac) 0) { + cl_log(LOG_ERR, Cannot find mac address for %s, + device); + unlink(pidfilename); + return EXIT_FAILURE; + } + } + else { + convert_macaddr((unsigned char *)macaddr, src_mac); + } + +/* + * We need to send both a broadcast ARP request as well as the ARP response we + * were already sending. All the interesting research work for this fix was + * done by Masaki Hasegawa masak...@pp.iij4u.or.jp and his colleagues. + */ + #if defined(HAVE_LIBNET_1_0_API) #ifdef ON_DARWIN if ((ip = libnet_name_resolve((unsigned char*)ipaddr, 1)) == -1UL) { @@ -219,49 +244,65 @@ main(int argc, char *argv[]) unlink(pidfilename); return EXIT_FAILURE; } + request = mk_packet(ip, (unsigned char*)device, src_mac + , (unsigned char*)broadcast, (unsigned char*)netmask + , ARPOP_REQUEST
Re: [Linux-ha-dev] [Pacemaker] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)
On Tue, Mar 12, 2013 at 12:58:44PM +, Tim Small wrote: The attached patch changes the behaviour of the OpenVZ virtual machine cluster resource agent, so that: 1. The default resource stop timeout is greater than the hardcoded Just for the record: where is this hardcoded actually? Is it also documented? timeout in vzctl stop (after this time, vzctl forcibly stops the virtual machine) (since failure to stop a resource can lead to the cluster node being evicted from the cluster entirely - and this is generally a BAD thing). Agreed. 2. The start operation now waits for resource startup to complete i.e. for the VE to boot up (so that the cluster manager can detect VEs which are hanging on startup, and also throttle simultaneous startups, so as not-to overburden the node in question). Since the start operation now does a lot more, the default start operation timeout has been increased. I'm not sure if we can introduce this just like that. It changes significantly the agent's behaviour. BTW, how does vzctl know when the VE is started? 3. Backs off the default timeouts and intervals for various operations to less aggressive values. Please make patches which are self-contained, but can be described in a succinct manner. If the description above matches the code modifications, then there should be three instead of one patch. Please continue the discussion at linux-ha-dev, that's where RA development discussions take place. Cheers, Dejan Cheers, Tim. n.b. There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that vzctl start VEID --wait hangs. The bug doesn't impact the OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either). -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 --- ManageVE.old 2010-10-22 05:54:50.0 + +++ ManageVE 2013-03-12 11:39:47.895102380 + @@ -26,12 +26,15 @@ # # # Created 07. Sep 2006 -# Updated 18. Sep 2006 +# Updated 12. Mar 2013 # -# rev. 1.00.3 +# rev. 1.00.4 # # Changelog # +# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start timeout. +# Default stop timeout to longer than the vzctl stop 'polite' +# interval. # 12/Sep/06 1.00.3 more cleanup # 12/Sep/06 1.00.2 fixed some logic in start_ve # general cleanup all over the place @@ -67,7 +70,7 @@ ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=ManageVE - version1.00.3/version + version1.00.4/version longdesc lang=en This OCF complaint resource agent manages OpenVZ VEs and thus requires @@ -87,12 +90,12 @@ /parameters actions -action name=start timeout=75 / -action name=stop timeout=75 / -action name=status depth=0 timeout=10 interval=10 / -action name=monitor depth=0 timeout=10 interval=10 / -action name=validate-all timeout=5 / -action name=meta-data timeout=5 / +action name=start timeout=240 / +action name=stop timeout=150 / +action name=status depth=0 timeout=20 interval=60 / +action name=monitor depth=0 timeout=20 interval=60 / +action name=validate-all timeout=10 / +action name=meta-data timeout=10 / /actions /resource-agent END @@ -127,7 +130,7 @@ return $retcode fi - $VZCTL start $VEID /dev/null + $VZCTL start $VEID --wait /dev/null retcode=$? if [[ $retcode != 0 $retcode != 32 ]]; then ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] A couple of SendArp resource changes
Hi, On Sat, Mar 09, 2013 at 07:53:34PM +, Tim Small wrote: Hi, I've been using the ocf:heartbeat:SendArp script and notice a couple of issues - some problems with starting and monitoring the service, and also a file descriptor leak in the binary (which would cause it to terminate). I've detailed the problems and supplied some patches: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701913 Cannot just replace the whole RA. Sorry. If you could split the patch we can consider them on a one-by-one basis. Otherwise, I found some patch in my local queue, which never got pushed for some reason. Don't know if that would help (attached). and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701914 Can you try the attached send_arp.libnet.c patch. It does first packet build then reuses them. Cheers, Dejan ... they're not perfect, but an improvement I think. HTH, Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org From 9dae34616ef62b98b762a1f821f9e1ee749e6315 Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic de...@suse.de Date: Wed, 13 Mar 2013 18:19:10 +0100 Subject: [PATCH] Medium: tools: send_arp.libnet: reuse ARP packets --- tools/send_arp.libnet.c | 174 1 file changed, 115 insertions(+), 59 deletions(-) diff --git a/tools/send_arp.libnet.c b/tools/send_arp.libnet.c index 71148bb..2abeecb 100644 --- a/tools/send_arp.libnet.c +++ b/tools/send_arp.libnet.c @@ -49,17 +49,18 @@ #ifdef HAVE_LIBNET_1_0_API # define LTYPE struct libnet_link_int + static u_char *mk_packet(u_int32_t ip, u_char *device, u_char *macaddr, u_char *broadcast, u_char *netmask, u_short arptype); + static int send_arp(struct libnet_link_int *l, u_char *device, u_char *buf); #endif #ifdef HAVE_LIBNET_1_1_API # define LTYPE libnet_t + static libnet_t *mk_packet(libnet_t* lntag, u_int32_t ip, u_char *device, u_char macaddr[6], u_char *broadcast, u_char *netmask, u_short arptype); + int send_arp(libnet_t* lntag); #endif #define PIDDIR HA_VARRUNDIR / PACKAGE #define PIDFILE_BASE PIDDIR /send_arp- -static int send_arp(LTYPE* l, u_int32_t ip, u_char *device, u_char mac[6] -, u_char *broadcast, u_char *netmask, u_short arptype); - static char print_usage[]={ send_arp: sends out custom ARP packet.\n usage: send_arp [-i repeatinterval-ms] [-r repeatcount] [-p pidfile] \\\n @@ -135,7 +136,6 @@ main(int argc, char *argv[]) char* netmask; u_int32_t ip; u_char src_mac[6]; - LTYPE* l; int repeatcount = 1; int j; long msinterval = 1000; @@ -143,6 +143,13 @@ main(int argc, char *argv[]) charpidfilenamebuf[64]; char*pidfilename = NULL; +#ifdef HAVE_LIBNET_1_0_API + LTYPE* l; + u_char *request, *reply; +#elif defined(HAVE_LIBNET_1_1_API) + LTYPE *request, *reply; +#endif + CL_SIGNAL(SIGTERM, byebye); CL_SIGINTERRUPT(SIGTERM, 1); @@ -201,6 +208,24 @@ main(int argc, char *argv[]) return EXIT_FAILURE; } + if (!strcasecmp(macaddr, AUTO_MAC_ADDR)) { + if (get_hw_addr(device, src_mac) 0) { + cl_log(LOG_ERR, Cannot find mac address for %s, + device); + unlink(pidfilename); + return EXIT_FAILURE; + } + } + else { + convert_macaddr((unsigned char *)macaddr, src_mac); + } + +/* + * We need to send both a broadcast ARP request as well as the ARP response we + * were already sending. All the interesting research work for this fix was + * done by Masaki Hasegawa masak...@pp.iij4u.or.jp and his colleagues. + */ + #if defined(HAVE_LIBNET_1_0_API) #ifdef ON_DARWIN if ((ip = libnet_name_resolve((unsigned char*)ipaddr, 1)) == -1UL) { @@ -219,49 +244,65 @@ main(int argc, char *argv[]) unlink(pidfilename); return EXIT_FAILURE; } + request = mk_packet(ip, (unsigned char*)device, src_mac + , (unsigned char*)broadcast, (unsigned char*)netmask + , ARPOP_REQUEST); + reply = mk_packet(ip, (unsigned char*)device, src_mac + , (unsigned char *)broadcast + , (unsigned char *)netmask, ARPOP_REPLY); + if (!request || !reply) { + cl_log(LOG_ERR, could not create packets); + unlink(pidfilename); + return EXIT_FAILURE; + } + for (j=0; j repeatcount; ++j) { + c = send_arp(l, (unsigned char*)device, request); + if (c 0) { + break; + } + mssleep(msinterval / 2); + c = send_arp(l, (unsigned char*)device, reply); + if (c 0) { + break; + } + if (j != repeatcount-1) { + mssleep(msinterval / 2); + } + } #elif defined(HAVE_LIBNET_1_1_API) - if ((l=libnet_init(LIBNET_LINK, device, errbuf)) == NULL
Re: [Linux-ha-dev] [PATCH] crmsh: fix in python version checking
Hi Keisuke-san, On Fri, Mar 01, 2013 at 10:31:46AM +0900, Keisuke MORI wrote: Hi Dejan, Here is a trivial patch for crmsh. It is totally harmless because it affects only when the python version is too old and crmsh would abort anyway :) Many thanks for the patch. Cheers, Dejan Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for crmsh.spec
On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote: Hi, On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote: Hi, Dejan I made a patch for spec file to make rpm of crmsh in rhel environment. I want a crmsh repository to merge it if I do not have any problem. This is a problem which I ran into earlier too. Something (probably one of the rpm macros) does a 'rm -rf' of the doc directory _after_ the files got installed: [ 29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 29s] /usr/bin/install -c -m 644 'AUTHORS' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS' [ 29s] /usr/bin/install -c -m 644 'COPYING' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING' ... [ 30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245 [ 30s] + umask 022 [ 30s] + cd /usr/src/packages/BUILD [ 30s] + cd crmsh [ 30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + export DOCDIR [ 30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + cp -pr ChangeLog /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh ... [ 32s] error: create archive failed on file /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open failed - Bad file descriptor If somebody can shed some light or suggest how to deal with this ... Thanks, Dejan No problem. Will test the patch. BTW, did you notice that there are packages for rhel too at OBS (see the latest news item at https://savannah.nongnu.org/projects/crmsh/). Cheers, Dejan Best regards, Yusuke -- METRO SYSTEMS CO., LTD Yusuke Iida Mail: yusk.i...@gmail.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for crmsh.spec
On Tue, Feb 19, 2013 at 11:03:53AM +0100, Dejan Muhamedagic wrote: On Fri, Feb 15, 2013 at 10:19:41PM +0100, Dejan Muhamedagic wrote: Hi, On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote: Hi, Dejan I made a patch for spec file to make rpm of crmsh in rhel environment. I want a crmsh repository to merge it if I do not have any problem. This is a problem which I ran into earlier too. Something (probably one of the rpm macros) does a 'rm -rf' of the doc directory _after_ the files got installed: [ 29s] test -z /usr/share/doc/packages/crmsh || /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 29s] /usr/bin/install -c -m 644 'AUTHORS' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS' [ 29s] /usr/bin/install -c -m 644 'COPYING' '/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/COPYING' ... [ 30s] Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.6245 [ 30s] + umask 022 [ 30s] + cd /usr/src/packages/BUILD [ 30s] + cd crmsh [ 30s] + DOCDIR=/var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + export DOCDIR [ 30s] + rm -rf /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + /bin/mkdir -p /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh [ 30s] + cp -pr ChangeLog /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh ... [ 32s] error: create archive failed on file /var/tmp/crmsh-1.2.5-build/usr/share/doc/packages/crmsh/AUTHORS: cpio: open failed - Bad file descriptor If somebody can shed some light or suggest how to deal with this ... OK. I think I managed to fix it. The result is already upstream. I tested it with rhel6, centos6, fedora 17 and 18. Can you please test too. Thanks, Dejan Thanks, Dejan No problem. Will test the patch. BTW, did you notice that there are packages for rhel too at OBS (see the latest news item at https://savannah.nongnu.org/projects/crmsh/). Cheers, Dejan Best regards, Yusuke -- METRO SYSTEMS CO., LTD Yusuke Iida Mail: yusk.i...@gmail.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for crmsh.spec
Hi, On Fri, Feb 15, 2013 at 02:25:41PM +0900, yusuke iida wrote: Hi, Dejan I made a patch for spec file to make rpm of crmsh in rhel environment. I want a crmsh repository to merge it if I do not have any problem. No problem. Will test the patch. BTW, did you notice that there are packages for rhel too at OBS (see the latest news item at https://savannah.nongnu.org/projects/crmsh/). Cheers, Dejan Best regards, Yusuke -- METRO SYSTEMS CO., LTD Yusuke Iida Mail: yusk.i...@gmail.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] crmsh release 1.2.5
Hello, The CRM shell v1.2.5 is released. The highlights of the release: * cibconfig: modgroup command * cibconfig: directed graph support * history: diff command (between PE inputs) * history: show command (show configuration of PE inputs) * history: graph command For the full set of changes, take a look at the changelog: http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.5/ChangeLog == Note about Pacemaker versions == CRM shell 1.2.5 supports all Pacemaker 1.1 versions. == Installing == Installing the CRM shell along with Pacemaker 1.1 versions = v1.1.7 is possible, but it will result in file conflicts. You need to enforce file overwriting when installing packages. Note that pacemaker v1.1.7 carries a crm shell version which is the same as in v1.1.6, or put differently quite outdated. There are several interesting new features, including history, which never made it in any pacemaker release. == Resources == Packages for several popular Linux distributions: http://download.opensuse.org/repositories/network:/ha-clustering/ The man page: http://crmsh.nongnu.org/crm.8.html The CRM shell project web page at GNU savannah: https://savannah.nongnu.org/projects/crmsh/ Support and bug reporting: http://lists.linux-ha.org/mailman/listinfo/linux-ha https://savannah.nongnu.org/bugs/?group=crmsh https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2 The sources repository is available at: http://hg.savannah.gnu.org/hgweb/crmsh Enjoy! Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] announcement: resource-agents release 3.9.5
Hello, We've tagged today (Feb 7) a new stable resource-agents release (3.9.5) in the upstream repository. Big thanks go to all contributors! Needless to say, without you this release would not be possible. The Linux-HA resource agents set changes consist mainly of bug fixes and a few improvements and new features. The most important fix is for the missing unsolicited ARPs issue in IPaddr2. The following two features are worth mentioning too: - support for RA tracing (see the README file for more details); your favourite UI should provide a way to turn trace on/off - pgsql: support starting as Hot Standby The full list of changes for the linux-ha RA set is available in ChangeLog: https://github.com/ClusterLabs/resource-agents/blob/master/ChangeLog The rgmanager resource agents set received mainly bug fixes. Please upgrade at the earliest opportunity. Best, The resource-agents maintainers ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1
Hi Keisuke-san, On Thu, Jan 31, 2013 at 06:33:49PM +0900, Keisuke MORI wrote: Hi, Does IPaddr2 need to support 'eth0:label' format in a single 'nic' parameter when you want to use an interface label? I thought it does't from the mata-data description of 'nic' but it looks conflicting with the 'iflabel' description; nic: Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. If you want a label, see the iflabel parameter. iflabel: (omit) If a label is specified in nic name, this parameter has no effect. Hmpf. The latest IPaddr2 (findif.sh version) would reject it as an invalid nic has specified. If we do need to support it I will submit a patch for this. I'd rather just update the iflabel description. After all, normally one doesn't need to specify the nic. But you'll get different preferences from different people :) However, it seems to be a regression, so we should probably allow labels. BTW, is this related to https://github.com/ClusterLabs/resource-agents/issues/200 ? Cheers, Dejan Thanks, 2013/1/30 Dejan Muhamedagic de...@suse.de: Hello, The current resource-agents repository has been tagged v3.9.5rc1. It is mainly a bug fix release. The full list of changes for the linux-ha RA set is available in ChangeLog: https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog We'll allow a week for testing. The final release is planned for Feb 6. Many thanks to all contributors! Best, The resource-agents maintainers ___ ha-wg-technical mailing list ha-wg-techni...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1
Hello, The current resource-agents repository has been tagged v3.9.5rc1. It is mainly a bug fix release. The full list of changes for the linux-ha RA set is available in ChangeLog: https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog We'll allow a week for testing. The final release is planned for Feb 6. Many thanks to all contributors! Best, The resource-agents maintainers ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] 答复: Re: 答复: Re: New patch for?resource-agents(nfsserver)
On Wed, Jan 23, 2013 at 11:34:34PM -0700, John Shi wrote: Hi Dejan, You take the action of checks on the place it should go, better! Thanks in advance! Thanks for the review. Applied now. Cheers, Dejan Best regards, John Dejan Muhamedagic de...@suse.de 2013-1-24 上午 0:17 On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote: Hi Dejan, Is this patch OK ? I didn't see the pache going upstream yet. Thanks! Applied now. Can you please also test the attached patch. It is fairly small and low impact, but still. Cheers, Dejan Best regards, John Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 Hi John, On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote: Hi all, Fix all cause an error to call rpc.statd, including: Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd in meta_data. Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, bacase no retrytime option for rpc.statd. The parameter nfs_notify_foreground can make the correct option for rpc.statd Looks good to me. Cheers, Dejan Best regards, John ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] 答复: Re: ocft updated (test tool of resource-agnets)
On Tue, Jan 22, 2013 at 02:21:37AM -0700, John Shi wrote: Done. Applied now. Many thanks for the contribution! Cheers, Dejan Dejan Muhamedagic de...@suse.de 2013-1-22 上午 4:20 Hi John, On Sun, Jan 20, 2013 at 09:23:05AM -0700, John Shi wrote: Medium: tools: ocft: update to version 0.44 Added incremental mode (ocft test -i) , cache results and not repeatedly rerun those cases which succeeded. Improved *ocft make*, the ocft will only make the updated test-case file. The ocft test prints the actual case names instead of just case numbers. Can you please split the patch into several self-contained patches, each of which fixes a single issue. Great work! Thanks, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation
Hi Attila, Sorry for the delay, somehow missed your message. On Fri, Dec 28, 2012 at 12:52:22PM +0100, Attila Megyeri wrote: Hi Dejan, -Original Message- From: linux-ha-dev-boun...@lists.linux-ha.org [mailto:linux-ha-dev-boun...@lists.linux-ha.org] On Behalf Of Dejan Muhamedagic Sent: Monday, December 24, 2012 11:07 AM To: linux-ha-dev@lists.linux-ha.org Subject: Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation Hi, On Thu, Dec 20, 2012 at 08:03:32PM +0100, Attila Megyeri wrote: hi, I have a cluster configuration with two IPsrcaddr resources (e.g. IP address A and B) They are configured to two different addresses, and are never supposed to run on the same nodes. So A can run on nodes N1 and N2, B can run on N3,N4. My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource is running on a node where it shouldn't, and of course it is in unmanaged state and cannot be stopped. For instance: IP address A is started, unamanged on node N3. I am using pacemaker 1.1.6 on a debian system, with the latest RA from github. I checked the RA, and here are my findings. - When status is called, it calls the srca_read() function - srca_read() returns 2, if a srcip is running on the given node, but with a different IP address. - srca_status(), when gets 2 from srca_read(), returns $OCF_ERR_GENERIC As a result, in my case IP B is running on N3, which is OK, but CRM_mon reports that IP A is also running on N3 (unmanaged). [for some reason this is how the OCF_ERR_GENERIC is interpreted] This is definitively a bug, the question is whether in pacemaker or in the RA. If I change the script to return $OCF_NOT_RUNNING instead of $OCF_ERR_GENERIC it works properly. What is the proper behavior in this case? My recommendation is to fix the RA so that srca_read() returns 1, if there is a srcip on the node, but it is not the queried one. The comment in the agent says: # NOTES: # # 1) There must be one and not more than 1 default route! Mainly because # I can't see why you should have more than one. And if there is more # than one, we would have to box clever to find out which one is to be # modified, or we would have to pass its identity as an argument. # This should actually be in the meta-data, as it is obviously intended for users. It looks like your use case doesn't fit this description, right? Perhaps we could add a parameter like allow_multiple_default_routes. Thanks, Dejan On the host where the resource is running I have only one default gateway. The other pair of this host (the other node) uses a different default gateway - but I do not think this should be a limitation (on that host I have a single default gateway as well). The must be one and not more than 1 should also say cluster-wide. The srca_read() function does not fail in the steps that check the default gateway. The function runs till the last line where 2 is returned, although it is not a generic error, rather the SRC ip is not running on the node. The exit code 2 signifies that the default route has an unexpected address. I think that it works as designed. As mentioned earlier, we can extend the resource agent to support clusters with multiple default routes, but that would need to be done with an extra configuration parameter. Patches welcome :) Thanks, Dejan Thanks, Attila In this case the RA would return a $OCF_NOT_RUNNING Cheers, Attila ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] IPaddr2 issue in resource-agents 3.9.4 and the next release
Hello, IPaddr2 version from the latest resource-agents release 3.9.4 has a serious regression, it won't send unsolicited ARPs on start. Depending on the ARP cache timeouts of the closest network device, that would result in slower failover times. All IP requests would still go to the node which was running the previous instance of IPaddr2 resource. Unfortunately, the regression tests are not capable of catching such a regression. My sincere apologies for the blunder. The issue has been fixed in the meantime and we're planning to release 3.9.5 imminently, probably by Wednesday next week. If there are any other urgent RA issues please let us know. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi, On Tue, Nov 27, 2012 at 08:28:04AM +0100, Dejan Muhamedagic wrote: On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Not exactly for the RA itself. The RA execution would just be observed. The attribute is consumed by others. Whether it is PE or lrmd or something else makes less of a difference. It is up to these subsystems to sort the meta attributes out. It turns out that pacemaker won't export meta attributes which were not recognized. At any rate, we can go with OCF_RESKEY_trace_ra. The good thing is that it can be specified per operation (op start trace_ra=1). The interface is simple and it's described in ocf-shellfuncs. It would get support in the UI. Would a changed definition for a resource we're trying to trace be an actual problem? I mean, tracing clearly means you want to trace an resource action, so one would put the attribute on the resource before triggering that. (It can also be put on in maintenance mode, avoiding the restart.) Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. I assume that'd be easiest. (And people not using bash can write their own implementation for this. ;-) - on-error: always trace, but delete on successful exit Good idea. This is not implemented right now. The patch is attached. It's planned for the release 3.9.5. Thanks, Dejan hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Sure. LRM is an other obvious entry point for increased tracing/logging. That could also work. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). The file/log management would be easier to do in the LRM - and also handle the timeout situation; that could also make use of the redirect trace elsewhere if the shell is new enough. Indeed. Until then, ocf-shellfuncs can fallback to some well known location. Thanks, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ From edad4f4f7ef39da0243c1b3444bb8630443a8c38 Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic de...@suse.de Date: Wed, 23 Jan 2013 17:36:08 +0100 Subject: [PATCH] Medium: ocf-shellfuncs: RA tracing --- doc/dev-guides/ra-dev-guide.txt | 6 +++ heartbeat/ocf-shellfuncs.in | 82 + tools/ocf-tester.8 | 5 ++- tools/ocf-tester.in | 4 +- 4 files changed, 95 insertions(+), 2 deletions(-) diff --git a/doc/dev-guides/ra-dev-guide.txt b/doc/dev-guides/ra-dev-guide.txt index af5e3b1..11e9a5d 100644 --- a/doc/dev-guides/ra-dev-guide.txt +++ b/doc/dev-guides/ra-dev-guide.txt @@ -1623,6 +1623,12 @@ Beginning tests for /home/johndoe/ra-dev/foobar... /home/johndoe/ra-dev/foobar passed all tests -- +If the resource agent exhibits some difficult to grasp behaviour, +which is typically the case with just developed software, there +are +-v+ and +-d+ options to dump more output
Re: [Linux-ha-dev] 答复: Re: New patch for resource-agents(nfsserver)
On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote: Hi Dejan, Is this patch OK ? I didn't see the pache going upstream yet. Thanks! Applied now. Can you please also test the attached patch. It is fairly small and low impact, but still. Cheers, Dejan Best regards, John Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 Hi John, On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote: Hi all, Fix all cause an error to call rpc.statd, including: Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd in meta_data. Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, bacase no retrytime option for rpc.statd. The parameter nfs_notify_foreground can make the correct option for rpc.statd Looks good to me. Cheers, Dejan Best regards, John ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ From 186919824523c362d63b4ac176ce91511b91c99d Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic de...@suse.de Date: Wed, 23 Jan 2013 17:10:11 +0100 Subject: [PATCH] Low: nfsserver: move configuration checks to the validation phase --- heartbeat/nfsserver | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver index 003fcc6..09136d7 100755 --- a/heartbeat/nfsserver +++ b/heartbeat/nfsserver @@ -254,10 +254,6 @@ nfsserver_start () fi if [ -n $OCF_RESKEY_nfs_smnotify_retry_time ]; then - if ! ocf_is_decimal $OCF_RESKEY_nfs_smnotify_retry_time; then -ocf_log err Invalid OCF_RESKEY_nfs_smnotify_retry_time [$OCF_RESKEY_nfs_smnotify_retry_time] -return $OCF_ERR_CONFIGURED - fi opts=$opts -m $OCF_RESKEY_nfs_smnotify_retry_time fi @@ -271,10 +267,6 @@ nfsserver_start () opts=$opts -n ;; - *) - ocf_log err Invalid OCF_RESKEY_nfs_notify_cmd [$OCF_RESKEY_nfs_notify_cmd] - return $OCF_ERR_CONFIGURED - ;; esac rm -rf /var/lib/nfs/sm.ha.save /dev/null 21 @@ -324,6 +316,21 @@ nfsserver_validate () exit $OCF_ERR_CONFIGURED fi + if [ -n $OCF_RESKEY_nfs_smnotify_retry_time ]; then + if ! ocf_is_decimal $OCF_RESKEY_nfs_smnotify_retry_time; then + ocf_log err Invalid nfs_smnotify_retry_time [$OCF_RESKEY_nfs_smnotify_retry_time] + exit $OCF_ERR_CONFIGURED + fi + fi + + case ${OCF_RESKEY_nfs_notify_cmd##*/} in + sm-notify|rpc.statd) ;; + *) + ocf_log err Invalid nfs_notify_cmd [$OCF_RESKEY_nfs_notify_cmd] + exit $OCF_ERR_CONFIGURED + ;; + esac + return $OCF_SUCCESS } -- 1.8.0 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] 答复: Re: New patch for resource-agents(nfsserver)
Hi John, On Tue, Jan 22, 2013 at 08:21:34PM -0700, John Shi wrote: Hi Dejan, Is this patch OK ? I didn't see the pache going upstream yet. Thanks! Oh, yes, I think so. Sorry, I missed it somehow. Will take care of it today. Cheers, Dejan Best regards, John Dejan Muhamedagic de...@suse.de 2012-12-24 下午 17:57 Hi John, On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote: Hi all, Fix all cause an error to call rpc.statd, including: Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd in meta_data. Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, bacase no retrytime option for rpc.statd. The parameter nfs_notify_foreground can make the correct option for rpc.statd Looks good to me. Cheers, Dejan Best regards, John ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] ocft updated (test tool of resource-agnets)
Hi John, On Sun, Jan 20, 2013 at 09:23:05AM -0700, John Shi wrote: Medium: tools: ocft: update to version 0.44 Added incremental mode (ocft test -i) , cache results and not repeatedly rerun those cases which succeeded. Improved *ocft make*, the ocft will only make the updated test-case file. The ocft test prints the actual case names instead of just case numbers. Can you please split the patch into several self-contained patches, each of which fixes a single issue. Great work! Thanks, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] NMI Crashdump with external/ipmi
Hi, On Mon, Jan 14, 2013 at 11:12:11PM +0100, Tobias D. Oestreicher wrote: Hi all, I've written a small patch for externel/ipmi, so it's possible to configure it not to reset a node, but trigger a crashdump via NMI. If a node becomes unavailable for several reasons it will be fenced but this makes investigating the root cause of the nodes unavailability very difficult; if you have a chashdump you can reconstruct the root cause. For this I added 3 new options: crashdump - set this to true to enable crashdump. sshcheck - if this is true, a ssh connection will be established to eighter $sshipaddr, if this is not set, $hostname will be used as remoteadress. sshipaddr - in case ssh is listening on an other interface, where dns isn't equal $hostname. ssh is used only in case sshcheck is set to true? If so, then that should be mentioned in the description of sshipaddr. Further, the sshcheck parameter should come first (i.e. exchange place with sshipaddr). The test is Linux specific, that should also be noted in the parameter description. Please see below for notes on code. Maybe it could be usefull for others too. For any comments, suggestions I would be glad. Tobias D. Oestreicher -- Tobias D. Oestreicher Linux Consultant Trainer Tel.: +49-160-5329935 Mail: oestreic...@b1-systems.de B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 diff -r da5832ae23dd lib/plugins/stonith/external/ipmi --- a/lib/plugins/stonith/external/ipmi Sun Dec 23 16:05:11 2012 +0100 +++ b/lib/plugins/stonith/external/ipmi Mon Jan 14 22:01:57 2013 +0100 @@ -36,7 +36,11 @@ POWEROFF=power off POWERON=power on STATUS=power status +CRASHDUMP=chassis power diag + IPMITOOL=${ipmitool:-`which ipmitool 2/dev/null`} +SYSCTL=`which sysctl 2/dev/null` Normally, sysctl is in the PATH. As well as ssh (SSH_BIN below). +SSH_OPTS=-q -o PasswordAuthentication=no -o StrictHostKeyChecking=no Add -l root, just in case? have_ipmi() { test -x ${IPMITOOL} @@ -138,7 +142,11 @@ ;; reset) if ipmi_is_power_on; then - do_ipmi ${RESET} + if [ ${crashdump} == true ]; then + do_ipmi ${CRASHDUMP} + else + do_ipmi ${RESET} + fi else do_ipmi ${POWERON} fi @@ -149,11 +157,40 @@ # the managed node. Hence, only check if we can contact the # IPMI device with power status command, don't pay attention # to whether the node is in fact powered on or off. + if [ ${crashdump} == true ]; then + if [ ${sshcheck} == true ];then This should go to a separate function, sth like check_crashdump_eligibility or check_crashdump_setup. Then you can do: if [ ${crashdump} == true -a ${sshcheck} == true ]; then check_crashdump_setup || exit fi Otherwise, it'd be hard to get the meaning of this big chunk of code. + if [ -z ${hostname} -a -z ${sshipaddr} ]; then + ha_log.sh err Neigther hostname nor sshipaddr is set, crashdump testing not possible + ha_log.sh err Neither ... + elif [ -z ${sshipaddr} ]; then + REMOTESSHHOST=${hostname} + else + REMOTESSHHOST=${sshipaddr} + fi + SSH_BIN=`which ssh 2/dev/null` + SSH_COMMAND=${SSH_BIN} ${REMOTESSHHOST} ${SSH_OPTS} + remote_crashdump_state=`${SSH_COMMAND} grep -c crashkernel /proc/cmdline;${SYSCTL} -n kernel.unknown_nmi_panic kernel.panic_on_unrecovered_nmi` What if crashkernel is set to nothing? Would crash dump work then too? + if [ $? -ne 0 ];then + ha_log.sh err Not possible to connect via ssh to ${REMOTESSHHOST} + exit 1 + fi + unknown_nmi=`echo ${remote_crashdump_state}|awk '{print $2}'` + unrecovered_nmi=`echo ${remote_crashdump_state}|awk '{print $3}'` + crashdump_kernel_option=`echo ${remote_crashdump_state}|awk '{print $1}'` + if [ ${crashdump_kernel_option} -ne 1 ];then + ha_log.sh err Crashdump seems not to be configured on host ${REMOTESSHHOST} + exit 1 + fi + if [ ${unknown_nmi} -eq 0 -o ${unrecovered_nmi} -eq 0 ]; then +ha_log.sh err Non Maskerable Interupts do not trigger a reset. Set \kernel.unknown_nmi_panic\ and \kernel.panic_on_unrecovered_nmi\ to \1\ Replace Non
[Linux-ha-dev] crmsh release v1.2.4
Hello, With a bit of delay, here's the announcement for crmsh v1.2.4. *** The CRM shell v1.2.4 is released. The highlights of the release: * history: fine tuning and several regression fixes * more pacemaker 1.1.8 compatibility code For the full set of changes, take a look at the changelog: http://hg.savannah.gnu.org/hgweb/crmsh/file/crmsh-1.2.4/ChangeLog == Note about Pacemaker versions == CRM shell 1.2.4 supports all Pacemaker 1.1 versions. == Installing == Installing the CRM shell along with Pacemaker 1.1 versions = v1.1.7 is possible, but it will result in file conflicts. You need to enforce file overwriting when installing packages. Note that pacemaker v1.1.7 carries a crm shell version which is the same as in v1.1.6, or put differently quite outdated. There are several interesting new features, including history, which never made it in any pacemaker release. == Resources == Packages for several popular Linux distributions: http://download.opensuse.org/repositories/network:/ha-clustering/ The man page: http://crmsh.nongnu.org/crm.8.html The CRM shell project web page at GNU savannah: https://savannah.nongnu.org/projects/crmsh/ Support and bug reporting: http://lists.linux-ha.org/mailman/listinfo/linux-ha https://savannah.nongnu.org/bugs/?group=crmsh https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Pacemaker;component=Shell;version=1.2 The sources repository is available at: http://hg.savannah.gnu.org/hgweb/crmsh Enjoy! Dejan P.S. In case you wonder what happened to v1.2.2 and v1.2.3, well, let's just say I didn't like the numbers ;-) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] agents: including LGPL license file
Hi Keisuke-san, On Thu, Dec 20, 2012 at 02:14:10PM +0900, Keisuke MORI wrote: Hi, The resource-agents package is licensed under GPL and LGPL, but the full copy of LGPL license file is missing as opposed to the heartbeat and the glue packages that includes it. Why don't we include COPYING.LGPL in the agents package too as the verbatim copy of LGPL license for the consistency? Not really an expert in the area, but I think there's no problem adding a copy of a license. Cheers, Dejan Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New patch for resource-agents(nfsserver)
Hi John, On Wed, Dec 19, 2012 at 11:18:41PM -0700, John Shi wrote: Hi all, Fix all cause an error to call rpc.statd, including: Specify the value of nfs_notify_cmd should be either sm-notify or rpc.statd in meta_data. Rename the parameter nfs_notify_retry_time to nfs_smnotify_retry_time, bacase no retrytime option for rpc.statd. The parameter nfs_notify_foreground can make the correct option for rpc.statd Looks good to me. Cheers, Dejan Best regards, John ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] IPsrcaddr bug, and fix recommendation
Hi, On Thu, Dec 20, 2012 at 08:03:32PM +0100, Attila Megyeri wrote: hi, I have a cluster configuration with two IPsrcaddr resources (e.g. IP address A and B) They are configured to two different addresses, and are never supposed to run on the same nodes. So A can run on nodes N1 and N2, B can run on N3,N4. My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource is running on a node where it shouldn't, and of course it is in unmanaged state and cannot be stopped. For instance: IP address A is started, unamanged on node N3. I am using pacemaker 1.1.6 on a debian system, with the latest RA from github. I checked the RA, and here are my findings. - When status is called, it calls the srca_read() function - srca_read() returns 2, if a srcip is running on the given node, but with a different IP address. - srca_status(), when gets 2 from srca_read(), returns $OCF_ERR_GENERIC As a result, in my case IP B is running on N3, which is OK, but CRM_mon reports that IP A is also running on N3 (unmanaged). [for some reason this is how the OCF_ERR_GENERIC is interpreted] This is definitively a bug, the question is whether in pacemaker or in the RA. If I change the script to return $OCF_NOT_RUNNING instead of $OCF_ERR_GENERIC it works properly. What is the proper behavior in this case? My recommendation is to fix the RA so that srca_read() returns 1, if there is a srcip on the node, but it is not the queried one. The comment in the agent says: # NOTES: # # 1) There must be one and not more than 1 default route! Mainly because # I can't see why you should have more than one. And if there is more # than one, we would have to box clever to find out which one is to be # modified, or we would have to pass its identity as an argument. # This should actually be in the meta-data, as it is obviously intended for users. It looks like your use case doesn't fit this description, right? Perhaps we could add a parameter like allow_multiple_default_routes. Thanks, Dejan In this case the RA would return a $OCF_NOT_RUNNING Cheers, Attila ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] 转发: 答复: Re: A patch for resource-agents(nfsserver) ( add attachment )
Hi John, On Fri, Dec 14, 2012 at 12:24:34AM -0700, John Shi wrote: John Shi 2012-12-14 下午 15:20 Dejan Muhamedagic de...@suse.de 2012-12-13 下午 17:33 This might be a bit better: # set default options local opts=-f -v # add option for notify_retry_time, if set if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time [$OCF_RESKEY_nfs_notify_retry_time] return $OCF_ERR_CONFIGURED fi opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time fi # run in foreground, if requested if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then opts=$opts -d fi What do you say? You are right, the default value of retry_time should be . A little adjustment, please see the code 229 line of nfsserver: ${OCF_RESKEY_nfs_notify_cmd} $opts $ip -P /var/lib/nfs/sm.ha $ip is the optarg of -v, so the code may be: oops, missed that. Well spotted! Patches applied. Many thanks for the contribution! Cheers, Dejan # set default options local opts=-f -v # add option for notify_retry_time, if set if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time [$OCF_RESKEY_nfs_notify_retry_time] return $OCF_ERR_CONFIGURED fi opts=-m $OCF_RESKEY_nfs_notify_retry_time $opts fi # run in foreground, if requested if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then opts=-d $opts fi Best regards, John ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] A patch for resource-agents(nfsserver)
Hi John, On Thu, Dec 13, 2012 at 12:08:56AM -0700, John Shi wrote: Hi Dejan, I have a patch for nfsserver agent in resource-agents package. sm-notify was invoked in nfsserver, with -d option, sm-notify doesn't fork, and without time limit, it tries to reach each client for 15 minutes, still spending this time in the startup procedure of the resource, so this option has been removed as fix for git commit ae83f2befdafdfae633afa1553f5d7c4f72d0196 I think it should be an option to set or unset the -d, and we can improve this even beter we could use the option -m to set the retrytime. OK, I guess that there could be use cases where the failover would wait until all clients get notified. The patch added 2 parameters for nfsserver agent: OCF_RESKEY_nfs_notify_foreground: type: boolean default 'false' , 'false' to unset -d, 'true' to set -d OCF_RESKEY_nfs_notify_retry_time: type: integer default '15' Best regards, John From f98a19be89eff8cb3f46a8957c4d81be694a2921 Mon Sep 17 00:00:00 2001 From: John Shi j...@suse.com Date: Wed, 12 Dec 2012 15:50:39 +0800 Subject: [PATCH] Medium: nfsserver: make the retry time for sm-notify in the nfsserver resource agent configurable The two options are actually independent. So, they should be either put into two patches or the description modified to better reflect the change. --- heartbeat/nfsserver | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver index 6414e3a..974bff9 100755 --- a/heartbeat/nfsserver +++ b/heartbeat/nfsserver @@ -14,6 +14,8 @@ fi DEFAULT_INIT_SCRIPT=/etc/init.d/nfsserver DEFAULT_NOTIFY_CMD=/sbin/sm-notify +DEFAULT_NOTIFY_FOREGROUND=false +DEFAULT_NOTIFY_RETRY_TIME=15 DEFAULT_RPCPIPEFS_DIR=/var/lib/nfs/rpc_pipefs nfsserver_meta_data() { @@ -55,6 +57,28 @@ The tool to send out notification. content type=string default=$DEFAULT_NOTIFY_CMD / /parameter +parameter name=nfs_notify_foreground unique=0 required=0 +longdesc lang=en +Keeps sm-notify attached to its controlling terminal and running in the foreground. +/longdesc +shortdesc lang=en +Keeps sm-notify running in the foreground. +/shortdesc +content type=boolean default=$DEFAULT_NOTIFY_FOREGROUND / +/parameter + +parameter name=nfs_notify_retry_time unique=0 required=0 +longdesc lang=en +Specifies the length of sm-notify retry time, in minutes, to continue retrying notifications to unresponsive hosts. +If this option is not specified, sm-notify attempts to send notifications for 15 minutes. Specifying a value of 0 +causes sm-notify to continue sending notifications to unresponsive peers until it is manually killed. +/longdesc +shortdesc lang=en +Specifies the length of sm-notify retry time(minutes). +/shortdesc +content type=integer default=$DEFAULT_NOTIFY_RETRY_TIME / +/parameter + parameter name=nfs_shared_infodir unique=0 required=1 longdesc lang=en The nfsserver resource agent will save nfs related information in this specific directory. @@ -129,6 +153,8 @@ esac fp=$OCF_RESKEY_nfs_shared_infodir : ${OCF_RESKEY_nfs_init_script=$DEFAULT_INIT_SCRIPT} : ${OCF_RESKEY_nfs_notify_cmd=$DEFAULT_NOTIFY_CMD} +: ${OCF_RESKEY_nfs_notify_foreground=$DEFAULT_NOTIFY_FOREGROUND} +: ${OCF_RESKEY_nfs_notify_retry_time=$DEFAULT_NOTIFY_RETRY_TIME} if [ -z ${OCF_RESKEY_rpcpipefs_dir} ]; then rpcpipefs_make_dir=$fp/rpc_pipefs @@ -220,7 +246,17 @@ nfsserver_start () #Notify the nfs server has been moved or rebooted #The init script do that already, but with the hostname, which may be ignored by client #we have to do it again with the nfs_ip - local opts=-f -v + local opts + + if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then + ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time [$OCF_RESKEY_nfs_notify_retry_time] + return $OCF_ERR_CONFIGURED + fi + if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then + opts=-d + fi + opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time -f -v This might be a bit better: # set default options local opts=-f -v # add option for notify_retry_time, if set if [ -n $OCF_RESKEY_nfs_notify_retry_time ]; then if ! ocf_is_decimal $OCF_RESKEY_nfs_notify_retry_time; then ocf_log err Invalid OCF_RESKEY_nfs_notify_retry_time [$OCF_RESKEY_nfs_notify_retry_time] return $OCF_ERR_CONFIGURED fi opts=$opts -m $OCF_RESKEY_nfs_notify_retry_time fi # run in foreground, if requested if ocf_is_true $OCF_RESKEY_nfs_notify_foreground; then opts=$opts -d fi What do you say? Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
Re: [Linux-ha-dev] RA trace facility
On Wed, Nov 21, 2012 at 07:06:35PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Not exactly for the RA itself. The RA execution would just be observed. The attribute is consumed by others. Whether it is PE or lrmd or something else makes less of a difference. It is up to these subsystems to sort the meta attributes out. Would a changed definition for a resource we're trying to trace be an actual problem? I mean, tracing clearly means you want to trace an resource action, so one would put the attribute on the resource before triggering that. (It can also be put on in maintenance mode, avoiding the restart.) Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. I assume that'd be easiest. (And people not using bash can write their own implementation for this. ;-) - on-error: always trace, but delete on successful exit Good idea. hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Sure. LRM is an other obvious entry point for increased tracing/logging. That could also work. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). The file/log management would be easier to do in the LRM - and also handle the timeout situation; that could also make use of the redirect trace elsewhere if the shell is new enough. Indeed. Until then, ocf-shellfuncs can fallback to some well known location. Thanks, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi Keisuke-san, On Thu, Nov 22, 2012 at 06:27:59PM +0900, Keisuke MORI wrote: Hi, 2012/11/22 Dejan Muhamedagic de...@suse.de: Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. I like this, too. I would be useful when you need to diagnose in the production environment if you can enable / disable it without any modifications to RAs. Of course. It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. Yes. I guess that one could run such a hook in background. Did you mean that? Or once the RA instance exited? This is a bit different feature though. Thanks, Dejan -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] RA trace facility
Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? Thanks, Dejan commit 77b3c997289a097fecab13179c4a2364bc34f15a Author: Dejan Muhamedagic de...@suse.de Date: Wed Nov 21 16:24:20 2012 +0100 Dev: add RA trace capability diff --git a/doc/dev-guides/ra-dev-guide.txt b/doc/dev-guides/ra-dev-guide.txt index af5e3b1..11e9a5d 100644 --- a/doc/dev-guides/ra-dev-guide.txt +++ b/doc/dev-guides/ra-dev-guide.txt @@ -1623,6 +1623,12 @@ Beginning tests for /home/johndoe/ra-dev/foobar... /home/johndoe/ra-dev/foobar passed all tests -- +If the resource agent exhibits some difficult to grasp behaviour, +which is typically the case with just developed software, there +are +-v+ and +-d+ options to dump more output. If that does not +help, instruct +ocf-tester+ to trace the resource agent with ++-X+ (make sure to redirect output to a file, unless you are a +really fast reader). === Testing with +ocft+ diff --git a/heartbeat/ocf-shellfuncs.in b/heartbeat/ocf-shellfuncs.in index f3822b7..04e4ecb 100644 --- a/heartbeat/ocf-shellfuncs.in +++ b/heartbeat/ocf-shellfuncs.in @@ -675,4 +675,14 @@ ocf_stop_processes() { return 1 } +ocf_start_trace() { + PS4='+ `date +%T`: ${FUNCNAME[0]:+${FUNCNAME[0]}:}${LINENO}: ' + set -x +} +ocf_stop_trace() { + set +x +} + __ocf_set_defaults $@ + +[ $__OCF_TRACE_RA ] ocf_start_trace diff --git a/tools/ocf-tester.8 b/tools/ocf-tester.8 index ba07058..850ec0b 100644 --- a/tools/ocf-tester.8 +++ b/tools/ocf-tester.8 @@ -3,7 +3,7 @@ ocf-tester \- Part of the Linux-HA project .SH SYNOPSIS .B ocf-tester -[\fI-Lh\fR] \fI-n resource_name \fR[\fI-o name=value\fR]\fI* /full/path/to/resource/agent\fR +[\fI-LhvqdX\fR] \fI-n resource_name \fR[\fI-o name=value\fR]\fI* /full/path/to/resource/agent\fR .SH DESCRIPTION Tool for testing if a cluster resource is OCF compliant .SH OPTIONS @@ -43,6 +43,9 @@ Be quiet while testing \fB\-d\fR Turn on RA debugging .TP +\fB\-X\fR +Turn on RA tracing (expect large output) +.TP \fB\-n\fR name Name of the resource .TP diff --git a/tools/ocf-tester.in b/tools/ocf-tester.in index 214e25c..2eaf220 100755 --- a/tools/ocf-tester.in +++ b/tools/ocf-tester.in @@ -51,13 +51,14 @@ usage() { echo Tool for testing if a cluster resource is OCF compliant echo -echo Usage: ocf-tester [-Lh] -n resource_name [-o name=value]* /full/path/to/resource/agent +echo Usage: ocf-tester [-LhvqdX] -n resource_name [-o name=value]* /full/path/to/resource/agent echo echo Options: echo -h This text echo -v Be verbose while testing echo -q Be quiet while testing echo -d Turn on RA debugging +echo -X Turn on RA tracing (expect large output) echo -n name Name of the resource echo -o name=value Name and value of any parameters required by the agent echo -L Use lrmadmin/lrmd for tests @@ -106,6 +107,7 @@ while test $done = 0; do -L) use_lrmd=1; shift;; -v) verbose=1; shift;; -d) export HA_debug=1; shift;; + -X) export __OCF_TRACE_RA=1; verbose=1; shift;; -q) quiet=1; shift;; -?|--help) usage 0;; --version) echo @PACKAGE_VERSION@; exit 0;; ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Our include script could enable that; it's unlikely that the problem occurs prior to that. - never (default): Does nothing - always: Always trace, write to $(which path?)/raname.rscid.$timestamp bash has a way to send trace to a separate FD, but that feature is available with version =4.x. Otherwise, it could be messy to separate the trace from the other stderr output. Of course, one could just redirect stderr in this case. I suppose that that would work too. - on-error: always trace, but delete on successful exit Good idea. hb_report/history explorer could gather this too. Right. (And yes I know this introduces a fake parameter that doesn't really exist. But it'd be so helpful.) Sorry. Maybe I'm getting carried away ;-) Good points. I didn't really think much (yet) about how to further facilitate the feature, just had a vague idea that somehow lrmd should set the environment variable. Perhaps we could do something like this: # crm resource trace rsc_id [action] [when-to-trace] This would set the appropriate meta attribute for the resource which would trickle down to the RA. ocf-shellfuncs would then do whatever's necessary to setup the trace. The file management could get tricky though, as we don't have a single point of exit (and trap is already used elsewhere). Cheers, Dejan Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch: stonith plugin external/vcenter HOSTLIST separator
Hi, On Thu, Nov 15, 2012 at 11:52:48AM +0100, Stefan Botter wrote: Hi all, recently I tried to use the STONITH plugin external/vcenter along with vCenter 5 (I doubt, that the version is significant). While using the stonith resource for each node separately, I had no problems, but using it in a clone resulted in failures like that one: Nov 14 08:53:57 shermcl1 external/vcenter(vfencing:0)[23236]: [23257]: ERROR: [reset shermcl2] Invalid target specified where the cluster consists of virtual machines SHERMCL1, SHERMCL2 and SHERMCL3, with their unames shermcl1, shermcl2 and shermcl3, accordingly. shermcl2 should be fenced, but the remaining cluster members were unable to kill that machine. The relevant portion of the cluster configuration is here: node shermcl1 node shermcl2 node shermcl3 primitive vfencing stonith:external/vcenter \ params VI_SERVER=virtualcenter.dom.ain VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml HOSTLIST=shermcl1=SHERMCL1;shermcl2=SHERMCL2;shermcl3=SHERMCL3 RESETPOWERON=0 \ op monitor interval=3600s clone Fencing vfencing It could be that the issue comes from the bug in fence_legacy, which has been resolved in the meantime. Can you try to edit that and replace the split command (line 86) with the following (i.e. just append , 2): ($name,$val)=split /\s*=\s*/, $opt, 2; The file location should be /usr/sbin/fence_legacy. Can you please see if that helps? Cheers, Dejan location l-Fencing_shermcl1 Fencing 0: shermcl1 location l-Fencing_shermcl2 Fencing 0: shermcl2 location l-Fencing_shermcl3 Fencing 0: shermcl3 The location statements are needed, as the cluster itself is no symmetric. All machines are plain openSUSE 12.2 with corosync 1.4.3 and pacemaker 1.1.6. While running perfectly on the commandline with stonith -t external/vcenter VI_SERVER=virtualcenter.dom.ain \ VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml \ HOSTLIST=shermcl1=SHERMCL1;shermcl2=SHERMCL2;shermcl3=SHERMCL3 \ RESETPOWERON=0 -l and showing the names of the three virtual machines, I found, that called as resource inside the cluster only the first hostname until the first = is visible, perhaps caused by the handover as environment variable. Applying the attached trivial patch to use a colon (:) instead of the equal sign (=) the command line test stonith -t external/vcenter VI_SERVER=virtualcenter.dom.ain \ VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml \ HOSTLIST=shermcl1:SHERMCL1;shermcl2:SHERMCL2;shermcl3:SHERMCL3 \ RESETPOWERON=0 -l as well as fencing inside the cluster with primitive vfencing stonith:external/vcenter \ params VI_SERVER=virtualcenter.dom.ain VI_CREDSTORE=/root/.vmware/credstore/vicredentials.xml HOSTLIST=shermcl1:SHERMCL1;shermcl2:SHERMCL2;shermcl3:SHERMCL3 RESETPOWERON=0 \ op monitor interval=3600s succeeds. So a question around: Is anyone using the external/vcenter with the cloned resource successfully with the original syntax? If so, where is my problem? If not, the attached patch changes the syntax in the above described way. If there is no objection can it be applied? Greetings, Stefan PS: sorry for the line breaks in the code -- Stefan Botter listrea...@jsj.dyndns.org # HG changeset patch # User Stefan Botter j...@jsj.dyndns.org # Date 1352974761 -3600 # Node ID 3429be9596a95127e04706c38c5c4d82fb67e206 # Parent 0809ed6abeb7289f3a8f4229f537df8d509c0854 - trivial change to use : as hostname delimiter in HOSTLIST instead of = diff -r 0809ed6abeb7 -r 3429be9596a9 lib/plugins/stonith/external/vcenter --- a/lib/plugins/stonith/external/vcenterMon Oct 22 17:35:17 2012 +0200 +++ b/lib/plugins/stonith/external/vcenterThu Nov 15 11:19:21 2012 +0100 @@ -55,12 +55,12 @@ longdesc lang=en The list of hosts that the VMware vCenter STONITH device controls. Syntax is: - hostname1[=VirtualMachineName1];hostname2[=VirtualMachineName2] + hostname1[:VirtualMachineName1];hostname2[:VirtualMachineName2] -NOTE: omit =VirtualMachineName if hostname and virtual machine names are identical +NOTE: omit :VirtualMachineName if hostname and virtual machine names are identical Example: - cluster1=VMCL1;cluster2=VMCL2 + cluster1:VMCL1;cluster2:VMCL2 /longdesc /parameter parameter name=VI_SERVER @@ -128,7 +128,7 @@ my %host_to_vm = (); my %vm_to_host = (); foreach my $host (@hostlist) { - my @config = split(/=/, $host); + my @config = split(/:/, $host); my $key = $config[0]; my $value = $config[1]; if (!defined($value)) { $value = $config[0]; } $host_to_vm{$key} = $value; ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
[Linux-ha-dev] announcement: resource-agents release candidate 3.9.4rc1
Hello, The current resource-agents repository has been tagged v3.9.4rc1. It is mainly a bug fix release. The full list of changes for the linux-ha RA set is available in ChangeLog. We'll allow a week for agents testing. The final release is planned for Nov 20. Many thanks to all contributors! Best, The resource-agents maintainers ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] planning resource agents v3.9.4
Hello, A couple of milestones were created for the resource-agents project at github.com yesterday. Since most of activity is happening at github, it seemed like the most logical place to use for the release planning. This is the tentative schedule: 3.9.4-rc: November 13. 3.9.4: November 20. If there's anything you think should be part of the resource-agents release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. Cheers, The resource-agents crowd ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote: On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic de...@suse.de wrote: On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. No. Otherwise all the regression tests would fail. The PE is smart enough to find promotion score and failcounts in either case. Cool. Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the resource as, not what we call it internally to the PE. What I meant was that some RA use OCF_RESOURCE_INSTANCE to name local files which keep some kind of state. If OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that the worst that can happen is for the probe to fail. But I didn't take a closer look. Thanks, Dejan Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] external/vcenter: don't fail when a machine is powered off
Hi, On Mon, Oct 22, 2012 at 11:06:07AM +0200, Robbert Muller wrote: Hello, While testing a new cluster we found the following behavior which i discussed on #linux-ha with andreask afterwards and we both agree the behavior was wrong. bug scenario: 3 node cluster, 1 standby just for having 3 nodes, 2 active nodes when we did a power off of the machine ( similar to pulling the power cable from a machine ) the cluster failed to failover to the next node. This is because the following setting: RESETPOWERON was set to 0, so a machine powered off stays powered off Just to make sure: RESETPOWERON was set to 0 in the configuration? with the current code path, a machine in the state poweroff is considered a failure for the stonith reset operation. which results in no resources are started on the second node, and the machine stays in a unclean state. The analogy with real hardware and a powerbar and imho correct behavior: --- If i pull the plug of node1, node 2 will fence it with the powerbar. The power will powercycle the socket without any result, because i pulled the plug. But the fencing operation is a success and all resources are started on the second node --- Patch to fix this with i hope a minimal change is attached. Thanks for the patch. But we'll need to rework it a bit. After finding this bug i got ill and have to stay at home for a few days, so i don't have access to an environment to test this patch atm. Get better soon! Cheers, Dejan Regards Robbert Müller diff -r 66f7442698e6 lib/plugins/stonith/external/vcenter --- a/lib/plugins/stonith/external/vcenterMon Oct 15 15:59:57 2012 +0200 +++ b/lib/plugins/stonith/external/vcenterMon Oct 22 10:38:09 2012 +0200 @@ -199,6 +199,8 @@ if ($powerState eq poweredOff (! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) { $vm-PowerOnVM(); system(ha_log.sh, info, Machine $esx:$vm-{'name'} has been powered on); + } elsif( $powerState eq poweredOff ) { + system(ha_log.sh, info, Machine $esx:$vm-{'name'} is poweredoff and RESETPOWERON was disabled); } else { dielog(Could not complete $esx:$vm-{'name'} power cycle); } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] external/vcenter: don't fail when a machine is powered off
On Tue, Oct 23, 2012 at 01:19:53PM +0200, Robbert Müller wrote: Hi, On 23-10-12 13:13, Dejan Muhamedagic wrote: Hi, On Mon, Oct 22, 2012 at 11:06:07AM +0200, Robbert Muller wrote: Hello, While testing a new cluster we found the following behavior which i discussed on #linux-ha with andreask afterwards and we both agree the behavior was wrong. bug scenario: 3 node cluster, 1 standby just for having 3 nodes, 2 active nodes when we did a power off of the machine ( similar to pulling the power cable from a machine ) the cluster failed to failover to the next node. This is because the following setting: RESETPOWERON was set to 0, so a machine powered off stays powered off Just to make sure: RESETPOWERON was set to 0 in the configuration? Yes it is. OK. with the current code path, a machine in the state poweroff is considered a failure for the stonith reset operation. which results in no resources are started on the second node, and the machine stays in a unclean state. The analogy with real hardware and a powerbar and imho correct behavior: --- If i pull the plug of node1, node 2 will fence it with the powerbar. The power will powercycle the socket without any result, because i pulled the plug. But the fencing operation is a success and all resources are started on the second node --- Patch to fix this with i hope a minimal change is attached. Thanks for the patch. But we'll need to rework it a bit. Could you tell me what is wrong with it? i am currently testing it on our customers environment. And it seems to work as expected. Functionally nothing wrong with it, it's just that the extra if was repeating part of the previous if, which may be difficult to understand at times. Please see, and possibly test, the attached patch. Cheers, Dejan After finding this bug i got ill and have to stay at home for a few days, so i don't have access to an environment to test this patch atm. Get better soon! Thx, the antibiotics seem to have killed the infection. So i'm back to work. Regards Robbert ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ diff -r 0809ed6abeb7 lib/plugins/stonith/external/vcenter --- a/lib/plugins/stonith/external/vcenter Mon Oct 22 17:35:17 2012 +0200 +++ b/lib/plugins/stonith/external/vcenter Tue Oct 23 16:30:54 2012 +0200 @@ -196,9 +196,13 @@ elsif ($command ~~ @netCommands) { } else { system(ha_log.sh, warn, Tried to ResetVM $esx:$vm-{'name'} that was $powerState); # Start a virtual machine on reset only if explicitly allowed by RESETPOWERON - if ($powerState eq poweredOff (! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) { - $vm-PowerOnVM(); - system(ha_log.sh, info, Machine $esx:$vm-{'name'} has been powered on); + if ($powerState eq poweredOff) { + if ((! exists $ENV{'RESETPOWERON'} || $ENV{'RESETPOWERON'} ne 0)) { +$vm-PowerOnVM(); +system(ha_log.sh, info, Machine $esx:$vm-{'name'} has been powered on); + } else { +system(ha_log.sh, info, Machine $esx:$vm-{'name'} is poweredoff and RESETPOWERON was disabled); + } } else { dielog(Could not complete $esx:$vm-{'name'} power cycle); } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] A patch for stonith external/libvirt
Hi Holger, On Fri, Oct 12, 2012 at 12:32:30PM +0200, Holger Teutsch wrote: Dejan, I'm no longer in the cluster business. Good luck! I can not recall the reason but I suspect in my configuration nearly 2 years ago it was the other way round: reboot did not work but stop/start did. OK. The script supports the reboot method, but it still power cycles by default. Cheers, Dejan Regards Holger On Thu, Oct 11, 2012 at 4:46 PM, Dejan Muhamedagic deja...@fastmail.fmwrote: Hi Owen, On Wed, Oct 10, 2012 at 10:07:41AM +0100, Owen Le Blanc wrote: I attach a patch for the stonith agent external/libvirt. This agent was failing on our machines because for rebooting machines it tried to stop and then start them, which doesn't work on our system, while rebooting them does. We have cluster glue version 1.0.8-2 installed on a Debian system, with libvirt 0.9.12-3. It would be good to have both, i.e. on-off and reboot method. With a parameter which would specify the method. I wonder why didn't the author put reboot in the first place. Holger? Cheers, Dejan P.S. Moving discussion to linux-ha-dev. -- Owen Le Blanc ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Problem] external/vcenter fails in stonith of the guest of the similar name.
Hi Hideo-san, On Mon, Oct 22, 2012 at 09:20:53AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, external/vcenter fails in stonith of the guest of the similar name. For example, as for the practice of stonith of sr2, stonith does backup-sr2 when two guests of sr2,backup-sr2 exist. The problem is a thing by the next search. $vm = Vim::find_entity_view(view_type = VirtualMachine, filter = { name = qr/\Q$host_to_vm{$targetHost}\E/i }); It seems to be caused by the fact that the correction that Mr. Lars pointed out before leaks out. * http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-April/018397.html (snip) Unless this filter thing has a special mode where it internally does a $x eq $y for scalars and $x =~ $y for explicitly designated qr// Regexp objects, I'd suggest to here also do filter = { name = qr/^\Q$realTarget\E$/i } (snip) Please revise it to add a character of ^ to a search. Applied. Thanks! Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Pacemaker] A patch for stonith external/libvirt
On Thu, Oct 11, 2012 at 04:46:48PM +0200, Dejan Muhamedagic wrote: Hi Owen, On Wed, Oct 10, 2012 at 10:07:41AM +0100, Owen Le Blanc wrote: I attach a patch for the stonith agent external/libvirt. This agent was failing on our machines because for rebooting machines it tried to stop and then start them, which doesn't work on our system, while rebooting them does. We have cluster glue version 1.0.8-2 installed on a Debian system, with libvirt 0.9.12-3. It would be good to have both, i.e. on-off and reboot method. With a parameter which would specify the method. I wonder why didn't the author put reboot in the first place. Holger? I modified the patch and introduced a reset_method parameter. It defaults to power_cycle, so the default behaviour remains the same. Many thanks for the patch! Cheers, Dejan Cheers, Dejan P.S. Moving discussion to linux-ha-dev. -- Owen Le Blanc ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] glue 1.0.11 released
Hello, The current glue repository has been tagged as 1.0.11. The highlights: - lrmd sets max number of children depending on the number of processors - compatibility for stonith agents and hb_report for pacemaker v1.1.8 You can get the 1.0.11 tarball here: http://hg.linux-ha.org/glue/archive/glue-1.0.11.tar.bz2 Many thanks to all contributors! Enjoy! Lars Ellenberg Dejan Muhamedagic ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] crmsh v1.2.1 released
Hello, The CRM shell v1.2.1 is released. The highlights of the release: * history: add the exclude (log messages) command * pacemaker 1.1.8 compatibility code There are two important bug fixes: * cibconfig: repair edit for non-vi users * cibconfig: update schema separately (don't remove the status section) For the full set of changes, take a look at the changelog: http://hg.savannah.gnu.org/hgweb/crmsh/file/b6bb311c7bd3/ChangeLog == Note about Pacemaker versions == CRM shell 1.2.1 supports all Pacemaker 1.1 versions. The history feature is unfortunately not as well supported with version 1.1.8. == Installing == Installing the CRM shell along with Pacemaker 1.1 versions = v1.1.7 is possible, but it will result in file conflicts. You need to enforce file overwriting when installing packages. == Resources == The CRM shell project web page at GNU savannah: https://savannah.nongnu.org/projects/crmsh/ The sources repository is available at: http://hg.savannah.gnu.org/hgweb/crmsh Packages for several popular Linux distributions: http://download.opensuse.org/repositories/network:/ha-clustering/ The man page: http://crmsh.nongnu.org/crm.8.html Support and bug reporting: http://lists.linux-ha.org/mailman/listinfo/linux-ha https://savannah.nongnu.org/bugs/?group=crmsh Enjoy! Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi, On Fri, Oct 12, 2012 at 08:31:21AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Hi Dejan, Makes sense to me. With the patch, the effective options are create+op rather than create+op1+op2+op3... Will it be a meaning to change the structure of the op-done message? I cannot change op message when I think about other influence. I think that a patch is right by the op message of present lrmd and crmd. We want to apply a patch to glue early if we can do it. I'll do some testing first. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/10/11, Andrew Beekhof beek...@gmail.com wrote: On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We found pacemaker that we could not judge a result of the operation of lrmd well. When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor. (snip) primitive prmDiskd ocf:pacemaker:Dummy \ params name=diskcheck_status_internal device=/dev/vda interval=30 \ op start interval=0 timeout=60s on-fail=restart prereq=fencing \ op monitor interval=30s timeout=60s on-fail=restart \ op stop interval=0s timeout=60s on-fail=block (snip) This is because lrmd gives back prereq parameter of start as a result of monitor operation. As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation. We can confirm this problem by the next command in Pacemaker1.0.12. Command 1) crm_verify command outputs the difference in digest cord. [root@rh63-heartbeat1 ~]# crm_verify -L crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Command 2) The ptest command outputs the difference in digest cord, too. [root@rh63-heartbeat1 ~]# ptest -L -VV ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 [root@rh63-heartbeat1 ~]# Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource. Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 (local) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_3 ) Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[3] CRM_meta_timeout=[6] cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3
Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi Hideo-san, On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We found pacemaker that we could not judge a result of the operation of lrmd well. When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor. (snip) primitive prmDiskd ocf:pacemaker:Dummy \ params name=diskcheck_status_internal device=/dev/vda interval=30 \ op start interval=0 timeout=60s on-fail=restart prereq=fencing \ op monitor interval=30s timeout=60s on-fail=restart \ op stop interval=0s timeout=60s on-fail=block (snip) This is because lrmd gives back prereq parameter of start as a result of monitor operation. As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation. We can confirm this problem by the next command in Pacemaker1.0.12. Command 1) crm_verify command outputs the difference in digest cord. [root@rh63-heartbeat1 ~]# crm_verify -L crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Command 2) The ptest command outputs the difference in digest cord, too. [root@rh63-heartbeat1 ~]# ptest -L -VV ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 [root@rh63-heartbeat1 ~]# Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource. Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 (local) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_3 ) Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[3] CRM_meta_timeout=[6] cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3 (call=4, status=1, cib-update=0, confirmed=true) Cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: yamauchi Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_3 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: parameters device=/dev/vda name=diskcheck_status_internal interval=30 prereq=fencing CRM_meta_timeout=6/ Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3 (call=5, rc=0, cib-update=53, confirmed=false) ok Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_3 (1) confirmed on rh63-heartbeat1 (rc=0) It is a problem to judge
Re: [Linux-ha-dev] Patch for named
Hi Serge, On Mon, Oct 01, 2012 at 08:29:50PM -0600, Serge Dubrouski wrote: Hi, Dejan - Will you apply it? The grep ps part I'll apply. I was just curious why the previous version didn't work, but I guess it's not worth the time to investigate. And I'm trying to understand this part: named_getpid () { local pattern=$OCF_RESKEY_named -if [ -n $OCF_RESKEY_named_rootdir ]; then +if [ -n $OCF_RESKEY_named_rootdir -a x${OCF_RESKEY_named_rootdir} != x/ ]; then pattern=$pattern.*-t $OCF_RESKEY_named_rootdir fi How would named_rootdir be set to / unless the user sets it as a parameter? Why would / then be treated differently? Cheers, Dejan On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com wrote: Yes it is. It also includes a fix for a small bug. So 2 lines changed. On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote: Hi Serge, On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote: Hello - Attached a short patch for named RA to fix improve getpid function. Sorry for the delay. Is this the same as https://github.com/ClusterLabs/resource-agents/issues/134 and https://github.com/ClusterLabs/resource-agents/pull/140 Cheers, Dejan -- Serge Dubrouski. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Serge Dubrouski. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch for named
On Wed, Oct 03, 2012 at 07:59:55AM -0600, Serge Dubrouski wrote: Look at start function. If one sets rootdir parameter to / , then start function strips it and monitor fails. So the patch fixes it. Ah, OK. Missed that. Applied now. Many thanks for the patches! Cheers, Dejan On Oct 3, 2012 7:45 AM, Dejan Muhamedagic de...@suse.de wrote: Hi Serge, On Mon, Oct 01, 2012 at 08:29:50PM -0600, Serge Dubrouski wrote: Hi, Dejan - Will you apply it? The grep ps part I'll apply. I was just curious why the previous version didn't work, but I guess it's not worth the time to investigate. And I'm trying to understand this part: named_getpid () { local pattern=$OCF_RESKEY_named -if [ -n $OCF_RESKEY_named_rootdir ]; then +if [ -n $OCF_RESKEY_named_rootdir -a x${OCF_RESKEY_named_rootdir} != x/ ]; then pattern=$pattern.*-t $OCF_RESKEY_named_rootdir fi How would named_rootdir be set to / unless the user sets it as a parameter? Why would / then be treated differently? Cheers, Dejan On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com wrote: Yes it is. It also includes a fix for a small bug. So 2 lines changed. On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote: Hi Serge, On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote: Hello - Attached a short patch for named RA to fix improve getpid function. Sorry for the delay. Is this the same as https://github.com/ClusterLabs/resource-agents/issues/134 and https://github.com/ClusterLabs/resource-agents/pull/140 Cheers, Dejan -- Serge Dubrouski. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Serge Dubrouski. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch for named
Hi Serge, On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote: Hello - Attached a short patch for named RA to fix improve getpid function. Sorry for the delay. Is this the same as https://github.com/ClusterLabs/resource-agents/issues/134 and https://github.com/ClusterLabs/resource-agents/pull/140 Cheers, Dejan -- Serge Dubrouski. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/