from:"Fabio M. Di Nitto"

Re: [Cluster-devel] [PATCH dlm-tool 1/4] fence: make pkg-config binary as passable make var

2023-04-14 Thread Fabio M. Di Nitto


Hi Alex,

all 4 patches look good to me.

Cheers
Fabio

On 11/04/2023 16.49, Alexander Aring wrote:

This patch defines PKG_CONFIG make var which could be overwrite by the
user like it's the case for dlm_controld Makefile.
---
  fence/Makefile | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fence/Makefile b/fence/Makefile
index ee4dfb88..894f6396 100644
--- a/fence/Makefile
+++ b/fence/Makefile
@@ -19,7 +19,10 @@ CFLAGS += -D_GNU_SOURCE -O2 -ggdb \
  
  CFLAGS += -fPIE -DPIE

  CFLAGS += -I../include
-CFLAGS += $(shell pkg-config --cflags pacemaker-fencing)
+
+PKG_CONFIG ?= pkg-config
+
+CFLAGS += $(shell $(PKG_CONFIG) --cflags pacemaker-fencing)
  
  LDFLAGS += -Wl,-z,relro -Wl,-z,now -pie

  LDFLAGS += -ldl

Re: [Cluster-devel] [ClusterLabs] gfs2-utils 3.5.0 released

2023-02-14 Thread Fabio M. Di Nitto





On 14/02/2023 21.17, Valentin Vidic wrote:

On Tue, Feb 14, 2023 at 06:18:55AM +0100, Fabio M. Di Nitto wrote:

The process would have to look like:

 (usual)
apt-get install 
git clone gfs2-utils
export CFLAGS/LDFLAGS/CC or whatever env var

./autogen.sh
./configure..
make
make 

Using other build tools like debbuild or mock has been problematic in the
past for other projects, might not be the case for gfs2-utils.

so you can try that all in a local VM and let me know the steps, then we can
add it to CI.


Sure, the commands to build and test a 32-bit version look like this for me:

dpkg --add-architecture i386


doh.. didn´t think of cross compilation.


apt-get update
apt-get install --yes build-essential crossbuild-essential-i386 autoconf 
automake autopoint autotools-dev bison flex check:i386 libblkid-dev:i386 
libbz2-dev:i386 libncurses-dev:i386 libtool pkg-config:i386 zlib1g-dev:i386
./configure --build=x86_64-linux-gnu --host=i686-linux-gnu
make
make check



ack perfect, we already have a Debian CI builder dedicated to arm cross 
compilation, we can tweak it to add i386 as well.


Thanks
Fabio

Re: [Cluster-devel] [ClusterLabs] gfs2-utils 3.5.0 released

2023-02-13 Thread Fabio M. Di Nitto





On 13/02/2023 10.58, Andrew Price wrote:

On 11/02/2023 17:16, Valentin Vidić wrote:

On Thu, Feb 09, 2023 at 01:12:58PM +, Andrew Price wrote:
gfs2-utils contains the tools needed to create, check, modify and 
inspect

gfs2 filesystems along with support scripts needed on every gfs2 cluster
node.


Hi,

Some tests seem to be failing for the new version in Debian:

gfs2_edit tests

  37: Save/restoremeta, defaults  FAILED (edit.at:14)
  38: Save/restoremeta, no compression    FAILED (edit.at:24)
  39: Save/restoremeta, min. block size   FAILED (edit.at:34)
  40: Save/restoremeta, 4 journals    FAILED (edit.at:44)
  41: Save/restoremeta, min. block size, 4 journals   FAILED (edit.at:54)
  42: Save metadata to /dev/null  ok

It seems this is all on 32-bit architectures, more info here:

https://buildd.debian.org/status/fetch.php?pkg=gfs2-utils=armel=3.5.0-1=1676127480=0
https://buildd.debian.org/status/fetch.php?pkg=gfs2-utils=armhf=3.5.0-1=1676127632=0
https://buildd.debian.org/status/fetch.php?pkg=gfs2-utils=i386=3.5.0-1=1676127477=0
https://buildd.debian.org/status/fetch.php?pkg=gfs2-utils=mipsel=3.5.0-1=1676130593=0

Can you check?



The smoking gun is

     "stderr:
     Error: File system is too small to restore this metadata.
     File system is 524287 blocks. Restore block = 537439"

It's caused by size_t being used for a variable relating to file size 
and it's too small in 32-bit environments.


It should be fixed by this commit: 
https://pagure.io/fork/andyp/gfs2-utils/c/a3f3aadc789f214cd24606808f5d8a6608e10219


It's waiting for the CI queue to flush after last week's outage but it 
should be in main shortly.


I doubt we have any users on 32-bit architectures but perhaps we can get 
a 32-bit test runner added to the CI pool to prevent these issues 
slipping through anyway.


We had to drop i686 from CI for lack of BaseOS support of 32bit 
OpenStack / Cloud images.


Also, other HA tools like pacemaker, have dropped 32bit support a while 
back. Not sure it´s worth the troubles any more.


If Valentin has an easy way to setup a 64 bit Debian based that will 
build a 32bit env with easy envvar overrides, I am happy to add it to 
the pool for gfs2-utils, but I am not going to build pure i686 images 
for that.


The process would have to look like:

 (usual)
apt-get install 
git clone gfs2-utils
export CFLAGS/LDFLAGS/CC or whatever env var

./autogen.sh
./configure..
make
make 

Using other build tools like debbuild or mock has been problematic in 
the past for other projects, might not be the case for gfs2-utils.


so you can try that all in a local VM and let me know the steps, then we 
can add it to CI.


Fabio

Re: [Cluster-devel] [ClusterLabs Developers] Pacemaker 2.1.0: Should we rename the master branch?

2020-10-21 Thread Fabio M. Di Nitto





On 10/21/2020 7:25 PM, Ken Gaillot wrote:

Maybe we should wait until github finishes putting its plans in place.
Especially if we want to do all projects at once, there's no need to
tie it to a particular Pacemaker release.


Right, I don´t see any reason to tie releases with branch changes.

Let´s keep operations as-is till github has all the infra in place and 
that will make the change much more smooth. It might give me time to 
start changing CI to handle main and master as if they were the same in 
the meantime.


Cheers
Fabio



On Wed, 2020-10-21 at 06:10 +0200, Fabio M. Di Nitto wrote:


On 10/20/2020 7:26 PM, Andrew Price wrote:

[CC+ cluster-devel]

On 19/10/2020 23:59, Ken Gaillot wrote:

On Mon, 2020-10-19 at 07:19 +0200, Fabio M. Di Nitto wrote:

Hi Ken,

On 10/2/2020 8:02 PM, Digimer wrote:

On 2020-10-02 1:12 p.m., Ken Gaillot wrote:

Hi all,

I sent a message to the us...@clusterlabs.org list about
releasing
Pacemaker 2.1.0 next year.

Coincidentally, there is a plan in the git and Github
communities
to
change the default git branch from "master" to "main":

https://github.com/github/renaming

The rationale for the change is not the specific meaning as
used
in
branching, but rather to avoid any possibility of fostering
an
exclusionary environment, and to replace generic metaphors
with
something more obvious (especially to non-native English
speakers).


No objections to the change, but please let´s coordinate the
change
across all HA projects at once, or CI is going to break badly
as the
concept of master branch is embedded everywhere and not per-
project.


Presumably this would be all the projects built by jenkins?


correct.



   booth
   corosync
   fence-agents
   fence-virt
   knet
   libqb
   pacemaker
   pcs
   qdevice
   resource-agents
   sbd

Maintainers, do you think that's practical and desirable?


I think I have super powers all repos to do the switch when github
is
ready to make us the switch. Practical no, there will be
disruptions...
desirable no, it´s extra work, but the point is that it is doable.



If the ClusterLabs projects switch together I might take the
opportunity
to make the switch in gfs2-utils.git at the same time, for
consistency.


Is there a single name that makes sense for all projects? "next",
"development" or "unstable" captures how pacemaker uses master,
not
sure about other projects. "main" is generic enough for all
projects,
but so generic it doesn't give an idea of how it's used. Or we
could go
for something distinctive like fedora's "rawhide" or suse's
"tumbleweed".


"main" works for me, it seems to be the most widely adopted
alternative
thanks to Github, so its purpose will be clear by convention. That
said,
it doesn't matter too much as long as the remote HEAD is set to the
new
branch.


I would go for main and follow github recommendations. They are
putting
automatic redirects in place to smooth the transition and we can
avoid
spending time finding a name that won´t offend some delicate soul
over
the internet.



Another question is how to do the switch without causing confusion
the
next time someone pulls. It might be safest to simply create the
main
branch and delete the master branch (rather than, say, replacing
all of
the content in master with an explanatory note). That way a 'git
pull'
gives a hint of the change and no messy conflicts:

$ git pull
From /tmp/gittest/upstream
 * [new branch]  main   -> origin/main
Your configuration specifies to merge with the ref
'refs/heads/master'
from the remote, but no such ref was fetched.

Maybe also push a 'master_is_now_main' tag annotated with 'use git
branch -u origin/main to fix tracking branches'. Or maybe that's
excessive :)


Let´s wait for github to put those in place for us. No point to
re-invent the wheel. Last blog I read they were working to do it at
infrastructure level and that would save us a lot of headaches and
complications.

IIRC they will add main branch automatically to new projects and
transition old ones. the master branch will be an automatic redirect
to
main. Basically will solve 99% of our issues. git pull won´t break
etc.

Cheers
Fabio



Cheers,
Andy


Since we are admin of all repositories, we can do it in one
shot
without
too much pain and suffering in CI. It will require probably a
day or
two
of CI downtime to rebuild the world as well.

Fabio



The change would not affect existing repositories/projects.
However I
am wondering if we should take the opportunity of the
minor-
version
bump to do the same for Pacemaker. The impact on developers
would
be a
one-time process for each checkout/fork:

https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes#Development_changes


In my opinion, this is a minor usage that many existing
projects
will
not bother changing, but I do think that since all new
projects
will
default to "main", some

Re: [Cluster-devel] [ClusterLabs Developers] Pacemaker 2.1.0: Should we rename the master branch?

2020-10-20 Thread Fabio M. Di Nitto





On 10/20/2020 7:26 PM, Andrew Price wrote:

[CC+ cluster-devel]

On 19/10/2020 23:59, Ken Gaillot wrote:

On Mon, 2020-10-19 at 07:19 +0200, Fabio M. Di Nitto wrote:

Hi Ken,

On 10/2/2020 8:02 PM, Digimer wrote:

On 2020-10-02 1:12 p.m., Ken Gaillot wrote:

Hi all,

I sent a message to the us...@clusterlabs.org list about
releasing
Pacemaker 2.1.0 next year.

Coincidentally, there is a plan in the git and Github communities
to
change the default git branch from "master" to "main":

   https://github.com/github/renaming

The rationale for the change is not the specific meaning as used
in
branching, but rather to avoid any possibility of fostering an
exclusionary environment, and to replace generic metaphors with
something more obvious (especially to non-native English
speakers).


No objections to the change, but please let´s coordinate the change
across all HA projects at once, or CI is going to break badly as the
concept of master branch is embedded everywhere and not per-project.


Presumably this would be all the projects built by jenkins?


correct.



  booth
  corosync
  fence-agents
  fence-virt
  knet
  libqb
  pacemaker
  pcs
  qdevice
  resource-agents
  sbd

Maintainers, do you think that's practical and desirable?


I think I have super powers all repos to do the switch when github is 
ready to make us the switch. Practical no, there will be disruptions... 
desirable no, it´s extra work, but the point is that it is doable.




If the ClusterLabs projects switch together I might take the opportunity 
to make the switch in gfs2-utils.git at the same time, for consistency.



Is there a single name that makes sense for all projects? "next",
"development" or "unstable" captures how pacemaker uses master, not
sure about other projects. "main" is generic enough for all projects,
but so generic it doesn't give an idea of how it's used. Or we could go
for something distinctive like fedora's "rawhide" or suse's
"tumbleweed".


"main" works for me, it seems to be the most widely adopted alternative 
thanks to Github, so its purpose will be clear by convention. That said, 
it doesn't matter too much as long as the remote HEAD is set to the new 
branch.


I would go for main and follow github recommendations. They are putting 
automatic redirects in place to smooth the transition and we can avoid 
spending time finding a name that won´t offend some delicate soul over 
the internet.




Another question is how to do the switch without causing confusion the 
next time someone pulls. It might be safest to simply create the main 
branch and delete the master branch (rather than, say, replacing all of 
the content in master with an explanatory note). That way a 'git pull' 
gives a hint of the change and no messy conflicts:


   $ git pull
   From /tmp/gittest/upstream
    * [new branch]  main   -> origin/main
   Your configuration specifies to merge with the ref 'refs/heads/master'
   from the remote, but no such ref was fetched.

Maybe also push a 'master_is_now_main' tag annotated with 'use git 
branch -u origin/main to fix tracking branches'. Or maybe that's 
excessive :)


Let´s wait for github to put those in place for us. No point to 
re-invent the wheel. Last blog I read they were working to do it at 
infrastructure level and that would save us a lot of headaches and 
complications.


IIRC they will add main branch automatically to new projects and 
transition old ones. the master branch will be an automatic redirect to 
main. Basically will solve 99% of our issues. git pull won´t break etc.


Cheers
Fabio



Cheers,
Andy


Since we are admin of all repositories, we can do it in one shot
without
too much pain and suffering in CI. It will require probably a day or
two
of CI downtime to rebuild the world as well.

Fabio



The change would not affect existing repositories/projects.
However I
am wondering if we should take the opportunity of the minor-
version
bump to do the same for Pacemaker. The impact on developers would
be a
one-time process for each checkout/fork:

https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes#Development_changes 



In my opinion, this is a minor usage that many existing projects
will
not bother changing, but I do think that since all new projects
will
default to "main", sometime in the future any project still using
"master" will appear outdated to young developers.

We could use "main" or something else. Some projects are
switching to
names like "release", "stable", or "next" depending on how
they're
actually using the branch ("next" would be appropriate in
Pacemaker's
case).

This will probably go on for years, so I am fine with either
changing
it with 2.1.0 (since it has bigger changes than usual, and we can
get
ahead of the curve) or waiting until the dust settles and future
conventions are

Re: [Cluster-devel] [Linux-cluster] fence-agents-4.0.16 stable release

2015-03-05 Thread Fabio M. Di Nitto



On 3/5/2015 12:47 PM, Marek marx Grac wrote:
 Welcome to the fence-agents 4.0.16 release
 
 This release includes several bugfixes and features:
 * fence_kdump has implemented 'monitor' action that check if local node
 is capable of working with kdump
 * path to smnp(walk|get|set) can be set at runtime
 * new operation 'validate-all' for majority of agents that checks if
 entered parameters are sufficient without connecting to fence
 device. Be aware that some checks can be done only after we receive
 information
 from fence device, so these are not tested.
 * new operation 'list-status' that present CSV output (plug_number,
 plug_alias, plug_status) where status
 is ON/OFF/UNKNOWN
 
 Git repository was moved to https://github.com/ClusterLabs/fence-agents/
 so this is last
 release made from fedorahosted.
 
 The new source tarball can be downloaded here:
 
 https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.16.tar.xz
 
 
 To report bugs or issues:
 
 https://bugzilla.redhat.com/
 
 Would you like to meet the cluster team or members of its community?
 
Join us on IRC (irc.freenode.net #linux-cluster) and share your
experience  with other sysadministrators or power users.


There is a new IRC channel in use now. #clusterlabs on Freenode. We are
slowly dismissing #linux-cluster and centralize all cluster related
activities on the new channel.

Fabio

Re: [Cluster-devel] [ha-wg] [Planning] Organizing HA Summit 2015

2015-01-24 Thread Fabio M. Di Nitto

All,

On 1/13/2015 6:31 AM, Digimer wrote:
 Hi all,
 
   With Fabio away for now, I (and others) are working on the final
 preparations for the summit. This is your chance to speak up and
 influence the planning! Objections/suggestions? Speak now please. :)

Digimer, I would like to thank you very much for helping in the
organization of the summit.

I unfortunately have to cancel my travel and won´t be able to attend
myself. Maybe I´ll join some sessions remotely if time allows.

I wish everybody to have a great time in Brno and make the best out of
it! I am really looking forward to see the outcome when so many
brilliant people will sit in the same room.

Cheers
Fabio

Re: [Cluster-devel] [Pacemaker] Wiki for planning created - Re: [RFC] Organizing HA Summit 2015

2014-11-28 Thread Fabio M. Di Nitto



On 11/28/2014 8:10 PM, Jan Pokorný wrote:
 On 28/11/14 00:37 -0500, Digimer wrote:
 On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote:
 On 11/27/2014 5:52 PM, Digimer wrote:
 I just created a dedicated/fresh wiki for planning and organizing:

 http://plan.alteeve.ca/index.php/Main_Page

 [...]

 Awesome! thanks for taking care of it. Do you have a chance to add also
 an instance of etherpad to the site?

 Mostly to do collaborative editing while we sit all around the same table.

 Otherwise we can use a public instance and copy paste info after that in
 the wiki.

 Never tried setting up etherpad before, but if it runs on rhel 6, I should
 have no problem setting it up.
 
 Provided no conspiracy to be started, there are a bunch of popular
 instances, e.g. http://piratepad.net/
 

Right, some of them only store etherpads for 30 days. Just be careful
the one we choose or we make our own.

Fabio

Re: [Cluster-devel] [ha-wg-technical] [Pacemaker] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-27 Thread Fabio M. Di Nitto



On 11/27/2014 1:33 PM, Kristoffer Grönlund wrote:
 
 On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree l...@suse.com wrote:

 On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:

 Okay, okay, apparently we have got enough topics to discuss. I'll
 grumble a bit more about Brno, but let's get the organisation of that
 thing on track ... Sigh. Always so much work!

 
 Will Chris Feist be at the summit?

Yes :)

Fabio

 I would be happy to have a roundtable
 discussion or something similar about clients, exchange ideas and so
 on. I don't necessarily think that there is an urgent need to unify the
 efforts code-wise, but I think there is a lot we could do together on
 the level of idea exchange without giving up our independence, so to
 speak ;)
 
 Of course I would be happy to talk about such things with anyone else
 who is interested as well.

Re: [Cluster-devel] [ha-wg-technical] [Pacemaker] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-27 Thread Fabio M. Di Nitto



On 11/27/2014 1:33 PM, Kristoffer Grönlund wrote:
 
 On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree l...@suse.com wrote:

 On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:

 Okay, okay, apparently we have got enough topics to discuss. I'll
 grumble a bit more about Brno, but let's get the organisation of that
 thing on track ... Sigh. Always so much work!

 
 Will Chris Feist be at the summit? I would be happy to have a roundtable
 discussion or something similar about clients, exchange ideas and so
 on. I don't necessarily think that there is an urgent need to unify the
 efforts code-wise, but I think there is a lot we could do together on
 the level of idea exchange without giving up our independence, so to
 speak ;)
 
 Of course I would be happy to talk about such things with anyone else
 who is interested as well.
 

sorry, I keep replying from my private email address...

Yes Chris will be there too.

Fabio

Re: [Cluster-devel] Wiki for planning created - Re: [Pacemaker] [RFC] Organizing HA Summit 2015

2014-11-27 Thread Fabio M. Di Nitto



On 11/27/2014 5:52 PM, Digimer wrote:
 I just created a dedicated/fresh wiki for planning and organizing:
 
 http://plan.alteeve.ca/index.php/Main_Page
 
 Other than the domain, it has no association with any existing project,
 so it should be a neutral enough platform. Also, it's not owned by
 $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If
 there is concern, I can setup https.
 
 If no one else gets to it before me, I'll start collating the data from
 the mailing list onto that wiki tomorrow (maaaybe today, depends).
 
 The wiki requires registration, but that's it. I'm not bothering with
 captchas because, in my experience, spammer walk right through them
 anyway. I do have edits email me, so I can catch and roll back any spam
 quickly.
 

Awesome! thanks for taking care of it. Do you have a chance to add also
an instance of etherpad to the site?

Mostly to do collaborative editing while we sit all around the same table.

Otherwise we can use a public instance and copy paste info after that in
the wiki.

Fabio

Re: [Cluster-devel] [ha-wg] [Pacemaker] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Fabio M. Di Nitto



On 11/26/2014 4:41 PM, Lars Marowsky-Bree wrote:
 On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote:
 
 Okay, okay, apparently we have got enough topics to discuss. I'll
 grumble a bit more about Brno, but let's get the organisation of that
 thing on track ... Sigh. Always so much work!
 
 I'm assuming arrival on the 3rd and departure on the 6th would be the
 plan?

Yes that´s correct. Devconf starts the 6.

Fabio

 
 Personally I'm interested in talking about scaling - with pacemaker-remoted
 and/or a new messaging/membership layer.
 If we're going to talk about scaling, we should throw in our new docker 
 support
 in the same discussion. Docker lends itself well to the pet vs cattle 
 analogy.
 I see management of docker with pacemaker making quite a bit of sense now 
 that we
 have the ability to scale into the cattle territory.
 
 While we're on that, I'd like to throw in a heretic thought and suggest
 that one might want to look at etcd and fleetd.
 
 Other design-y topics:
 - SBD
 
 Point taken. I have actually not forgotten this Andrew, and am reading
 your development. I probably just need to pull the code over ...
 
 - degraded mode
 - improved notifications
 - containerisation of services (cgroups, docker, virt)
 - resource-agents (upstream releases, handling of pull requests, testing)

 Yep, We definitely need to talk about the resource-agents.
 
 Agreed.
 
 User-facing topics could include recent features (ie. pacemaker-remoted,
 crm_resource --restart) and common deployment scenarios (eg. NFS) that
 people get wrong.
 Adding to the list, it would be a good idea to talk about Deployment
 integration testing, what's going on with the phd project and why it's
 important regardless if you're interested in what the project functionally
 does.
 
 OK. So QA is within scope as well. It seems the agenda will fill up
 quite nicely.
 
 
 Regards,
 Lars

Re: [Cluster-devel] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Fabio M. Di Nitto



On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote:
 On 2014-11-24T16:16:05, Fabio M. Di Nitto fdini...@redhat.com wrote:
 
 Yeah, well, devconf.cz is not such an interesting event for those who do
 not wear the fedora ;-)
 That would be the perfect opportunity for you to convert users to Suse ;)
 
 I´d prefer, at least for this round, to keep dates/location and explore
 the option to allow people to join remotely. Afterall there are tons of
 tools between google hangouts and others that would allow that.
 That is, in my experience, the absolute worst. It creates second class
 participants and is a PITA for everyone.
 I agree, it is still a way for people to join in tho.
 
 I personally disagree. In my experience, one either does a face-to-face
 meeting, or a virtual one that puts everyone on the same footing.
 Mixing both works really badly unless the team already knows each
 other.
 
 I know that an in-person meeting is useful, but we have a large team in
 Beijing, the US, Tasmania (OK, one crazy guy), various countries in
 Europe etc.
 Yes same here. No difference.. we have one crazy guy in Australia..
 
 Yeah, but you're already bringing him for your personal conference.
 That's a bit different. ;-)
 
 OK, let's switch tracks a bit. What *topics* do we actually have? Can we
 fill two days? Where would we want to collect them?

I´d say either a google doc or any random etherpad/wiki instance will do
just fine.

As for the topics:
- corosync qdevice and plugins (network, disk, integration with sdb?,
  others?)
- corosync RRP / libknet integration/replacement
- fence autodetection/autoconfiguration

For the user facing topics (that is if there are enough participants and
I only got 1 user confirmation so far):

- demos, cluster 101, tutorials
- get feedback
- get feedback
- get more feedback

Fabio

Re: [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Fabio M. Di Nitto



On 11/24/2014 3:39 PM, Lars Marowsky-Bree wrote:
 On 2014-09-08T12:30:23, Fabio M. Di Nitto fdini...@redhat.com wrote:
 
 Folks, Fabio,
 
 thanks for organizing this and getting the ball rolling. And again sorry
 for being late to said game; I was busy elsewhere.
 
 However, it seems that the idea for such a HA Summit in Brno/Feb 2015
 hasn't exactly fallen on fertile grounds, even with the suggested
 user/client day. (Or if there was a lot of feedback, it wasn't
 public.)
 
 I wonder why that is, and if/how we can make this more attractive?
 
 Frankly, as might have been obvious ;-), for me the venue is an issue.
 It's not easy to reach, and I'm theoretically fairly close in Germany
 already.
 
 I wonder if we could increase participation with a virtual meeting (on
 either those dates or another), similar to what the Ceph Developer
 Summit does?
 
 Those appear really productive and make it possible for a wide range of
 interested parties from all over the world to attend, regardless of
 travel times, or even just attend select sessions (that would otherwise
 make it hard to justify travel expenses  time off).
 
 
 Alternatively, would a relocation to a more connected venue help, such
 as Vienna xor Prague?
 
 
 I'd love to get some more feedback from the community.

I agree. some feedback would be useful.

 
 As Fabio put it, yes, I *can* suck it up and go to Brno if that's where
 everyone goes to play ;-), but I'd also prefer to have a broader
 participation.

dates and location were chosen to piggy-back with devconf.cz and allow
people to travel for more than just HA Summit.

I´d prefer, at least for this round, to keep dates/location and explore
the option to allow people to join remotely. Afterall there are tons of
tools between google hangouts and others that would allow that.

Fabio

Re: [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Fabio M. Di Nitto



On 11/24/2014 4:12 PM, Lars Marowsky-Bree wrote:
 On 2014-11-24T15:54:33, Fabio M. Di Nitto fdini...@redhat.com wrote:
 
 dates and location were chosen to piggy-back with devconf.cz and allow
 people to travel for more than just HA Summit.
 
 Yeah, well, devconf.cz is not such an interesting event for those who do
 not wear the fedora ;-)

That would be the perfect opportunity for you to convert users to Suse ;)

 
 I´d prefer, at least for this round, to keep dates/location and explore
 the option to allow people to join remotely. Afterall there are tons of
 tools between google hangouts and others that would allow that.
 
 That is, in my experience, the absolute worst. It creates second class
 participants and is a PITA for everyone.

I agree, it is still a way for people to join in tho.

 
 I know that an in-person meeting is useful, but we have a large team in
 Beijing, the US, Tasmania (OK, one crazy guy), various countries in
 Europe etc.
 

Yes same here. No difference.. we have one crazy guy in Australia..

Fabio

Re: [Cluster-devel] [ha-wg] [ha-wg-technical] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-11 Thread Fabio M. Di Nitto



On 11/5/2014 4:16 PM, Lars Ellenberg wrote:
 On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote:
 All the cool kids will be there.

 You want to be a cool kid, right?
 
 Well, no. ;-)
 
 But I'll still be there,
 and a few other Linbit'ers as well.
 
 Fabio, let us know what we could do to help make it happen.
 

I appreciate the offer.

Assuming we achieve quorum to do the event, I´d say that I´ll take of
the meeting rooms/hotel logistics and one lunch and learn pizza event.
It would be nice if others could organize a dinner event.

Cheers
Fabio



   Lars
 
 On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote:
 just a kind reminder.

 On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
 All,

 it's been almost 6 years since we had a face to face meeting for all
 developers and vendors involved in Linux HA.

 I'd like to try and organize a new event and piggy-back with DevConf in
 Brno [1].

 DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.

 My suggestion would be to have a 2 days dedicated HA summit the 4th and
 the 5th of February.

 The goal for this meeting is to, beside to get to know each other and
 all social aspect of those events, tune the directions of the various HA
 projects and explore common areas of improvements.

 I am also very open to the idea of extending to 3 days, 1 one dedicated
 to customers/users and 2 dedicated to developers, by starting the 3rd.

 Thoughts?

 Fabio

 PS Please hit reply all or include me in CC just to make sure I'll see
 an answer :)

 [1] http://devconf.cz/

 Could you please let me know by end of Nov if you are interested or not?

 I have heard only from few people so far.

 Cheers
 Fabio
 ___
 ha-wg mailing list
 ha...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/ha-wg

Re: [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-10-31 Thread Fabio M. Di Nitto

just a kind reminder.

On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
 All,
 
 it's been almost 6 years since we had a face to face meeting for all
 developers and vendors involved in Linux HA.
 
 I'd like to try and organize a new event and piggy-back with DevConf in
 Brno [1].
 
 DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
 
 My suggestion would be to have a 2 days dedicated HA summit the 4th and
 the 5th of February.
 
 The goal for this meeting is to, beside to get to know each other and
 all social aspect of those events, tune the directions of the various HA
 projects and explore common areas of improvements.
 
 I am also very open to the idea of extending to 3 days, 1 one dedicated
 to customers/users and 2 dedicated to developers, by starting the 3rd.
 
 Thoughts?
 
 Fabio
 
 PS Please hit reply all or include me in CC just to make sure I'll see
 an answer :)
 
 [1] http://devconf.cz/

Could you please let me know by end of Nov if you are interested or not?

I have heard only from few people so far.

Cheers
Fabio

Re: [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-09-09 Thread Fabio M. Di Nitto

Hi Alan,

On 09/09/2014 03:11 PM, Alan Robertson wrote:
 Hi Fabio,
 
 Do you know much about the Brno DevConf?

It would be my first visit to DevConf so not much really :)

 
 I was wondering if the Assimilation Project might be interesting to the
 audience there.
 http://assimilationsystems.com/
 http://assimproj.org/
 
 It's related to High Availability in that we monitor systems and
 services with zero configuration - we even use OCF RAs ;-).  Because of
 that, we could eventually intervene in systems - restarting services, or
 even migrating them.  That's not in current plans, but it is technically
 very possible.

I don't see why not. HA Summit != pacemaker ;)

Having a pool of presentations from other HA related project would be cool.

 
 But it's so much more than that - and HUGELY scalable - 10K servers
 without breathing hard, and 100K servers without proxies, etc.  It also
 discovers systems, services, dependencies, switch connections, and lots
 of other things.  Basically everything is done with near-zero
 configuration.  We wind up with a graph database describing everything
 in great detail - and it's continually up to date.

sounds interesting. Would you be willing to join us for a presentation/demo?

 
 I don't know if you know me, but I founded the Linux-HA project and led
 it for about 10 years.

Yeps, your name is very well known :)

Cheers
Fabio

 
 -- Alan Robertson
 al...@unix.sh
 
 
 On 09/08/2014 04:30 AM, Fabio M. Di Nitto wrote:
 All,

 it's been almost 6 years since we had a face to face meeting for all
 developers and vendors involved in Linux HA.

 I'd like to try and organize a new event and piggy-back with DevConf in
 Brno [1].

 DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.

 My suggestion would be to have a 2 days dedicated HA summit the 4th and
 the 5th of February.

 The goal for this meeting is to, beside to get to know each other and
 all social aspect of those events, tune the directions of the various HA
 projects and explore common areas of improvements.

 I am also very open to the idea of extending to 3 days, 1 one dedicated
 to customers/users and 2 dedicated to developers, by starting the 3rd.

 Thoughts?

 Fabio

 PS Please hit reply all or include me in CC just to make sure I'll see
 an answer :)

 [1] http://devconf.cz/
 ___
 Linux-HA mailing list
 linux...@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

Re: [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-09-09 Thread Fabio M. Di Nitto

On 09/09/2014 06:31 PM, Alan Robertson wrote:
 My apologizes for spamming everyone.
 
 I thought I deleted all the other email addresses.
 
 I failed.
 
 Apologies :-(


I think it's good that we have an open discussion with all parties
involved. I hardly fail to see that as an issue.

Apologies not accepted ;)

Fabio

[Cluster-devel] [RFC] Organizing HA Summit 2015

2014-09-08 Thread Fabio M. Di Nitto

All,

it's been almost 6 years since we had a face to face meeting for all
developers and vendors involved in Linux HA.

I'd like to try and organize a new event and piggy-back with DevConf in
Brno [1].

DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.

My suggestion would be to have a 2 days dedicated HA summit the 4th and
the 5th of February.

The goal for this meeting is to, beside to get to know each other and
all social aspect of those events, tune the directions of the various HA
projects and explore common areas of improvements.

I am also very open to the idea of extending to 3 days, 1 one dedicated
to customers/users and 2 dedicated to developers, by starting the 3rd.

Thoughts?

Fabio

PS Please hit reply all or include me in CC just to make sure I'll see
an answer :)

[1] http://devconf.cz/

Re: [Cluster-devel] [PATCH]fence-virtd: Fix typo in debug mesage of do_fence_request_tcp

2014-05-15 Thread Fabio M. Di Nitto

On 05/15/2014 08:45 PM, Masatake YAMATO wrote:
 I'ms sorry. I should post this to linux-cluster list.

nope, cluster-devel is the right place! thanks for the patch.

 
 Masatake YAMATO
 
 fence-virtd: Fix typo in debug mesage of do_fence_request_tcp

 Signed-off-by: Masatake YAMATO yam...@redhat.com
 ---
  server/mcast.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/server/mcast.c b/server/mcast.c
 index e850ec7..5fbe46a 100644
 --- a/server/mcast.c
 +++ b/server/mcast.c
 @@ -250,7 +250,7 @@ do_fence_request_tcp(fence_req_t *req, mcast_info *info)
  
  fd = connect_tcp(req, info-args.auth, info-key, info-key_len);
  if (fd  0) {
 -dbg_printf(2, Could call back for fence request: %s\n, 
 +dbg_printf(2, Could not call back for fence request: %s\n, 
  strerror(errno));
  goto out;
  }
 -- 
 1.9.0

Re: [Cluster-devel] [cluster.git/STABLE32][PATCH] xml: ccs_update_schema: be verbose about extraction fail

2014-04-30 Thread Fabio M. Di Nitto

ACK

Fabio

On 4/29/2014 11:30 PM, Jan Pokorný wrote:
 Previously, the distillation of resource-agents' metadata could fail
 from unexpected reasons without any evidence ever being made, unlike
 in case of fence-agents.  Also no metadata and issue with their
 extraction will allegedly yield the same outcome, so it is reflected
 in the comments being emitted to the schema for both sorts of agents.
 
 Signed-off-by: Jan Pokorný jpoko...@redhat.com
 ---
  config/tools/xml/ccs_update_schema.in | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/config/tools/xml/ccs_update_schema.in 
 b/config/tools/xml/ccs_update_schema.in
 index 98ed885..b63c987 100644
 --- a/config/tools/xml/ccs_update_schema.in
 +++ b/config/tools/xml/ccs_update_schema.in
 @@ -215,6 +215,9 @@ generate_ras() {
   lecho  ras: processing $(basename $i)
   $i meta-data 2/dev/null | xsltproc $rngdir/ra2rng.xsl -  \
   $outputdir/resources.rng.cache 2/dev/null
 + [ $? != 0 ]  \
 + echo   !-- Problem evaluating metadata for $i \
 + --  $outputdir/resources.rng.cache
   done
   cat $rngdir/resources.rng.mid  $outputdir/resources.rng.cache
   lecho  ras: generating ref data
 @@ -301,8 +304,8 @@ generate_fas() {
   xsltproc $rngdir/fence2rng.xsl -  \
   $outputdir/fence_agents.rng.cache 2/dev/null
   [ $? != 0 ]  \
 - echo   !-- No metadata for $i --  \
 - $outputdir/fence_agents.rng.cache
 + echo   !-- Problem evaluating metadata for $i \
 + --  $outputdir/fence_agents.rng.cache
   done
   cat $rngdir/fence.rng.tail  $outputdir/fence_agents.rng.cache
  }

Re: [Cluster-devel] [PATCH] fencing: Replace printing to stderr with proper logging solution

2014-04-02 Thread Fabio M. Di Nitto

On 04/02/2014 05:06 PM, Marek 'marx' Grac wrote:
 This patch replaces local solutions by standard python logging module. Levels 
 of messages
 is not final, it just reflects the previous state. So, debug level is 
 available only with
 -v / verbose option.

Hi Marek,

are we keeping out-of-tree agents in sync too? specifically fence_virt
and fence_sanlock.

Fabio

Re: [Cluster-devel] [PATCH] fencing: Add support for ipmitool/amttool binaries during autoconf

2013-12-02 Thread Fabio M. Di Nitto

Thanks for doing it, we still need to change the agent to use
IPMITOOL_PATH  co. :)

Fabio

On 12/02/2013 04:39 PM, Marek 'marx' Grac wrote:
 Configuration of autoconf was extended to dynamically find ipmitool/amttool. 
 If the binary is not found on
 the system then we will switch to default values (Fedora/RHEL). Path to 
 binaries is exported and replaced in
 fencebuild using same processes as a version number or sbin/logdir.
 ---
  configure.ac   |6 ++
  make/fencebuild.mk |2 ++
  2 files changed, 8 insertions(+), 0 deletions(-)
 
 diff --git a/configure.ac b/configure.ac
 index 6f4baa0..02c46b8 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -163,6 +163,9 @@ LOGDIR=${localstatedir}/log/cluster
  CLUSTERVARRUN=${localstatedir}/run/cluster
  CLUSTERDATA=${datadir}/cluster
  
 +## path to 3rd-party binaries
 +AC_PATH_PROG([IPMITOOL_PATH], [ipmitool], [/usr/bin/ipmitool])
 +AC_PATH_PROG([AMTTOOL_PATH], [amttool], [/usr/bin/amttool])
  ## do subst
  
  AC_SUBST([DEFAULT_CONFIG_DIR])
 @@ -187,6 +190,9 @@ AC_SUBST([SNMPBIN])
  AC_SUBST([AGENTS_LIST])
  AM_CONDITIONAL(BUILD_XENAPILIB, test $XENAPILIB -eq 1)
  
 +AC_SUBST([IPMITOOL_PATH])
 +AC_SUBST([AMTTOOL_PATH])
 +
  ## *FLAGS handling
  
  ENV_CFLAGS=$CFLAGS
 diff --git a/make/fencebuild.mk b/make/fencebuild.mk
 index 15a47fd..5cbe3bd 100644
 --- a/make/fencebuild.mk
 +++ b/make/fencebuild.mk
 @@ -9,6 +9,8 @@ $(TARGET): $(SRC)
   -e 's#@''LOGDIR@#${LOGDIR}#g' \
   -e 's#@''SBINDIR@#${sbindir}#g' \
   -e 's#@''LIBEXECDIR@#${libexecdir}#g' \
 + -e 's#@''IPMITOOL_PATH#${IPMITOOL_PATH}#g' \
 + -e 's#@''AMTTOOL_PATH#${AMTTOOL_PATH}#g' \
$@
  
   if [ 0 -eq `echo $(SRC) | grep fence_  /dev/null; echo $$?` ]; then 
 \

Re: [Cluster-devel] [PATCH 2/3] fence_ipmilan: option --method and new option --ipmitool-path

2013-11-29 Thread Fabio M. Di Nitto

On 11/29/2013 05:32 PM, Ondrej Mular wrote:
 Add support for option --method and new option --ipmitool-path
 
 ---
  fence/agents/ipmilan/fence_ipmilan.py | 80 
 +++
  1 file changed, 54 insertions(+), 26 deletions(-)
 
 diff --git a/fence/agents/ipmilan/fence_ipmilan.py 
 b/fence/agents/ipmilan/fence_ipmilan.py
 index 5c32690..4d33234 100644
 --- a/fence/agents/ipmilan/fence_ipmilan.py
 +++ b/fence/agents/ipmilan/fence_ipmilan.py
 @@ -11,14 +11,6 @@ REDHAT_COPYRIGHT=
  BUILD_DATE=
  #END_VERSION_GENERATION
  
 -PATHS = [/usr/local/bull/NSMasterHW/bin/ipmitool,
 -/usr/bin/ipmitool,
 -/usr/sbin/ipmitool,
 -/bin/ipmitool,
 -/sbin/ipmitool,
 -/usr/local/bin/ipmitool,
 -/usr/local/sbin/ipmitool]
 -
  def get_power_status(_, options):
  
  cmd = create_command(options, status)
 @@ -28,9 +20,8 @@ def get_power_status(_, options):
  
  try:
  process = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, 
 stderr=subprocess.PIPE)
 -except OSError, ex:
 -print ex
 -fail(EC_TOOL_FAIL)
 +except OSError:
 +fail_usage(Ipmitool not found or not accessible)
  
  process.wait()
  
 @@ -54,13 +45,31 @@ def set_power_status(_, options):
  process = subprocess.Popen(shlex.split(cmd), stdout=null, 
 stderr=null)
  except OSError:
  null.close()
 -fail(EC_TOOL_FAIL)
 +fail_usage(Ipmitool not found or not accessible)
  
  process.wait()
  null.close()
  
  return
  
 +def reboot_cycle(_, options):
 +cmd = create_command(options, cycle)
 +
 +if options[log] = LOG_MODE_VERBOSE:
 +options[debug_fh].write(executing:  + cmd + \n)
 +
 +try:
 +process = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, 
 stderr=subprocess.PIPE)
 +except OSError:
 +fail_usage(Ipmitool not found or not accessible)
 +
 +process.wait()
 +
 +out = process.communicate()
 +process.stdout.close()
 +
 +return bool(re.search('chassis power control: cycle', str(out).lower()))
 +
  def is_executable(path):
  if os.path.exists(path):
  stats = os.stat(path)
 @@ -68,13 +77,17 @@ def is_executable(path):
  return True
  return False
  
 -def get_ipmitool_path():
 -for path in PATHS:
 -if is_executable(path):
 -return path
 +def get_ipmitool_path(options):
 +if type(options[--ipmitool-path]) == type(list()):
 +for path in options[--ipmitool-path]:
 +if is_executable(path):
 +return path
 +else:
 +if is_executable(options[--ipmitool-path]):
 +return options[--ipmitool-path]
  return None
  
 -def create_command(options, action):
 +def create_command(options, action):
  cmd = options[ipmitool_path]
  
  # --lanplus / -L
 @@ -120,7 +133,7 @@ def define_new_opts():
  all_opt[lanplus] = {
  getopt : L,
  longopt : lanplus,
 -help : -L, --lanplusUse Lanplus to improve security of 
 connection,
 +help : -L, --lanplus  Use Lanplus to improve 
 security of connection,
  required : 0,
  shortdesc : Use Lanplus to improve security of connection,
  order: 1
 @@ -128,7 +141,7 @@ def define_new_opts():
  all_opt[auth] = {
  getopt : A:,
  longopt : auth,
 -help : -A, --auth=[auth]IPMI Lan Auth type 
 (md5|password|none),
 +help : -A, --auth=[auth]  IPMI Lan Auth type 
 (md5|password|none),
  required : 0,
  shortdesc : IPMI Lan Auth type.,
  default : none,
 @@ -138,7 +151,7 @@ def define_new_opts():
  all_opt[cipher] = {
  getopt : C:,
  longopt : cipher,
 -help : -C, --cipher=[cipher]Ciphersuite to use (same as 
 ipmitool -C parameter),
 +help : -C, --cipher=[cipher]  Ciphersuite to use (same as 
 ipmitool -C parameter),
  required : 0,
  shortdesc : Ciphersuite to use (same as ipmitool -C parameter),
  default : 0,
 @@ -147,28 +160,44 @@ def define_new_opts():
  all_opt[privlvl] = {
  getopt : P:,
  longopt : privlvl,
 -help : -P, --privlvl=[level]Privilege level on IPMI device 
 (callback|user|operator|administrator),
 +help : -P, --privlvl=[level]  Privilege level on IPMI 
 device (callback|user|operator|administrator),
  required : 0,
  shortdesc : Privilege level on IPMI device,
  default : administrator,
  choices : [callback, user, operator, administrator],
  order: 1
  }
 +all_opt[ipmitool_path] = {
 +getopt : i:,
 +longopt : ipmitool-path,
 +help : --ipmitool-path=[path] Path to ipmitool binary,
 +required : 0,
 +shortdesc : Path to ipmitool binary,
 +default : [/usr/local/bull/NSMasterHW/bin/ipmitool,
 +/usr/bin/ipmitool,
 +

Re: [Cluster-devel] [PATCH 3/3] fence_amt: option --method and new option --amttool-path

2013-11-29 Thread Fabio M. Di Nitto

On 11/29/2013 05:32 PM, Ondrej Mular wrote:
 Add support for option --method and new option --amttool-path
 
 ---
  fence/agents/amt/fence_amt.py | 72 
 ++-
  1 file changed, 57 insertions(+), 15 deletions(-)
 
 diff --git a/fence/agents/amt/fence_amt.py b/fence/agents/amt/fence_amt.py
 index 8fe2dbc..7077828 100755
 --- a/fence/agents/amt/fence_amt.py
 +++ b/fence/agents/amt/fence_amt.py
 @@ -1,6 +1,6 @@
  #!/usr/bin/python
  
 -import sys, subprocess, re
 +import sys, subprocess, re, os, stat
  from pipes import quote
  sys.path.append(@FENCEAGENTSLIBDIR@)
  from fencing import *
 @@ -21,12 +21,11 @@ def get_power_status(_, options):
  try:
  process = subprocess.Popen(cmd, stdout=subprocess.PIPE, 
 stderr=subprocess.PIPE, shell=True)
  except OSError:
 -fail(EC_TOOL_FAIL)
 +fail_usage(Amttool not found or not accessible)
  
  process.wait()
  
  output = process.communicate()
 -
  process.stdout.close()
  
  match = re.search('Powerstate:[\\s]*(..)', str(output))
 @@ -51,19 +50,44 @@ def set_power_status(_, options):
  process = subprocess.Popen(cmd, stdout=null, stderr=null, shell=True)
  except OSError:
  null.close()
 -fail(EC_TOOL_FAIL)
 +fail_usage(Amttool not found or not accessible)
  
  process.wait()
  null.close()
  
  return
  
 +def reboot_cycle(_, options):
 +cmd = create_command(options, cycle)
 +
 +if options[log] = LOG_MODE_VERBOSE:
 +options[debug_fh].write(executing:  + cmd + \n)
 +
 +null = open('/dev/null', 'w')
 +try:
 +process = subprocess.Popen(cmd, stdout=null, stderr=null, shell=True)
 +except OSError:
 +null.close()
 +fail_usage(Amttool not found or not accessible)
 +
 +status = process.wait()
 +null.close()
 +
 +return not bool(status)
 +
 +def is_executable(path):
 +if os.path.exists(path):
 +stats = os.stat(path)
 +if stat.S_ISREG(stats.st_mode) and os.access(path, os.X_OK):
 +return True
 +return False
 +
  def create_command(options, action):
  
  # --password / -p
  cmd = AMT_PASSWORD= + quote(options[--password])
  
 -cmd +=   + options[amttool_path]
 +cmd +=   + options[--amttool-path]
  
  # --ip / -a
  cmd +=   + options[--ip]
 @@ -77,7 +101,10 @@ def create_command(options, action):
  elif action == off:
  cmd = echo \y\| + cmd
  cmd +=  powerdown
 -if action in [on, off] and options.has_key(--boot-options):
 +elif action == cycle:
 +cmd = echo \y\| + cmd
 +cmd +=  powercycle
 +if action in [on, off, cycle] and 
 options.has_key(--boot-options):
  cmd += options[--boot-options]
  
  # --use-sudo / -d
 @@ -86,25 +113,40 @@ def create_command(options, action):
  
  return cmd
  
 -def main():
 -
 -atexit.register(atexit_handler)
 -
 -device_opt = [ ipaddr, no_login, passwd, boot_option, no_port, 
 sudo]
 -
 +def define_new_opts():
  all_opt[boot_option] = {
  getopt : b:,
  longopt : boot-option,
 -help:-b, --boot-option=[option] Change the default boot behavior 
 of the machine. (pxe|hd|hdsafe|cd|diag),
 +help:-b, --boot-option=[option] Change the default boot 
 behavior of the machine. (pxe|hd|hdsafe|cd|diag),
  required : 0,
  shortdesc : Change the default boot behavior of the machine.,
  choices : [pxe, hd, hdsafe, cd, diag],
order : 1
  }
 +all_opt[amttool_path] = {
 +getopt : i:,
 +longopt : amttool-path,
 +help : --amttool-path=[path]  Path to amttool binary,
 +required : 0,
 +shortdesc : Path to amttool binary,
 +default : /usr/bin/amttool,

similar here. Hardcoding paths is bad.

Fabio

Re: [Cluster-devel] [PATCH 2/3] fence_ipmilan: option --method and new option --ipmitool-path

2013-11-29 Thread Fabio M. Di Nitto

On 11/29/2013 05:32 PM, Ondrej Mular wrote:

 @@ -147,28 +160,44 @@ def define_new_opts():
  all_opt[privlvl] = {
  getopt : P:,
  longopt : privlvl,
 -help : -P, --privlvl=[level]Privilege level on IPMI device 
 (callback|user|operator|administrator),
 +help : -P, --privlvl=[level]  Privilege level on IPMI 
 device (callback|user|operator|administrator),

All the reformatting and cosmetic changes should be in a separate commit.

Also, this patch assumes that the first patch you posted is applied to
the tree. It's not. Sending incremental patches over patches makes it
difficult to rebuild the final binary and test it (yes I have ipmi
devices at home :))

Fabio

Re: [Cluster-devel] [PATCH 1/2] fence_ipmilan: port fencing agent to fencing library

2013-11-22 Thread Fabio M. Di Nitto

On 11/22/2013 5:18 PM, Jan Pokorný wrote:
 On 21/11/13 16:48 +0100, Fabio M. Di Nitto wrote:
 On 11/21/2013 4:16 PM, Ondrej Mular wrote:
 +PATHS = [/usr/local/bull/NSMasterHW/bin/ipmitool,
 +/usr/bin/ipmitool,
 +/usr/sbin/ipmitool,
 +/bin/ipmitool,
 +/sbin/ipmitool,
 +/usr/local/bin/ipmitool,
 +/usr/local/sbin/ipmitool]

 this hard-cording it bad.

 Always use OS define PATH and if really necessary allow user to override
 with an option (for example: --pathtoipmitool=/usr/local)
 
 see, e.g.,
 http://git.engineering.redhat.com/users/jpokorny/clufter/tree/utils.py?id=d37db7470f4e44598af0b91d02221182178677ff#n22
 that mimics which standard utility
 
 Hope this helps
 

I´d like to understand why we need a search path in the first place tho
and we can´t just rely on shell hitting the right tool :)

Fabio

Re: [Cluster-devel] [PATCH 1/2] fence_ipmilan: port fencing agent to fencing library

2013-11-21 Thread Fabio M. Di Nitto

Hi Ondrej,

On 11/21/2013 4:16 PM, Ondrej Mular wrote:
 This is port of fence_ipmilan to fencing library. Also added fail message to 
 fencing library if tool (e.g. impitool, amttool...) is not accessible.
 
 ---
  fence/agents/ipmilan/fence_ipmilan.py | 184 
 ++
  fence/agents/lib/fencing.py.py|   4 +-
  2 files changed, 187 insertions(+), 1 deletion(-)
  create mode 100644 fence/agents/ipmilan/fence_ipmilan.py
 
 diff --git a/fence/agents/ipmilan/fence_ipmilan.py 
 b/fence/agents/ipmilan/fence_ipmilan.py
 new file mode 100644
 index 000..5c32690
 --- /dev/null
 +++ b/fence/agents/ipmilan/fence_ipmilan.py
 @@ -0,0 +1,184 @@
 +#!/usr/bin/python
 +
 +import sys, shlex, stat, subprocess, re, os
 +from pipes import quote
 +sys.path.append(@FENCEAGENTSLIBDIR@)
 +from fencing import *
 +
 +#BEGIN_VERSION_GENERATION
 +RELEASE_VERSION=
 +REDHAT_COPYRIGHT=
 +BUILD_DATE=
 +#END_VERSION_GENERATION
 +
 +PATHS = [/usr/local/bull/NSMasterHW/bin/ipmitool,
 +/usr/bin/ipmitool,
 +/usr/sbin/ipmitool,
 +/bin/ipmitool,
 +/sbin/ipmitool,
 +/usr/local/bin/ipmitool,
 +/usr/local/sbin/ipmitool]

this hard-cording it bad.

Always use OS define PATH and if really necessary allow user to override
with an option (for example: --pathtoipmitool=/usr/local)

Fabio

 +
 +def get_power_status(_, options):
 +
 +cmd = create_command(options, status)
 +
 +if options[log] = LOG_MODE_VERBOSE:
 +options[debug_fh].write(executing:  + cmd + \n)
 +
 +try:
 +process = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, 
 stderr=subprocess.PIPE)
 +except OSError, ex:
 +print ex
 +fail(EC_TOOL_FAIL)
 +
 +process.wait()
 +
 +out = process.communicate()
 +process.stdout.close()
 +
 +match = re.search('[Cc]hassis [Pp]ower is [\\s]*([a-zA-Z]{2,3})', 
 str(out))
 +status = match.group(1) if match else None
 +
 +return status
 +
 +def set_power_status(_, options):
 +
 +cmd = create_command(options, options[--action])
 +
 +if options[log] = LOG_MODE_VERBOSE:
 +options[debug_fh].write(executing:  + cmd + \n)
 +
 +null = open('/dev/null', 'w')
 +try:
 +process = subprocess.Popen(shlex.split(cmd), stdout=null, 
 stderr=null)
 +except OSError:
 +null.close()
 +fail(EC_TOOL_FAIL)
 +
 +process.wait()
 +null.close()
 +
 +return
 +
 +def is_executable(path):
 +if os.path.exists(path):
 +stats = os.stat(path)
 +if stat.S_ISREG(stats.st_mode) and os.access(path, os.X_OK):
 +return True
 +return False
 +
 +def get_ipmitool_path():
 +for path in PATHS:
 +if is_executable(path):
 +return path
 +return None
 +
 +def create_command(options, action):
 +cmd = options[ipmitool_path]
 +
 +# --lanplus / -L
 +if options.has_key(--lanplus):
 +cmd +=  -I lanplus
 +else:
 +cmd +=  -I lan
 +# --ip / -a
 +cmd +=  -H  + options[--ip]
 +
 +# --username / -l
 +if options.has_key(--username) and len(options[--username]) != 0:
 +cmd +=  -U  + quote(options[--username])
 +
 +# --auth / -A
 +if options.has_key(--auth):
 +cmd +=  -A  + options[--auth]
 +
 +# --password / -p
 +if options.has_key(--password):
 +cmd +=  -P  + quote(options[--password])
 +
 +# --cipher / -C
 +cmd +=  -C  + options[--cipher]
 +
 +# --port / -n
 +if options.has_key(--ipport):
 +cmd +=  -p  + options[--ipport]
 +
 +if options.has_key(--privlvl):
 +cmd +=  -L  + options[--privlvl]
 +
 +# --action / -o
 +cmd +=  chassis power  + action
 +
 + # --use-sudo / -d
 +if options.has_key(--use-sudo):
 +cmd = SUDO_PATH +   + cmd
 +
 +return cmd
 +
 +def define_new_opts():
 +all_opt[lanplus] = {
 +getopt : L,
 +longopt : lanplus,
 +help : -L, --lanplusUse Lanplus to improve security of 
 connection,
 +required : 0,
 +shortdesc : Use Lanplus to improve security of connection,
 +order: 1
 +}
 +all_opt[auth] = {
 +getopt : A:,
 +longopt : auth,
 +help : -A, --auth=[auth]IPMI Lan Auth type 
 (md5|password|none),
 +required : 0,
 +shortdesc : IPMI Lan Auth type.,
 +default : none,
 +choices : [md5, password, none],
 +order: 1
 +}
 +all_opt[cipher] = {
 +getopt : C:,
 +longopt : cipher,
 +help : -C, --cipher=[cipher]Ciphersuite to use (same as 
 ipmitool -C parameter),
 +required : 0,
 +shortdesc : Ciphersuite to use (same as ipmitool -C parameter),
 +default : 0,
 +order: 1
 +}
 +all_opt[privlvl] = {
 +getopt : P:,
 +longopt : privlvl,
 +help : -P, --privlvl=[level]Privilege level on IPMI device 
 (callback|user|operator|administrator),
 +required :

Re: [Cluster-devel] fence-agents: master - fence_ipmilan: Better description of lanplus parameter

2013-07-18 Thread Fabio M. Di Nitto

Hi Marek,

On 7/18/2013 12:04 PM, Marek Grác wrote:
 Gitweb:
 http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=7117a54a55aafb9f6ea97fe7b3a7b56355f609e4
 Commit:7117a54a55aafb9f6ea97fe7b3a7b56355f609e4
 Parent:c61430f65c843c4e4b7b3487f378d306efe1d52a
 Author:Marek 'marx' Grac mg...@redhat.com
 AuthorDate:Thu Jul 18 12:04:01 2013 +0200
 Committer: Marek 'marx' Grac mg...@redhat.com
 CommitterDate: Thu Jul 18 12:04:01 2013 +0200
 
 fence_ipmilan: Better description of lanplus parameter
 
 resolves: rhbz#981086
 ---
  fence/agents/ipmilan/ipmilan.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fence/agents/ipmilan/ipmilan.c b/fence/agents/ipmilan/ipmilan.c
 index 4d286ea..3561456 100644
 --- a/fence/agents/ipmilan/ipmilan.c
 +++ b/fence/agents/ipmilan/ipmilan.c
 @@ -167,7 +167,7 @@ struct xml_parameter_s xml_parameters[]={
{ipaddr,-a,1,string,NULL,IPMI Lan IP to talk to},
{passwd,-p,0,string,NULL,Password (if required) to control power on 
 IPMI device},
{passwd_script,-S,0,string,NULL,Script to retrieve password (if 
 required)},
 -  {lanplus,-P,0,boolean,NULL,Use Lanplus},
 +  {lanplus,-P,0,boolean,NULL,Use Lanplus to improve security of 
 connection},

Can you be just a bit more descriptive and explain what improve
security means?

thanks
Fabio

Re: [Cluster-devel] [PATCH] fsck.gfs2: Don't rely on cluster.conf when rebuilding sb

2013-07-17 Thread Fabio M. Di Nitto

You also want to get rid of this code in RHEL6 btw. It's just broken in
many different ways.

Fabio

On 07/17/2013 01:51 PM, Andrew Price wrote:
 As cluster.conf no longer exists we can't sniff the locking options from
 it when rebuilding the superblock and in any case we shouldn't assume
 that fsck.gfs2 is running on the cluster the volume belongs to.
 
 This patch removes the get_lockproto_table function and instead sets the
 lock table name to a placeholder (unknown) and sets lockproto to
 lock_dlm.  It warns the user at the end of the run that the locktable
 will need to be set before mounting.
 
 Signed-off-by: Andrew Price anpr...@redhat.com
 ---
  gfs2/fsck/initialize.c | 57 
 --
  gfs2/fsck/main.c   |  4 
  2 files changed, 8 insertions(+), 53 deletions(-)
 
 diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
 index b01b240..869d2de 100644
 --- a/gfs2/fsck/initialize.c
 +++ b/gfs2/fsck/initialize.c
 @@ -33,6 +33,7 @@ static int was_mounted_ro = 0;
  static uint64_t possible_root = HIGHEST_BLOCK;
  static struct master_dir fix_md;
  static unsigned long long blks_2free = 0;
 +extern int sb_fixed;
  
  /**
   * block_mounters
 @@ -828,58 +829,6 @@ static int init_system_inodes(struct gfs2_sbd *sdp)
   return -1;
  }
  
 -static int get_lockproto_table(struct gfs2_sbd *sdp)
 -{
 - FILE *fp;
 - char line[PATH_MAX];
 - char *cluname, *end;
 - const char *fsname, *cfgfile = /etc/cluster/cluster.conf;
 -
 - memset(sdp-lockproto, 0, sizeof(sdp-lockproto));
 - memset(sdp-locktable, 0, sizeof(sdp-locktable));
 - fp = fopen(cfgfile, rt);
 - if (!fp) {
 - /* no cluster.conf; must be a stand-alone file system */
 - strcpy(sdp-lockproto, lock_nolock);
 - log_warn(_(Lock protocol determined to be: lock_nolock\n));
 - log_warn(_(Stand-alone file system: No need for a lock 
 -table.\n));
 - return 0;
 - }
 - /* We found a cluster.conf so assume it's a clustered file system */
 - log_warn(_(Lock protocol assumed to be:  GFS2_DEFAULT_LOCKPROTO
 -\n));
 - strcpy(sdp-lockproto, GFS2_DEFAULT_LOCKPROTO);
 -
 - while (fgets(line, sizeof(line) - 1, fp)) {
 - cluname = strstr(line,cluster name=);
 - if (cluname) {
 - cluname += 15;
 - end = strchr(cluname,'');
 - if (end)
 - *end = '\0';
 - break;
 - }
 - }
 - if (cluname == NULL || end == NULL || end - cluname  1) {
 - log_err(_(Error: Unable to determine cluster name from %s\n),
 -   cfgfile);
 - } else {
 - fsname = strrchr(opts.device, '/');
 - if (fsname)
 - fsname++;
 - else
 - fsname = repaired;
 - snprintf(sdp-locktable, sizeof(sdp-locktable), %.*s:%.16s,
 -  (int)(sizeof(sdp-locktable) - strlen(fsname) - 2),
 -  cluname, fsname);
 - log_warn(_(Lock table determined to be: %s\n),
 -  sdp-locktable);
 - }
 - fclose(fp);
 - return 0;
 -}
 -
  /**
   * is_journal_copy - Is this a real dinode or a copy inside a journal?
   * A real dinode will be located at the block number in its no_addr.
 @@ -1256,7 +1205,8 @@ static int sb_repair(struct gfs2_sbd *sdp)
   }
   }
   /* Step 3 - Rebuild the lock protocol and file system table name */
 - get_lockproto_table(sdp);
 + strcpy(sdp-lockproto, GFS2_DEFAULT_LOCKPROTO);
 + strcpy(sdp-locktable, unknown);
   if (query(_(Okay to fix the GFS2 superblock? (y/n {
   log_info(_(Found system master directory at: 0x%llx\n),
sdp-sd_sb.sb_master_dir.no_addr);
 @@ -1280,6 +1230,7 @@ static int sb_repair(struct gfs2_sbd *sdp)
   build_sb(sdp, uuid);
   inode_put(sdp-md.rooti);
   inode_put(sdp-master_dir);
 + sb_fixed = 1;
   } else {
   log_crit(_(GFS2 superblock not fixed; fsck cannot proceed 
  without a valid superblock.\n));
 diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
 index 9c3b06d..f9e7166 100644
 --- a/gfs2/fsck/main.c
 +++ b/gfs2/fsck/main.c
 @@ -36,6 +36,7 @@ struct osi_root dirtree = (struct osi_root) { NULL, };
  struct osi_root inodetree = (struct osi_root) { NULL, };
  int dups_found = 0, dups_found_first = 0;
  struct gfs_sb *sbd1 = NULL;
 +int sb_fixed = 0;
  
  /* This function is for libgfs2's sake.  
 */
  void print_it(const char *label, const char *fmt, const char *fmt2, ...)
 @@ -315,6 +316,9 @@ int main(int argc, char **argv)
   log_notice( _(Writing changes to disk\n));

Re: [Cluster-devel] [gfs2-utils PATCH 1/7] fsck.gfs2: Fix reference to uninitialized variable

2013-07-16 Thread Fabio M. Di Nitto

On 07/16/2013 02:56 PM, Bob Peterson wrote:
 This patch initializes a variable so that it no longer references
 it uninitialized.
 
 rhbz#984085
 ---
  gfs2/fsck/initialize.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
 index b01b240..936fd5e 100644
 --- a/gfs2/fsck/initialize.c
 +++ b/gfs2/fsck/initialize.c
 @@ -832,7 +832,7 @@ static int get_lockproto_table(struct gfs2_sbd *sdp)
  {
   FILE *fp;
   char line[PATH_MAX];
 - char *cluname, *end;
 + char *cluname, *end = NULL;
   const char *fsname, *cfgfile = /etc/cluster/cluster.conf;

Just spotted this reference to cluster.conf ^^ remember it doesn't exist
anymore in the new era.

Fabio

Re: [Cluster-devel] qdisk - memcpy incorrect(?)

2013-05-16 Thread Fabio M. Di Nitto

This is already fixed in more recent releases. See commit:

8edb0d0eb31d94b8a3ba81f6d5b4c398accc950d

your patch also misses another incorrect in diskRawWrite.

Fabio

On 05/16/2013 08:00 PM, Neale Ferguson wrote:
 Hi,
  In diskRawRead in disk.c there is the following code:
 
readret = posix_memalign((void **)alignedBuf, disk-d_pagesz,
 disk-d_blksz);
 if (readret  0) {
 return -1;
 }
 
 io_state(STATE_READ);
 readret = read(disk-d_fd, alignedBuf, readlen);
 io_state(STATE_NONE);
 if (readret  0) {
 if (readret  len) {
 memcpy(alignedBuf, buf, len);
 readret = len;
 } else {
 memcpy(alignedBuf, buf, readret);
 }
 }
 
 free(alignedBuf);
 
 The memcpy() above have the src/dst operands swapped. We read into
 alignedBuf and are supposed to copy to buf. I’m not sure why qdiskd
 works sometimes and not others.  
 
 --- cluster-3.0.12.1/cman/qdisk/disk.c2013/05/16 16:45:491.1
 +++ cluster-3.0.12.1/cman/qdisk/disk.c2013/05/16 16:46:29
 @@ -430,14 +430,14 @@
  io_state(STATE_READ);
  readret = read(disk-d_fd, alignedBuf, readlen);
  io_state(STATE_NONE);
  if (readret  0) {
  if (readret  len) {
 -memcpy(alignedBuf, buf, len);
 +memcpy(buf, alignedBuf, len);
  readret = len;
  } else {
 -memcpy(alignedBuf, buf, readret);
 +memcpy(buf, alignedBuf, readret);
  }
  }
  
  free(alignedBuf);
  if (readret != len) {
 
 Neale

Re: [Cluster-devel] Heads-up: retiring gfs_controld

2013-02-16 Thread Fabio M. Di Nitto

On 02/15/2013 02:18 PM, Andrew Price wrote:
 Hi,
 
 Now that Fedora 16 has EOL'd we have little reason to keep gfs_controld
 and gfs_control in gfs2-utils. They're currently disabled by default but
 can be enabled with configure option --enable-gfs_controld which adds
 additional dependencies on corosynclib, clusterlib (discontinued) and
 openaislib (discontinued).
 
 My intention is to remove gfs_control* from gfs2-utils.git before the
 next release unless there are any good reasons to keep them around.
 
 Andy
 

Just make sure it is clear from which exact kernel version it is
possible to operate without gfs_control*. so that maintainers will not
try to backport to linux 1.0.

Fabio

Re: [Cluster-devel] [PATCH] config/tools/xml: validate resulting cluster.rng with relaxng.rng

2013-02-07 Thread Fabio M. Di Nitto

Hi Jan,

On 2/6/2013 9:47 PM, Jan Pokorný wrote:
 Doing so will guarantee the file is valid RELAX NG schema, not just
 a valid XML.
 
 Validating schema, relaxng.rng, was obtained directly from [1] and
 matches directly to a version bundled with xmlcopyeditor in Fedora 17.
 The same (modulo VCS headers, comments and spacing details) can be
 obtained by combining schema as in the specification [2] and its
 errata [3].
 
 [1] http://relaxng.org/relaxng.rng
 [2] http://relaxng.org/spec-20011203.html
 [3] http://relaxng.org/spec-20011203-errata.html

this looks like a good idea, but i have one question.

Is there a specific reason why we need to ship/embed the file with our
tarball? How bad is it to require the one installed on a system?

I can see it´s rather stable and hardly updated, but i prefer to avoid
duplication if we can.

Fabio

 
 Signed-off-by: Jan Pokorný jpoko...@redhat.com
 ---
  config/tools/xml/Makefile |   2 +-
  config/tools/xml/ccs_update_schema.in |   3 +-
  config/tools/xml/relaxng.rng  | 335 
 ++
  3 files changed, 338 insertions(+), 2 deletions(-)
  create mode 100644 config/tools/xml/relaxng.rng
 
 diff --git a/config/tools/xml/Makefile b/config/tools/xml/Makefile
 index 3c9e97c..a86eb01 100644
 --- a/config/tools/xml/Makefile
 +++ b/config/tools/xml/Makefile
 @@ -7,7 +7,7 @@ TARGET4 = cluster.rng
  
  SBINDIRT = $(TARGET1) $(TARGET2) $(TARGET3)
  SHAREDIRSYMT = $(TARGET4)
 -RELAXNGDIRT = cluster.rng.in.head cluster.rng.in.tail
 +RELAXNGDIRT = cluster.rng.in.head cluster.rng.in.tail relaxng.rng
  
  all: $(TARGET1) $(TARGET2) $(TARGET3) $(TARGET4)
  
 diff --git a/config/tools/xml/ccs_update_schema.in 
 b/config/tools/xml/ccs_update_schema.in
 index a5aa351..16ce9f7 100644
 --- a/config/tools/xml/ccs_update_schema.in
 +++ b/config/tools/xml/ccs_update_schema.in
 @@ -316,7 +316,8 @@ build_schema() {
   return 1
   }
  
 - xmllint --noout $outputdir/cluster.rng || {
 + xmllint --noout --relaxng $rngdir/relaxng.rng $outputdir/cluster.rng \
 +   || {
   echo generated schema does not pass xmllint validation 2
   return 1
   }

Re: [Cluster-devel] [PATCH] cman: Prevent libcman from causing SIGPIPE

2012-12-17 Thread Fabio M. Di Nitto

ACK

On 12/17/2012 10:23 AM, Christine Caulfield wrote:
 If corosync goes down/is shut down cman will return 0 from cman_dispatch
 and close the socket. However, if a cman write operation is issued
 before this happens then SIGPIPE can result from the writev() call to an
 open, but disconnected, FD.
 
 This patch changes writev() to sendmg() so it can pass MSG_NOSIGNAL to
 the system call and prevent SIGPIPEs from occurring.
 
 Signed-Off-By: Christine Caulfield ccaul...@redhat.com

Re: [Cluster-devel] Fence agents - supported fence devices in next major release

2012-11-25 Thread Fabio M. Di Nitto

On 11/25/2012 02:55 PM, Marek Grac wrote:
 Hi,
 
 In next major version of fence agents we would like to include only
 those fence agents which are used and can be tested. We have access to
 various fence devices but there is still need for more and we would like
 to include you and your hardware in testing process. Testing process
 will follow of creating simple configuration file for your device
 (almost copypaste from cluster.conf) and running simple script ( 5
 minutes). We believe that with yours help we will be able to test more
 devices and make upstream code even better.
 
 We are looking for these fence devices (and their owners) used by
 following fence agents:
 * fence_baytech
 * fence_bullpap
 * fence_vixel
 * fence_zvm
 * fence_cpint
 * fence_rackswitch
 * fence_brocade
 * fence_mcdata
 
 Thanks for you help. If you would like to help with testing please send
 me a mail directly.

We can probably drop fence_na too from this list. Hardware has not made
production and I don't think it will happen in any short time.

Digimer?

Fabio

Re: [Cluster-devel] bug reports

2012-10-24 Thread Fabio M. Di Nitto

On 10/24/2012 12:38 PM, Heiko Nardmann wrote:
 Hi together!
 
 Since all (or almost all?) GFS2 developers (as far as I can tell) are
 employed by RedHat I wonder whether it makes sense to additionally post
 bug reports to this mailing list beside reporting them to the RH support?

No, please report bugs via RH support. This list is for development only.

Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Fix return code when a service would deadlock

2012-10-13 Thread Fabio M. Di Nitto

ACK

On 10/13/2012 03:18 AM, Ryan McCabe wrote:
 When we detect that starting a service would cause a deadlock, return 0
 instead of -1. This fixes a crash that occurred when -1 was returned.
 
 Resolves: rhbz#861157
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/daemons/rg_thread.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/rgmanager/src/daemons/rg_thread.c 
 b/rgmanager/src/daemons/rg_thread.c
 index 5e551c3..b888717 100644
 --- a/rgmanager/src/daemons/rg_thread.c
 +++ b/rgmanager/src/daemons/rg_thread.c
 @@ -756,7 +756,7 @@ rt_enqueue_request(const char *resgroupname, int request,
   logt_print(LOG_DEBUG,
   Failed to queue %d request for %s: Would block\n,
   request, resgroupname);
 - return -1;
 + return 0;
   }
  
   ret = rq_queue_request(resgroup-rt_queue, resgroup-rt_name,

Re: [Cluster-devel] [PATCH 2/2] checkquorum.wdmd: add integration script with wdmd

2012-10-10 Thread Fabio M. Di Nitto

On 10/10/2012 6:33 AM, Dietmar Maurer wrote:
 Will you add some documentaion how to use those scripts?

Yes our documentation overlord is preparing an upstream wiki page for
it. It will be ready before a release.

 
 Seems those scripts does not check if the node is joined to the fence domain?
 

It doesn´t really need to.

I´ll put this in the easiest way as possible:

- real fencing == murder
  there can only be one killer in the cluster at a time
  fence domain coordinates who can/should be killed by who

- checkquorum.wdmd == suicide
  there are N nodes in the cluster that can decide to commit suicide
  without really caring about what others are doing.
  this can run without any fencing configuration at all.

Anyway examples and all, setups, limitations.. all in the doc as soon as
it´s ready. Be a bit patience :)

Fabio

Re: [Cluster-devel] [PATCH 2/2] checkquorum.wdmd: add integration script with wdmd

2012-10-10 Thread Fabio M. Di Nitto

On 10/10/2012 10:06 AM, Dietmar Maurer wrote:
 On 10/10/2012 6:26 AM, Dietmar Maurer wrote:
 +# rpm based distros
 +[ -d /etc/sysconfig ]  \
 +  [ -f /etc/sysconfig/checkquorum ]  \
 +  . /etc/sysconfig/checkquorum
 +
 +# deb based distros
 +[ ! -d /etc/sysconfig ]  \
 +  [ -f /etc/default/checkquorum ]  \
 +  . /etc/default/checkquorum
 +

 FYI: Some RAID tool vendors delivers utilities for debian which creates
 directory '/etc/sysconfig'
 on debian boxes, so that test is not reliable.



 This might be a controversial argument.
 
 I just though there are better tests to see if you run on debian, for example:
 
 [ -f /etc/debian_version  -d /etc/default ]
 

that doesn´t scale well for debian derivates that don´t ship
debian_version :) (see ubuntu  co..)

You can´t even use something like which dpkg since the tool is
available on rpm based distributions... or viceversa.. there is rpm for
Debian  derivates.

hardcoding all distributions is not optimal either, as they might change
policy by version

Fabio

Re: [Cluster-devel] [PATCH 2/2] checkquorum.wdmd: add integration script with wdmd

2012-10-10 Thread Fabio M. Di Nitto

On 10/10/2012 1:04 PM, Heiko Nardmann wrote:
 Am 10.10.2012 10:11, schrieb Fabio M. Di Nitto:
 [snip]
 that doesn´t scale well for debian derivates that don´t ship
 debian_version :) (see ubuntu  co..)

 You can´t even use something like which dpkg since the tool is
 available on rpm based distributions... or viceversa.. there is rpm for
 Debian  derivates.

 hardcoding all distributions is not optimal either, as they might change
 policy by version

 Fabio

 
 What about 'lsb_release'? Is that executable available on all platforms?


Not installed by default, it´s generally shipped with $distro-lsb
metapackage that pulls in half gazillions dependencies.

I doubt it would solve anything since you still need to parse the
output. It´s really no different than hardcoding /etc/$distro_release,
actually with a few GB of extra packages ;)

Fabio

[Cluster-devel] [PATCH 1/2] cman init: make sure we start after fence_sanlockd and warn users

2012-10-09 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#509056

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index a88f52f..849739b 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -8,8 +8,8 @@
 #
 ### BEGIN INIT INFO
 # Provides:cman
-# Required-Start:  $network $time
-# Required-Stop:   $network $time
+# Required-Start:  $network $time fence_sanlockd
+# Required-Stop:   $network $time fence_sanlockd
 # Default-Start:
 # Default-Stop:
 # Short-Description:   Starts and stops cman
@@ -740,6 +740,13 @@ stop_cmannotifyd()
stop_daemon cmannotifyd
 }
 
+fence_sanlock_check()
+{
+   service fence_sanlockd status  /dev/null 21 
+   echofence_sanlockd detected. Unfencing might take several 
minutes!
+   return 0
+}
+
 unfence_self()
 {
# fence_node returns 0 on success, 1 on failure, 2 if unconfigured
@@ -881,6 +888,8 @@ start()
 
[ $breakpoint = daemons ]  exit 0
 
+   fence_sanlock_check
+
runwrap unfence_self \
none \
Unfencing self
-- 
1.7.7.6

[Cluster-devel] [PATCH 2/2] checkquorum.wdmd: add integration script with wdmd

2012-10-09 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

requires wdmd = 2.6

Resolves: rhbz#509056

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/scripts/Makefile |2 +-
 cman/scripts/checkquorum.wdmd |  104 +
 2 files changed, 105 insertions(+), 1 deletions(-)
 create mode 100644 cman/scripts/checkquorum.wdmd

diff --git a/cman/scripts/Makefile b/cman/scripts/Makefile
index b4866c8..7950311 100644
--- a/cman/scripts/Makefile
+++ b/cman/scripts/Makefile
@@ -1,4 +1,4 @@
-SHAREDIRTEX=checkquorum
+SHAREDIRTEX=checkquorum checkquorum.wdmd
 
 include ../../make/defines.mk
 include $(OBJDIR)/make/clean.mk
diff --git a/cman/scripts/checkquorum.wdmd b/cman/scripts/checkquorum.wdmd
new file mode 100644
index 000..1d81ff6
--- /dev/null
+++ b/cman/scripts/checkquorum.wdmd
@@ -0,0 +1,104 @@
+#!/bin/bash
+# Quorum detection watchdog script
+#
+# This script will return -2 if the node had quorum at one point
+# and then subsequently lost it
+#
+# Copyright 2012 Red Hat, Inc.
+
+# defaults
+
+# Amount of time in seconds to wait after quorum is lost to fail script
+waittime=60
+
+# action to take if quorum is missing for over  waittime
+# autodetect|hardreboot|crashdump|watchdog
+action=autodetect
+
+# Location of temporary file to capture timeouts
+timerfile=/var/run/cluster/checkquorum-timer
+
+# rpm based distros
+[ -d /etc/sysconfig ]  \
+   [ -f /etc/sysconfig/checkquorum ]  \
+   . /etc/sysconfig/checkquorum
+
+# deb based distros
+[ ! -d /etc/sysconfig ]  \
+   [ -f /etc/default/checkquorum ]  \
+   . /etc/default/checkquorum
+
+has_quorum() {
+   corosync-quorumtool -s 2/dev/null | \
+   grep ^Quorate: | \
+   grep -q Yes$
+}
+
+had_quorum() {
+   output=$(corosync-objctl 2/dev/null | \
+   grep runtime.totem.pg.mrp.srp.operational_entered | cut -d = 
-f 2)
+   [ -n $output ]  {
+   [ $output -ge 1 ]  return 0
+   return 1
+   }
+}
+
+take_action() {
+   case $action in
+   watchdog)
+   [ -n $wdmd_action ]  return 1
+   ;;
+   hardreboot)
+   echo 1  /proc/sys/kernel/sysrq
+   echo b  /proc/sysrq-trigger
+   ;;
+   crashdump)
+   echo 1  /proc/sys/kernel/sysrq
+   echo c  /proc/sysrq-trigger
+   ;;
+   autodetect)
+   service kdump status  /dev/null 21
+   usekexec=$?
+   [ -n $wdmd_action ]  [ $usekexec != 0 ]  
return 1
+   echo 1  /proc/sys/kernel/sysrq
+   [ $usekexec = 0 ]  echo c  /proc/sysrq-trigger
+   echo b  /proc/sysrq-trigger
+   esac
+}
+
+# watchdog uses $1 = test or = repair
+# with no arguments we are called by wdmd
+[ -z $1 ]  wdmd_action=yes
+
+# we don't support watchdog repair action
+[ $1 = repair ]  exit 1
+
+service corosync status  /dev/null 21
+ret=$?
+
+case $ret in
+   3) # corosync is not running (clean)
+   rm -f $timerfile
+   exit 0
+   ;;
+   1) # corosync crashed or did exit abonormally (dirty - take action)
+   logger -t checkquorum.wdmd corosync crashed or exited 
abonarmally. Node will soon reboot
+   take_action
+   ;;
+   0) # corosync is running (clean)
+   # check quorum here
+   has_quorum  {
+   echo -e oldtime=$(date +%s)  $timerfile
+   exit 0
+   }
+   . $timerfile
+   newtime=$(date +%s) 
+   delta=$((newtime - oldtime))
+   logger -t checkquorum.wdmd Node has lost quorum. Node will 
soon reboot
+   had_quorum  [ $delta -gt $waittime ]  {
+   take_action
+   }
+   ;;
+esac
+
+exit $?
-- 
1.7.7.6

Re: [Cluster-devel] checkquorum script for self fencing

2012-10-02 Thread Fabio M. Di Nitto

On 10/02/2012 08:07 PM, Dietmar Maurer wrote:
 Hi Fabio,
 
 was there any progress on that topic?

As a matter of fact, yes, we are completing the first implementation and
writing down the docs and howto's.

I think the first cut will be available for testing within a week, maybe
two.

Fabio

 
 -Original Message-
 From: cluster-devel-boun...@redhat.com [mailto:cluster-devel-
 boun...@redhat.com] On Behalf Of Fabio M. Di Nitto
 Sent: Donnerstag, 22. Dezember 2011 06:57
 To: cluster-devel@redhat.com
 Subject: Re: [Cluster-devel] checkquorum script for self fencing

 On 12/21/2011 08:28 PM, Dietmar Maurer wrote:
 I recently detected that checkquorum script for self fencing.

 That seems to work reliable, but the remaining nodes (with quorum)
 does not get any fence acknowledge.

 I wonder if it would be possible to extend the checkquorum script so
 that it runs fence_ack_manual on the fence master after some safety
 timeout?

 Or do you think that is a bad idea?

 We are already working on a similar feature based on checkquorum, but I got
 injured on my hand and I had to delay a bit the write up for the feature (I 
 am
 incredibly slow writing with one hand, never mind the typos ;)).

 The way you suggest is dangerous, so no, don't take that route.

 The full feature proposal will come soon after this December holiday/xmas
 break.

 Fabio

[Cluster-devel] [PATCH] cman init: increase default shutdown timeouts

2012-09-14 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

in some conditions, specially triggered when shutting down all nodes
at the same time, corosync takes a lot longer than 10 seconds
to stabilize membership. That means that daemons will not quit fast
enough before cman init will declare a shutdown error.

Increase the default shutdown timeouts from 10 to 30 seconds.

Resolves: rhbz#854032

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index 1917abd..a88f52f 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -305,7 +305,7 @@ stop_daemon()
shift
retryforsec=$1
 
-   [ -z $retryforsec ]  retryforsec=1
+   [ -z $retryforsec ]  retryforsec=30
retries=0
 
if check_sleep; then
@@ -661,7 +661,7 @@ start_qdiskd()
 
 stop_qdiskd()
 {
-   stop_daemon qdiskd 5
+   stop_daemon qdiskd
 }
 
 start_groupd()
@@ -770,7 +770,7 @@ join_fence_domain()
 leave_fence_domain()
 {
if status fenced  /dev/null 21; then
-   errmsg=$( fence_tool leave -w 10 21 )
+   errmsg=$( fence_tool leave -w 30 21 )
return $?
fi
 }
-- 
1.7.7.6

Re: [Cluster-devel] cluster: RHEL6 - fsck.gfs2: Fix buffer overflow in get_lockproto_table

2012-08-17 Thread Fabio M. Di Nitto

On 8/17/2012 11:57 AM, Andrew Price wrote:
On 17/08/12 05:02, Fabio M. Di Nitto wrote:
On 08/16/2012 11:01 PM, Andrew Price wrote:
Gitweb:
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=f796ee8752712e9e523e1516bb9165b274552753

Commit:f796ee8752712e9e523e1516bb9165b274552753
Parent:638deec0ccbf45862eee97294f09ba9d6b3f56d0
Author:Andrew Price anpr...@redhat.com
AuthorDate:Sat Jul 7 22:03:24 2012 +0100
Committer: Andrew Price anpr...@redhat.com
CommitterDate: Thu Aug 16 21:54:56 2012 +0100

fsck.gfs2: Fix buffer overflow in get_lockproto_table

Coverity discovered a buffer overflow in this function where an overly
long cluster name in cluster.conf could cause a crash while repairing
the superblock. This patch fixes the bug by making sure the lock table
is composed sensibly, limiting the fsname to 16 chars as documented, and
only allowing the cluster name (which doesn't seem to have a documented
max size) to use the remaining space in the locktable name string.

cluster name is max 16 bytes too (including \0). It's actually verified
by cman at startup so it can't be longer than that.

OK, thanks for clearing that up. There are other places in gfs2-utils
which we can tighten up now that we know that the cluster name has a
solid limit so I'm going to push this patch (which fixes the overflow
bug) and we'll address the limit issues separately.

BTW, now that cman has disappeared upstream is anything checking the
length of the cluster name now?

I am not sure. I don´t think corosync enforces any limit, but best to
check with Jan.

Fabio

Re: [Cluster-devel] [PATCH 0/3] minor edits of cluster.rng (fixed head part)

2012-08-16 Thread Fabio M. Di Nitto

ACK all 3 of them.

please push them to STABLE32 branch.

Fabio

On 08/16/2012 09:52 PM, Jan Pokorný wrote:
 
 Jan Pokorný (3):
   cluster.rng: fix trailing whitespaces in head
   cluster.rng: fencedevice initial non-digit note to description
   cluster.rng: retab the head (use space uniformly)
 
  config/tools/xml/cluster.rng.in.head | 41 
 ++--
  1 file changed, 21 insertions(+), 20 deletions(-)

Re: [Cluster-devel] cluster: RHEL6 - fsck.gfs2: Fix buffer overflow in get_lockproto_table

2012-08-16 Thread Fabio M. Di Nitto

On 08/16/2012 11:01 PM, Andrew Price wrote:
 Gitweb:
 http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=f796ee8752712e9e523e1516bb9165b274552753
 Commit:f796ee8752712e9e523e1516bb9165b274552753
 Parent:638deec0ccbf45862eee97294f09ba9d6b3f56d0
 Author:Andrew Price anpr...@redhat.com
 AuthorDate:Sat Jul 7 22:03:24 2012 +0100
 Committer: Andrew Price anpr...@redhat.com
 CommitterDate: Thu Aug 16 21:54:56 2012 +0100
 
 fsck.gfs2: Fix buffer overflow in get_lockproto_table
 
 Coverity discovered a buffer overflow in this function where an overly
 long cluster name in cluster.conf could cause a crash while repairing
 the superblock. This patch fixes the bug by making sure the lock table
 is composed sensibly, limiting the fsname to 16 chars as documented, and
 only allowing the cluster name (which doesn't seem to have a documented
 max size) to use the remaining space in the locktable name string.

cluster name is max 16 bytes too (including \0). It's actually verified
by cman at startup so it can't be longer than that.

Fabio

[Cluster-devel] cluster 3.1.93 release (Release Candidate)

2012-08-14 Thread Fabio M. Di Nitto

Welcome to the cluster 3.1.93 (Release Candidate) release.

This release addresses a few major issues. Users of previous releases
are strongly encouraged to upgrade to this version.

This release also strictly requires corosync 1.4.4 to build and run.

Unless major issues will be reported, the next release will be marked
stable 3.2.0.

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/c/l/cluster/cluster-3.1.93.tar.xz

ChangeLog:

https://fedorahosted.org/releases/c/l/cluster/Changelog-3.1.93

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience with other sysadmins and users.

Thanks/congratulations to all people that contributed to this release!

Happy clustering,
Fabio

[Cluster-devel] [PATCH] qdiskd: backport dual socket connection to cman

2012-08-13 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Patch 76741bb2a94ae94e493c609d50f570d02e2f3029 had a not so obvious
dependency on 08ae3ce147b2771c5ee6e1d364a5e48c88384427.

Backport portion of 08ae3ce147b2771c5ee6e1d364a5e48c88384427 to handle
dual cman socket (admin and user) and use the correct socket (user)
for send/receive data.

Move cman_alive check and heartbeat (for dispatch) to ch_user.

Resolves: rhbz#782900

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/qdisk/disk.h  |5 ++-
 cman/qdisk/disk_util.c |7 +++--
 cman/qdisk/iostate.c   |8 +++---
 cman/qdisk/main.c  |   69 ++-
 4 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/cman/qdisk/disk.h b/cman/qdisk/disk.h
index d491de1..83167ea 100644
--- a/cman/qdisk/disk.h
+++ b/cman/qdisk/disk.h
@@ -270,7 +270,8 @@ typedef struct {
int qc_master;  /* Master?! */
int qc_status_sock;
run_flag_t qc_flags;
-   cman_handle_t qc_ch;
+   cman_handle_t qc_ch_admin;
+   cman_handle_t qc_ch_user;
char *qc_device;
char *qc_label;
char *qc_status_file;
@@ -299,7 +300,7 @@ typedef struct {
 int qd_write_status(qd_ctx *ctx, int nid, disk_node_state_t state,
disk_msg_t *msg, memb_mask_t mask, memb_mask_t master);
 int qd_read_print_status(target_info_t *disk, int nid);
-int qd_init(qd_ctx *ctx, cman_handle_t ch, int me);
+int qd_init(qd_ctx *ctx, cman_handle_t ch_admin, cman_handle_t ch_user, int 
me);
 void qd_destroy(qd_ctx *ctx);
 
 /* proc.c */
diff --git a/cman/qdisk/disk_util.c b/cman/qdisk/disk_util.c
index f5539c0..25f4013 100644
--- a/cman/qdisk/disk_util.c
+++ b/cman/qdisk/disk_util.c
@@ -312,16 +312,17 @@ generate_token(void)
   Initialize a quorum disk context, given a CMAN handle and a nodeid.
  */
 int
-qd_init(qd_ctx *ctx, cman_handle_t ch, int me)
+qd_init(qd_ctx *ctx, cman_handle_t ch_admin, cman_handle_t ch_user, int me)
 {
-   if (!ctx || !ch || !me) {
+   if (!ctx || !ch_admin || !ch_user || !me) {
errno = EINVAL;
return -1;
}   
 
memset(ctx, 0, sizeof(*ctx));
ctx-qc_incarnation = generate_token();
-   ctx-qc_ch = ch;
+   ctx-qc_ch_admin = ch_admin;
+   ctx-qc_ch_user = ch_user;
ctx-qc_my_id = me;
ctx-qc_status_sock = -1;
 
diff --git a/cman/qdisk/iostate.c b/cman/qdisk/iostate.c
index eb74ad2..ba7ad12 100644
--- a/cman/qdisk/iostate.c
+++ b/cman/qdisk/iostate.c
@@ -69,7 +69,7 @@ io_nanny_thread(void *arg)
iostate_t last_main_state = 0, current_main_state = 0;
int last_main_incarnation = 0, current_main_incarnation = 0;
int logged_incarnation = 0;
-   cman_handle_t ch = (cman_handle_t)arg;
+   cman_handle_t ch_user = (cman_handle_t)arg;
int32_t whine_state;
 
/* Start with wherever we're at now */
@@ -105,7 +105,7 @@ io_nanny_thread(void *arg)
/* Whine on CMAN api */
whine_state = (int32_t)current_main_state;
swab32(whine_state);
-   cman_send_data(ch, whine_state, sizeof(int32_t), 0, 
CLUSTER_PORT_QDISKD, 0);
+   cman_send_data(ch_user, whine_state, sizeof(int32_t), 0, 
CLUSTER_PORT_QDISKD, 0);
 
/* Don't log things twice */
if (logged_incarnation == current_main_incarnation)
@@ -125,7 +125,7 @@ io_nanny_thread(void *arg)
 
 
 int
-io_nanny_start(cman_handle_t ch, int timeout)
+io_nanny_start(cman_handle_t ch_user, int timeout)
 {
int ret;
 
@@ -135,7 +135,7 @@ io_nanny_start(cman_handle_t ch, int timeout)
qdisk_timeout = timeout;
thread_active = 1;
 
-   ret = pthread_create(io_nanny_tid, NULL, io_nanny_thread, ch);
+   ret = pthread_create(io_nanny_tid, NULL, io_nanny_thread, ch_user);
pthread_mutex_unlock(state_mutex);
 
return ret;
diff --git a/cman/qdisk/main.c b/cman/qdisk/main.c
index 90d00ab..72a3c07 100644
--- a/cman/qdisk/main.c
+++ b/cman/qdisk/main.c
@@ -287,7 +287,7 @@ check_transitions(qd_ctx *ctx, node_info_t *ni, int max, 
memb_mask_t mask)
if (ctx-qc_flags  RF_ALLOW_KILL) {
clulog(LOG_DEBUG, Telling CMAN to 
kill the node\n);
-   cman_kill_node(ctx-qc_ch,
+   cman_kill_node(ctx-qc_ch_admin,
ni[x].ni_status.ps_nodeid);
}
}
@@ -325,7 +325,7 @@ check_transitions(qd_ctx *ctx, node_info_t *ni, int max, 
memb_mask_t mask)
if (ctx-qc_flags  RF_ALLOW_KILL) {
clulog(LOG_DEBUG, Telling CMAN to 
kill the node\n);
-   cman_kill_node(ctx-qc_ch

Re: [Cluster-devel] Fence driver for the Digital Loggers Web Power Switches

2012-07-31 Thread Fabio M. Di Nitto

On 07/31/2012 10:24 PM, Dwight Hubbard wrote:
 Hopefully this is a correct patch, been a long while since I've
 generated one

Don't worry.. I'll have Marek review it and send comments back.

My only minor concern is the license. Do you think you can make your
agent GPLv2+ ? otherwise I guess it's time to fix the build system and
packaging to deal with multiple license. tho having the whole tree under
the same umbrella is easier ;)

Thanks
Fabio

 
 On Tue, Jul 31, 2012 at 12:06 PM, Fabio M. Di Nitto fdini...@redhat.com
 mailto:fdini...@redhat.com wrote:
 
 On 07/31/2012 06:59 PM, Dwight Hubbard wrote:
  If I knew where to submit it I'd be happy to
 
 here is just fine :) either in form of patch to fence-agents.git master
 branch or as a standalone agent and we can help integrating in the
 current tree.
 
 Fabio
 
 
  On Mon, Jul 23, 2012 at 11:18 PM, Fabio M. Di Nitto
 fdini...@redhat.com mailto:fdini...@redhat.com
  mailto:fdini...@redhat.com mailto:fdini...@redhat.com wrote:
 
  On 07/23/2012 10:12 PM, Dwight Hubbard wrote:
   I updated the Fence driver I wrote back in 2009 for the Digital
  loggers
   network power switches (http://digital-loggers.com/lpc.html)
 to work
   with some additional powerswitch models and put the code in
 a github
   repo http://github.com/dwighthubbard/python-dlipower.  In
 case it's
   useful for anyone else...
 
  Is there a specific reason why you don't submit the code
 upstream and
  have it part of fence-agents.git?
 
  Thanks
  Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Exit uncleanly only when CMAN_SHUTDOWN_ANYWAY is set

2012-07-27 Thread Fabio M. Di Nitto

ACK

we will need an upstream/rhel6 equivalent too for this one. See my
comment in BZ.

Fabio

On 07/27/2012 07:07 PM, Ryan McCabe wrote:
 Only exit uncleanly when the CMAN_SHUTDOWN_ANYWAY flag is set in the
 argument passed when handling the CMAN_REASON_TRY_SHUTDOWN event.
 
 This fixes the case where args is 2, where we want to refuse to
 shut down.
 
 Resolves: rhbz#769730
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/clulib/msg_cluster.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/rgmanager/src/clulib/msg_cluster.c 
 b/rgmanager/src/clulib/msg_cluster.c
 index e864853..e4b6b39 100644
 --- a/rgmanager/src/clulib/msg_cluster.c
 +++ b/rgmanager/src/clulib/msg_cluster.c
 @@ -211,7 +211,7 @@ poll_cluster_messages(int timeout)
  
   if (cman_dispatch(ch, 0)  0) {
   process_cman_event(ch, NULL,
 -CMAN_REASON_TRY_SHUTDOWN, 1);
 +CMAN_REASON_TRY_SHUTDOWN, CMAN_SHUTDOWN_ANYWAY);
   }
   ret = 0;
   }
 @@ -987,7 +987,9 @@ process_cman_event(cman_handle_t handle, void *private, 
 int reason, int arg)
   printf(EVENT: %p %p %d %d\n, handle, private, reason, arg);
  #endif
  
 - if (reason == CMAN_REASON_TRY_SHUTDOWN  !arg) {
 + if (reason == CMAN_REASON_TRY_SHUTDOWN 
 + !(arg  CMAN_SHUTDOWN_ANYWAY))
 + {
   cman_replyto_shutdown(handle, 0);
   return;
   }

Re: [Cluster-devel] Fence driver for the Digital Loggers Web Power Switches

2012-07-24 Thread Fabio M. Di Nitto

On 07/23/2012 10:12 PM, Dwight Hubbard wrote:
 I updated the Fence driver I wrote back in 2009 for the Digital loggers
 network power switches (http://digital-loggers.com/lpc.html) to work
 with some additional powerswitch models and put the code in a github
 repo http://github.com/dwighthubbard/python-dlipower.  In case it's
 useful for anyone else...

Is there a specific reason why you don't submit the code upstream and
have it part of fence-agents.git?

Thanks
Fabio

[Cluster-devel] [PATCH] cman init: allow dlm hash table sizes to be tunable at startup

2012-07-24 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#842370

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in   |   28 
 cman/init.d/cman.init.defaults.in |7 +++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index 9a0d726..9de349d 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -110,6 +110,13 @@ fi
 # DLM_CONTROLD_OPTS -- allow extra options to be passed to dlm_controld daemon.
 [ -z $DLM_CONTROLD_OPTS ]  DLM_CONTROLD_OPTS=
 
+# DLM_LKBTBL_SIZE - DLM_RSBTBL_SIZE - DLM_DIRTBL_SIZE
+# Allow tuning of DLM kernel hash table sizes.
+# do NOT change unless instructed to do so.
+[ -z $DLM_LKBTBL_SIZE ]  DLM_LKBTBL_SIZE=
+[ -z $DLM_RSBTBL_SIZE ]  DLM_RSBTBL_SIZE=
+[ -z $DLM_DIRTBL_SIZE ]  DLM_DIRTBL_SIZE=
+
 # FENCE_JOIN_TIMEOUT -- seconds to wait for fence domain join to
 # complete.  If the join hasn't completed in this time, fence_tool join
 # exits with an error, and this script exits with an error.  To wait
@@ -706,6 +713,23 @@ leave_fence_domain()
fi
 }
 
+tune_dlm_hash_sizes()
+{
+   dlmdir=/sys/kernel/config/dlm/cluster
+
+   [ -n $DLM_LKBTBL_SIZE ]  [ -f $dlmdir/lkbtbl_size ]  \
+ echo $DLM_LKBTBL_SIZE  $dlmdir/lkbtbl_size
+
+   [ -n $DLM_RSBTBL_SIZE ]  [ -f $dlmdir/rsbtbl_size ]  \
+ echo $DLM_RSBTBL_SIZE  $dlmdir/rsbtbl_size
+
+   [ -n $DLM_DIRTBL_SIZE ]  [ -f $dlmdir/dirtbl_size ]  \
+ echo $DLM_DIRTBL_SIZE  $dlmdir/dirtbl_size
+
+   return 0
+}
+   
+
 start()
 {
currentaction=start
@@ -773,6 +797,10 @@ start()
none \
Starting dlm_controld
 
+   runwrap tune_dlm_hash_sizes \
+   none \
+   Tuning DLM kernel hash tables
+
runwrap start_gfs_controld \
none \
Starting gfs_controld
diff --git a/cman/init.d/cman.init.defaults.in 
b/cman/init.d/cman.init.defaults.in
index 1b7913e..bbaa049 100644
--- a/cman/init.d/cman.init.defaults.in
+++ b/cman/init.d/cman.init.defaults.in
@@ -34,6 +34,13 @@
 # DLM_CONTROLD_OPTS -- allow extra options to be passed to dlm_controld daemon.
 #DLM_CONTROLD_OPTS=
 
+# DLM_LKBTBL_SIZE - DLM_RSBTBL_SIZE - DLM_DIRTBL_SIZE
+# Allow tuning of DLM kernel hash table sizes.
+# do NOT change unless instructed to do so.
+#DLM_LKBTBL_SIZE=
+#DLM_RSBTBL_SIZE=
+#DLM_DIRTBL_SIZE=
+
 # FENCE_JOIN_TIMEOUT -- seconds to wait for fence domain join to
 # complete.  If the join hasn't completed in this time, fence_tool join
 # exits with an error, and this script exits with an error.  To wait
-- 
1.7.7.6

Re: [Cluster-devel] [PATCH] rgmanager: Add IP interface parameter

2012-07-20 Thread Fabio M. Di Nitto

On 07/20/2012 11:07 PM, Lon Hohberger wrote:
 On 07/14/2012 03:54 PM, Fabio M. Di Nitto wrote:
 On 07/13/2012 10:08 PM, Lon Hohberger wrote:
 On 07/13/2012 12:08 AM, Fabio M. Di Nitto wrote:
 Hi Ryan,

 only one comment here.. many times we have been asked to implement
 interface parameter to allow any random IP on any specific interface
 (beside the pre configured ip on that interface).

 We haven't done that because we might end up owning routing.  However,
 if we make it explicit that this is not the case, then we could in
 theory do both.

 hmm right.. forgot about that.

 I would still prefer to avoid the use of interface= option if possible
 tho. Maybe something slightly less overloaded. force_interface or
 force_net_device.

 
 Sure; that's fine.
 
 prefer_interface=
 
 maybe?  If more than one match, use this one, otherwise, use the one
 that matches

Yes that sounds a lot better than force_* :)

Thanks
Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Add IP interface parameter

2012-07-20 Thread Fabio M. Di NItto

On 07/20/2012 11:07 PM, Lon Hohberger wrote:
 On 07/14/2012 03:54 PM, Fabio M. Di Nitto wrote:
 On 07/13/2012 10:08 PM, Lon Hohberger wrote:
 On 07/13/2012 12:08 AM, Fabio M. Di Nitto wrote:
 Hi Ryan,

 only one comment here.. many times we have been asked to implement
 interface parameter to allow any random IP on any specific interface
 (beside the pre configured ip on that interface).

 We haven't done that because we might end up owning routing.  However,
 if we make it explicit that this is not the case, then we could in
 theory do both.

 hmm right.. forgot about that.

 I would still prefer to avoid the use of interface= option if possible
 tho. Maybe something slightly less overloaded. force_interface or
 force_net_device.

 
 Sure; that's fine.
 
 prefer_interface=
 
 maybe?  If more than one match, use this one, otherwise, use the one
 that matches

Yes that sounds a lot better than force_* :)

Thanks
Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Add IP interface parameter

2012-07-14 Thread Fabio M. Di Nitto

On 07/13/2012 10:08 PM, Lon Hohberger wrote:
 On 07/13/2012 12:08 AM, Fabio M. Di Nitto wrote:
 Hi Ryan,

 only one comment here.. many times we have been asked to implement
 interface parameter to allow any random IP on any specific interface
 (beside the pre configured ip on that interface).
 
 We haven't done that because we might end up owning routing.  However,
 if we make it explicit that this is not the case, then we could in
 theory do both.

hmm right.. forgot about that.

I would still prefer to avoid the use of interface= option if possible
tho. Maybe something slightly less overloaded. force_interface or
force_net_device.

Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Add IP interface parameter

2012-07-12 Thread Fabio M. Di Nitto

Hi Ryan,

only one comment here.. many times we have been asked to implement
interface parameter to allow any random IP on any specific interface
(beside the pre configured ip on that interface).

Can we change the patch to simply fix both problems at once?
Effectively, the fact that 2 interfaces have 2 ip on the same subnet is
simply a corner case.

Maybe later on we can add something like: ifconfig iface up / down.

when doing ifconfig up we need to store the output of ip addresses
automatically assigned to that interface.

on shutdown, we need to check if the ip we are removing is the last one
on that interface _before_ issuing an ifconfig down in case there are
more ip resources associated to it.

The patch looks ok, but I would probably use a different term than
interface as it sounds very similar to the expected feature above.

Fabio

On 07/12/2012 07:23 PM, Ryan McCabe wrote:
 This patch adds an interface parameter for IP resources. The
 interface must already be configured and active. This parameter
 should be used only when at least two active interfaces have IP
 addresses on the same subnet and it's necessary to specify which
 particular interface should be used.
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/resources/ip.sh |   17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/rgmanager/src/resources/ip.sh b/rgmanager/src/resources/ip.sh
 index 38d1ab9..3adbb12 100755
 --- a/rgmanager/src/resources/ip.sh
 +++ b/rgmanager/src/resources/ip.sh
 @@ -132,6 +132,15 @@ meta_data()
   content type=boolean/
   /parameter
  
 + parameter name=interface
 + longdesc lang=en
 + The network interface to which the IP address should be added. 
 The interface must already be configured and active. This parameter should be 
 used only when at least two active interfaces have IP addresses on the same 
 subnet and it is desired to have the IP address added to a particular 
 interface.
 + /longdesc
 + shortdesc lang=en
 + Network interface
 + /shortdesc
 + content type=string/
 + /parameter
  /parameters
  
  actions
 @@ -587,6 +596,10 @@ ipv6()
   fi
   
   if [ $1 = add ]; then
 + if [ -n $OCF_RESKEY_interface ]  \
 +[ $OCF_RESKEY_interface != $dev ]; then
 + continue
 + fi
   ipv6_same_subnet $ifaddr_exp/$maskbits $addr_exp
   if [ $? -ne 0 ]; then
  continue
 @@ -670,6 +683,10 @@ ipv4()
   fi
  
   if [ $1 = add ]; then
 + if [ -n $OCF_RESKEY_interface ]  \
 +[ $OCF_RESKEY_interface != $dev ]; then
 + continue
 + fi
   ipv4_same_subnet $ifaddr/$maskbits $addr
   if [ $? -ne 0 ]; then
   continue

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-11 Thread Fabio M. Di Nitto

On 7/11/2012 9:37 AM, Dietmar Maurer wrote:
 Ok, bisect myself.

 This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6

 But this is just the check you introduced. If I revert that patch, everything
 works as before, but I noticed that It still deletes the values from the
 corosync objdb after config reload - even in 3.1.8!

 Both cluster.cman.nodename and cluster.cman.cluster_id get removed.

 Testing with earlier versions now.
 
 That even happens with 3.1.4 (cant test easily with older versions).
 
 Any ideas?

No, not yet, but what kind of operational problem do you get? does it
affect runtime? if so how?

Fabio

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-11 Thread Fabio M. Di Nitto

On 7/11/2012 10:14 AM, Fabio M. Di Nitto wrote:
 On 7/11/2012 9:37 AM, Dietmar Maurer wrote:
 Ok, bisect myself.

 This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6

 But this is just the check you introduced. If I revert that patch, 
 everything
 works as before, but I noticed that It still deletes the values from the
 corosync objdb after config reload - even in 3.1.8!

 Both cluster.cman.nodename and cluster.cman.cluster_id get removed.

 Testing with earlier versions now.

 That even happens with 3.1.4 (cant test easily with older versions).

 Any ideas?
 
 No, not yet, but what kind of operational problem do you get? does it
 affect runtime? if so how?
 
 Fabio
 


Nevermind.. i answered my own question.

Fabio

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-11 Thread Fabio M. Di Nitto

On 7/11/2012 10:21 AM, Dietmar Maurer wrote:
 This lead directly to commit f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6

 But this is just the check you introduced. If I revert that patch,
 everything works as before, but I noticed that It still deletes the
 values from the corosync objdb after config reload - even in 3.1.8!

 Both cluster.cman.nodename and cluster.cman.cluster_id get removed.

 Testing with earlier versions now.

 That even happens with 3.1.4 (cant test easily with older versions).

 Any ideas?

 No, not yet, but what kind of operational problem do you get? does it affect
 runtime? if so how?
 
 I cannot change/reload the configuration  with commit 
 f3f4499d4ace7a3bf5fe09ce6d9f04ed6d8958f6
 
 When I revert that commit everything works fine.
 
 I just wonder why those values get removed from the corosync objdb?

That´s the root cause of the issue.

 
 Note: You added that check, so I guess it has negative side effects when 
 there is no nodename (why did you add that check)?

Well yes, it is an error if we can´t determine our nodename.

The issue now is to understand why it fails for you but doesn´t fail for
me using git.

Fabio

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-11 Thread Fabio M. Di Nitto

On 7/11/2012 10:32 AM, Dietmar Maurer wrote:
 Well yes, it is an error if we can´t determine our nodename.

 The issue now is to understand why it fails for you but doesn´t fail for me
 using git.
 
 Oh, you can't reproduce the bug?
 


Found it it is triggered only when cluster.conf has a
cman.. section.

Working on a fix now.

Fabio

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-11 Thread Fabio M. Di Nitto

On 7/11/2012 10:32 AM, Dietmar Maurer wrote:
 Well yes, it is an error if we can´t determine our nodename.

 The issue now is to understand why it fails for you but doesn´t fail for me
 using git.
 
 Oh, you can't reproduce the bug?
 
 


Can you please try the patch I just posted to the list? it works for me,
but a couple of extra eyes won´t hurt.

Thanks
fabio

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-10 Thread Fabio M. Di Nitto

If are running stable32 from git, can you please revert:

commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff

and see if it´s still a problem?

Thanks
Fabio

On 7/10/2012 1:33 PM, Dietmar Maurer wrote:
 I just updated from 3.1.8 to latest STABLE32:
 
  
 
 I use this cluster.conf:
 
  
 
 # cat /etc/cluster/cluster.conf
 
 ?xml version=1.0?
 
 cluster config_version=235 name=test
 
   cman keyfile=/var/lib/pve-cluster/corosync.authkey transport=udpu/
 
   clusternodes
 
 clusternode name=maui nodeid=3 votes=1/
 
 clusternode name=cnode1 nodeid=1 votes=1/
 
   /clusternodes
 
   rm
 
 pvevm autostart=0 vmid=100/
 
   /rm
 
 /cluster
 
  
 
 cman service starts without problems:
 
  
 
 # /etc/init.d/cman start
 
 Starting cluster:
 
Checking if cluster has been disabled at boot... [  OK  ]
 
Checking Network Manager... [  OK  ]
 
Global setup... [  OK  ]
 
Loading kernel modules... [  OK  ]
 
Mounting configfs... [  OK  ]
 
Starting cman... [  OK  ]
 
Waiting for quorum... [  OK  ]
 
Starting fenced... [  OK  ]
 
Starting dlm_controld... [  OK  ]
 
 Starting GFS2 Control Daemon: gfs_controld.
 
Unfencing self... [  OK  ]
 
Joining fence domain... [  OK  ]
 
  
 
 And the corosync objdb contains:
 
  
 
 # corosync-objctl|grep cluster.cman
 
 cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
 
 cluster.cman.transport=udpu
 
 cluster.cman.nodename=maui
 
 cluster.cman.cluster_id=1678
 
  
 
 Note: there is a value for ‘nodename’ and ‘cluster_id’
 
  
 
 Now I simply increase the version inside cluster.conf (on both nodes):
 
  
 
 # cat /etc/cluster/cluster.conf
 
 ?xml version=1.0?
 
 cluster config_version=236 name=test
 
   cman keyfile=/var/lib/pve-cluster/corosync.authkey transport=udpu/
 
   clusternodes
 
 clusternode name=maui nodeid=3 votes=1/
 
 clusternode name=cnode1 nodeid=1 votes=1/
 
   /clusternodes
 
   rm
 
 pvevm autostart=0 vmid=100/
 
   /rm
 
 /cluster
 
  
 
 And trigger a reload:
 
  
 
 # cman_tool version -r –S
 
 cman_tool: Error loading configuration in corosync/cman
 
  
 
 And the syslog have more details:
 
  
 
 Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] cman was unable to
 determine our node name!
 
 Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Can't get updated
 config version: Successfully read config from /etc/cluster/cluster.conf#012.
 
 Jul 10 13:28:25 maui corosync[488675]:   [CMAN  ] Continuing activity
 with old configuration
 
  
 
 Somehow the nodename and cluster_id values are removed from the corosync
 objdb:
 
  
 
 # corosync-objctl|grep cluster.cman
 
 cluster.cman.keyfile=/var/lib/pve-cluster/corosync.authkey
 
 cluster.cman.transport=udpu
 
  
 
  
 
 Any Idea why that happens?
 
  
 
 - Dietmar

Re: [Cluster-devel] cluster.cman.nodename vanish on config reload

2012-07-10 Thread Fabio M. Di Nitto

On 7/10/2012 2:09 PM, Dietmar Maurer wrote:
 If are running stable32 from git, can you please revert:

 commit 8975bd6341b2d94c1f89279b1b00d4360da1f5ff

 and see if it´s still a problem?
 
 Yes, same problem.
 
 - Dietmar
 
 


Ok. then please file a bugzilla. I´ll need to bisect and see when the
problem has been introduced (unless you want to give bisect a shot).

Fabio

[Cluster-devel] [PATCH] qdiskd: restrict master_wins to 2 node cluster

2012-07-09 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

given enough mingling of cluster.conf it was possible to
break quorum rule #1: there is only one quorum in a cluster at
any given time.

this change restricts master_wins to 2 node cluster only
and provides extra feedback to the user (via logging) on why
the mode is disabled.

Resolves: rhbz#838047

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/man/qdisk.5  |5 +++--
 cman/qdisk/disk.h |1 +
 cman/qdisk/main.c |   22 +++---
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/cman/man/qdisk.5 b/cman/man/qdisk.5
index ca974fa..938ed69 100644
--- a/cman/man/qdisk.5
+++ b/cman/man/qdisk.5
@@ -297,8 +297,9 @@ and qdiskd's timeout (interval*tko) should be less than 
half of
 Totem's token timeout.  See section 3.3.1 for more information.
 
 This option only takes effect if there are no heuristics
-configured.  Usage of this option in configurations with more than
-two cluster nodes is undefined and should not be done.
+configured and it is valid only for 2 node cluster.
+This option is automatically disabled if heuristics are
+defined or cluster has more than 2 nodes configured.
 
 In a two-node cluster with no heuristics and no defined vote
 count (see above), this mode is turned by default.  If enabled in
diff --git a/cman/qdisk/disk.h b/cman/qdisk/disk.h
index fd80fa6..1792377 100644
--- a/cman/qdisk/disk.h
+++ b/cman/qdisk/disk.h
@@ -249,6 +249,7 @@ typedef struct {
int qc_master;  /* Master?! */
int qc_config;
int qc_token_timeout;
+   int qc_auto_votes;
disk_node_state_t qc_disk_status;
disk_node_state_t qc_status;
run_flag_t qc_flags;
diff --git a/cman/qdisk/main.c b/cman/qdisk/main.c
index 32677a2..e14d534 100644
--- a/cman/qdisk/main.c
+++ b/cman/qdisk/main.c
@@ -1444,7 +1444,7 @@ auto_qdisk_votes(int desc)
logt_print(LOG_ERR, Unable to determine qdiskd votes 
   automatically\n);
else
-   logt_print(LOG_DEBUG, Setting votes to %d\n, ret);
+   logt_print(LOG_DEBUG, Setting autocalculated votes to %d\n, 
ret);
  
return (ret);
 }
@@ -1606,6 +1606,8 @@ get_dynamic_config_data(qd_ctx *ctx, int ccsfd)
ctx-qc_flags = ~RF_AUTO_VOTES;
}
 
+   ctx-qc_auto_votes = auto_qdisk_votes(ccsfd);
+
snprintf(query, sizeof(query), /cluster/quorumd/@votes);
if (ccs_get(ccsfd, query, val) == 0) {
ctx-qc_votes = atoi(val);
@@ -1613,7 +1615,7 @@ get_dynamic_config_data(qd_ctx *ctx, int ccsfd)
if (ctx-qc_votes  0)
ctx-qc_votes = 0;
} else {
-   ctx-qc_votes = auto_qdisk_votes(ccsfd);
+   ctx-qc_votes = ctx-qc_auto_votes;
if (ctx-qc_votes  0) {
if (ctx-qc_config) {
logt_print(LOG_WARNING, Unable to determine 
@@ -1879,15 +1881,21 @@ get_config_data(qd_ctx *ctx, struct h_data *h, int 
maxh, int *cfh)
*cfh = configure_heuristics(ccsfd, h, maxh,
ctx-qc_interval * (ctx-qc_tko - 1));
 
-   if (*cfh) {
-   if (ctx-qc_flags  RF_MASTER_WINS) {
-   logt_print(LOG_WARNING, Master-wins mode disabled\n);
+   if (ctx-qc_flags  RF_MASTER_WINS) {
+   if (*cfh) {
+   logt_print(LOG_WARNING, Master-wins mode disabled 
+   (not compatible with 
heuristics)\n);
+   ctx-qc_flags = ~RF_MASTER_WINS;
+   }
+   if (ctx-qc_auto_votes != 1) {
+   logt_print(LOG_WARNING, Master-wins mode disabled 
+   (not compatible with more than 
2 nodes)\n);
ctx-qc_flags = ~RF_MASTER_WINS;
}
} else {
if (ctx-qc_flags  RF_AUTO_VOTES 
-   !(ctx-qc_flags  RF_MASTER_WINS) 
-   ctx-qc_votes == 1) { 
+   !*cfh 
+   ctx-qc_auto_votes == 1) { 
/* Two node cluster, no heuristics, 1 vote for
 * quorum disk daemon.  Safe to enable master-wins.
 * In fact, qdiskd without master-wins in this config
-- 
1.7.7.6

Re: [Cluster-devel] [PATCH 1/5] rgmanager: Fix orainstance.sh error checking

2012-07-04 Thread Fabio M. Di Nitto

ACK

On 6/28/2012 9:57 PM, Ryan McCabe wrote:
 Pull in the fixed error checking that was added to oracledb.sh as a
 fix for rhbz#471066.
 
 Resolves: rhbz#723819
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com --- 
 rgmanager/src/resources/orainstance.sh |4 ++-- 1 file changed,
 2 insertions(+), 2 deletions(-)
 
 diff --git a/rgmanager/src/resources/orainstance.sh
 b/rgmanager/src/resources/orainstance.sh index 6f2ff15..a9f690d
 100755 --- a/rgmanager/src/resources/orainstance.sh +++
 b/rgmanager/src/resources/orainstance.sh @@ -105,7 +105,7 @@
 start_db() {
 
 # If we see: # ORA-.: failure, we failed -grep -q failure
 $logfile +grep -q ^ORA- $logfile rv=$?
 
 rm -f $logfile @@ -155,7 +155,7 @@ stop_db() { return 1 fi
 
 - grep -q failure $logfile +  grep -q ^ORA- $logfile rv=$? rm -f
 $logfile

Re: [Cluster-devel] [PATCH 2/5] rgmanager: Don't exit uncleanly when cman asks us to shut down.

2012-07-04 Thread Fabio M. Di Nitto

ACK

On 6/28/2012 9:57 PM, Ryan McCabe wrote:
 Original patch from Lon rediffed to apply to the current tree:
 
 Previous to this, rgmanager would uncleanly exit if you
 issued a 'service cman stop'.  This patch makes it uncleanly
 exit if 'cman_tool leave force' or a corosync/openais crash
 occurs, but in a simple cman_tool leave, rgmanager will no
 longer exit uncleanly.
 
 Without this patch, issuing 'service cman stop' when rgmanager
 is running will make it impossible to stop the cman service because
 rgmanager will have exited without releasing its dlm lockspace.
 
 Resolves: rhbz#769730
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/clulib/msg_cluster.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/rgmanager/src/clulib/msg_cluster.c 
 b/rgmanager/src/clulib/msg_cluster.c
 index 8dc22d0..e864853 100644
 --- a/rgmanager/src/clulib/msg_cluster.c
 +++ b/rgmanager/src/clulib/msg_cluster.c
 @@ -211,7 +211,7 @@ poll_cluster_messages(int timeout)
  
   if (cman_dispatch(ch, 0)  0) {
   process_cman_event(ch, NULL,
 -CMAN_REASON_TRY_SHUTDOWN, 0);
 +CMAN_REASON_TRY_SHUTDOWN, 1);
   }
   ret = 0;
   }
 @@ -987,6 +987,11 @@ process_cman_event(cman_handle_t handle, void *private, 
 int reason, int arg)
   printf(EVENT: %p %p %d %d\n, handle, private, reason, arg);
  #endif
  
 + if (reason == CMAN_REASON_TRY_SHUTDOWN  !arg) {
 + cman_replyto_shutdown(handle, 0);
 + return;
 + }
 +
   /* Allocate queue node */
   while ((node = malloc(sizeof(*node))) == NULL) {
   sleep(1);

Re: [Cluster-devel] [PATCH 5/5] rgmanager: Fix a possible NULL pointer dereference

2012-07-04 Thread Fabio M. Di Nitto

ACK

On 6/28/2012 9:58 PM, Ryan McCabe wrote:
 Fix a NULL pointer dereference that could happen when cman_get_node_count()
 returns 0 with errno set to EINTR.
 
 Possibly resolves rhbz#820632
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/clulib/members.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/rgmanager/src/clulib/members.c b/rgmanager/src/clulib/members.c
 index f705297..72f4529 100644
 --- a/rgmanager/src/clulib/members.c
 +++ b/rgmanager/src/clulib/members.c
 @@ -367,8 +367,10 @@ get_member_list(cman_handle_t h)
  
   do {
   ++tries;
 - if (nodes)
 + if (nodes) {
   free(nodes);
 + nodes = NULL;
 + }
  
   c = cman_get_node_count(h);
   if (c = 0) {

Re: [Cluster-devel] [PATCH 4/5] rgmanager: Treat exit status 16 from umount as success

2012-07-04 Thread Fabio M. Di Nitto

ACK, but please add Masatake YAMATO suggestion to the final patch.

Fabio

On 6/28/2012 9:57 PM, Ryan McCabe wrote:
 When the filesystem /etc lives on is completely full, umount will exit
 with exit status 16 if the umount syscall succeeded but it was unable to write
 a new mtab file because the disk is full. umount won't exit with status 16
 under any other circumstances.
 
 This patch changes the fs.sh, clusterfs.sh, and netfs.sh resource agents
 to check treat both exit status 0 and exit status 16 as success.
 
 Resolves: rhbz#819595
 
 Signed-off-by: Ryan McCabe rmcc...@redhat.com
 ---
  rgmanager/src/resources/clusterfs.sh |3 ++-
  rgmanager/src/resources/fs.sh|3 ++-
  rgmanager/src/resources/netfs.sh |3 ++-
  3 files changed, 6 insertions(+), 3 deletions(-)
 
 diff --git a/rgmanager/src/resources/clusterfs.sh 
 b/rgmanager/src/resources/clusterfs.sh
 index 49eb724..eae1ee0 100755
 --- a/rgmanager/src/resources/clusterfs.sh
 +++ b/rgmanager/src/resources/clusterfs.sh
 @@ -793,7 +793,8 @@ stop: Could not match $OCF_RESKEY_device with a real 
 device
   ocf_log info unmounting $dev ($mp)
  
   umount $mp
 - if  [ $? -eq 0 ]; then
 + retval=$?
 + if  [ $retval -eq 0 -o $retval -eq 16 ]; then
   umount_failed=
   done=$YES
   continue
 diff --git a/rgmanager/src/resources/fs.sh b/rgmanager/src/resources/fs.sh
 index a98cddc..5d6bc1b 100755
 --- a/rgmanager/src/resources/fs.sh
 +++ b/rgmanager/src/resources/fs.sh
 @@ -1103,7 +1103,8 @@ stop: Could not match $OCF_RESKEY_device with a real 
 device
  
   ocf_log info unmounting $mp
   umount $mp
 - if  [ $? -eq 0 ]; then
 + retval=$?
 + if  [ $retval -eq 0 -o $retval -eq 16 ]; then
   umount_failed=
   done=$YES
   continue
 diff --git a/rgmanager/src/resources/netfs.sh 
 b/rgmanager/src/resources/netfs.sh
 index 837a4c4..9f0daa4 100755
 --- a/rgmanager/src/resources/netfs.sh
 +++ b/rgmanager/src/resources/netfs.sh
 @@ -560,7 +560,8 @@ stopNFSFilesystem() {
   ocf_log info unmounting $mp
  
  umount $umount_flag $mp
 - if  [ $? -eq 0 ]; then
 + retval=$?
 + if  [ $retval -eq 0 -o $retval -eq 16 ]; then
  umount_failed=
  done=$YES
  continue

[Cluster-devel] [PATCH] qdiskd: Make multipath issues go away

2012-07-02 Thread Fabio M. Di Nitto

From: Lon Hohberger l...@redhat.com

Qdiskd hsitorically has required significant tuning to work around
delays which occur during multipath failover, overloaded I/O, and LUN
trespasses in both device-mapper-multipath and EMC PowerPath
environments.

This patch goes a very long way towards eliminating false evictions
when these conditions occur by making qdiskd whine to the other
cluster members when it detects hung system calls.  When a cluster
member whines, it indicates the source of the problem (which system
call is hung), and the act of receiving a whine from a host indicates
that qdiskd is operational, but that I/O is hung.  Hung I/O is different
from losing storage entirely (where you get I/O errors).

Possible problems:

- Receive queue getting very full, causing messages to become blocked on
a node where I/O is hung.  1) that would take a very long time, and 2)
node should get evicted at that point anyway.

Resolves: rhbz#782900

this version of the patch is a backport of:
e2937eb33f224f86904fead08499a6178868ca6a
34d2872fb7e60be1594158acaaeb8acd74f78d22

There is a minor change vs original patch based on how qdiskd
in RHEL5 handles cman connection. We add an extra call to cman_alive
in main qdisk_loop to make sure data are not stalled on the
cman port, and data_callback to qdiskd_whine executed.

Signed-off-by: Lon Hohberger l...@redhat.com
Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/daemon/cnxman-socket.h |1 +
 cman/qdisk/Makefile |2 +-
 cman/qdisk/disk.h   |6 
 cman/qdisk/iostate.c|   17 +++--
 cman/qdisk/iostate.h|4 ++-
 cman/qdisk/main.c   |   54 +++
 6 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/cman/daemon/cnxman-socket.h b/cman/daemon/cnxman-socket.h
index 351c97c..1d01b44 100644
--- a/cman/daemon/cnxman-socket.h
+++ b/cman/daemon/cnxman-socket.h
@@ -79,6 +79,7 @@
 #define CLUSTER_PORT_SERVICES2
 #define CLUSTER_PORT_SYSMAN  10/* Remote execution daemon */
 #define CLUSTER_PORT_CLVMD   11/* Cluster LVM daemon */
+#defineCLUSTER_PORT_QDISKD  178/* Quorum disk daemon */
 
 /* Port numbers above this will be blocked when the cluster is inquorate or in
  * transition */
diff --git a/cman/qdisk/Makefile b/cman/qdisk/Makefile
index f58806b..9bfc486 100644
--- a/cman/qdisk/Makefile
+++ b/cman/qdisk/Makefile
@@ -32,7 +32,7 @@ qdiskd: disk.o crc32.o disk_util.o main.o score.o bitmap.o 
clulog.o \
gcc -o $@ $^ -lpthread -L../lib -L${ccslibdir} -lccs -lrt
 
 mkqdisk: disk.o crc32.o disk_util.o iostate.o \
-proc.o mkqdisk.o scandisk.o clulog.o gettid.o
+proc.o mkqdisk.o scandisk.o clulog.o gettid.o ../lib/libcman.a
gcc -o $@ $^ -lrt
 
 %.o: %.c
diff --git a/cman/qdisk/disk.h b/cman/qdisk/disk.h
index b784220..d491de1 100644
--- a/cman/qdisk/disk.h
+++ b/cman/qdisk/disk.h
@@ -290,6 +290,12 @@ typedef struct {
status_block_t ni_status;
 } node_info_t;
 
+typedef struct {
+   qd_ctx *ctx;
+   node_info_t *ni;
+   size_t ni_len;
+} qd_priv_t;
+
 int qd_write_status(qd_ctx *ctx, int nid, disk_node_state_t state,
disk_msg_t *msg, memb_mask_t mask, memb_mask_t master);
 int qd_read_print_status(target_info_t *disk, int nid);
diff --git a/cman/qdisk/iostate.c b/cman/qdisk/iostate.c
index 65b4d50..eb74ad2 100644
--- a/cman/qdisk/iostate.c
+++ b/cman/qdisk/iostate.c
@@ -1,10 +1,14 @@
 #include pthread.h
+#include libcman.h
 #include iostate.h
 #include unistd.h
 #include time.h
 #include sys/time.h
 #include clulog.h
+#include stdint.h
+#include platform.h
 #include iostate.h
+#include ../daemon/cnxman-socket.h
 
 static iostate_t main_state = 0;
 static int main_incarnation = 0;
@@ -26,7 +30,7 @@ static struct state_table io_state_table[] = {
 {  STATE_LSEEK,seek  },
 {  -1, NULL} };
 
-static const char *
+const char *
 state_to_string(iostate_t state)
 {
static const char *ret = unknown;
@@ -65,6 +69,8 @@ io_nanny_thread(void *arg)
iostate_t last_main_state = 0, current_main_state = 0;
int last_main_incarnation = 0, current_main_incarnation = 0;
int logged_incarnation = 0;
+   cman_handle_t ch = (cman_handle_t)arg;
+   int32_t whine_state;
 
/* Start with wherever we're at now */
pthread_mutex_lock(state_mutex);
@@ -96,6 +102,11 @@ io_nanny_thread(void *arg)
continue;
}
 
+   /* Whine on CMAN api */
+   whine_state = (int32_t)current_main_state;
+   swab32(whine_state);
+   cman_send_data(ch, whine_state, sizeof(int32_t), 0, 
CLUSTER_PORT_QDISKD, 0);
+
/* Don't log things twice */
if (logged_incarnation == current_main_incarnation)
continue;
@@ -114,7 +125,7 @@ io_nanny_thread(void *arg)
 
 
 int
-io_nanny_start(int timeout

[Cluster-devel] [PATCH] cman-preconfig: allow host aliases as valid cluster nodenames

2012-06-27 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#786118

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/daemon/cman-preconfig.c |   91 +++---
 1 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/cman/daemon/cman-preconfig.c b/cman/daemon/cman-preconfig.c
index d88ff3d..68fec22 100644
--- a/cman/daemon/cman-preconfig.c
+++ b/cman/daemon/cman-preconfig.c
@@ -451,7 +451,7 @@ static int verify_nodename(struct objdb_iface_ver0 *objdb, 
char *node)
struct sockaddr *sa;
hdb_handle_t nodes_handle;
hdb_handle_t find_handle = 0;
-   int error;
+   int found = 0;
 
/* nodename is either from commandline or from uname */
if (nodelist_byname(objdb, cluster_parent_handle, node))
@@ -497,12 +497,11 @@ static int verify_nodename(struct objdb_iface_ver0 
*objdb, char *node)
}
objdb-object_find_destroy(find_handle);
 
-
-   /* The cluster.conf names may not be related to uname at all,
-  they may match a hostname on some network interface.
-  NOTE: This is IPv4 only */
-   error = getifaddrs(ifa_list);
-   if (error)
+   /*
+* The cluster.conf names may not be related to uname at all,
+* they may match a hostname on some network interface.
+*/
+   if (getifaddrs(ifa_list))
return -1;
 
for (ifa = ifa_list; ifa; ifa = ifa-ifa_next) {
@@ -521,12 +520,13 @@ static int verify_nodename(struct objdb_iface_ver0 
*objdb, char *node)
if (sa-sa_family == AF_INET6)
salen = sizeof(struct sockaddr_in6);
 
-   error = getnameinfo(sa, salen, nodename2,
-   sizeof(nodename2), NULL, 0, 0);
-   if (!error) {
+   if (getnameinfo(sa, salen,
+   nodename2, sizeof(nodename2), 
+   NULL, 0, 0) == 0) {
 
if (nodelist_byname(objdb, cluster_parent_handle, 
nodename2)) {
strncpy(node, nodename2, sizeof(nodename) - 1);
+   found = 1;
goto out;
}
 
@@ -537,27 +537,88 @@ static int verify_nodename(struct objdb_iface_ver0 
*objdb, char *node)
 
if (nodelist_byname(objdb, 
cluster_parent_handle, nodename2)) {
strncpy(node, nodename2, 
sizeof(nodename) - 1);
+   found = 1;
goto out;
}
}
}
 
/* See if it's the IP address that's in cluster.conf */
-   error = getnameinfo(sa, sizeof(*sa), nodename2,
-   sizeof(nodename2), NULL, 0, NI_NUMERICHOST);
-   if (error)
+   if (getnameinfo(sa, sizeof(*sa),
+   nodename2, sizeof(nodename2), 
+   NULL, 0, NI_NUMERICHOST))
continue;
 
if (nodelist_byname(objdb, cluster_parent_handle, nodename2)) {
strncpy(node, nodename2, sizeof(nodename) - 1);
+   found = 1;
goto out;
}
}
 
-   error = -1;
  out:
+   if (found) {
+   freeifaddrs(ifa_list);
+   return 0;
+   }
+
+   /*
+* This section covers the usecase where the nodename specified in 
cluster.conf
+* is an alias specified in /etc/hosts. For example:
+* ipaddr hostname alias1 alias2
+* and clusternode name=alias2
+* the above calls use uname and getnameinfo does not return aliases.
+* here we take the name specified in cluster.conf, resolve it to an 
address
+* and then compare against all known local ip addresses.
+* if we have a match, we found our nodename. In theory this chunk of 
code
+* could replace all the checks above, but let's avoid any possible 
regressions
+* and use it as last.
+*/
+
+   nodes_handle = nodeslist_init(objdb, cluster_parent_handle, 
find_handle);
+   while (nodes_handle) {
+   char *dbnodename = NULL;
+   struct addrinfo hints;
+   struct addrinfo *result = NULL, *rp = NULL;
+
+   if (objdb_get_string(objdb, nodes_handle, name, dbnodename)) 
{
+   goto next;
+   }
+
+   memset(hints, 0, sizeof(struct addrinfo));
+   hints.ai_family = AF_UNSPEC;
+   hints.ai_socktype = SOCK_DGRAM;
+   hints.ai_flags = 0;
+   hints.ai_protocol = IPPROTO_UDP;
+
+   if (getaddrinfo(dbnodename, NULL, hints, result))
+   goto next;
+
+   for (rp

[Cluster-devel] [PATCH] rgmanager: fix nfsrestart option to be effective

2012-06-21 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

The original patch e512a9ce367 was still racy in some conditions
as other rpc.* and nfs* processes were holding a lock on the filesystem.

stopping nfs in kernel is simply not enough in rhel5

this fixed version does stop nfs completely and re-instante nfs exports.

Resolves: rhbz#822066

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 rgmanager/src/resources/clusterfs.sh |   31 ---
 rgmanager/src/resources/fs.sh|   31 ---
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/rgmanager/src/resources/clusterfs.sh 
b/rgmanager/src/resources/clusterfs.sh
index 89b30a2..49eb724 100755
--- a/rgmanager/src/resources/clusterfs.sh
+++ b/rgmanager/src/resources/clusterfs.sh
@@ -681,7 +681,10 @@ stopFilesystem() {
typeset -i max_tries=3  # how many times to try umount
typeset -i sleep_time=2 # time between each umount failure
typeset -i refs=0
-   typeset nfsdthreads
+   typeset nfsexports=
+   typeset nfsexp=
+   typeset nfsopts=
+   typeset nfsacl=
typeset done=
typeset umount_failed=
typeset force_umount=
@@ -804,16 +807,22 @@ stop: Could not match $OCF_RESKEY_device with a real 
device
 
if [ $OCF_RESKEY_nfsrestart = yes ] 
|| \
   [ $OCF_RESKEY_nfsrestart = 1 ]; 
then
-   if [ -f /proc/fs/nfsd/threads 
]; then
-   ocf_log warning 
Restarting nfsd/nfslock
-   nfsdthreads=$(cat 
/proc/fs/nfsd/threads)
-   service nfslock stop
-   echo 0  
/proc/fs/nfsd/threads
-   echo $nfsdthreads  
/proc/fs/nfsd/threads
-   service nfslock start
-   else
-   ocf_log err Unable to 
determin nfsd information. nfsd restart aborted
-   fi
+   ocf_log warning Restarting 
nfsd/nfslock
+   nfsexports=$(cat 
/var/lib/nfs/etab)
+   service nfslock stop
+   service nfs stop
+   service nfs start
+   service nfslock start
+   echo $nfsexports | { while 
read line; do
+   nfsexp=$(echo $line | 
awk '{print $1}')
+   nfsopts=$(echo $line | 
sed -e 's#.*(##g' -e 's#).*##g')
+   nfsacl=$(echo $line | 
awk '{print $2}' | sed -e 's#(.*##g')
+   if [ -n $nfsopts ]; 
then
+   exportfs -i -o 
$nfsopts $nfsacl:$nfsexp
+   else
+   exportfs -i 
$nfsacl:$nfsexp
+   fi
+   done; }
fi
 
else
diff --git a/rgmanager/src/resources/fs.sh b/rgmanager/src/resources/fs.sh
index 5724352..a98cddc 100755
--- a/rgmanager/src/resources/fs.sh
+++ b/rgmanager/src/resources/fs.sh
@@ -1019,7 +1019,10 @@ stopFilesystem() {
typeset -i max_tries=3  # how many times to try umount
typeset -i sleep_time=5 # time between each umount failure
typeset -i nfslock_reclaim=0
-   typeset nfsdthreads
+   typeset nfsexports=
+   typeset nfsexp=
+   typeset nfsopts=
+   typeset nfsacl=
typeset done=
typeset umount_failed=
typeset force_umount=
@@ -1126,16 +1129,22 @@ stop: Could not match $OCF_RESKEY_device with a real 
device
 
  if [ $OCF_RESKEY_nfsrestart = yes ] || \
 [ $OCF_RESKEY_nfsrestart = 1 ]; then
-   if [ -f /proc/fs/nfsd/threads ]; then
-   ocf_log warning Restarting 
nfsd/nfslock
-   nfsdthreads=$(cat 
/proc/fs/nfsd/threads)
-   service nfslock stop
-   echo 0  /proc/fs/nfsd/threads

Re: [Cluster-devel] [PATCH] rgmanager: fix nfsrestart option to be effective

2012-06-21 Thread Fabio M. Di Nitto

On 6/21/2012 3:26 PM, Lon Hohberger wrote:
 On 06/21/2012 04:07 AM, Fabio M. Di Nitto wrote:
 From: Fabio M. Di Nittofdini...@redhat.com

 The original patch e512a9ce367 was still racy in some conditions
 as other rpc.* and nfs* processes were holding a lock on the filesystem.

 stopping nfs in kernel is simply not enough in rhel5

 this fixed version does stop nfs completely and re-instante nfs exports.

 Resolves: rhbz#822066
 
 This is okay; ideally we wouldn't have to do this in the first place,
 however.

and I would like some ponies, rainbows and unicorns.. however.

Fabio

Re: [Cluster-devel] [PATCH] mkfs.gfs2: Follow symlinks before checking device contents

2012-06-20 Thread Fabio M. Di Nitto

Hi,

On 6/20/2012 6:15 PM, Bob Peterson wrote:
 - Original Message -
 | +   absname = canonicalize_file_name(sdp-device_name);
 
 Hi Andy,
 
 Thanks for the patch. I just wanted to point out that in the past we've
 used realpath rather than canonicalize_file_name. For example, see this patch
 we did a long time ago to gfs2_tool:
 
 http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=e70898cfa09939a7100a057433fff3a4ad666bdd
 
 It would be nice if our use was consistent. I'm not sure if there's an
 advantage of one over the other. If canonicalize_file_name is now preferred
 upstream over realpath, we should probably replace all occurrences of that.
 
 On the other hand, if realpath is now preferred upstream, we should adjust
 this patch to use it instead. AFAIK, they are the same, and I don't have a
 personal preference; whatever is most favoured by the upstream community. :)
 
 Otherwise, the patch looks good.

I don´t remember what other mkfs.* tools do, but if I would prefer to
see something like:

# ./mkfs.gfs2 -p lock_nolock /dev/vg/test
WARNING: /dev/vg/test appears to be a symlink to /dev/real/device
This will destroy any data on /dev/real/device
It appears to contain: RANDOM_FS_OF_DOOM (blocksize..)

Fabio

Re: [Cluster-devel] when do I need to start cpglockd

2012-06-19 Thread Fabio M. Di Nitto

On 6/19/2012 6:23 AM, Dietmar Maurer wrote:
 Yes, that's a bug. cpglockd will be started from the rgmanager init
 script when RRP mode is enabled.


 Ryan


 Actually no, it's not a bug.

 cpglockd has it's own init script too.
 
 Yes, and that script 'unconditionally' (always) starts cpglockd

Nothing wrong with that. If you ask a daemon to start it will start :)

On top of that, cpglockd is harmless if there is no RRP mode active, or
forcefully disabled.

  
 The Required-Start: tells sysvinint that if cpglockd is enabled, it has to be
 started before rgmanager.
 
 That tells sysvinint to always start that script before rgmanager.
 
 So we end up with cpglockd always running, although it is not required at all.
 
 What do I miss?

It tells sysvinit to start cpglockd before rgmanager IF cpglockd is
enabled via chkconfig, otherwise it is not started. That value is used
only to calculate the symlink S* K** values for rc.d/

Fabio

Re: [Cluster-devel] when do I need to start cpglockd

2012-06-19 Thread Fabio M. Di Nitto

On 6/19/2012 8:54 AM, Dietmar Maurer wrote:
 Yes, and that script 'unconditionally' (always) starts cpglockd

 Nothing wrong with that. If you ask a daemon to start it will start :)
 
 For me this is wrong. I have to maintain a debian package, and I do not want 
 to start
 unnecessary daemons. So I simply remove that dependency.
 

If Debian handling of daemons has changed, then the change is debian
specific, it doesn´t make it a bug for all distributions.

Last I checked if I run:

apt-get install bind9 - bind9 will start automatically. Or for that
matter also apache2 or

The init scripts we deliver are as generic as possible, it doesn´t
mean that they fit everything everywhere.

And then again, expressing an order is correct. If Required-Start
behavior in Debian is different than in other distro (I can speak for
Fedora/RHEL here), then clearly there needs to be some distro specific
tuning.

Fabio

Re: [Cluster-devel] when do I need to start cpglockd

2012-06-19 Thread Fabio M. Di Nitto

On 6/19/2012 9:24 AM, Dietmar Maurer wrote:
 And then again, expressing an order is correct. If Required-Start
 behavior in Debian is different than in other distro (I can speak for
 Fedora/RHEL here), then clearly there needs to be some distro specific
 tuning.
 
 You simply start a daemon which is not necessary.


  And I guess you do that on 
 all distros if there is a Required-Start start dependency.

Fresh install on Fedora:

root@fedora16-node2 ~]# chkconfig --list |grep cpg
cpglockd0:off   1:off   2:off   3:off   4:off   5:off   6:off

[root@fedora16-node2 ~]# chkconfig rgmanager on
[root@fedora16-node2 ~]# chkconfig --list |grep rg
rgmanager   0:off   1:off   2:on3:on4:on5:on6:off

[root@fedora16-node2 ~]# chkconfig --list |grep cpg
cpglockd0:off   1:off   2:off   3:off   4:off   5:off   6:off

[reboot]

[root@fedora16-node2 ~]# ps ax|grep cpglockd
 3741 pts/1S+ 0:00 grep --color=auto cpglockd
[root@fedora16-node2 ~]#

[root@fedora16-node2 ~]# clustat
[SNIP]
 service:vip1   fedora16-node2 started

As you can see, rgmanager is on, cpglockd off.

At boot rgmanager starts fine, without cpglockd running.

I think the problem here is the interpretation of the LSB specifications
between different distributions. I am not going to argue which one is
right or wrong but the key issue is here:

An init.d shell script may declare using the Required-Start:  header
that it shall not be run until certain boot facilities are provided.
This information is used by the installation tool or the boot-time
boot-script execution facility to assure that init scripts are run in
the correct order.

In the fedora world that means that if cpglockd is enabled (via
chkconfig), the Required-Start: make sure that cpglockd is started
before rgmanager, always.

It is possible that other distributions might interpret that as:
cpglockd must be started even if disabled when rgmanager
Required-Start: cpglockd and rgmanager is enabled.

So based on the platform I use for testing/development, the daemon does
not start unless it is necessary :)

Fabio

Re: [Cluster-devel] when do I need to start cpglockd

2012-06-19 Thread Fabio M. Di Nitto

On 6/19/2012 10:12 AM, Dietmar Maurer wrote:

 At boot rgmanager starts fine, without cpglockd running.
 I think the problem here is the interpretation of the LSB specifications
 between different distributions. I am not going to argue which one is right 
 or
 wrong but the key issue is here:

 An init.d shell script may declare using the Required-Start:  header that 
 it
 shall not be run until certain boot facilities are provided.
 This information is used by the installation tool or the boot-time 
 boot-script
 execution facility to assure that init scripts are run in the correct order.

 In the fedora world that means that if cpglockd is enabled (via chkconfig), 
 the
 Required-Start: make sure that cpglockd is started before rgmanager, always.

 It is possible that other distributions might interpret that as:
 cpglockd must be started even if disabled when rgmanager
 Required-Start: cpglockd and rgmanager is enabled.

 So based on the platform I use for testing/development, the daemon does
 not start unless it is necessary :)
 
 OK, I was not aware of that.
 
 Many thanks for that detailed reply!

So let´s instead try to figure out the correct fix.

Let´s put one minute aside the possibility that some distributions might
use the second interpretation of LSB header and focus only on the
ordering instead.

Dropping Required-Start: might look like an easy fix in the Debian
world, but that could cripple the startup order as cpglockd could
theoretically land after rgmanager (i don´t think it´s possible, but
let´s not take a chance).

I think the correct fix should be:

move the conditional start start_cpglockd function/check from
rgmanager.init to cpglockd.init.

move the cpglockd is up and running test from rgmanager.init to
cpglockd.init (that´s a bug as-is now).

cpglockd.init should return 0 (success) if it does not need to run and
would allow rgmanager to start given Debian current interpretation of
LSB header.

rgmanager.init can simply fire cpglockd.init without any check, as those
would be done properly by cpglockd.init.

I think this should solve the issue for Debian and keep current behavior
in Fedora.

Fabio

Re: [Cluster-devel] when do I need to start cpglockd

2012-06-18 Thread Fabio M. Di Nitto

On 06/14/2012 06:06 PM, Ryan McCabe wrote:
 On Thu, Jun 14, 2012 at 03:41:39PM +, Dietmar Maurer wrote:
 I can't see that in the current cman init script. Instead, the rgmanager 
 init script depends on the cpglockd unconditionally:

 # Required-Start:   cman cpglockd

 So that is a bug?
 
 Hi,
 
 Yes, that's a bug. cpglockd will be started from the rgmanager init
 script when RRP mode is enabled.
 
 
 Ryan
 

Actually no, it's not a bug.

cpglockd has it's own init script too.

The Required-Start: tells sysvinint that if cpglockd is enabled, it has
to be started before rgmanager.

rgmanager snippet to start cpglockd is there only for backward
compatibility mode that avoids breaking upgrades from non RRP
environments to RRP. This was done so that users didn't need to enable
cpglockd via chkconfig (being a new daemon and all is not known yet).

A perfect install would see the user doing:

chkconfig cpglockd on
chkconfig rgmanager on

only for RRP installations. But then again, docs are fresh, cpglockd is
new.. might as well help the users not to shoot their foot with an RRP
gun ;)

Fabio

[Cluster-devel] [PATCH] rgmanager: add nfsdrestart option as last resource to umount fs

2012-05-16 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#822053

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 rgmanager/src/resources/fs.sh.in |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/rgmanager/src/resources/fs.sh.in b/rgmanager/src/resources/fs.sh.in
index c43c177..404fe01 100644
--- a/rgmanager/src/resources/fs.sh.in
+++ b/rgmanager/src/resources/fs.sh.in
@@ -135,6 +135,18 @@ do_metadata()
content type=boolean/
/parameter
 
+   parameter name=nfsrestart inherit=nfsrestart
+   longdesc lang=en
+   If set and unmounting the file system fails, the node will
+   try to restart nfs daemon and nfs lockd to drop all filesystem
+   references. Use this option as last resource.
+   /longdesc
+   shortdesc lang=en
+   Enable NFS daemon and lockd workaround
+   /shortdesc
+   content type=boolean/
+   /parameter
+
parameter name=fsid
longdesc lang=en
File system ID for NFS exports.  This can be overridden
@@ -446,6 +458,20 @@ do_force_unmount() {
export nfslock_reclaim=1
fi
 
+   if [ $OCF_RESKEY_nfsrestart = yes ] || \
+  [ $OCF_RESKEY_nfsrestart = 1 ]; then
+   if [ -f /proc/fs/nfsd/threads ]; then
+   ocf_log warning Restarting nfsd/nfslock
+   nfsdthreads=$(cat /proc/fs/nfsd/threads)
+   service nfslock stop
+   rpc.nfsd 0
+   rpc.nfsd $nfsdthreads
+   service nfslock start
+   else
+   ocf_log err Unable to determin nfsd information. nfsd 
restart aborted
+   fi
+   fi
+
# Proceed with fuser -kvm...
return 1
 }
-- 
1.7.7.6

[Cluster-devel] [PATCH] rgmanager: add nfsdrestart option as last resource to umount fs

2012-05-16 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#822066

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 rgmanager/src/resources/fs.sh |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/rgmanager/src/resources/fs.sh b/rgmanager/src/resources/fs.sh
index 49912c2..f67f80e 100755
--- a/rgmanager/src/resources/fs.sh
+++ b/rgmanager/src/resources/fs.sh
@@ -202,6 +202,18 @@ meta_data()
content type=boolean/
/parameter
 
+   parameter name=nfsrestart inherit=nfsrestart
+   longdesc lang=en
+   If set and unmounting the file system fails, the node will
+   try to restart nfs daemon and nfs lockd to drop all filesystem
+   references. Use this option as last resource.
+   /longdesc
+   shortdesc lang=en
+   Enable NFS daemon and lockd workaround
+   /shortdesc
+   content type=boolean/
+   /parameter
+
parameter name=fsid
longdesc lang=en
File system ID for NFS exports.  This can be overridden
@@ -1005,6 +1017,7 @@ stopFilesystem() {
typeset -i max_tries=3  # how many times to try umount
typeset -i sleep_time=5 # time between each umount failure
typeset -i nfslock_reclaim=0
+   typeset nfsdthreads
typeset done=
typeset umount_failed=
typeset force_umount=
@@ -1108,6 +1121,20 @@ stop: Could not match $OCF_RESKEY_device with a real 
device
notify_list_store $mp/.clumanager/statd
nfslock_reclaim=1
  fi
+
+ if [ $OCF_RESKEY_nfsrestart = yes ] || \
+[ $OCF_RESKEY_nfsrestart = 1 ]; then
+   if [ -f /proc/fs/nfsd/threads ]; then
+   ocf_log warning Restarting 
nfsd/nfslock
+   nfsdthreads=$(cat 
/proc/fs/nfsd/threads)
+   service nfslock stop
+   echo 0  /proc/fs/nfsd/threads
+   echo $nfsdthreads  
/proc/fs/nfsd/threads
+   service nfslock start
+   else
+   ocf_log err Unable to determin 
nfsd information. nfsd restart aborted
+   fi
+ fi
else
  fuser -kvm $mp
fi
-- 
1.7.7.6

[Cluster-devel] [PATCH] cman init: allow sysconfig/cman to pass options to dlm_controld

2012-05-15 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

DLM_CONTROLD_OPTS= can now be used to pass startup options to the
daemon.

Resolves: rhbz#821016

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in   |5 -
 cman/init.d/cman.init.defaults.in |3 +++
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index a39f19f..dddfe6e 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -116,6 +116,9 @@ fi
 # empty or any other value (default) | cman init will start the daemons
 #CMAN_DAEMONS_START=
 
+# DLM_CONTROLD_OPTS -- allow extra options to be passed to dlm_controld daemon.
+[ -z $DLM_CONTROLD_OPTS ]  DLM_CONTROLD_OPTS=
+
 # FENCE_JOIN_TIMEOUT -- seconds to wait for fence domain join to
 # complete.  If the join hasn't completed in this time, fence_tool join
 # exits with an error, and this script exits with an error.  To wait
@@ -674,7 +677,7 @@ stop_fenced()
 
 start_dlm_controld()
 {
-   start_daemon dlm_controld || return 1
+   start_daemon dlm_controld $DLM_CONTROLD_OPTS || return 1
 
if [ $INITLOGLEVEL = full ]; then
ok
diff --git a/cman/init.d/cman.init.defaults.in 
b/cman/init.d/cman.init.defaults.in
index 04b3b5b..adde8d9 100644
--- a/cman/init.d/cman.init.defaults.in
+++ b/cman/init.d/cman.init.defaults.in
@@ -39,6 +39,9 @@
 # empty or any other value (default) | cman init will start the daemons
 #CMAN_DAEMONS_START= 
 
+# DLM_CONTROLD_OPTS -- allow extra options to be passed to dlm_controld daemon.
+#DLM_CONTROLD_OPTS=
+
 # FENCE_JOIN_TIMEOUT -- seconds to wait for fence domain join to
 # complete.  If the join hasn't completed in this time, fence_tool join
 # exits with an error, and this script exits with an error.  To wait
-- 
1.7.7.6

[Cluster-devel] [PATCH] cman init: add extra documentation for FENCE_JOIN=

2012-05-15 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Related: rhbz#821016

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in   |3 +++
 cman/init.d/cman.init.defaults.in |3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index dddfe6e..95323b4 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -135,6 +135,9 @@ fi
 # set to yes, then the script will attempt to join the fence domain.
 # If FENCE_JOIN is set to any other value, the default behavior is
 # to join the fence domain (equivalent to yes).
+# When setting FENCE_JOIN to no, it is important to check 
+# DLM_CONTROLD_OPTS to reflect expected behavior regarding fencing
+# and quorum.
 [ -z $FENCE_JOIN ]  FENCE_JOIN=yes
 
 # FENCED_OPTS -- allow extra options to be passed to fence daemon.
diff --git a/cman/init.d/cman.init.defaults.in 
b/cman/init.d/cman.init.defaults.in
index adde8d9..b981bab 100644
--- a/cman/init.d/cman.init.defaults.in
+++ b/cman/init.d/cman.init.defaults.in
@@ -58,6 +58,9 @@
 # set to yes, then the script will attempt to join the fence domain.
 # If FENCE_JOIN is set to any other value, the default behavior is
 # to join the fence domain (equivalent to yes).
+# When setting FENCE_JOIN to no, it is important to check
+# DLM_CONTROLD_OPTS to reflect expected behavior regarding fencing
+# and quorum.
 #FENCE_JOIN=yes
 
 # FENCED_OPTS -- allow extra options to be passed to fence daemon.
-- 
1.7.7.6

Re: [Cluster-devel] GFS2: Update main gfs2 doc

2012-05-10 Thread Fabio M. Di Nitto

On 5/10/2012 2:11 PM, Steven Whitehouse wrote:
From 49f30789fc33c4516fbe123f05ea4313866381d3 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse swhit...@redhat.com
 Date: Thu, 10 May 2012 11:45:31 +0100
 Subject: [PATCH 1/2] GFS2: Update main gfs2 doc
 
 Various items were a bit out of date, so this is a refresh to the
 latest info.
 
 Signed-off-by: Steven Whitehouse swhit...@redhat.com
 
 diff --git a/Documentation/filesystems/gfs2.txt 
 b/Documentation/filesystems/gfs2.txt
 index 4cda926..cc4f230 100644
 --- a/Documentation/filesystems/gfs2.txt
 +++ b/Documentation/filesystems/gfs2.txt
 @@ -1,7 +1,7 @@
  Global File System
  --
  
 -http://sources.redhat.com/cluster/wiki/
 +https://fedorahosted.org/cluster/wiki/HomePage
  
  GFS is a cluster file system. It allows a cluster of computers to
  simultaneously use a block device that is shared between them (with FC,
 @@ -30,7 +30,8 @@ needed, simply:
  
  If you are using Fedora, you need to install the gfs2-utils package
  and, for lock_dlm, you will also need to install the cman package
 -and write a cluster.conf as per the documentation.
 +and write a cluster.conf as per the documentation. For F17 and above
 +cman has been replaced by the dlm package.

^^^ cman has been replaced by corosync 2.0 (or higher) in combination
with votequorum provide (see votequorum.5).

gfs2 still requires dlm for it´s dependencies but it´s not a replacement.

Fabio

Re: [Cluster-devel] GFS2: Update main gfs2 doc

2012-05-10 Thread Fabio M. Di Nitto

On 5/10/2012 3:13 PM, Steven Whitehouse wrote:
 Hi,

 On Thu, 2012-05-10 at 15:09 +0200, Fabio M. Di Nitto wrote:
 On 5/10/2012 2:11 PM, Steven Whitehouse wrote:
 From 49f30789fc33c4516fbe123f05ea4313866381d3 Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse swhit...@redhat.com
 Date: Thu, 10 May 2012 11:45:31 +0100
 Subject: [PATCH 1/2] GFS2: Update main gfs2 doc

 Various items were a bit out of date, so this is a refresh to the
 latest info.

 Signed-off-by: Steven Whitehouse swhit...@redhat.com

 diff --git a/Documentation/filesystems/gfs2.txt 
 b/Documentation/filesystems/gfs2.txt
 index 4cda926..cc4f230 100644
 --- a/Documentation/filesystems/gfs2.txt
 +++ b/Documentation/filesystems/gfs2.txt
 @@ -1,7 +1,7 @@
  Global File System
  --

 -http://sources.redhat.com/cluster/wiki/
 +https://fedorahosted.org/cluster/wiki/HomePage

  GFS is a cluster file system. It allows a cluster of computers to
  simultaneously use a block device that is shared between them (with FC,
 @@ -30,7 +30,8 @@ needed, simply:

  If you are using Fedora, you need to install the gfs2-utils package
  and, for lock_dlm, you will also need to install the cman package
 -and write a cluster.conf as per the documentation.
 +and write a cluster.conf as per the documentation. For F17 and above
 +cman has been replaced by the dlm package.

 ^^^ cman has been replaced by corosync 2.0 (or higher) in combination
 with votequorum provide (see votequorum.5).

 corosync was always a requirement though, it gets pulled in through the
 deps

No disagreement on the dependency here, but cman is not replaced by dlm
in terms of functionality, that would be incorrect.

 gfs2 still requires dlm for it´s dependencies but it´s not a replacement.

 Well it is kind of, since thats where dlm_controld resides and that now
 deals with all the recovery stuff now that gfs_controld is gone, so
 maybe it could have been worded better, but it at least is correct in
 terms of what needs to be installed package-wise,

Right, package wise you are right, you install dlm and you get corosync
indirectly. I was only pointing out the functionality chain here vs
package chain.

It might be better to express both in a doc since the landscape has
changed substantially.

Fabio

[Cluster-devel] [PATCH] qdisk: Fix man page example (take 2)

2012-05-08 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#745538

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/man/qdisk.5 |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cman/man/qdisk.5 b/cman/man/qdisk.5
index e0b0ff6..ca974fa 100644
--- a/cman/man/qdisk.5
+++ b/cman/man/qdisk.5
@@ -479,11 +479,11 @@ by the qdiskd timeout.
 .br
 quorumd interval=1 tko=10 votes=3 label=testing
 .in 12
-heuristic program=ping A -c1 -t1 score=1 interval=2 tko=3/
+heuristic program=ping A -c1 -w1 score=1 interval=2 tko=3/
 .br
-heuristic program=ping B -c1 -t1 score=1 interval=2 tko=3/
+heuristic program=ping B -c1 -w1 score=1 interval=2 tko=3/
 .br
-heuristic program=ping C -c1 -t1 score=1 interval=2 tko=3/
+heuristic program=ping C -c1 -w1 score=1 interval=2 tko=3/
 .br
 .in 8
 /quorumd
-- 
1.7.7.6

[Cluster-devel] [PATCH] cmannotifyd: deliver cluster status at startup and fix daemon init

2012-05-08 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

cmannotifyd is very often (if not always) started _after_ cman is
completely settled. That means cmannotifyd does not receive/dispatch
any notifications on the current cluster status at startup.

change cman connection loop to generate a fake notification that
config and membership have changed (we can't poll if they did)
and use those information internally too, to reinit logging with
new cman connection.

Resolves: rhbz#819787

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/notifyd/main.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cman/notifyd/main.c b/cman/notifyd/main.c
index 3091d2f..4a9f868 100644
--- a/cman/notifyd/main.c
+++ b/cman/notifyd/main.c
@@ -189,6 +189,10 @@ static void init_logging(int reconf)
ccs_read_logging(ccs_handle, cmannotifyd, debug, mode,
 syslog_facility, syslog_priority, 
logfile_priority, logfile);
ccs_disconnect(ccs_handle);
+   } else {
+   if (debug) {
+   logfile_priority = LOG_DEBUG;
+   }
}
 
if (!daemonize)
@@ -311,6 +315,8 @@ static void byebye_cman(void)
 static void setup_cman(int forever)
 {
int init = 0, active = 0;
+   int quorate;
+   const char *str = NULL;
 
 retry_init:
cman_handle = cman_init(NULL);
@@ -346,6 +352,14 @@ retry_active:
exit(EXIT_FAILURE);
}
 
+   logt_print(LOG_DEBUG, Dispatching first cluster status\n);
+   init_logging(1);
+   str = CMAN_REASON_CONFIG_UPDATE;
+   dispatch_notification(str, 0);
+   str = CMAN_REASON_STATECHANGE;
+   quorate = cman_is_quorate(cman_handle);
+   dispatch_notification(str, quorate);
+
return;
 
 out:
-- 
1.7.7.6

Re: [Cluster-devel] [PATCH 1/2] fence_scsi: fix typos in debug messages

2012-04-17 Thread Fabio M. Di Nitto

ACK

On 04/18/2012 02:01 AM, Ryan O'Hara wrote:
 Resolves: rhbz#674497
 
 Signed-off-by: Ryan O'Hara roh...@redhat.com
 ---
  fence/agents/scsi/fence_scsi.pl |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fence/agents/scsi/fence_scsi.pl
 b/fence/agents/scsi/fence_scsi.pl
 index 91f113d..84cee91 100755
 --- a/fence/agents/scsi/fence_scsi.pl
 +++ b/fence/agents/scsi/fence_scsi.pl
 @@ -111,7 +111,7 @@ sub get_node_id
   sub get_node_name
  {
 -print [$pname]: get_hode_name = $opt_n\n if $opt_v;
 +print [$pname]: get_node_name = $opt_n\n if $opt_v;
   return $opt_n;
  }
 @@ -163,7 +163,7 @@ sub get_host_name
  }
  }
  -print [$pname]: get_host_nam = $host_name\n if $opt_v;
 +print [$pname]: get_host_name = $host_name\n if $opt_v;
   return $host_name;
  }

Re: [Cluster-devel] [PATCH 2/2] fence_scsi: remove limitations section from man page

2012-04-17 Thread Fabio M. Di Nitto

ACK

On 04/18/2012 02:02 AM, Ryan O'Hara wrote:
 Resolves: rhbz#753839
 
 Signed-off-by: Ryan O'Hara roh...@redhat.com
 ---
  fence/man/fence_scsi.8 |7 ---
  1 files changed, 0 insertions(+), 7 deletions(-)
 
 diff --git a/fence/man/fence_scsi.8 b/fence/man/fence_scsi.8
 index 8a2d5a8..d9ab03f 100644
 --- a/fence/man/fence_scsi.8
 +++ b/fence/man/fence_scsi.8
 @@ -99,12 +99,5 @@ Name of the node to be fenced.
  \fIverbose =  param \fR
  Verbose output.
  -.SH LIMITATIONS
 -The fence_scsi fencing agent requires a minimum of three nodes in the
 -cluster to operate.  For SAN devices connected via fiber channel,
 -these must be physical nodes.  SAN devices connected via iSCSI may use
 -virtual or physical nodes.  In addition, fence_scsi cannot be used in
 -conjunction with qdisk.
 -
  .SH SEE ALSO
  fence(8), fence_node(8), sg_persist(8), lvs(8), lvm.conf(5)

Re: [Cluster-devel] cluster: RHEL6 - Apply patch from John Ruemker to resolve rhbz#803474

2012-04-09 Thread Fabio M. Di Nitto

Hi Ryan,

This patch is not upstream (STABLE32 branch) and has not been
reviewed/ack'ed for inclusion.

Commit has been reverted from the RHEL6 branch.

Please also write a more comprehensive changelog entry in the commit
because not all bugzilla's are visible to outside world.

Example:

Fix this or that by init var foo to NULL and compare blabla

Patch from

Resolves: rhbz#123456

Thanks
Fabio

On 04/09/2012 09:35 PM, Ryan McCabe wrote:
 Gitweb:
 http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=cd9d9be98b4276c4e73eac81563f54e92a08045d
 Commit:cd9d9be98b4276c4e73eac81563f54e92a08045d
 Parent:54a29913c5de797da6adb69e03b38487fef451b4
 Author:Ryan McCabe rmcc...@redhat.com
 AuthorDate:Mon Apr 9 15:34:08 2012 -0400
 Committer: Ryan McCabe rmcc...@redhat.com
 CommitterDate: Mon Apr 9 15:35:50 2012 -0400
 
 Apply patch from John Ruemker to resolve rhbz#803474
 
 ---
  rgmanager/src/daemons/main.c |8 +++-
  rgmanager/src/daemons/rg_event.c |4 ++--
  2 files changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/rgmanager/src/daemons/main.c b/rgmanager/src/daemons/main.c
 index 94047c3..9a1e5e9 100644
 --- a/rgmanager/src/daemons/main.c
 +++ b/rgmanager/src/daemons/main.c
 @@ -456,7 +456,13 @@ dispatch_msg(msgctx_t *ctx, int nodeid, int need_close)
   /* Centralized processing or request is from
  clusvcadm */
   nid = event_master();
 - if (nid != my_id()) {
 + if (nid  0) {
 + logt_print(LOG_ERR, #40b: Unable to determine 
 + event master\n);
 + ret = -1;
 + goto out;
 + }
 + else if (nid != my_id()) {
   /* Forward the message to the event master */
   forward_message(ctx, msg_sm, nid);
   } else {
 diff --git a/rgmanager/src/daemons/rg_event.c 
 b/rgmanager/src/daemons/rg_event.c
 index 7048bc6..e6a2abd 100644
 --- a/rgmanager/src/daemons/rg_event.c
 +++ b/rgmanager/src/daemons/rg_event.c
 @@ -247,7 +247,7 @@ static int
  find_master(void)
  {
   event_master_t *masterinfo = NULL;
 - void *data;
 + void *data = NULL;
   uint32_t sz;
   cluster_member_list_t *m;
   uint64_t vn;
 @@ -255,7 +255,7 @@ find_master(void)
  
   m = member_list();
   if (vf_read(m, Transition-Master, vn,
 - (void **)(data), sz)  0) {
 + (void **)(data), sz) != VFR_OK) {
   logt_print(LOG_ERR, Unable to discover master
   status\n);
   masterinfo = NULL;
 ___
 cluster-commits mailing list
 cluster-comm...@lists.fedorahosted.org
 https://fedorahosted.org/mailman/listinfo/cluster-commits

[Cluster-devel] [PATCH 1/2] config: update relax ng schema to include totem miss_count_const

2012-03-30 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Resolves: rhbz#804938

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 config/tools/xml/cluster.rng.in.head |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/config/tools/xml/cluster.rng.in.head 
b/config/tools/xml/cluster.rng.in.head
index c2fed3e..4e3d901 100644
--- a/config/tools/xml/cluster.rng.in.head
+++ b/config/tools/xml/cluster.rng.in.head
@@ -255,6 +255,15 @@ To validate your cluster.conf against this schema, run:
   calculated from retransmits_before_loss and token. rha:default=4
   rha:sample=5/
/optional
+   optional
+attribute name=miss_count_const
+  rha:description=This constant defines the maximum number of times
+  on receipt of a token a message is checked for retransmission before
+  retransmission occurs. This parameter is useful to modify for switches
+  that delay multicast packets compared to unicast packets.
+  The default setting works well for nearly all modern switches.
+  rha:default=5 rha:sample=10/
+   /optional
!-- FIXME: The following description was adapted from the man page.
It may be tool long for the schema document. Consider cutting text
after the second sentence and referring the reader to the openais.conf
-- 
1.7.7.6

[Cluster-devel] [PATCH 2/2] cman init: fix start sequence error handling

2012-03-30 Thread Fabio M. Di Nitto

From: Fabio M. Di Nitto fdini...@redhat.com

Any daemon that fails to start would leave no traces.

the problem with cman init is that we need to handle multiple daemons
and tools. If one in the chain fails, we never reverted to the original
state of the system. This can indeed cause other issues.

Fix the init script to stop cman if any error happens during start.

Resolves: rhbz#806002

Signed-off-by: Fabio M. Di Nitto fdini...@redhat.com
---
 cman/init.d/cman.in |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/cman/init.d/cman.in b/cman/init.d/cman.in
index d0c6f70..a39f19f 100644
--- a/cman/init.d/cman.in
+++ b/cman/init.d/cman.in
@@ -19,6 +19,9 @@
 # set secure PATH
 PATH=/bin:/usr/bin:/sbin:/usr/sbin:@SBINDIR@
 
+# save invokation for rollback ops
+thisinvokation=$0
+
 chkconfig2()
 {
case $1 in
@@ -199,6 +202,9 @@ nok() {
echo -e $errmsg
failure
echo
+   if [ $currentaction = start ]; then
+   $thisinvokation stop
+   fi
exit 1
 }
 
@@ -744,6 +750,7 @@ leave_fence_domain()
 
 start()
 {
+   currentaction=start
breakpoint=$1
 
sshd_enabled  cd @INITDDIR@  ./sshd start
-- 
1.7.7.6

Re: [Cluster-devel] [Patch] GFS2: Add gfs2_lockgather script and man page

2012-03-05 Thread Fabio M. Di Nitto

On 03/05/2012 06:51 PM, Adam Drew wrote:
 This is a backport of the gfs2_lockgather script and manpage from gfs2_utils 
 upstream. 

I have to NACK this backport for now.

I already explain to Adam what needs changing.

Fabio

Re: [Cluster-devel] [PATCH] resource-agnets: Add support for using tunnelled migrations with qemu

2012-03-05 Thread Fabio M. Di Nitto

Looks good to me.

ACK

Fabio

On 03/05/2012 11:36 PM, Chris Feist wrote:

 Add support for using tunnelled migrations with qemu
 
 Resolves: rhbz#712174
 
 
 Allow using the --tunnelled option when migrating with virsh

Re: [Cluster-devel] [Patch] GFS2: Add gfs2_lockgather script and man page

2012-03-05 Thread Fabio M. Di Nitto

On 03/05/2012 07:34 PM, Steven Whitehouse wrote:
 Hi,
 
 On Mon, 2012-03-05 at 19:27 +0100, Fabio M. Di Nitto wrote:
 On 03/05/2012 06:51 PM, Adam Drew wrote:
 This is a backport of the gfs2_lockgather script and manpage from 
 gfs2_utils upstream. 

 I have to NACK this backport for now.

 I already explain to Adam what needs changing.

 Fabio

 
 What is the issue?

There are different ones.

The script is GPLv3 and we can't pull it in cluster.git (GPLv2+) without
some re-licensing work.

Some parts of the script make use of /tmp in unsafe way that can cause
security problems (mostly DoS in this case).

Execution of some cluster commands is not safe. If the cluster is
hanging and you want to use this tool to gather data, the script won't
work because it will hang as well, creating extra load on the cluster.

The script needs to handle shell errors correctly and AFAICT it doesn't.
Basically it can give the impression to run correctly without collecting
data (missing set -e or error handling per call).

(minor) the backport patch needs fixing for the Makefile or it will fail
to build/install.

Fabio

Re: [Cluster-devel] [Patch] GFS2: Add gfs2_lockgather script and man page

2012-03-05 Thread Fabio M. Di Nitto

On 03/05/2012 07:34 PM, Steven Whitehouse wrote:
 Hi,
 
 On Mon, 2012-03-05 at 19:27 +0100, Fabio M. Di Nitto wrote:
 On 03/05/2012 06:51 PM, Adam Drew wrote:
 This is a backport of the gfs2_lockgather script and manpage from 
 gfs2_utils upstream. 

 I have to NACK this backport for now.

 I already explain to Adam what needs changing.

 Fabio

 
 What is the issue?

Forgot to mention in the previous email:

since this is a long time (tar/ssh/scp..) running script, it needs to
handle trap of signals and locking differently or if a user hits ctrl+c
or the script is killed for whatever reason, it doesn't clean after
itself. Leaking disk space and leaving the lock file around that would
block the next run.

I didn't check all the paths it uses, but an update to selinux policies
might be necessary too.

Fabio

Re: [Cluster-devel] [PATCH] rgmanager: Retry when config is out of sync [RHEL5]

2012-02-29 Thread Fabio M. Di Nitto

ACK.

Fabio

On 03/01/2012 12:53 AM, Lon Hohberger wrote:
 [This patch is already in RHEL5]
 
 If you add a service to rgmanager v1 or v2 and that
 service fails to start on the first node but succeeds
 in its initial stop operation, there is a chance that
 the remote instance of rgmanager has not yet reread
 the configuration, causing the service to be placed
 into the 'recovering' state without further action.
 
 This patch causes the originator of the request to
 retry the operation.
 
 Later versions of rgmanager (ex STABLE3 branch and
 derivatives) are unlikely to have this problem since
 configuration updates are not polled, but rather
 delivered to clients.
 
 Update 22-Feb-2012: The above is incorrect, this was
 reproduced a rgmanager v3 installation.
 
 Resolves: rhbz#796272
 
 Signed-off-by: Lon Hohberger l...@redhat.com
 ---
  rgmanager/src/daemons/rg_state.c |   19 +++
  1 files changed, 19 insertions(+), 0 deletions(-)
 
 diff --git a/rgmanager/src/daemons/rg_state.c 
 b/rgmanager/src/daemons/rg_state.c
 index 23a4bec..8c5af5b 100644
 --- a/rgmanager/src/daemons/rg_state.c
 +++ b/rgmanager/src/daemons/rg_state.c
 @@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, 
 int preferred_target,
   rg_state_t svcStatus;
   int target = preferred_target, me = my_id();
   int ret, x, request = orig_request;
 + int retries;
   
   get_rg_state_local(svcName, svcStatus);
   if (svcStatus.rs_state == RG_STATE_DISABLED ||
 @@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, 
 int preferred_target,
   if (target == me)
   goto exhausted;
  
 + retries = 0;
 +retry:
   ret = svc_start_remote(svcName, request, target);
   switch (ret) {
   case RG_ERUN:
 @@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, 
 int preferred_target,
   *new_owner = svcStatus.rs_owner;
   free_member_list(allowed_nodes);
   return 0;
 + case RG_ENOSERVICE:
 + /*
 +  * Configuration update pending on remote node?  Give it
 +  * a few seconds to sync up.  rhbz#568126
 +  *
 +  * Configuration updates are synchronized in later 
 releases
 +  * of rgmanager; this should not be needed.
 +  */
 + if (retries++  4) {
 + sleep(3);
 + goto retry;
 + }
 + logt_print(LOG_WARNING, Member #%d has a different 
 +configuration than I do; trying next 
 +member., target);
 + /* Deliberate */
   case RG_EDEPEND:
   case RG_EFAIL:
   /* Uh oh - we failed to relocate to this node.

Re: [Cluster-devel] [PATCH] rgmanager: Fix clusvcadm message when run with -F [RHEL6]

2012-02-21 Thread Fabio M. Di Nitto

ACK

On 02/21/2012 07:53 PM, Lon Hohberger wrote:
 The new_owner was not being correctly set when enabling a service with
 -F when run without central processing enabled.
 
 Resolves: rhbz#727326
 
 Signed-off-by: Lon Hohberger l...@redhat.com
 ---
  rgmanager/src/daemons/rg_state.c |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/rgmanager/src/daemons/rg_state.c 
 b/rgmanager/src/daemons/rg_state.c
 index 5501b3f..23a4bec 100644
 --- a/rgmanager/src/daemons/rg_state.c
 +++ b/rgmanager/src/daemons/rg_state.c
 @@ -2293,6 +2293,7 @@ handle_fd_start_req(char *svcName, int request, int 
 *new_owner)
  
   switch(ret) {
   case RG_ESUCCESS:
 + *new_owner = target;
   ret = RG_ESUCCESS;
   goto out;
   case RG_ERUN:

1 2 3 4 5 6 >

1 - 100 of 546 matches

Mail list logo