2012/3/2 Kadlecsik József <kadlecsik.joz...@wigner.mta.hu>: > On Fri, 2 Mar 2012, Andrew Beekhof wrote: > >> 2012/3/2 Kadlecsik József <kadlecsik.joz...@wigner.mta.hu>: >> > >> > After upgrading to pacemaker 1.1.6, cluster-glue 1.0.8 on Debian, our >> > working apcmastersnmp resources stopped to work: >> > >> > Feb 29 14:22:03 atlas0 stonith: [35438]: ERROR: apcmastersnmp device not >> > accessible. >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: notice: log_operation: >> > Operation 'monitor' [35404] for device 'stonith-atlas6' returned: -2 >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation: >> > stonith-atlas6: Performing: stonith -t apcmastersnmp -S 161 >> > Feb 29 14:22:03 atlas0 stonith-ng: [32972]: ERROR: log_operation: >> > stonith-atlas6: Invalid config info for apcmastersnmp device >> > >> > Please note the strange "161" argument of stonith. >> > >> > After checking the source code and stracing stonithd, as far as I see, the >> > following happens: >> > >> > - stonithd calls fence_legacy, which steals the "port=161" parameter from >> > apcmastersnmp. This produces the error message >> > "Invalid config info for apcmastersnmp device" >> >> You keep saying steals, what do you mean by that? Where is it stolen from? > > fence_legacy passes the parameters to the stonith drivers via environment > variables, except the "port".
I had totally forgotten we do that. Everything you've done makes complete sense now. The second part is already pushed as: https://github.com/beekhof/pacemaker/commit/797d740 I'll add the first part that adds the port as an environment variable now. > However "port" is mandatory for > apcmastersnmp. I should have worded it better. > >> What does your config look like? > > Before upgrade the working apcmastersnmp resource was > > primitive stonith-atlas5 stonith:apcmastersnmp \ > params ipaddr="192.168.40.252" community="private" port="161" \ > ... > > "ipaddr", "community" are passed via environments variables by > fence_legacy, but "port" doesn't. > > We converted the resource to external/rackpdu, but that cannot handle > nodes attached to multiple outlets, so we should have apcmastersnm working > back. > >> > - At stealing "port=161", fence_legacy sets the port value to the node >> > name and passes to stonith, even in status mode. Therefore we >> > get "stonith -t apcmastersnmp -S 161" >> > - However stonith cannot catch the invalid node parameter: >> > >> > if (!(argcount == 1 || (argcount < 1 >> > && (status||listhosts||listtypes||listparanames||metadata)))) >> > { >> > ++errors; >> > } >> >> where is fragment this from? > > The C code fragments are from cluster-glue-1.0.8/lib/stonith/main.c. > >> > and even in status mode wants to run the reset request too: >> > >> > if (status) { >> > < no exit > >> > } >> > if (listhosts) { >> > < no exit > >> > } >> > if (optind < argc) { >> > ... >> > rc = stonith_req_reset(s, reset_type, nodename); >> > } >> > >> > Fortunately the port value does not match nodename, so it won't kill any >> > node, but the agent fails. >> > >> > Am I on the right track? Would the following patch fix the issue? I'm >> > asking it, because I don't know why "port=" is handled separatedly and >> > what are the implications of deleting $opt_n below. >> > >> > --- fence_legacy.orig 2012-02-29 23:03:36.594945717 +0100 >> > +++ fence_legacy 2012-03-01 14:41:46.454859212 +0100 >> > @@ -105,6 +105,7 @@ >> > elsif ($name eq "port" ) >> > { >> > $opt_n = $val; >> > + $ENV{$name} = $val; >> >> what is this for? > > Passing "port" similarly to the other parameters to the stonith drivers. > >> > } >> > elsif ($name eq "stonith" ) >> > { >> > @@ -176,8 +177,8 @@ >> > } >> > elsif ( $opt_o eq "monitor" || $opt_o eq "stat" || $opt_o eq "status" ) >> > { >> > - print "Performing: $opt_s -t $opt_t -S $opt_n\n" unless defined >> > $opt_q; >> > - exec "$opt_s -t $opt_t $extra_args -S $opt_n" or die "failed to >> > exec \"$opt_s\"\n"; >> > + print "Performing: $opt_s -t $opt_t -S\n" unless defined $opt_q; >> > + exec "$opt_s -t $opt_t $extra_args -S" or die "failed to exec >> > \"$opt_s\"\n"; >> >> I was under the impression that -S needed a node name, I see however >> that this isnt the case. >> Some devices can query the state of an individual port, it seems that >> the stonith binary doesn't expose this. >> >> Does everything work when you have this patch? > > We'll give it a try today. It's the usual issue: we have to experiment > on a in production cluster. > > Best regards, > Jozsef > -- > E-mail : kadlecsik.joz...@wigner.mta.hu > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt > Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences > H-1525 Budapest 114, POB. 49, Hungary > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org