Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-12-05 Thread Jan Pokorný
On 02/12/17 21:00 +0100, Jan Pokorný wrote:
> https://jdebp.eu/FGA/unix-daemon-readiness-protocol-problems.html
> 
> Quoting it:
>   Of course, only the service program itself can determine exactly
>   when this point [of being ready, that, "is about to enter its main
>   request processing loop"] is.
> 
> There's no way around this.
> 
> The whole objective of OCF standard looks retrospectively pretty
> sidetracked through this lense: instead of pulling weight of the
> semiformal standardization body (comprising significant industry
> players[*]) to raise awareness of this solvable reliability
> discrepancy, possibly contributing to generally acknowledged,
> resource manager agnostic solution (that could be inherited the
> next generation of the init systems), it just put a little bit of
> systemic approach to configuration management and monitoring on
> top of the legacy of organically grown "good enough" initscripts,
> clearly (because of inherent raciness and whatnot) not very suitable
> for the act of supervision nor for any sort of reactive balancing
> to satisfy the requirements (crucial in HA, polling interval-based
> approach leads to losing trailing nines needlessly for cases you
> can be notified about directly).

... although there was clearly a notion of employing asynchronous
mechanisms (one can infer, for technically more sound binding between
the resource manager and the supervised processes) even some 14+ years
ago:
https://github.com/ClusterLabs/OCF-spec/commit/2331bb8d3624a2697afaf3429cec1f47d19251f5#diff-316ade5241704833815c8fa2c2b71d4dR422

> Basically, that page also provides an overview of the existing
> "formalized intefaces" I had in mind above, in its "Several
> incompatible protocols with low adoption" section, including
> the mentioned sd_notify way of doing that in systemd realms
> (and its criticism just as well).
> 
> Apparently, this is a recurring topic because to this day, the problem
> hasn't been overcome in generic enough way, see NetBSD, as another
> example:
> https://mail-index.netbsd.org/tech-userlevel/2014/01/28/msg008401.html
> 
> This situation, caused by a lack of interest to get things right
> in the past plus OS ecosystem segmentation playing against any
> conceivable attempt to unify on a portable solution, is pretty
> unsettling :-/
> 
> [*] see https://en.wikipedia.org/wiki/Open_Cluster_Framework

-- 
Jan (Poki)


pgppERVlwhH_z.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-12-02 Thread Jan Pokorný
On 07/11/17 02:01 +0100, Jan Pokorný wrote:
> On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
>> 06.11.2017 22:38, Valentin Vidic пишет:
>>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
 I think it depends on exactly what you mean by "synchronous" here. You can
 start up a daemon, or a process which is responsible for forking into a
 daemon, but how can you know for sure that a service is really up and
 running?  Even if the daemon ran for a few seconds, it might die soon 
 after.
 At what point do you draw the line and say "OK start-up is now over, any
 failures after this are failures of a running service"?  In that light,
 "systemctl start" could return at a number of points in the startup 
 process,
 but there's probably always an element of asynchronicity in there.
 Interested to hear other opinions on this.
>>> 
>>> systemd.service(5) describes a started (running) service depending
>>> on the service type:
>>> 
>>> simple  - systemd will immediately proceed starting follow-up units (after 
>>> exec)
>>> forking - systemd will proceed with starting follow-up units as soon as
>>>   the parent process exits
>>> oneshot - process has to exit before systemd starts follow-up units
>>> dbus- systemd will proceed with starting follow-up units after the
>>>   D-Bus bus name has been acquired
>>> notify  - systemd will proceed with starting follow-up units after this
>>>   notification message has been sent
>>> 
>>> Obviously notify is best here
>> 
>> forking, dbus and notify all allow daemon to signal to systemd that
>> deamon is ready to service request. Unfortunately ...
>> 
>>> but not all daemons implement sending
>>> sd_notify(READY=1) when they are ready to serve clients.
>>> 
>> 
>> ... as well as not all daemons properly daemonize itself or register on
>> D-Bus only after they are ready.
> 
> Sharing the sentiment about the situation, arising probably primarily
> from daemon authors never been pushed to indicate full ability to
> provide service precisely because 1/ it's not the primary objective of
> init systems -- the only thing they would need to comply with
> regarding getting these daemons started (as opposed to real
> service-oriented supervisors, which is also the realm of HA, right?),
> and 2/ even if it had been desirable to indicate that, no formalized
> interface (and in turn, system convolutions) that would become
> widespread was devised for that purpose.  On the other hand, sd_notify
> seems to reconcile that in my eyes (+1 to Valetin's qualifying it
> the best of the above options) as it doesn't impose any other effect
> (casting extra interpretation on, say, a fork event makes it
> possibly not intended or at least not-well-timed side-effect of the
> main, intended effect).

I had some information deficits that only now are becoming catered.
Specifically, I discovered this nice, elaborate study on the
"Readiness protocol problems with Unix dæmons":
https://jdebp.eu/FGA/unix-daemon-readiness-protocol-problems.html

Quoting it:
  Of course, only the service program itself can determine exactly
  when this point [of being ready, that, "is about to enter its main
  request processing loop"] is.

There's no way around this.

The whole objective of OCF standard looks retrospectively pretty
sidetracked through this lense: instead of pulling weight of the
semiformal standardization body (comprising significant industry
players) to raise awareness of this solvable reliability
discrepancy, possibly contributing to generally acknowledged,
resource manager agnostic solution (that could be inherited the
next generation of the init systems), it just put a little bit of
systemic approach to configuration management and monitoring on
top of the legacy of organically grown "good enough" initscripts,
clearly (because of inherent raciness and whatnot) not very suitable
for the act of supervision nor for any sort of reactive balancing
to satisfy the requirements (crucial in HA, polling interval-based
approach leads to losing trailing nines needlessly for cases you
can be notified about directly).

Basically, that page also provides an overview of the existing
"formalized intefaces" I had in mind above, in its "Several
incompatible protocols with low adoption" section, including
the mentioned sd_notify way of doing that in systemd realms
(and its criticism just as well).

Apparently, this is a recurring topic because to this day, the problem
hasn't been overcome in generic enough way, see NetBSD, as another
example:
https://mail-index.netbsd.org/tech-userlevel/2014/01/28/msg008401.html

This situation, caused by a lack of interest to get things right
in the past plus OS ecosystem segmentation playing against any
conceivable attempt to unify on a portable solution, is pretty
unsettling :-/

[*] see https://en.wikipedia.org/wiki/Open_Cluster_Framework

> To elaborate more, historically, it's customary to perform double

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Jan Pokorný
[sorry, managed to drop most recent modifications just before sending,
fortunately got them from the editor's backups, so skip the previous
entry in the thread in favor of this one, also to avoid some typos
bleed, please]

On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
> 06.11.2017 22:38, Valentin Vidic пишет:
>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>> start up a daemon, or a process which is responsible for forking into a
>>> daemon, but how can you know for sure that a service is really up and
>>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>>> At what point do you draw the line and say "OK start-up is now over, any
>>> failures after this are failures of a running service"?  In that light,
>>> "systemctl start" could return at a number of points in the startup process,
>>> but there's probably always an element of asynchronicity in there.
>>> Interested to hear other opinions on this.
>> 
>> systemd.service(5) describes a started (running) service depending
>> on the service type:
>> 
>> simple  - systemd will immediately proceed starting follow-up units (after 
>> exec)
>> forking - systemd will proceed with starting follow-up units as soon as
>>   the parent process exits
>> oneshot - process has to exit before systemd starts follow-up units
>> dbus- systemd will proceed with starting follow-up units after the
>>   D-Bus bus name has been acquired
>> notify  - systemd will proceed with starting follow-up units after this
>>   notification message has been sent
>> 
>> Obviously notify is best here
> 
> forking, dbus and notify all allow daemon to signal to systemd that
> deamon is ready to service request. Unfortunately ...
> 
>> but not all daemons implement sending
>> sd_notify(READY=1) when they are ready to serve clients.
>> 
> 
> ... as well as not all daemons properly daemonize itself or register on
> D-Bus only after they are ready.

Sharing the sentiment about the situation, arising probably primarily
from daemon authors never been pushed to indicate full ability to
provide service precisely because 1/ it's not the primary objective of
init systems -- the only thing they would need to comply with
regarding getting these daemons started (as opposed to real
service-oriented supervisors, which is also the realm of HA, right?),
and 2/ even if it had been desirable to indicate that, no formalized
interface (and in turn, system convolutions) that would become
widespread was devised for that purpose.  On the other hand, sd_notify
seems to reconcile that in my eyes (+1 to Valetin's qualifying it the
best of the above options) as it doesn't impose any other effect
(casting extra interpretation on, say, a fork event makes it possibly
not intended or at least not-well-timed side-effect of the main,
intended effect).

To elaborate more, historically, it's customary to perform double fork
in the daemons to make them as isolated from controlling terminals and
what not as possible.  But it may not be desirable to perform anything
security sensitive prior to at least the first fork, hence with
"forking", you've already lost the preciseness of "ready" indication,
unless there is some further synchronization between the parent and
its child processes (I am yet to see that in practice).  So I'd say,
unless the daemon is specifically fine-tuned, both forking and dbus
types of services are bound to carry some amount of asynchronicity as
mentioned.  To the distaste of said service supervisors that strive to
maximize service usefulness over the considerable timeframe, which is
way more than ticking the "should be running OK because it got started
by me without any early failure" checkbox.

The main issue (though sometimes workable) of sd_notify approach is
that in your composite application you may not have a direct "consider
me ready" hook throughout the underlying stack, and tying it with
processing of the first request is out of question because it's timing
is not guaranteed (if it's ever to arrive).

Sorry, didn't add much to the discussion, meant to defend sd_notify's
perceived supremacy.  Getting rid of asynchronities (and related
magic, fragile sleeps on various places) is tough in the world that
wasn't widely interested in unified "as a service guarantee, reporting
true ready" signalling, IMHO.  (Verging on whatifs... what if some
supervisor-agnostic interface was designed in the prehistory, now
it could have been derived also by systemd, just as how such unified
logging interface, syslog, is widespread and functional up to these
days).

-- 
Poki


pgpwaUvD8lYfr.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Jan Pokorný
On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
> 06.11.2017 22:38, Valentin Vidic пишет:
>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>> start up a daemon, or a process which is responsible for forking into a
>>> daemon, but how can you know for sure that a service is really up and
>>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>>> At what point do you draw the line and say "OK start-up is now over, any
>>> failures after this are failures of a running service"?  In that light,
>>> "systemctl start" could return at a number of points in the startup process,
>>> but there's probably always an element of asynchronicity in there.
>>> Interested to hear other opinions on this.
>> 
>> systemd.service(5) describes a started (running) service depending
>> on the service type:
>> 
>> simple  - systemd will immediately proceed starting follow-up units (after 
>> exec)
>> forking - systemd will proceed with starting follow-up units as soon as
>>   the parent process exits
>> oneshot - process has to exit before systemd starts follow-up units
>> dbus- systemd will proceed with starting follow-up units after the
>>   D-Bus bus name has been acquired
>> notify  - systemd will proceed with starting follow-up units after this
>>   notification message has been sent
>> 
>> Obviously notify is best here
> 
> forking, dbus and notify all allow daemon to signal to systemd that
> deamon is ready to service request. Unfortunately ...
> 
>> but not all daemons implement sending
>> sd_notify(READY=1) when they are ready to serve clients.
>> 
> 
> ... as well as not all daemons properly daemonize itself or register on
> D-Bus only after they are ready.

Sharing the sentiment about the situation, arising probably primarily
from daemon authors never been pushed to indicate full ability to
provide service precisely because 1/ it's not the primary objective of
init systems -- the only thing they would need to comply with
regarding getting these daemons started (as opposed to real
service-oriented supervisors, which is also the realm of HA, right?),
and 2/ even if it had been desirable to indicate that, no formalized
interface (and in turn, system convolutions) that would become
widespread was devised for that purpose.  On the other hand, sd_notify
seems to reconcile that in my eyes (+1 to Valetin's qualifying it the best the
above options) as it doesn't impose any other effect (casting extra
interpretation on, say, a fork event makes it possibly not intended or
at least not-well-timed side-effect of the main, intended effect).

To elaborate more, historically, it's customary to perform double fork
in the daemons to make them as isolated from controlling terminals and
what not as possible.  But it may not be desirable to perform anything
security sensitive prior to at least the first fork, hence with
"forking", you've already lost the preciseness of "ready" indication,
unless there is some further synchronization between the parent and
its child processes (I am yet to see that in practice).  So I'd say,
unless the daemon is specifically fine-tuned, both forking and dbus
types of services are bound to carry some amount of asynchronicity as
mentioned.  To the distaste of said service supervisors that strive to
maximize service usefulness over the considerable timeframe, which is
way more than ticking the "should be running OK because it got started
by me without any early failure" checkbox.

The main issue (though sometimes workable) of sd_notify approach is
that in your composite application you may not have a direct "consider
me ready" hook throughout the underlying stack, and tying it with
processing of the first request is out of question because it's timing
is not guaranteed (if it's ever to arrive).

Sorry, didn't add much to the discussion, getting rid of
asynchronities is tough in the world that wasn't widely intrested
in poll/check-less "true ready" state.

-- 
Poki


pgpz3eLXdZbLh.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Andrei Borzenkov
06.11.2017 22:38, Valentin Vidic пишет:
> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>> I think it depends on exactly what you mean by "synchronous" here. You can
>> start up a daemon, or a process which is responsible for forking into a
>> daemon, but how can you know for sure that a service is really up and
>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>> At what point do you draw the line and say "OK start-up is now over, any
>> failures after this are failures of a running service"?  In that light,
>> "systemctl start" could return at a number of points in the startup process,
>> but there's probably always an element of asynchronicity in there.
>> Interested to hear other opinions on this.
> 
> systemd.service(5) describes a started (running) service depending
> on the service type:
> 
> simple  - systemd will immediately proceed starting follow-up units (after 
> exec)
> forking - systemd will proceed with starting follow-up units as soon as
>   the parent process exits
> oneshot - process has to exit before systemd starts follow-up units
> dbus- systemd will proceed with starting follow-up units after the
>   D-Bus bus name has been acquired
> notify  - systemd will proceed with starting follow-up units after this
>   notification message has been sent
> 
> Obviously notify is best here

forking, dbus and notify all allow daemon to signal to systemd that
deamon is ready to service request. Unfortunately ...

> but not all daemons implement sending
> sd_notify(READY=1) when they are ready to serve clients.
> 

... as well as not all daemons properly daemonize itself or register on
D-Bus only after they are ready.

___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-11-06 Thread Valentin Vidic
On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
> I think it depends on exactly what you mean by "synchronous" here. You can
> start up a daemon, or a process which is responsible for forking into a
> daemon, but how can you know for sure that a service is really up and
> running?  Even if the daemon ran for a few seconds, it might die soon after.
> At what point do you draw the line and say "OK start-up is now over, any
> failures after this are failures of a running service"?  In that light,
> "systemctl start" could return at a number of points in the startup process,
> but there's probably always an element of asynchronicity in there.
> Interested to hear other opinions on this.

systemd.service(5) describes a started (running) service depending
on the service type:

simple  - systemd will immediately proceed starting follow-up units (after exec)
forking - systemd will proceed with starting follow-up units as soon as
  the parent process exits
oneshot - process has to exit before systemd starts follow-up units
dbus- systemd will proceed with starting follow-up units after the
  D-Bus bus name has been acquired
notify  - systemd will proceed with starting follow-up units after this
  notification message has been sent

Obviously notify is best here but not all daemons implement sending
sd_notify(READY=1) when they are ready to serve clients.

-- 
Valentin

___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-10-13 Thread Adam Spiers
Lars Ellenberg  wrote: 
On Mon, May 22, 2017 at 12:26:36PM -0500, Ken Gaillot wrote: 
Resurrecting an old thread, because I stumbled on something relevant ... 


/me too :-) 

There had been some discussion about having the ability to run a more 
useful monitor operation on an otherwise systemd-based resource. We had 
talked about a couple approaches with advantages and disadvantages. 

I had completely forgotten about an older capability of pacemaker that 
could be repurposed here: the (undocumented) "container" meta-attribute. 


Which is nice to know. 

The wrapper approach is appealing as well, though. 

I have just implemented a PoC ocf:pacemaker:systemd "wrapper" RA, 
to give my brain something different to do for a change. 


Cool!  Really nice to see a PoC for this, which BTW I mentioned at the 
recent Clusterlabs Summit, for those who missed the event: 

https://aspiers.github.io/clusterlabs-summit-2017-openstack-ha/#/control-plane-api-5 

Takes two parameters, 
unit=(systemd unit), and 
monitor_hook=(some executable) 

The monitor_hook has access to the environment, obviously, 
in case it needs that.  For monitor, it will only be called, 
if "systemctl is-active" thinks the thing is active. 

It is expected to return 0 (OCF_SUCCESS) for "running", 
and 7 (OCF_NOT_RUNNING) for "not running".  It can return anything else, 
all exit codes are directly propagated for the "monitor" action. 
"Unexpected" exit codes will be logged with ocf_exit_reason 
(does that make sense?). 

systemctl start and stop commands apparently are "synchronous" 
(have always been? only nowadays? is that relevant?) 


I think it depends on exactly what you mean by "synchronous" here. 
You can start up a daemon, or a process which is responsible for 
forking into a daemon, but how can you know for sure that a service is 
really up and running?  Even if the daemon ran for a few seconds, it 
might die soon after.  At what point do you draw the line and say "OK 
start-up is now over, any failures after this are failures of a 
running service"?  In that light, "systemctl start" could return at a 
number of points in the startup process, but there's probably always 
an element of asynchronicity in there.  Interested to hear other 
opinions on this. 

but to be so, they need properly written unit files. 
If there is an ExecStop command defined which will only trigger 
stopping, but not wait for it, systemd cannot wait, either 
(it has no way to know what it should wait for in that case), 
and no-one should blame systemd for that. 


Exactly.

That's why you would need to fix such systemd units, 
but that's also why I added the additional _monitor loops 
after systemctl start / stop. 


Yes, those loops sound critical to me for the success of this 
approach.


Maybe it should not be named systemd, but systemd-wrapper. 

Other comments? 


I'm not sure if/when I'll get round to testing this approach.  The 
conclusion from the summit seemed to be that in the OpenStack context 
at least, it should be sufficient to just have stateless active/active 
REST API services managed by systemd (not Pacemaker) with auto-restart 
enabled, and HAProxy doing load-balancing and health monitoring via 
its httpchk option.  But if for some reason that doesn't cover all the 
cases we need, I'll definitely bear this approach in mind as an 
alternative option.  So again, thanks a lot for sharing! 


___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-09-26 Thread Lars Ellenberg
On Mon, May 22, 2017 at 12:26:36PM -0500, Ken Gaillot wrote:
> Resurrecting an old thread, because I stumbled on something relevant ...

/me too :-)

> There had been some discussion about having the ability to run a more
> useful monitor operation on an otherwise systemd-based resource. We had
> talked about a couple approaches with advantages and disadvantages.
> 
> I had completely forgotten about an older capability of pacemaker that
> could be repurposed here: the (undocumented) "container" meta-attribute.

Which is nice to know.

The wrapper approach is appealing as well, though.

I have just implemented a PoC ocf:pacemaker:systemd "wrapper" RA,
to give my brain something different to do for a change.

Takes two parameters,
unit=(systemd unit), and
monitor_hook=(some executable)

The monitor_hook has access to the environment, obviously,
in case it needs that.  For monitor, it will only be called,
if "systemctl is-active" thinks the thing is active.

It is expected to return 0 (OCF_SUCCESS) for "running",
and 7 (OCF_NOT_RUNNING) for "not running".  It can return anything else,
all exit codes are directly propagated for the "monitor" action.
"Unexpected" exit codes will be logged with ocf_exit_reason
(does that make sense?).

systemctl start and stop commands apparently are "synchronous"
(have always been? only nowadays? is that relevant?)
but to be so, they need properly written unit files.
If there is an ExecStop command defined which will only trigger
stopping, but not wait for it, systemd cannot wait, either
(it has no way to know what it should wait for in that case),
and no-one should blame systemd for that.

That's why you would need to fix such systemd units,
but that's also why I added the additional _monitor loops
after systemctl start / stop.

Maybe it should not be named systemd, but systemd-wrapper.

Other comments?

Lars


So here is my RFC,
tested only "manually" via

for x in monitor stop monitor start monitor ; do
  for try in 1 2; do
OCF_ROOT=/usr/lib/ocf \
OCF_RESKEY_monitor_hook=/usr/local/bin/my-monitoring-hook \
OCF_RESKEY_unit=postfix@- ./systemd $x ; echo $try. $x $?
  done
done

-- /usr/local/bin/my-monitoring-hook 
#!/bin/sh
echo quit | nc 127.0.0.1 25  2>/dev/null | grep -q ^220 || exit 7

- /usr/lib/ocf/resource.d/pacemaker/systemd -
#!/bin/bash

: ${OCF_FUNCTIONS=${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs}
. ${OCF_FUNCTIONS}
: ${__OCF_ACTION=$1}


meta_data() {
cat <


1.0


This Resource Agent delegates start and stop to systemctl start and stop,
but monitor will in addition to systemctl status also run the monitor_hook you 
specify.

systemd service with monitor hook




What systemd unit to manage.

systemd unit






What executable to run in addition to systemctl status.

monitor hook













END
}

_monitor()
{
local ex check
if [[ -n "$OCF_RESKEY_monitor_hook" ]] &&
   [[ -x "$OCF_RESKEY_monitor_hook" ]]; then
"$OCF_RESKEY_monitor_hook"
ex=$?
:  $__OCF_ACTION/$ex 
case $__OCF_ACTION/$ex in
stop/7) : "not running after stop: expected" ;;
stop/*) ocf_exit_reason "returned exit code $ex after stop: 
$OCF_RESKEY_monitor_hook" ;;
start/0) : "running after start: expected";;
start/*) ocf_exit_reason "returned exit code $ex after start: 
$OCF_RESKEY_monitor_hook" ;;
monitor/0|monitor/7) : "expected running (0) or not running 
(7)" ;;
monitor/*)
ocf_exit_reason "returned exit code $ex during monitor: 
$OCF_RESKEY_monitor_hook" ;;
esac
return $ex
else
ocf_exit_reason "missing or not executable: 
$OCF_RESKEY_monitor_hook"
fi
return $OCF_ERR_GENERIC
}

case $__OCF_ACTION in
meta-data) meta_data ;;
validate-all) : "Tbd. Maybe." ;;
stop)   systemctl stop $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
# TODO make time/retries of monitor after stop configurable
while _monitor; do sleep 1; done
exit $OCF_SUCCESS
;;
start)  systemctl start $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC
# TODO make time/retries of monitor after start configurable
while ! _monitor; do sleep 1; done
exit $OCF_SUCCESS
;;
monitor)
systemctl is-active --quiet $OCF_RESKEY_unit || exit $OCF_NOT_RUNNING
_monitor
;;
*)
ocf_exit_reason "not implemented: $__OCF_ACTION"
exit $OCF_ERR_GENERIC
esac

exit $?

___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-05-22 Thread Ken Gaillot
Resurrecting an old thread, because I stumbled on something relevant ...

There had been some discussion about having the ability to run a more
useful monitor operation on an otherwise systemd-based resource. We had
talked about a couple approaches with advantages and disadvantages.

I had completely forgotten about an older capability of pacemaker that
could be repurposed here: the (undocumented) "container" meta-attribute.

It was originally designed for running nagios checks on services inside
a virtual domain. The idea is that you can create an OCF VirtualDomain
resource, then create a nagios resource with its container set to the
VirtualDomain.

The effect is this: a resource with the container meta-attribute will be
started, stopped, and monitored normally, but if its monitor fails, it
will be recovered by recovering its container instead. Also, the
resource will be colocated with its container resource, and ordered
relative to it.

This works with the nagios use case because start and stop are
essentially no-ops for nagios resources. The nagios resource can "start"
on the same host that the VirtualDomain starts on, and the host will run
the nagios check at each monitor interval. If the monitor fails,
pacemaker will recover the VirtualDomain.

I haven't tested it, but this approach should work identically with a
systemd resource and a custom OCF resource with the extended monitor.
The OCF resource would function as a dummy resource (to know when it's
"running" or not), so start/stop would only set up the dummy state. If
the monitor fails, the systemd resource should be recovered.

If someone wants to verify that works, I'll make sure that the
documentation gets updated.
-- 
Ken Gaillot 

___
Developers mailing list
Developers@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2017-02-08 Thread Adam Spiers

Ken Gaillot  wrote:

On 11/03/2016 02:37 PM, Adam Spiers wrote:

Hi again Ken,

Sorry for the delayed reply, caused by Barcelona amongst other things ...


Hmm, seems I have to apologise for yet another delayed reply :-(  I'm
deliberately not trimming the context, so everyone can refresh their
memory of this old thread!


Ken Gaillot  wrote:

On 10/21/2016 07:40 PM, Adam Spiers wrote:

Ken Gaillot  wrote:

On 09/26/2016 09:15 AM, Adam Spiers wrote:

For example, could Pacemaker be extended to allow hybrid resources,
where some actions (such as start, stop, status) are handled by (say)
the systemd backend, and other actions (such as monitor) are handled
by (say) the OCF backend?  Then we could cleanly rely on dbus for
collaborating with systemd, whilst adding arbitrarily complex
monitoring via OCF RAs.  That would have several advantages:

1. Get rid of grotesque layering violations and maintenance boundaries
   where the OCF RA duplicates knowledge of all kinds of things which
   are distribution-specific, e.g.:

 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56


A simplified agent will likely still need distro-specific intelligence
to do even a limited subset of actions, so I'm not sure there's a gain
there.


What distro-specific intelligence would it need?  If the OCF RA was
only responsible for monitoring, it wouldn't need to know a lot of the
things which are only required for starting / stopping the service and
checking whether it's running, e.g.:

  - Name of the daemon executable
  - uid/gid it should be started as
  - Daemon CLI arguments
  - Location of pid file

In contrast, an OCF RA only responsible for monitoring would only need
to know how to talk to the service, which is not typically
distro-specific; in the REST API case, it only needs to know the endpoint
URL, which would be configured via Pacemaker resource parameters anyway.


If you're only talking about monitors, that does simplify things. As you
mention, you'd still need to configure resource parameters that would
only be relevant to the enhanced monitor action -- parameters that other
actions might also need, and get elsewhere, so there's the minor admin
complication of setting the same value in multiple places.


Which same value(s)?


Nothing particular in mind, just app-specific information that's
required in both the app's own configuration and in the resource
configuration. Even stuff like an IP address/port.


That could potentially be reduced by simply pointing the RA at the
same config file(s) which the systemd service definition uses, and
then it could use something like crudini (in the OpenStack case) to
extract values like the port the service is listening on.  (Assuming
the service is listening on localhost, you could simply use that
instead of the external IP address.)  So yes, you'd have to set the
location of the config file(s) in two places, but at least then you
wouldn't have to duplicate anything else.


In the OpenStack case (which is the only use case I have), I don't
think this will happen, because the "monitor" action only needs to
know the endpoint URL and associated credentials, which doesn't
overlap with what the other actions need to know.  This separation of
concerns feels right to me: the start/stop/status actions are
responsible for managing the state of the service, and the monitor
action is responsible for monitoring whether it's delivering what it
should be.  It's just like the separation between admins and end
users.


2. Drastically simplify OCF RAs by delegating start/stop/status etc.
   to systemd, thereby increasing readability and reducing maintenance
   burden.

3. OCF RAs are more likely to work out of the box with any distro,
   or at least require less work to get working.

4. Services behave more similarly regardless of whether managed by
   Pacemaker or the standard pid 1 service manager.  For example, they
   will always use the same pidfile, run as the same user, in the
   right cgroup, be invoked with the same arguments etc.

5. Pacemaker can still monitor services accurately at the
   application-level, rather than just relying on naive pid-level
   monitoring.

Or is this a terrible idea? ;-)


I considered this, too. I don't think it's a terrible idea, but it does
pose its own questions.

* What hybrid actions should be allowed? It seems dangerous to allow
starting from one code base and stopping from another, or vice versa,
and really dangerous to allow something like migrate_to/migrate_from to
be reimplemented. At one extreme, we allow anything and leave that
responsibility on the user; at the other, we only allow higher-level
monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.


Just monitors would be good enough for me.


The tomcat RA (which could also benefit from something like this) would
extend start and stop as well, e.g. start = systemctl start plus some
bookkeeping.


Ahh OK, interesting.  What kind of bookkeeping?


I don't remem

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-11-10 Thread Ken Gaillot
On 11/03/2016 02:37 PM, Adam Spiers wrote:
> Hi again Ken,
> 
> Sorry for the delayed reply, caused by Barcelona amongst other things ...
> 
> Ken Gaillot  wrote:
>> On 10/21/2016 07:40 PM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
 On 09/26/2016 09:15 AM, Adam Spiers wrote:
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.  That would have several advantages:
>
> 1. Get rid of grotesque layering violations and maintenance boundaries
>where the OCF RA duplicates knowledge of all kinds of things which
>are distribution-specific, e.g.:
>
>  
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56

 A simplified agent will likely still need distro-specific intelligence
 to do even a limited subset of actions, so I'm not sure there's a gain
 there.
>>>
>>> What distro-specific intelligence would it need?  If the OCF RA was
>>> only responsible for monitoring, it wouldn't need to know a lot of the
>>> things which are only required for starting / stopping the service and
>>> checking whether it's running, e.g.:
>>>
>>>   - Name of the daemon executable
>>>   - uid/gid it should be started as
>>>   - Daemon CLI arguments
>>>   - Location of pid file
>>>
>>> In contrast, an OCF RA only responsible for monitoring would only need
>>> to know how to talk to the service, which is not typically
>>> distro-specific; in the REST API case, it only needs to know the endpoint
>>> URL, which would be configured via Pacemaker resource parameters anyway.
>>
>> If you're only talking about monitors, that does simplify things. As you
>> mention, you'd still need to configure resource parameters that would
>> only be relevant to the enhanced monitor action -- parameters that other
>> actions might also need, and get elsewhere, so there's the minor admin
>> complication of setting the same value in multiple places.
> 
> Which same value(s)?

Nothing particular in mind, just app-specific information that's
required in both the app's own configuration and in the resource
configuration. Even stuff like an IP address/port.

> In the OpenStack case (which is the only use case I have), I don't
> think this will happen, because the "monitor" action only needs to
> know the endpoint URL and associated credentials, which doesn't
> overlap with what the other actions need to know.  This separation of
> concerns feels right to me: the start/stop/status actions are
> responsible for managing the state of the service, and the monitor
> action is responsible for monitoring whether it's delivering what it
> should be.  It's just like the separation between admins and end
> users.
> 
> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>to systemd, thereby increasing readability and reducing maintenance
>burden.
>
> 3. OCF RAs are more likely to work out of the box with any distro,
>or at least require less work to get working.
>
> 4. Services behave more similarly regardless of whether managed by
>Pacemaker or the standard pid 1 service manager.  For example, they
>will always use the same pidfile, run as the same user, in the
>right cgroup, be invoked with the same arguments etc.
>
> 5. Pacemaker can still monitor services accurately at the
>application-level, rather than just relying on naive pid-level
>monitoring.
>
> Or is this a terrible idea? ;-)

 I considered this, too. I don't think it's a terrible idea, but it does
 pose its own questions.

 * What hybrid actions should be allowed? It seems dangerous to allow
 starting from one code base and stopping from another, or vice versa,
 and really dangerous to allow something like migrate_to/migrate_from to
 be reimplemented. At one extreme, we allow anything and leave that
 responsibility on the user; at the other, we only allow higher-level
 monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
>>>
>>> Just monitors would be good enough for me.
>>
>> The tomcat RA (which could also benefit from something like this) would
>> extend start and stop as well, e.g. start = systemctl start plus some
>> bookkeeping.
> 
> Ahh OK, interesting.  What kind of bookkeeping?

I don't remember ... something like node attributes or a pid/status
file. That could potentially be handled by separate OCF resources for
those bits, grouped with the systemd+monitor resource, but that would be
hacky and maybe insufficient in some use cases.

 * Should the wrapper's actions be done instead of, or in addition to,
 the main resou

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-11-03 Thread Jan Pokorný
On 03/11/16 19:37 +, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 10/21/2016 07:40 PM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
 On 09/26/2016 09:15 AM, Adam Spiers wrote:
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.  That would have several advantages:
> 
> 1. Get rid of grotesque layering violations and maintenance boundaries
>where the OCF RA duplicates knowledge of all kinds of things which
>are distribution-specific, e.g.:
> 
>  
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
 
 A simplified agent will likely still need distro-specific intelligence
 to do even a limited subset of actions, so I'm not sure there's a gain
 there.
>>> 
>>> What distro-specific intelligence would it need?  If the OCF RA was
>>> only responsible for monitoring, it wouldn't need to know a lot of the
>>> things which are only required for starting / stopping the service and
>>> checking whether it's running, e.g.:
>>> 
>>>   - Name of the daemon executable
>>>   - uid/gid it should be started as
>>>   - Daemon CLI arguments
>>>   - Location of pid file
>>> 
>>> In contrast, an OCF RA only responsible for monitoring would only need
>>> to know how to talk to the service, which is not typically
>>> distro-specific; in the REST API case, it only needs to know the endpoint
>>> URL, which would be configured via Pacemaker resource parameters anyway.
>> 
>> If you're only talking about monitors, that does simplify things. As you
>> mention, you'd still need to configure resource parameters that would
>> only be relevant to the enhanced monitor action -- parameters that other
>> actions might also need, and get elsewhere, so there's the minor admin
>> complication of setting the same value in multiple places.
> 
> Which same value(s)?
> 
> In the OpenStack case (which is the only use case I have), I don't
> think this will happen, because the "monitor" action only needs to
> know the endpoint URL and associated credentials, which doesn't
> overlap with what the other actions need to know.  This separation of
> concerns feels right to me: the start/stop/status actions are
> responsible for managing the state of the service, and the monitor
> action is responsible for monitoring whether it's delivering what it
> should be.  It's just like the separation between admins and end
> users.

This is a side topic: if I am not mistaken, so far no resource agents
(at least those to found under ClusterLabs GitHub entity) could have
sensitive data like passwords specified.  There was no reason.
Now, it sounds this could change.  FAs generally allow sensitive data
to be obtained also by external scripts.  If this design principle was
to be followed, perhaps it would make sense to consider some kind of
"value-or-getter" provision in future OCF revisions.

> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>to systemd, thereby increasing readability and reducing maintenance
>burden.
> 
> 3. OCF RAs are more likely to work out of the box with any distro,
>or at least require less work to get working.
> 
> 4. Services behave more similarly regardless of whether managed by
>Pacemaker or the standard pid 1 service manager.  For example, they
>will always use the same pidfile, run as the same user, in the
>right cgroup, be invoked with the same arguments etc.
> 
> 5. Pacemaker can still monitor services accurately at the
>application-level, rather than just relying on naive pid-level
>monitoring.
> 
> Or is this a terrible idea? ;-)
 
 I considered this, too. I don't think it's a terrible idea, but it does
 pose its own questions.
 
 * What hybrid actions should be allowed? It seems dangerous to allow
 starting from one code base and stopping from another, or vice versa,
 and really dangerous to allow something like migrate_to/migrate_from to
 be reimplemented. At one extreme, we allow anything and leave that
 responsibility on the user; at the other, we only allow higher-level
 monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
>>> 
>>> Just monitors would be good enough for me.
>> 
>> The tomcat RA (which could also benefit from something like this) would
>> extend start and stop as well, e.g. start = systemctl start plus some
>> bookkeeping.
> 
> Ahh OK, interesting.  What kind of bookkeeping?

I think he means anything on top of plain start.  Value added with
resource agents is usually parameterization of at least basics like
some configuratio

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-11-03 Thread Adam Spiers
Hi again Ken,

Sorry for the delayed reply, caused by Barcelona amongst other things ...

Ken Gaillot  wrote:
> On 10/21/2016 07:40 PM, Adam Spiers wrote:
> > Ken Gaillot  wrote:
> >> On 09/26/2016 09:15 AM, Adam Spiers wrote:
> >>> For example, could Pacemaker be extended to allow hybrid resources,
> >>> where some actions (such as start, stop, status) are handled by (say)
> >>> the systemd backend, and other actions (such as monitor) are handled
> >>> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> >>> collaborating with systemd, whilst adding arbitrarily complex
> >>> monitoring via OCF RAs.  That would have several advantages:
> >>>
> >>> 1. Get rid of grotesque layering violations and maintenance boundaries
> >>>where the OCF RA duplicates knowledge of all kinds of things which
> >>>are distribution-specific, e.g.:
> >>>
> >>>  
> >>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
> >>
> >> A simplified agent will likely still need distro-specific intelligence
> >> to do even a limited subset of actions, so I'm not sure there's a gain
> >> there.
> > 
> > What distro-specific intelligence would it need?  If the OCF RA was
> > only responsible for monitoring, it wouldn't need to know a lot of the
> > things which are only required for starting / stopping the service and
> > checking whether it's running, e.g.:
> > 
> >   - Name of the daemon executable
> >   - uid/gid it should be started as
> >   - Daemon CLI arguments
> >   - Location of pid file
> > 
> > In contrast, an OCF RA only responsible for monitoring would only need
> > to know how to talk to the service, which is not typically
> > distro-specific; in the REST API case, it only needs to know the endpoint
> > URL, which would be configured via Pacemaker resource parameters anyway.
> 
> If you're only talking about monitors, that does simplify things. As you
> mention, you'd still need to configure resource parameters that would
> only be relevant to the enhanced monitor action -- parameters that other
> actions might also need, and get elsewhere, so there's the minor admin
> complication of setting the same value in multiple places.

Which same value(s)?

In the OpenStack case (which is the only use case I have), I don't
think this will happen, because the "monitor" action only needs to
know the endpoint URL and associated credentials, which doesn't
overlap with what the other actions need to know.  This separation of
concerns feels right to me: the start/stop/status actions are
responsible for managing the state of the service, and the monitor
action is responsible for monitoring whether it's delivering what it
should be.  It's just like the separation between admins and end
users.

> >>> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
> >>>to systemd, thereby increasing readability and reducing maintenance
> >>>burden.
> >>>
> >>> 3. OCF RAs are more likely to work out of the box with any distro,
> >>>or at least require less work to get working.
> >>>
> >>> 4. Services behave more similarly regardless of whether managed by
> >>>Pacemaker or the standard pid 1 service manager.  For example, they
> >>>will always use the same pidfile, run as the same user, in the
> >>>right cgroup, be invoked with the same arguments etc.
> >>>
> >>> 5. Pacemaker can still monitor services accurately at the
> >>>application-level, rather than just relying on naive pid-level
> >>>monitoring.
> >>>
> >>> Or is this a terrible idea? ;-)
> >>
> >> I considered this, too. I don't think it's a terrible idea, but it does
> >> pose its own questions.
> >>
> >> * What hybrid actions should be allowed? It seems dangerous to allow
> >> starting from one code base and stopping from another, or vice versa,
> >> and really dangerous to allow something like migrate_to/migrate_from to
> >> be reimplemented. At one extreme, we allow anything and leave that
> >> responsibility on the user; at the other, we only allow higher-level
> >> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
> > 
> > Just monitors would be good enough for me.
> 
> The tomcat RA (which could also benefit from something like this) would
> extend start and stop as well, e.g. start = systemctl start plus some
> bookkeeping.

Ahh OK, interesting.  What kind of bookkeeping?

> >> * Should the wrapper's actions be done instead of, or in addition to,
> >> the main resource's actions? Or maybe even allow the user to choose? I
> >> could see some wrappers intended to replace the native handling, and
> >> others to supplement it.
> > 
> > For my use case, in addition, because the only motivation is to
> > delegate start/stop/status to systemd (as happens currently with
> > systemd:* RAs) whilst retaining the ability to do service-level
> > testing of the resource via the OCF RA.  So it wouldn't really be a
> > wrapper, but rather an extension.
> > 
> > In contrast, with the wrapper approach

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-10-24 Thread Ken Gaillot
On 10/21/2016 07:40 PM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 09/26/2016 09:15 AM, Adam Spiers wrote:
>>> [Sending this as a separate mail, since the last one was already (too)
>>> long and focused on specific details, whereas this one takes a step
>>> back to think about the bigger picture again.]
>>>
>>> Adam Spiers  wrote:
> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>> As a result I have been thinking about the idea of changing the
>> start/stop/status actions of these RAs so that they wrap around
>> service(8) (which would be even more portable across distros than
>> systemctl).
>>>
>>> [snipped discussion of OCF wrapper RA idea]
>>>
 The fact that I don't see any problems where you apparently do makes
 me deeply suspicious of my own understanding ;-)  Please tell me what
 I'm missing.
>>>
>>> [snipped]
>>>
>>> To clarify: I am not religiously defending this "wrapper OCF RA" idea
>>> of mine to the death.  It certainly sounds like it's not as clean as I
>>> originally thought.  But I'm still struggling to see any dealbreaker.
>>>
>>> OTOH, I'm totally open to better ideas.
>>>
>>> For example, could Pacemaker be extended to allow hybrid resources,
>>> where some actions (such as start, stop, status) are handled by (say)
>>> the systemd backend, and other actions (such as monitor) are handled
>>> by (say) the OCF backend?  Then we could cleanly rely on dbus for
>>> collaborating with systemd, whilst adding arbitrarily complex
>>> monitoring via OCF RAs.  That would have several advantages:
>>>
>>> 1. Get rid of grotesque layering violations and maintenance boundaries
>>>where the OCF RA duplicates knowledge of all kinds of things which
>>>are distribution-specific, e.g.:
>>>
>>>  
>>> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
>>
>> A simplified agent will likely still need distro-specific intelligence
>> to do even a limited subset of actions, so I'm not sure there's a gain
>> there.
> 
> What distro-specific intelligence would it need?  If the OCF RA was
> only responsible for monitoring, it wouldn't need to know a lot of the
> things which are only required for starting / stopping the service and
> checking whether it's running, e.g.:
> 
>   - Name of the daemon executable
>   - uid/gid it should be started as
>   - Daemon CLI arguments
>   - Location of pid file
> 
> In contrast, an OCF RA only responsible for monitoring would only need
> to know how to talk to the service, which is not typically
> distro-specific; in the REST API case, it only needs to know the endpoint
> URL, which would be configured via Pacemaker resource parameters anyway.

If you're only talking about monitors, that does simplify things. As you
mention, you'd still need to configure resource parameters that would
only be relevant to the enhanced monitor action -- parameters that other
actions might also need, and get elsewhere, so there's the minor admin
complication of setting the same value in multiple places.

>>> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>>>to systemd, thereby increasing readability and reducing maintenance
>>>burden.
>>>
>>> 3. OCF RAs are more likely to work out of the box with any distro,
>>>or at least require less work to get working.
>>>
>>> 4. Services behave more similarly regardless of whether managed by
>>>Pacemaker or the standard pid 1 service manager.  For example, they
>>>will always use the same pidfile, run as the same user, in the
>>>right cgroup, be invoked with the same arguments etc.
>>>
>>> 5. Pacemaker can still monitor services accurately at the
>>>application-level, rather than just relying on naive pid-level
>>>monitoring.
>>>
>>> Or is this a terrible idea? ;-)
>>
>> I considered this, too. I don't think it's a terrible idea, but it does
>> pose its own questions.
>>
>> * What hybrid actions should be allowed? It seems dangerous to allow
>> starting from one code base and stopping from another, or vice versa,
>> and really dangerous to allow something like migrate_to/migrate_from to
>> be reimplemented. At one extreme, we allow anything and leave that
>> responsibility on the user; at the other, we only allow higher-level
>> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
> 
> Just monitors would be good enough for me.

The tomcat RA (which could also benefit from something like this) would
extend start and stop as well, e.g. start = systemctl start plus some
bookkeeping.

>> * Should the wrapper's actions be done instead of, or in addition to,
>> the main resource's actions? Or maybe even allow the user to choose? I
>> could see some wrappers intended to replace the native handling, and
>> others to supplement it.
> 
> For my use case, in addition, because the only motivation is to
> delegate start/stop/status to systemd (as happens currently with
> systemd:* RAs) whilst retaining the ability to do servic

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-10-21 Thread Adam Spiers
Ken Gaillot  wrote:
> On 09/26/2016 09:15 AM, Adam Spiers wrote:
> > [Sending this as a separate mail, since the last one was already (too)
> > long and focused on specific details, whereas this one takes a step
> > back to think about the bigger picture again.]
> > 
> > Adam Spiers  wrote:
> >>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>  As a result I have been thinking about the idea of changing the
>  start/stop/status actions of these RAs so that they wrap around
>  service(8) (which would be even more portable across distros than
>  systemctl).
> > 
> > [snipped discussion of OCF wrapper RA idea]
> > 
> >> The fact that I don't see any problems where you apparently do makes
> >> me deeply suspicious of my own understanding ;-)  Please tell me what
> >> I'm missing.
> > 
> > [snipped]
> > 
> > To clarify: I am not religiously defending this "wrapper OCF RA" idea
> > of mine to the death.  It certainly sounds like it's not as clean as I
> > originally thought.  But I'm still struggling to see any dealbreaker.
> > 
> > OTOH, I'm totally open to better ideas.
> > 
> > For example, could Pacemaker be extended to allow hybrid resources,
> > where some actions (such as start, stop, status) are handled by (say)
> > the systemd backend, and other actions (such as monitor) are handled
> > by (say) the OCF backend?  Then we could cleanly rely on dbus for
> > collaborating with systemd, whilst adding arbitrarily complex
> > monitoring via OCF RAs.  That would have several advantages:
> > 
> > 1. Get rid of grotesque layering violations and maintenance boundaries
> >where the OCF RA duplicates knowledge of all kinds of things which
> >are distribution-specific, e.g.:
> > 
> >  
> > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
> 
> A simplified agent will likely still need distro-specific intelligence
> to do even a limited subset of actions, so I'm not sure there's a gain
> there.

What distro-specific intelligence would it need?  If the OCF RA was
only responsible for monitoring, it wouldn't need to know a lot of the
things which are only required for starting / stopping the service and
checking whether it's running, e.g.:

  - Name of the daemon executable
  - uid/gid it should be started as
  - Daemon CLI arguments
  - Location of pid file

In contrast, an OCF RA only responsible for monitoring would only need
to know how to talk to the service, which is not typically
distro-specific; in the REST API case, it only needs to know the endpoint
URL, which would be configured via Pacemaker resource parameters anyway.

> > 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
> >to systemd, thereby increasing readability and reducing maintenance
> >burden.
> > 
> > 3. OCF RAs are more likely to work out of the box with any distro,
> >or at least require less work to get working.
> > 
> > 4. Services behave more similarly regardless of whether managed by
> >Pacemaker or the standard pid 1 service manager.  For example, they
> >will always use the same pidfile, run as the same user, in the
> >right cgroup, be invoked with the same arguments etc.
> > 
> > 5. Pacemaker can still monitor services accurately at the
> >application-level, rather than just relying on naive pid-level
> >monitoring.
> > 
> > Or is this a terrible idea? ;-)
> 
> I considered this, too. I don't think it's a terrible idea, but it does
> pose its own questions.
> 
> * What hybrid actions should be allowed? It seems dangerous to allow
> starting from one code base and stopping from another, or vice versa,
> and really dangerous to allow something like migrate_to/migrate_from to
> be reimplemented. At one extreme, we allow anything and leave that
> responsibility on the user; at the other, we only allow higher-level
> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.

Just monitors would be good enough for me.

> * Should the wrapper's actions be done instead of, or in addition to,
> the main resource's actions? Or maybe even allow the user to choose? I
> could see some wrappers intended to replace the native handling, and
> others to supplement it.

For my use case, in addition, because the only motivation is to
delegate start/stop/status to systemd (as happens currently with
systemd:* RAs) whilst retaining the ability to do service-level
testing of the resource via the OCF RA.  So it wouldn't really be a
wrapper, but rather an extension.

In contrast, with the wrapper approach, it sounds like the delegation
would have to happen via systemctl not via Pacemaker's dbus code.  And
if systemctl start/stop really are asynchronous non-blocking, the
delegation would need to be able to wrap these start/stop calls in a
polling loop as previously mentioned, in order to make them
synchronous non-blocking (which is the behaviour I think most people
would expect).

> * The answers to the above will help decide whether the wra

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-29 Thread Ken Gaillot
On 09/26/2016 01:34 PM, Andrei Borzenkov wrote:
> 26.09.2016 19:39, Ken Gaillot пишет:
> ...
>>>
 The main drawbacks I see are that I'm not sure you can solve the
 problems with polling without the dbus interface
>>>
>>> I still don't get why not - but that's most likely due to my
>>> ignorance of the details.  Any pointers gratefully received if you
>>> have time.
>>
>> We hope to get around the "inactive" ambiguity by using DBus signalling
>> to receive notifications when a start job is complete, rather than poll
>> the status repeatedly. I don't know of any equivalent way to do that
>> with systemctl.
>>
> 
> That's exactly what systemctl does - it calls StartUnit and then waits
> for job that was queued. Default invocation mode is synchronous, so when
> systemctl returns, job has finished. Bus systemctl does not return
> detail error code to indicate why job has failed (if it has failed).

That would make the wrapper approach more appealing. It would still have
drawbacks, but at least it would be feasible.


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Andrei Borzenkov
26.09.2016 19:39, Ken Gaillot пишет:
...
>>
>>> The main drawbacks I see are that I'm not sure you can solve the
>>> problems with polling without the dbus interface
>>
>> I still don't get why not - but that's most likely due to my
>> ignorance of the details.  Any pointers gratefully received if you
>> have time.
> 
> We hope to get around the "inactive" ambiguity by using DBus signalling
> to receive notifications when a start job is complete, rather than poll
> the status repeatedly. I don't know of any equivalent way to do that
> with systemctl.
> 

That's exactly what systemctl does - it calls StartUnit and then waits
for job that was queued. Default invocation mode is synchronous, so when
systemctl returns, job has finished. Bus systemctl does not return
detail error code to indicate why job has failed (if it has failed).



___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Ken Gaillot
On 09/26/2016 12:03 PM, Jan Pokorný wrote:
> On 26/09/16 11:39 -0500, Ken Gaillot wrote:
>> On 09/26/2016 09:10 AM, Adam Spiers wrote:
>>> Now, here I *do* see a potential problem.  If service B is managed by
>>> Pacemaker, is configured with Requires=A and After=A, but service A is
>>> *not* managed by Pacemaker, we would need to ensure that on system
>>> shutdown, systemd would shutdown Pacemaker (and hence B) *before* it
>>> (systemd) shuts down A, otherwise A could be stopped before B,
>>> effectively pulling the rug from underneath B's feet.
>>>
>>> But isn't that an issue even if Pacemaker only uses systemd resources?
>>> I don't see how the currently used override files protect against this
>>> issue.  Have I just "discovered" a bug, or more likely, is there again
>>> a gap in my understanding?
>>
>> Systemd handles the dependencies properly here:
>>
>> - A must be stopped after B (B's After=A)
>> - B must be stopped after pacemaker (B's Before=pacemaker via override)
>> - therefore, stop pacemaker, then A (which will be a no-op because
>>   pacemaker will already have stopped it), then B
> 
> without reading too much about systemd behavior here, shouldn't this be:
> 
> - therefore, stop pacemaker, then B (which will be a no-op because
>   pacemaker will already have stopped it), then A
> 
> (i.e., A and B swapped)?

Whoops, yes, that's what I meant


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Ken Gaillot
On 09/26/2016 09:15 AM, Adam Spiers wrote:
> [Sending this as a separate mail, since the last one was already (too)
> long and focused on specific details, whereas this one takes a step
> back to think about the bigger picture again.]
> 
> Adam Spiers  wrote:
>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
 As a result I have been thinking about the idea of changing the
 start/stop/status actions of these RAs so that they wrap around
 service(8) (which would be even more portable across distros than
 systemctl).
> 
> [snipped discussion of OCF wrapper RA idea]
> 
>> The fact that I don't see any problems where you apparently do makes
>> me deeply suspicious of my own understanding ;-)  Please tell me what
>> I'm missing.
> 
> [snipped]
> 
> To clarify: I am not religiously defending this "wrapper OCF RA" idea
> of mine to the death.  It certainly sounds like it's not as clean as I
> originally thought.  But I'm still struggling to see any dealbreaker.
> 
> OTOH, I'm totally open to better ideas.
> 
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.  That would have several advantages:
> 
> 1. Get rid of grotesque layering violations and maintenance boundaries
>where the OCF RA duplicates knowledge of all kinds of things which
>are distribution-specific, e.g.:
> 
>  
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56

A simplified agent will likely still need distro-specific intelligence
to do even a limited subset of actions, so I'm not sure there's a gain
there.

> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>to systemd, thereby increasing readability and reducing maintenance
>burden.
> 
> 3. OCF RAs are more likely to work out of the box with any distro,
>or at least require less work to get working.
> 
> 4. Services behave more similarly regardless of whether managed by
>Pacemaker or the standard pid 1 service manager.  For example, they
>will always use the same pidfile, run as the same user, in the
>right cgroup, be invoked with the same arguments etc.
> 
> 5. Pacemaker can still monitor services accurately at the
>application-level, rather than just relying on naive pid-level
>monitoring.
> 
> Or is this a terrible idea? ;-)

I considered this, too. I don't think it's a terrible idea, but it does
pose its own questions.

* What hybrid actions should be allowed? It seems dangerous to allow
starting from one code base and stopping from another, or vice versa,
and really dangerous to allow something like migrate_to/migrate_from to
be reimplemented. At one extreme, we allow anything and leave that
responsibility on the user; at the other, we only allow higher-level
monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.

* Should the wrapper's actions be done instead of, or in addition to,
the main resource's actions? Or maybe even allow the user to choose? I
could see some wrappers intended to replace the native handling, and
others to supplement it.

* The answers to the above will help decide whether the wrapper is a
separate resource (with its own parameters, operations, timeouts, etc.),
or just a property of the main resource.

* If we allow anything other than monitors to be hybridized, I think we
get into a pacemaker-specific implementation. I don't think it's
feasible to include this in the OCF standard -- it would essentially
mandate pacemaker's "resource class" mechanism on all OCF users (which
is beyond OCF's scope), and would likely break manual/scripted use
altogether. We could possibly modify OCF so that agents so that no
actions are mandatory, and it's up to the OCF-using software to verify
that any actions it requires are supported. Or maybe wrappers just
implement some actions as no-ops, and it's up to the user to know the
limitations.

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Jan Pokorný
On 26/09/16 11:39 -0500, Ken Gaillot wrote:
> On 09/26/2016 09:10 AM, Adam Spiers wrote:
>> Now, here I *do* see a potential problem.  If service B is managed by
>> Pacemaker, is configured with Requires=A and After=A, but service A is
>> *not* managed by Pacemaker, we would need to ensure that on system
>> shutdown, systemd would shutdown Pacemaker (and hence B) *before* it
>> (systemd) shuts down A, otherwise A could be stopped before B,
>> effectively pulling the rug from underneath B's feet.
>> 
>> But isn't that an issue even if Pacemaker only uses systemd resources?
>> I don't see how the currently used override files protect against this
>> issue.  Have I just "discovered" a bug, or more likely, is there again
>> a gap in my understanding?
> 
> Systemd handles the dependencies properly here:
> 
> - A must be stopped after B (B's After=A)
> - B must be stopped after pacemaker (B's Before=pacemaker via override)
> - therefore, stop pacemaker, then A (which will be a no-op because
>   pacemaker will already have stopped it), then B

without reading too much about systemd behavior here, shouldn't this be:

- therefore, stop pacemaker, then B (which will be a no-op because
  pacemaker will already have stopped it), then A

(i.e., A and B swapped)?

-- 
Jan (Poki)


pgpImw3LZliTj.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Ken Gaillot
On 09/26/2016 09:10 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 09/22/2016 10:39 AM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
 On 09/22/2016 08:49 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>> As a result I have been thinking about the idea of changing the
>>> start/stop/status actions of these RAs so that they wrap around
>>> service(8) (which would be even more portable across distros than
>>> systemctl).
>>>
>>> The primary difference with your approach is that we probably wouldn't
>>> need to make the RAs dynamically create any systemd configuration, since
>>> that would already be provided by the packages which install the 
>>> OpenStack
>>> services.  But then AFAIK none of the OpenStack services use the
>>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>>
>> The main complication I see is that pacemaker expects OCF agents to
>> return success only after an action is complete. For example, start
>> should not return until the service is fully active. I believe systemctl
>> does not behave this way, rather it initiates the action and returns
>> immediately.
>
> But that's trivial to work around: polling via "service foo status"
> after "service foo start" converts it back from an asynchronous
> operation to a synchronous one.

 Yes, that's exactly what pacemaker does now: start/stop, then every two
 seconds, poll the status.

 However, I'm currently working on a project to change that, so that we
 use DBus signalling to be notified when the job completes, rather than
 (or in addition to) polling.

 The reason is twofold: the two-second wait can be an unnecessary
 recovery delay in some cases; and (at least from the DBus API, not sure
 about systemctl status) there's no reliable way to distinguish "service
 is inactive because the start didn't work properly" from "service is
 inactive because systemd has some slow-starting dependencies of its own
 to start first".
>>>
>>> OK, that makes sense - thanks.
> 
> Although thinking about it more - why couldn't systemctl return
> different exit codes for these two cases, or add an "is-starting"
> subcommand, or similar?

That would be nice. I'm not sure what systemctl returns now, since we
use the DBus API, but I'm guessing it's equivalent.

systemd does have an "activating" state when the service is starting.
However, it does not enter that state while (After=) dependencies are
being started, only when the service itself is being started. It shows
"inactive" when waiting for dependencies to start, and also when the
service is cleanly stopped, and as far as I know, there's no reliable
way to distinguish those two cases.

>> Pacemaker's native systemd integration has a lot of workarounds for
>> quirks in systemd behavior (and more every release). I'm not sure
>> moving/duplicating that logic to the RA is a good approach.
>
> What other quirks are there?

 When pacemaker starts a systemd service, it creates a unit override in
 /run/systemd/system/.service.d/50-pacemaker.conf, with these
 overrides (and removes the file when stopping the resource):

 * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
 Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
 Agent"). This gives a clear indicator in systemd messages in the syslog
 that it's a cluster resource.

 * "Before=pacemaker.service": This ensures that when someone shuts down
 the system via systemd, systemd doesn't stop pacemaker before pacemaker
 can stop the resource.

 * "Restart=no": This ensures that pacemaker stays in control of
 responding to service failures.
>>>
>>> Yes, I was aware of that, and you're right that my approach of making
>>> the RA wrap service(8) or systemctl(8) would need to duplicate this
>>> functionality - *unless* the creation of the unit override could be
>>> moved out of Pacemaker's C code into a shell script which both
>>> Pacemaker and external RAs which want to adopt this wrapping technique
>>> could call.
>>>
 Additionally:

 * Pacemaker uses intelligent timeout values (based on cluster
 configuration) when making systemd calls.
>>>
>>> I guess I'd need more details to fully understand this, but couldn't
>>> those intelligently chosen timeout values be passed to the RA if
>>> necessary?  Although that does put a bit of a dampener on my hope of
>>> using service(8) to remain agnostic to whichever pid-1 system happened
>>> to be in use on the current machine.  Having said that, maybe everyone
>>> in the OpenStack (HA) community has already moved to systemd by now
>>> anyway.
>>
>> One pacemaker action (start/stop/whatever) may involve multiple
>> interactions with systemd. At each step, pacemaker knows the remaining

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Jan Pokorný
On 26/09/16 15:15 +0100, Adam Spiers wrote:
> [snipped]
> 
> To clarify: I am not religiously defending this "wrapper OCF RA" idea
> of mine to the death.  It certainly sounds like it's not as clean as I
> originally thought.  But I'm still struggling to see any dealbreaker.
> 
> OTOH, I'm totally open to better ideas.
> 
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.

Yes, I totally forgot about "monitor" action in the original post. 
It would also likely be usually implemented by the mentioned
"systemd+hooks" class, just as the mentioned "pre-start" and
"post-stop" equivalents (note that behavior of standard OCF agents
could be split so that, say, "start" action is "pre-start" action plus
daemon executable invocation, which would make the parts of behavior
more reusable, e.g., as systemd hooks, than it's the case nowadays).

-- 
Jan (Poki)


pgpkxQUOI7KCu.pgp
Description: PGP signature
___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Adam Spiers
[Sending this as a separate mail, since the last one was already (too)
long and focused on specific details, whereas this one takes a step
back to think about the bigger picture again.]

Adam Spiers  wrote:
> >  On 09/21/2016 03:25 PM, Adam Spiers wrote:
> > > As a result I have been thinking about the idea of changing the
> > > start/stop/status actions of these RAs so that they wrap around
> > > service(8) (which would be even more portable across distros than
> > > systemctl).

[snipped discussion of OCF wrapper RA idea]

> The fact that I don't see any problems where you apparently do makes
> me deeply suspicious of my own understanding ;-)  Please tell me what
> I'm missing.

[snipped]

To clarify: I am not religiously defending this "wrapper OCF RA" idea
of mine to the death.  It certainly sounds like it's not as clean as I
originally thought.  But I'm still struggling to see any dealbreaker.

OTOH, I'm totally open to better ideas.

For example, could Pacemaker be extended to allow hybrid resources,
where some actions (such as start, stop, status) are handled by (say)
the systemd backend, and other actions (such as monitor) are handled
by (say) the OCF backend?  Then we could cleanly rely on dbus for
collaborating with systemd, whilst adding arbitrarily complex
monitoring via OCF RAs.  That would have several advantages:

1. Get rid of grotesque layering violations and maintenance boundaries
   where the OCF RA duplicates knowledge of all kinds of things which
   are distribution-specific, e.g.:

 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56

2. Drastically simplify OCF RAs by delegating start/stop/status etc.
   to systemd, thereby increasing readability and reducing maintenance
   burden.

3. OCF RAs are more likely to work out of the box with any distro,
   or at least require less work to get working.

4. Services behave more similarly regardless of whether managed by
   Pacemaker or the standard pid 1 service manager.  For example, they
   will always use the same pidfile, run as the same user, in the
   right cgroup, be invoked with the same arguments etc.

5. Pacemaker can still monitor services accurately at the
   application-level, rather than just relying on naive pid-level
   monitoring.

Or is this a terrible idea? ;-)

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-26 Thread Adam Spiers
Ken Gaillot  wrote:
> On 09/22/2016 10:39 AM, Adam Spiers wrote:
> > Ken Gaillot  wrote:
> >> On 09/22/2016 08:49 AM, Adam Spiers wrote:
> >>> Ken Gaillot  wrote:
>  On 09/21/2016 03:25 PM, Adam Spiers wrote:
> > As a result I have been thinking about the idea of changing the
> > start/stop/status actions of these RAs so that they wrap around
> > service(8) (which would be even more portable across distros than
> > systemctl).
> >
> > The primary difference with your approach is that we probably wouldn't
> > need to make the RAs dynamically create any systemd configuration, since
> > that would already be provided by the packages which install the 
> > OpenStack
> > services.  But then AFAIK none of the OpenStack services use the
> > multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> 
>  The main complication I see is that pacemaker expects OCF agents to
>  return success only after an action is complete. For example, start
>  should not return until the service is fully active. I believe systemctl
>  does not behave this way, rather it initiates the action and returns
>  immediately.
> >>>
> >>> But that's trivial to work around: polling via "service foo status"
> >>> after "service foo start" converts it back from an asynchronous
> >>> operation to a synchronous one.
> >>
> >> Yes, that's exactly what pacemaker does now: start/stop, then every two
> >> seconds, poll the status.
> >>
> >> However, I'm currently working on a project to change that, so that we
> >> use DBus signalling to be notified when the job completes, rather than
> >> (or in addition to) polling.
> >>
> >> The reason is twofold: the two-second wait can be an unnecessary
> >> recovery delay in some cases; and (at least from the DBus API, not sure
> >> about systemctl status) there's no reliable way to distinguish "service
> >> is inactive because the start didn't work properly" from "service is
> >> inactive because systemd has some slow-starting dependencies of its own
> >> to start first".
> > 
> > OK, that makes sense - thanks.

Although thinking about it more - why couldn't systemctl return
different exit codes for these two cases, or add an "is-starting"
subcommand, or similar?

>  Pacemaker's native systemd integration has a lot of workarounds for
>  quirks in systemd behavior (and more every release). I'm not sure
>  moving/duplicating that logic to the RA is a good approach.
> >>>
> >>> What other quirks are there?
> >>
> >> When pacemaker starts a systemd service, it creates a unit override in
> >> /run/systemd/system/.service.d/50-pacemaker.conf, with these
> >> overrides (and removes the file when stopping the resource):
> >>
> >> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
> >> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
> >> Agent"). This gives a clear indicator in systemd messages in the syslog
> >> that it's a cluster resource.
> >>
> >> * "Before=pacemaker.service": This ensures that when someone shuts down
> >> the system via systemd, systemd doesn't stop pacemaker before pacemaker
> >> can stop the resource.
> >>
> >> * "Restart=no": This ensures that pacemaker stays in control of
> >> responding to service failures.
> > 
> > Yes, I was aware of that, and you're right that my approach of making
> > the RA wrap service(8) or systemctl(8) would need to duplicate this
> > functionality - *unless* the creation of the unit override could be
> > moved out of Pacemaker's C code into a shell script which both
> > Pacemaker and external RAs which want to adopt this wrapping technique
> > could call.
> > 
> >> Additionally:
> >>
> >> * Pacemaker uses intelligent timeout values (based on cluster
> >> configuration) when making systemd calls.
> > 
> > I guess I'd need more details to fully understand this, but couldn't
> > those intelligently chosen timeout values be passed to the RA if
> > necessary?  Although that does put a bit of a dampener on my hope of
> > using service(8) to remain agnostic to whichever pid-1 system happened
> > to be in use on the current machine.  Having said that, maybe everyone
> > in the OpenStack (HA) community has already moved to systemd by now
> > anyway.
> 
> One pacemaker action (start/stop/whatever) may involve multiple
> interactions with systemd. At each step, pacemaker knows the remaining
> timeout for the whole action, so it can use an appropriate timeout with
> each systemd action.
> 
> There's no way for the RA to know how much time is remaining.

Stupid question - why not?  Couldn't Pacemaker tell it?

> But I guess it's not important, since pacemaker will timeout the entire
> RA action if necessary.
> 
> >> * Pacemaker interprets/remaps systemd return status as needed. For
> >> example, a stop followed by a status poll that returns "OK" means the
> >> service is still running. Fairly obvious, but there are a lot of cases
> >> that need

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-22 Thread Andrew Beekhof

> On 23 Sep 2016, at 12:49 AM, Ken Gaillot  wrote:
> 
> On 09/22/2016 08:49 AM, Adam Spiers wrote:
>> Ken Gaillot  wrote:
>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
 Jan Pokorný  wrote:
> Just thinking aloud before the can is open.
 
 Thanks for sharing - I'm very interested to hear your ideas on this,
 because I was thinking along somewhat similar lines for the
 openstack-resource-agents repository which I maintain.
 
 Currently the OpenStack RAs duplicate much of the logic and config of
 corresponding systemd / LSB init scripts for starting / stopping
 OpenStack services and checking their status.  The main difference is
 that RAs also have a "monitor" action which can check the health of
 the service at application level, e.g. via HTTP rather than a naive
 "is this pid running" kind of check.
 
 This duplication causes issues with portability between Linux
 distributions, since each distribution has a slightly different way of
 starting and stopping the services.  It also results in subtlely
 different behaviour for OpenStack clouds depending on whether or not
 they are deployed in HA mode using Pacemaker.
 
 As a result I have been thinking about the idea of changing the
 start/stop/status actions of these RAs so that they wrap around
 service(8) (which would be even more portable across distros than
 systemctl).
 
 The primary difference with your approach is that we probably wouldn't
 need to make the RAs dynamically create any systemd configuration, since
 that would already be provided by the packages which install the OpenStack
 services.  But then AFAIK none of the OpenStack services use the
 multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>>> 
>>> The main complication I see is that pacemaker expects OCF agents to
>>> return success only after an action is complete. For example, start
>>> should not return until the service is fully active. I believe systemctl
>>> does not behave this way, rather it initiates the action and returns
>>> immediately.
>> 
>> But that's trivial to work around: polling via "service foo status"
>> after "service foo start" converts it back from an asynchronous
>> operation to a synchronous one.
> 
> Yes, that's exactly what pacemaker does now: start/stop, then every two
> seconds, poll the status.
> 
> However, I'm currently working on a project to change that, so that we
> use DBus signalling to be notified when the job completes, rather than
> (or in addition to) polling.
> 
> The reason is twofold: the two-second wait can be an unnecessary
> recovery delay in some cases; and (at least from the DBus API, not sure
> about systemctl status) there's no reliable way to distinguish "service
> is inactive because the start didn't work properly" from "service is
> inactive because systemd has some slow-starting dependencies of its own
> to start first”.

The systemd folks are telling us that the only real way reliably synchronously 
start a service is by watching DBus, which suggests that a shell based approach 
is doomed to fail.

> 
>>> Pacemaker's native systemd integration has a lot of workarounds for
>>> quirks in systemd behavior (and more every release). I'm not sure
>>> moving/duplicating that logic to the RA is a good approach.
>> 
>> What other quirks are there?
> 
> When pacemaker starts a systemd service, it creates a unit override in
> /run/systemd/system/.service.d/50-pacemaker.conf, with these
> overrides (and removes the file when stopping the resource):
> 
> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
> Agent"). This gives a clear indicator in systemd messages in the syslog
> that it's a cluster resource.
> 
> * "Before=pacemaker.service": This ensures that when someone shuts down
> the system via systemd, systemd doesn't stop pacemaker before pacemaker
> can stop the resource.
> 
> * "Restart=no": This ensures that pacemaker stays in control of
> responding to service failures.
> 
> 
> Additionally:
> 
> * Pacemaker uses intelligent timeout values (based on cluster
> configuration) when making systemd calls.
> 
> * Pacemaker interprets/remaps systemd return status as needed. For
> example, a stop followed by a status poll that returns "OK" means the
> service is still running. Fairly obvious, but there are a lot of cases
> that need to be handled.
> 
> All of these were added gradually over the past few years, so I'd expect
> the list to grow over the next few years.
> 
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/developers 
> 
___
Developers mailing list
Developers@clusterlabs.org
h

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-22 Thread Ken Gaillot
On 09/22/2016 10:39 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 09/22/2016 08:49 AM, Adam Spiers wrote:
>>> Ken Gaillot  wrote:
 On 09/21/2016 03:25 PM, Adam Spiers wrote:
> As a result I have been thinking about the idea of changing the
> start/stop/status actions of these RAs so that they wrap around
> service(8) (which would be even more portable across distros than
> systemctl).
>
> The primary difference with your approach is that we probably wouldn't
> need to make the RAs dynamically create any systemd configuration, since
> that would already be provided by the packages which install the OpenStack
> services.  But then AFAIK none of the OpenStack services use the
> multi-instance feature of systemd (foo@{one,two,three,etc}.service).

 The main complication I see is that pacemaker expects OCF agents to
 return success only after an action is complete. For example, start
 should not return until the service is fully active. I believe systemctl
 does not behave this way, rather it initiates the action and returns
 immediately.
>>>
>>> But that's trivial to work around: polling via "service foo status"
>>> after "service foo start" converts it back from an asynchronous
>>> operation to a synchronous one.
>>
>> Yes, that's exactly what pacemaker does now: start/stop, then every two
>> seconds, poll the status.
>>
>> However, I'm currently working on a project to change that, so that we
>> use DBus signalling to be notified when the job completes, rather than
>> (or in addition to) polling.
>>
>> The reason is twofold: the two-second wait can be an unnecessary
>> recovery delay in some cases; and (at least from the DBus API, not sure
>> about systemctl status) there's no reliable way to distinguish "service
>> is inactive because the start didn't work properly" from "service is
>> inactive because systemd has some slow-starting dependencies of its own
>> to start first".
> 
> OK, that makes sense - thanks.
> 
 Pacemaker's native systemd integration has a lot of workarounds for
 quirks in systemd behavior (and more every release). I'm not sure
 moving/duplicating that logic to the RA is a good approach.
>>>
>>> What other quirks are there?
>>
>> When pacemaker starts a systemd service, it creates a unit override in
>> /run/systemd/system/.service.d/50-pacemaker.conf, with these
>> overrides (and removes the file when stopping the resource):
>>
>> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
>> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
>> Agent"). This gives a clear indicator in systemd messages in the syslog
>> that it's a cluster resource.
>>
>> * "Before=pacemaker.service": This ensures that when someone shuts down
>> the system via systemd, systemd doesn't stop pacemaker before pacemaker
>> can stop the resource.
>>
>> * "Restart=no": This ensures that pacemaker stays in control of
>> responding to service failures.
> 
> Yes, I was aware of that, and you're right that my approach of making
> the RA wrap service(8) or systemctl(8) would need to duplicate this
> functionality - *unless* the creation of the unit override could be
> moved out of Pacemaker's C code into a shell script which both
> Pacemaker and external RAs which want to adopt this wrapping technique
> could call.
> 
>> Additionally:
>>
>> * Pacemaker uses intelligent timeout values (based on cluster
>> configuration) when making systemd calls.
> 
> I guess I'd need more details to fully understand this, but couldn't
> those intelligently chosen timeout values be passed to the RA if
> necessary?  Although that does put a bit of a dampener on my hope of
> using service(8) to remain agnostic to whichever pid-1 system happened
> to be in use on the current machine.  Having said that, maybe everyone
> in the OpenStack (HA) community has already moved to systemd by now
> anyway.

One pacemaker action (start/stop/whatever) may involve multiple
interactions with systemd. At each step, pacemaker knows the remaining
timeout for the whole action, so it can use an appropriate timeout with
each systemd action.

There's no way for the RA to know how much time is remaining.

But I guess it's not important, since pacemaker will timeout the entire
RA action if necessary.

>> * Pacemaker interprets/remaps systemd return status as needed. For
>> example, a stop followed by a status poll that returns "OK" means the
>> service is still running. Fairly obvious, but there are a lot of cases
>> that need to be handled.
> 
> Other than (obviously) start followed by status, what other cases are
> there?

It's just a matter of looking at all the possible return values of each
systemd call, and then mapping that to something the cluster can
interpret. Pacemaker uses the DBus API so the specifics will be
different compared to systemctl. It's just important to get right.

> All of this stuff sounds like generic problems which could 

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-22 Thread Adam Spiers
Ken Gaillot  wrote:
> On 09/22/2016 08:49 AM, Adam Spiers wrote:
> > Ken Gaillot  wrote:
> >> On 09/21/2016 03:25 PM, Adam Spiers wrote:
> >>> As a result I have been thinking about the idea of changing the
> >>> start/stop/status actions of these RAs so that they wrap around
> >>> service(8) (which would be even more portable across distros than
> >>> systemctl).
> >>>
> >>> The primary difference with your approach is that we probably wouldn't
> >>> need to make the RAs dynamically create any systemd configuration, since
> >>> that would already be provided by the packages which install the OpenStack
> >>> services.  But then AFAIK none of the OpenStack services use the
> >>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> >>
> >> The main complication I see is that pacemaker expects OCF agents to
> >> return success only after an action is complete. For example, start
> >> should not return until the service is fully active. I believe systemctl
> >> does not behave this way, rather it initiates the action and returns
> >> immediately.
> > 
> > But that's trivial to work around: polling via "service foo status"
> > after "service foo start" converts it back from an asynchronous
> > operation to a synchronous one.
> 
> Yes, that's exactly what pacemaker does now: start/stop, then every two
> seconds, poll the status.
> 
> However, I'm currently working on a project to change that, so that we
> use DBus signalling to be notified when the job completes, rather than
> (or in addition to) polling.
> 
> The reason is twofold: the two-second wait can be an unnecessary
> recovery delay in some cases; and (at least from the DBus API, not sure
> about systemctl status) there's no reliable way to distinguish "service
> is inactive because the start didn't work properly" from "service is
> inactive because systemd has some slow-starting dependencies of its own
> to start first".

OK, that makes sense - thanks.

> >> Pacemaker's native systemd integration has a lot of workarounds for
> >> quirks in systemd behavior (and more every release). I'm not sure
> >> moving/duplicating that logic to the RA is a good approach.
> > 
> > What other quirks are there?
> 
> When pacemaker starts a systemd service, it creates a unit override in
> /run/systemd/system/.service.d/50-pacemaker.conf, with these
> overrides (and removes the file when stopping the resource):
> 
> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
> Agent"). This gives a clear indicator in systemd messages in the syslog
> that it's a cluster resource.
> 
> * "Before=pacemaker.service": This ensures that when someone shuts down
> the system via systemd, systemd doesn't stop pacemaker before pacemaker
> can stop the resource.
> 
> * "Restart=no": This ensures that pacemaker stays in control of
> responding to service failures.

Yes, I was aware of that, and you're right that my approach of making
the RA wrap service(8) or systemctl(8) would need to duplicate this
functionality - *unless* the creation of the unit override could be
moved out of Pacemaker's C code into a shell script which both
Pacemaker and external RAs which want to adopt this wrapping technique
could call.

> Additionally:
> 
> * Pacemaker uses intelligent timeout values (based on cluster
> configuration) when making systemd calls.

I guess I'd need more details to fully understand this, but couldn't
those intelligently chosen timeout values be passed to the RA if
necessary?  Although that does put a bit of a dampener on my hope of
using service(8) to remain agnostic to whichever pid-1 system happened
to be in use on the current machine.  Having said that, maybe everyone
in the OpenStack (HA) community has already moved to systemd by now
anyway.

> * Pacemaker interprets/remaps systemd return status as needed. For
> example, a stop followed by a status poll that returns "OK" means the
> service is still running. Fairly obvious, but there are a lot of cases
> that need to be handled.

Other than (obviously) start followed by status, what other cases are
there?

All of this stuff sounds like generic problems which could be solved
once for all wrapper RAs via a simple shell library.  I'd happily
maintain this in openstack-resource-agents, although TBH it would
probably belong in resource-agents if anywhere.

> All of these were added gradually over the past few years, so I'd expect
> the list to grow over the next few years.

Well, hopefully they could be grown in a way which also supported
wrapper RAs :-)

Alternatively, if you think that there's a better solution than this
wrapper RA idea, I'm all ears.  The two main problems are essentially:

  1. RAs duplicate a whole bunch of logic / config already provided
 by vendor packages and systemd service units.

  2. RAs have a "monitor" action which can do proper application-level
 monitoring (e.g. HTTP pings), whereas apparentl

Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-22 Thread Ken Gaillot
On 09/22/2016 08:49 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>> Jan Pokorný  wrote:
 Just thinking aloud before the can is open.
>>>
>>> Thanks for sharing - I'm very interested to hear your ideas on this,
>>> because I was thinking along somewhat similar lines for the
>>> openstack-resource-agents repository which I maintain.
>>>
>>> Currently the OpenStack RAs duplicate much of the logic and config of
>>> corresponding systemd / LSB init scripts for starting / stopping
>>> OpenStack services and checking their status.  The main difference is
>>> that RAs also have a "monitor" action which can check the health of
>>> the service at application level, e.g. via HTTP rather than a naive
>>> "is this pid running" kind of check.
>>>
>>> This duplication causes issues with portability between Linux
>>> distributions, since each distribution has a slightly different way of
>>> starting and stopping the services.  It also results in subtlely
>>> different behaviour for OpenStack clouds depending on whether or not
>>> they are deployed in HA mode using Pacemaker.
>>>
>>> As a result I have been thinking about the idea of changing the
>>> start/stop/status actions of these RAs so that they wrap around
>>> service(8) (which would be even more portable across distros than
>>> systemctl).
>>>
>>> The primary difference with your approach is that we probably wouldn't
>>> need to make the RAs dynamically create any systemd configuration, since
>>> that would already be provided by the packages which install the OpenStack
>>> services.  But then AFAIK none of the OpenStack services use the
>>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>>
>> The main complication I see is that pacemaker expects OCF agents to
>> return success only after an action is complete. For example, start
>> should not return until the service is fully active. I believe systemctl
>> does not behave this way, rather it initiates the action and returns
>> immediately.
> 
> But that's trivial to work around: polling via "service foo status"
> after "service foo start" converts it back from an asynchronous
> operation to a synchronous one.

Yes, that's exactly what pacemaker does now: start/stop, then every two
seconds, poll the status.

However, I'm currently working on a project to change that, so that we
use DBus signalling to be notified when the job completes, rather than
(or in addition to) polling.

The reason is twofold: the two-second wait can be an unnecessary
recovery delay in some cases; and (at least from the DBus API, not sure
about systemctl status) there's no reliable way to distinguish "service
is inactive because the start didn't work properly" from "service is
inactive because systemd has some slow-starting dependencies of its own
to start first".

>> Pacemaker's native systemd integration has a lot of workarounds for
>> quirks in systemd behavior (and more every release). I'm not sure
>> moving/duplicating that logic to the RA is a good approach.
> 
> What other quirks are there?

When pacemaker starts a systemd service, it creates a unit override in
/run/systemd/system/.service.d/50-pacemaker.conf, with these
overrides (and removes the file when stopping the resource):

* It prefixes the description with "Cluster Controlled" (e.g. "Postfix
Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
Agent"). This gives a clear indicator in systemd messages in the syslog
that it's a cluster resource.

* "Before=pacemaker.service": This ensures that when someone shuts down
the system via systemd, systemd doesn't stop pacemaker before pacemaker
can stop the resource.

* "Restart=no": This ensures that pacemaker stays in control of
responding to service failures.


Additionally:

* Pacemaker uses intelligent timeout values (based on cluster
configuration) when making systemd calls.

* Pacemaker interprets/remaps systemd return status as needed. For
example, a stop followed by a status poll that returns "OK" means the
service is still running. Fairly obvious, but there are a lot of cases
that need to be handled.

All of these were added gradually over the past few years, so I'd expect
the list to grow over the next few years.


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-22 Thread Adam Spiers
Ken Gaillot  wrote:
> On 09/21/2016 03:25 PM, Adam Spiers wrote:
> > Jan Pokorný  wrote:
> >> Just thinking aloud before the can is open.
> > 
> > Thanks for sharing - I'm very interested to hear your ideas on this,
> > because I was thinking along somewhat similar lines for the
> > openstack-resource-agents repository which I maintain.
> > 
> > Currently the OpenStack RAs duplicate much of the logic and config of
> > corresponding systemd / LSB init scripts for starting / stopping
> > OpenStack services and checking their status.  The main difference is
> > that RAs also have a "monitor" action which can check the health of
> > the service at application level, e.g. via HTTP rather than a naive
> > "is this pid running" kind of check.
> > 
> > This duplication causes issues with portability between Linux
> > distributions, since each distribution has a slightly different way of
> > starting and stopping the services.  It also results in subtlely
> > different behaviour for OpenStack clouds depending on whether or not
> > they are deployed in HA mode using Pacemaker.
> > 
> > As a result I have been thinking about the idea of changing the
> > start/stop/status actions of these RAs so that they wrap around
> > service(8) (which would be even more portable across distros than
> > systemctl).
> > 
> > The primary difference with your approach is that we probably wouldn't
> > need to make the RAs dynamically create any systemd configuration, since
> > that would already be provided by the packages which install the OpenStack
> > services.  But then AFAIK none of the OpenStack services use the
> > multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> 
> The main complication I see is that pacemaker expects OCF agents to
> return success only after an action is complete. For example, start
> should not return until the service is fully active. I believe systemctl
> does not behave this way, rather it initiates the action and returns
> immediately.

But that's trivial to work around: polling via "service foo status"
after "service foo start" converts it back from an asynchronous
operation to a synchronous one.

> Pacemaker's native systemd integration has a lot of workarounds for
> quirks in systemd behavior (and more every release). I'm not sure
> moving/duplicating that logic to the RA is a good approach.

What other quirks are there?

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-21 Thread Ken Gaillot
On 09/21/2016 03:25 PM, Adam Spiers wrote:
> Hi Jan,
> 
> Jan Pokorný  wrote:
>> Hello,
>>
>> https://github.com/ClusterLabs/resource-agents/pull/846 seems to be
>> a first crack on integrating systemd to otherwise init-system-unaware
>> resource-agents.
>>
>> As pacemaker already handles native systemd integration, I wonder if
>> it wouldn't be better to just allow, on top of that, perhaps as
>> special "systemd+hooks" class of resources that would also accept
>> "hooks" (meta) attribute pointing to an executable implementing
>> formalized API akin to OCF (say on-start, on-stop, meta-data
>> actions) that would take care of initial reflecting on the rest of
>> the parameters + possibly a cleanup later on.

I can see the usefulness of having "hooks" for OS resources
(systemd/lsb/upstart/service). Let pacemaker start and stop the resource
via the OS mechanism, but do a little bit of extra housekeeping.

It could easily get ugly, though. Version dependencies, extra overhead, etc.

>> Technically, something akin to injecting Environment, ExecStartPre
>> and ExecStopPost to the service definition might also achieve the
>> same goal if there's a transparent way to do it from pacemaker using
>> just systemd API (I don't know).

Sure, pacemaker already creates a unit override before starting a
systemd resource. It would be trivial to add this. It could even simply
be configured as meta-attributes of systemd resources.

However, that wouldn't let you change the behavior of a status call, for
example.

>> Indeed, the scenario I have in mind would make do with separate
>> "prepare grounds" agent, suitably grouped with such systemd-class
>> resource, but that seems more fragile configuration-wise (this
>> is not the granularity cluster administrator would be supposed
>> to be thinking in, IMHO, just as with ocf class).

That isn't pretty either, but it's probably the best approach currently.

There are some non-obvious pitfalls when writing a "secondary" OCF agent
like this, but it's easy to document what they are and how to avoid them.

Nagios agents are another possibility; essentially, they implement a
status action and nothing else. So, a systemd resource + nagios resource
would provide an application-aware status.

Constraints and failure handling become trickier with this "two agents"
approach.

>> Just thinking aloud before the can is open.
> 
> Thanks for sharing - I'm very interested to hear your ideas on this,
> because I was thinking along somewhat similar lines for the
> openstack-resource-agents repository which I maintain.
> 
> Currently the OpenStack RAs duplicate much of the logic and config of
> corresponding systemd / LSB init scripts for starting / stopping
> OpenStack services and checking their status.  The main difference is
> that RAs also have a "monitor" action which can check the health of
> the service at application level, e.g. via HTTP rather than a naive
> "is this pid running" kind of check.
> 
> This duplication causes issues with portability between Linux
> distributions, since each distribution has a slightly different way of
> starting and stopping the services.  It also results in subtlely
> different behaviour for OpenStack clouds depending on whether or not
> they are deployed in HA mode using Pacemaker.
> 
> As a result I have been thinking about the idea of changing the
> start/stop/status actions of these RAs so that they wrap around
> service(8) (which would be even more portable across distros than
> systemctl).
> 
> The primary difference with your approach is that we probably wouldn't
> need to make the RAs dynamically create any systemd configuration, since
> that would already be provided by the packages which install the OpenStack
> services.  But then AFAIK none of the OpenStack services use the
> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> 
> Cheers,
> Adam

The main complication I see is that pacemaker expects OCF agents to
return success only after an action is complete. For example, start
should not return until the service is fully active. I believe systemctl
does not behave this way, rather it initiates the action and returns
immediately.

Pacemaker's native systemd integration has a lot of workarounds for
quirks in systemd behavior (and more every release). I'm not sure
moving/duplicating that logic to the RA is a good approach.

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] RA as a systemd wrapper -- the right way?

2016-09-21 Thread Adam Spiers
Hi Jan,

Jan Pokorný  wrote:
> Hello,
> 
> https://github.com/ClusterLabs/resource-agents/pull/846 seems to be
> a first crack on integrating systemd to otherwise init-system-unaware
> resource-agents.
> 
> As pacemaker already handles native systemd integration, I wonder if
> it wouldn't be better to just allow, on top of that, perhaps as
> special "systemd+hooks" class of resources that would also accept
> "hooks" (meta) attribute pointing to an executable implementing
> formalized API akin to OCF (say on-start, on-stop, meta-data
> actions) that would take care of initial reflecting on the rest of
> the parameters + possibly a cleanup later on.
> 
> Technically, something akin to injecting Environment, ExecStartPre
> and ExecStopPost to the service definition might also achieve the
> same goal if there's a transparent way to do it from pacemaker using
> just systemd API (I don't know).
> 
> Indeed, the scenario I have in mind would make do with separate
> "prepare grounds" agent, suitably grouped with such systemd-class
> resource, but that seems more fragile configuration-wise (this
> is not the granularity cluster administrator would be supposed
> to be thinking in, IMHO, just as with ocf class).
> 
> Just thinking aloud before the can is open.

Thanks for sharing - I'm very interested to hear your ideas on this,
because I was thinking along somewhat similar lines for the
openstack-resource-agents repository which I maintain.

Currently the OpenStack RAs duplicate much of the logic and config of
corresponding systemd / LSB init scripts for starting / stopping
OpenStack services and checking their status.  The main difference is
that RAs also have a "monitor" action which can check the health of
the service at application level, e.g. via HTTP rather than a naive
"is this pid running" kind of check.

This duplication causes issues with portability between Linux
distributions, since each distribution has a slightly different way of
starting and stopping the services.  It also results in subtlely
different behaviour for OpenStack clouds depending on whether or not
they are deployed in HA mode using Pacemaker.

As a result I have been thinking about the idea of changing the
start/stop/status actions of these RAs so that they wrap around
service(8) (which would be even more portable across distros than
systemctl).

The primary difference with your approach is that we probably wouldn't
need to make the RAs dynamically create any systemd configuration, since
that would already be provided by the packages which install the OpenStack
services.  But then AFAIK none of the OpenStack services use the
multi-instance feature of systemd (foo@{one,two,three,etc}.service).

Cheers,
Adam

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers