Re: [ClusterLabs Developers] Opinions wanted: OCF agent types

2022-08-18 Thread Klaus Wenninger
On Thu, Aug 18, 2022 at 1:06 AM Reid Wahl  wrote:
>
> On Wed, Aug 17, 2022 at 1:01 PM Ken Gaillot  wrote:
> >
> > Hi all,
> >
> > OCF 1.1 hasn't been out that long but I'm already looking ahead to OCF
> > 1.2 (which would remain backward-compatible).
> >
> > One big addition I'm contemplating is defining OCF resource agent
> > types, to address these problems:
> >
> > * Fence agents have a completely different standard from OCF resource
> > agents, and lack some of the features available to OCF agents (such as
> > meaningful error statuses and exit reasons for failures).
> >
> > * Pacemaker's node health feature uses OCF agents to monitor node
> > conditions, but there are some user pain points involved since they are
> > indistinguishable from regular OCF agents.
> >
> > * In the past there has been discussion of implementing "storage
> > agents" to help manage replication of external storage devices,
> > primarily for disaster recovery purposes.
> >
> > Visually, the agent type would be another field in
> > the agent specification, for example ocf:fence:heartbeat:iscsi or
> > ocf:health:pacemaker:cpu.
> >
> > "Regular" OCF agents would be (for example)
> > ocf:service:heartbeat:apache in full, but for backward compatibility
> > "service" would be the default, and ocf:heartbeat:apache would continue
> > to work.
> >
> > Alternatively, if we want to keep it to three fields, we could do
> > something like ocf-fence:heartbeat:iscsi and ocf-health:pacemaker:cpu.
> >
> > The OCF standard would have a shared section that all agent types would
> > be required to support. This could include things like exit status
> > codes, environment variables, and the meta-data action. Each agent type
> > would then have its own section with anything specific to that type --
> > for example, service agents need to support start and stop actions,
> > while fence agents need to support off and optionally reboot.
> >
> > The benefits would include:
> >
> > * Agent writers would have fewer differences to worry about and
> > libraries to learn.
> >
> > * Pacemaker and higher-level tools could easily distinguish agent types
> > and respond intelligently. For example, higher-level shells could list
> > all health agents and clone them automatically when used, and Pacemaker
> > could automatically exempt health agents from health restrictions so
> > that the agent can automatically detect when the node becomes healthy
> > again.
> >
> > * We would have a framework for adding new types if the need arises.
> >
> > Thoughts?
>
> It sounds like a good idea.
>
> With regard to "service" as the default OCF resource agent type, this
> may be confusing since we already have a "service" standard.

stumbled over that as well ... maybe simply 'resource'

Klaus

>
> > --
> > Ken Gaillot 
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/developers
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/developers
>
> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs Developers] Opinions wanted: OCF agent types

2022-08-17 Thread Reid Wahl
On Wed, Aug 17, 2022 at 1:01 PM Ken Gaillot  wrote:
>
> Hi all,
>
> OCF 1.1 hasn't been out that long but I'm already looking ahead to OCF
> 1.2 (which would remain backward-compatible).
>
> One big addition I'm contemplating is defining OCF resource agent
> types, to address these problems:
>
> * Fence agents have a completely different standard from OCF resource
> agents, and lack some of the features available to OCF agents (such as
> meaningful error statuses and exit reasons for failures).
>
> * Pacemaker's node health feature uses OCF agents to monitor node
> conditions, but there are some user pain points involved since they are
> indistinguishable from regular OCF agents.
>
> * In the past there has been discussion of implementing "storage
> agents" to help manage replication of external storage devices,
> primarily for disaster recovery purposes.
>
> Visually, the agent type would be another field in
> the agent specification, for example ocf:fence:heartbeat:iscsi or
> ocf:health:pacemaker:cpu.
>
> "Regular" OCF agents would be (for example)
> ocf:service:heartbeat:apache in full, but for backward compatibility
> "service" would be the default, and ocf:heartbeat:apache would continue
> to work.
>
> Alternatively, if we want to keep it to three fields, we could do
> something like ocf-fence:heartbeat:iscsi and ocf-health:pacemaker:cpu.
>
> The OCF standard would have a shared section that all agent types would
> be required to support. This could include things like exit status
> codes, environment variables, and the meta-data action. Each agent type
> would then have its own section with anything specific to that type --
> for example, service agents need to support start and stop actions,
> while fence agents need to support off and optionally reboot.
>
> The benefits would include:
>
> * Agent writers would have fewer differences to worry about and
> libraries to learn.
>
> * Pacemaker and higher-level tools could easily distinguish agent types
> and respond intelligently. For example, higher-level shells could list
> all health agents and clone them automatically when used, and Pacemaker
> could automatically exempt health agents from health restrictions so
> that the agent can automatically detect when the node becomes healthy
> again.
>
> * We would have a framework for adding new types if the need arises.
>
> Thoughts?

It sounds like a good idea.

With regard to "service" as the default OCF resource agent type, this
may be confusing since we already have a "service" standard.

> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/developers
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs Developers] Opinions wanted: OCF agent types

2022-08-17 Thread Ken Gaillot
Hi all,

OCF 1.1 hasn't been out that long but I'm already looking ahead to OCF
1.2 (which would remain backward-compatible).

One big addition I'm contemplating is defining OCF resource agent
types, to address these problems:

* Fence agents have a completely different standard from OCF resource
agents, and lack some of the features available to OCF agents (such as
meaningful error statuses and exit reasons for failures).

* Pacemaker's node health feature uses OCF agents to monitor node
conditions, but there are some user pain points involved since they are
indistinguishable from regular OCF agents.

* In the past there has been discussion of implementing "storage
agents" to help manage replication of external storage devices,
primarily for disaster recovery purposes.

Visually, the agent type would be another field in
the agent specification, for example ocf:fence:heartbeat:iscsi or
ocf:health:pacemaker:cpu.

"Regular" OCF agents would be (for example)
ocf:service:heartbeat:apache in full, but for backward compatibility
"service" would be the default, and ocf:heartbeat:apache would continue
to work.

Alternatively, if we want to keep it to three fields, we could do
something like ocf-fence:heartbeat:iscsi and ocf-health:pacemaker:cpu.

The OCF standard would have a shared section that all agent types would
be required to support. This could include things like exit status
codes, environment variables, and the meta-data action. Each agent type
would then have its own section with anything specific to that type --
for example, service agents need to support start and stop actions,
while fence agents need to support off and optionally reboot.

The benefits would include:

* Agent writers would have fewer differences to worry about and
libraries to learn.

* Pacemaker and higher-level tools could easily distinguish agent types
and respond intelligently. For example, higher-level shells could list
all health agents and clone them automatically when used, and Pacemaker
could automatically exempt health agents from health restrictions so
that the agent can automatically detect when the node becomes healthy
again.

* We would have a framework for adding new types if the need arises.

Thoughts?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/