On 2008-05-16T12:15:03, Lon Hohberger <[EMAIL PROTECTED]> wrote: > rgmanager: > * parent/child relationships for implicit start-after/stop-before > * attribute inheritance (we have talked about this in the past; > it isn't hard, and may be beneficial) > * specification of child resource type ordering to prevent major > "gotchas" when defining resource groups (e.g. putting a > script on a file system but putting them in the wrong order, > causing errors) > * 'primary' attribute specification (not OCF compliant) is used to > identify resource instances
That's all just meta-data, right? > * use of LSB 'status' to implement OCF 'monitor' function (status isn't > specified in the RA API, but the monitor function as specified appears > to map to the LSB status function... so most of our agents do > monitor->status, though depth is still supported - maybe yours are the > same; haven't fully investigated) monitor is _not_ 1:1 the LSB status. That's exactly why we're not using status. ;-) http://www.linux-foundation.org/spec/refspecs/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html In particular, 3 vs 7 is a crucial difference, and we didn't want to have to special-case the exit codes depending on the action being called. > * multiple references to the same resource instance - reference counts > are used to prevent starting the same resource on the same node multiple > times We use explicit dependencies and thus can reference the same primitive/clone/group in as-many places as needed. > * rgmanager allows reconfiguration of resource parameters without > restarting the resource; maybe pacemaker does too; haven't checked; uses > <parameter name="xxx" reconfig="1" .../> in the meta-data to enable it. Our instance_attributes support a "reload" setting. > pacemaker: > * promote / demote resource operations > * UUIDs used to identify resource instances (I like this better than > what we do with type:primary_attr in rgmanager) Yeah, well, the UUIDs are not the grandest idea we ever had - nowadays at least the GUI tries to generate a shorter unique id w/o the full cumbersomeness of UUIDs. > * clone resources and operations used to start (more or less) the same > resource on multiple nodes > General: > * resource migrate is likely done differently; not sure though (maybe > you can tell me?): > <resource-agent> migrate <target_host_name> Our model is both push and pull compatible. On the source, we execute a "migrate_to" command (the target_host is passed via the environment), and on the target, a "migrate_from". (That makes sense if you consider this as _commands_ given to the nodes, otherwise it seems kind of the wrong way around ;-) The migrate_from also is our way of checking whether the migration succeeded; I guess in your case you then run a monitor/status on the target? > There will be more that I will come across, no doubt. Those are just > the ones on the surface. I do not believe any of them are hard to deal > with. Right. I was in particular interested in understanding those differences which affect the RA API, as that could possibly affect the usability of RAs written for RHCS vs those written for ours. I think it's probably a good idea to find some time to sit down and chat how to resolve these. I've got a presentation from last year's BrainShare on what our scripts do, that should be a usable starting point. Not much has changed since. A further matter might be the shell scripts calling out to various scripts which assume things in the environment - ie, we supply ocf-shellfuncs (a shell source file) which defines ocf_log() and a few others. > I think we both diverged in a compatible way here: > * <parameter ... required="1" .../> means this parameter must be > specified for a given resource instance. A compatible divergence can't possibly be a diverge ;-) > I believe the idea was to use virtual machines resources, with those > virtual machines in a cluster of their own. Ah, OK. > To clarify the requirements as stated: they were in the context of an > existing implementation. > > Generally, with clustered virtual machines that can run on more than one > physical node, at a bare minimum, you need to know only a few things on > the physical hosts in order to implement fencing: > > * where a particular vm is and its current state, or > * where that vm "was", and > * the state of the host running the vm, and > * if "bad" or "Dead", whether fencing has completed > > Certainly, pacemaker knows all of the above! Right, of course. The external/xen STONITH script which we already have could likely use crm_resource to find out and/or control the state of the resource representing the DomU in the Dom0 cluster. Now I see what you're saying. > I doubt it would be difficult to make the existing fence agent/host > preferentially use pacemaker to locate & kill VMs when possible (as > opposed to simply talking to libvirt + AIS Checkpoint APIs as it does > now). I think at least some interaction here would be needed, because otherwise, pacemaker/LRM would eventually run the monitor action, find out that it's gone and restart it, which might not be what is desired ;-) Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
