On 13-05-09 09:53 PM, Andrew Beekhof wrote: > > May 7 02:36:16 node1 crmd[16836]: info: delete_resource: Removing > resource testfs-resource1 for 18002_crm_resource (internal) on node1 > May 7 02:36:16 node1 lrmd: [16833]: info: flush_op: process for operation > monitor[8] on ocf::Target::testfs-resource1 for client 16836 still running, > flush delayed > May 7 02:36:16 node1 crmd[16836]: info: lrm_remove_deleted_op: Removing > op testfs-resource1_monitor_0:8 for deleted resource testfs-resource1 > > So apparently a badly timed cleanup was run.
:-( I didn't know there could such timing problems. I might have to change my process a bit then perhaps. > Did you do that or was it the crm shell? That was "me" doing a "crm resource cleanup" (soon to become "crm_resource -r ... --cleanup"). The process is typically: - create resource - start resource - wait for resource to start where "start resource" is: - "clean it to start with a known clean resource" (crm resource cleanup) - "start resource" (crm_resource -r ... -p target-role -m -v Started) and "wait for resource" is a loop of "crm resource status ..." (soon to be "crm_resource -r ... --locate") So the create, clean, start operations happen in quite quick succession (i.e. scripted). Is that pathological? Is a clean between create and start known to be problematic? FWIW, the reason for clean before the start, even after just creating the resource is that "clean" and "start" are lumped together into a function that is called after create, but can also be called at other times during the life-cycle, so it could be needed to clean a resource before trying to start it. I was hoping the cleaning of a just created resource was going to be effectively a NOOP. I guess for completeness, I should add here that creating the resource is a "cibadmin -o resource -C ..." operation. > If the machine is heavily loaded, or just very busy with file I/O, that can > still take quite a long time. Yeah, not very loaded at all, especially at this point. This is all happening before anything really gets started on the machine... this is the process of getting the resources up and running and the machine is dedicated to running the tasks associated with these resources. Cheers, b.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org