On Wed, Dec 2, 2009 at 3:22 PM, Frank DiMeo <frank.di...@bigbandnet.com> wrote:
> Ask and ye shall receive. :)
>
> I'm enclosing my openais init script, which I'm running on my two node 
> cluster made up of identical Ubuntu (9.04) machines called ubuntu_2 and 
> ubuntu_1.

If the node takes more than 30s to shut down, then it kills openais.
In that case, its no surprise that the lrmd and pengine are still
around - because the cluster didn't have time to shut down cleanly.

> Running pacemaker 1.06 from the tip as of a month ago or so.
>
> I'm also enclosing two sets of files which may help you see whats happening.
>
> The "working" set:
>
> 4rsc_worlds_coloc_ordered.xml - this is my initial configuration file.  When 
> I use this to initial my cluster, the 4 resources all start up in order, on 
> the right node, and move together when I put nodes in and out of standby.
>
> goodconfig_debug.txt - the log file (from ubuntu_1) showing what happens when 
> the resources are running on node "ubuntu_2" and I put that node into 
> standby.  All resources are moved to "ubuntu_1".  If I stop openais, 
> everything shuts down quickly and clean, and no processes (like lrmd, 
> pengine, etc) are left running.
>
> The "not working" set:

Can you attach /var/lib/pengine/pe-input-12434.bz2 from ubuntu_1 please?

>
> 4rsc_worlds_coloc_ordered_alt1.xml - this is identical to the xml file in the 
> working set, except I use the compact syntax for ordering.
>
> badconfig_debug.txt - the log file (from ubuntu_1) showing what happens when 
> the resources are running on node "ubuntu_2" and I put that node into 
> standby.  The pe wants to move them to ubuntu_1, but the pe only seems to 
> generate "pseudo actions" and never really moves anything.  The resources 
> continue to run on node ubuntu_2 even when the node is in standby!  Further, 
> if I try to shut down openais on ubuntu_2 at this point (using the 
> /etc/init.d/openais script enclosed), after a long time, corosync stops, but 
> lrmd and pengine keep running, and become children of the init process.  
> Again, the resources keep running even at this point, which is because they 
> are never commanded to stop.
>
> I can send you my RA's and the resources themselves (which are just bash 
> scripts) if you'd like.
>
> I'll apply the patch you pointed to and let you know what happens.
>
> Thanks very much,
> -Frank
>
>
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:and...@beekhof.net]
>> Sent: Wednesday, December 02, 2009 6:00 AM
>> To: pacemaker@oss.clusterlabs.org
>> Subject: Re: [Pacemaker] bug in ordering syntax?
>>
>> On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo
>> <frank.di...@bigbandnet.com> wrote:
>> > I'm experimenting with startup sequence and co-location control, and
>> think I
>> > may have stumbled across a bug.
>> >
>> >
>> >
>> > I have two xml files that I use in my testing as my initial
>> configuration of
>> > a two node cluster.  I start each node with no configuration, and
>> then use
>> > cibadmin to "source in" the xml file.  Each file defines two
>> resources as
>> > well as a startup order and collocation definition.  The only
>> difference
>> > between the two files is the syntax I use to specify the startup
>> order.
>> >
>> >
>> >
>> > When I use the syntax:
>> >
>> >
>> >
>> > <rsc_order id="order-1" first="world1" then="world2" score="INFINITY"
>> />
>> >
>> >
>> >
>> > Everything works fine.  I can put either of the two nodes into
>> standby while
>> > resources are running there, and the resources move to the other node
>> as
>> > expected.
>> >
>> >
>> >
>> > However, when I use the syntax:
>> >
>> >
>> >
>> > - <<rsc_order id="order-1">
>>
>> You're missing a score.  Without one it defaults to 0 (which means
>> optional).
>> However, IIRC, the 1.0.6 schema won't allow you to set a score there
>> so you'll need to apply the following patch:
>>    http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c
>>
>> >
>> > - <  <resource_set id="order-1-set-1" sequential="true">
>> >
>> >   <            <resource_ref id="world1" />
>> >
>> >   <            <resource_ref id="world2" />
>> >
>> >   </resource_set>
>> >
>> >  </rsc_order>
>> >
>> >
>> >
>> >
>> >
>> > Several bad things happen.  First, the resources don't move off the
>> node
>> > that is put into standby, even though the alternate node is running
>> and able
>> > to run the resources.
>>
>> Did you remove the other ordering constraint first?
>>
>> > Second, attempting to shut down openais on the node
>> > running the resources after attempting a forced move (by putting the
>> node
>> > into standby) leaves both the lrmd and pengine processes running (but
>> > children of process 1 (init), and the resources continue to run on
>> the that
>> > node even after openais is stopped.
>>
>> I suspect you've a faulty init script there.  See other email.
>>
>> > I turned debug on in crmd and in the logs and recorded what happens
>> when I
>> > force standby, and I notice that using the first syntax causes
>> > te_rsc_command to be executed to send a shut down message to the node
>> where
>> > the resources are running (which seems to work), while using the
>> second
>> > syntax causes te_pseudo_action to be called in approximately the same
>> place
>> > in the log, but no shutdown of resources happens (I can't really tell
>> what
>> > this is supposed to be doing).
>>
>> Neither can I - you didnt attach the logs :-)
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to