On Thu, Sep 10, 2009 at 02:25:34PM -0500, David Teigland wrote:
> On Wed, Sep 02, 2009 at 09:23:08AM -0500, David Teigland wrote:
> > > > 1. correlating events from different services locally
> > > > 
> > > > I get nodedown from both cman (or quorum service) and cpg.  I need to
> > > > correlate them with each other.  When I get a cpg nodedown for node A, 
> > > > I don't
> > > > know which cman nodedown for A it refers to: one of multiple in the 
> > > > past or
> > > > one in the future that cman hasn't reported yet.
> > > > 
> > > 
> > > Correlation could be solved by addition of api to cman, cpg, and quorum
> > > to retrieve the globally unique ring id for the last configuration
> > > change delivered to the application.
> > > 
> > > If you agree, we can work on the implementation for corosync 1.1.
> > > Adding this to CPG is trivial, not sure about other services.
> > > 
> > > Our policies wrt x.y.z would not be violated with this change.
> > > 
> > > As an example, the API for cpg might look like
> > > 
> > > cpg_ringid_get (handle, &ring_id);
> > > 
> > > Then ring_id could be memcmp'ed in the application.
> > > 
> > > This would retrieve the last ring id delivered to the application (not
> > > the current ring id known to the cpg service).
> 
> Thinking more about this, and I think there are two different kinds of ringid
> queries that we'd want from cpg.  It's because all new ringid's result in
> cman/quorum confchgs, but not all ringid changes result in cpg confchgs.

I describe the specific test that fails and why at the end of this commit
message:

http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=bcc5fdef8473d99399c624a7bc15423a2af645c1


> 
> My understanding is that ringid (actually ringid sequence number) is
> incremented for each new ring (each cluster membership change).
> 
> a. For a given ringid from cpg for a nodedown confchg, need to know that
>    cman/quorum has seen the same nodedown.
> 
> Comparing the ringid of the cpg nodedown confchg and the ringid from cman
> should work for this.  If cman ringid is greater than or equal to the ringid
> of the cpg nodedown confchg, then we know cman is aware of the cpg nodedown.
> cman ringid may be larger if another node has since joined the cluster but not
> the cpg, or if a cluster member failed that was not a member of the cpg.
> 
> b. For a given ringid from cman/quorum, need to know that any confchgs up to
>    that same ringid have been delivered to the cpg.
> 
> These imply two different ringid values for cpg:
> 
> 1. the ringid of the last confchg delivered to the cpg
> 2. the ringid that cpg deliveries are up to date with, which may be greater
>    than the ringid of the last confchg delivered if the latest ring changes
>    have not altered the cpg membership
> 
> example
> 
> cluster ringid = 40
> cluster members = 1,2,3,4,5
> cpg members = 1,2,3,4
> 
> node 1 fails
> cluster ringid = 44
> cluster members = 2,3,4,5
> cpg members = 2,3,4
> 
> cman_ringid(&id)
>   id = 44
> 
> cpg_ringid(h, &id1, &id2)
>   before the app dispatches the cpg confchg callback
>   id1 = 40, id2 = 40
>   after the app dispatches the cpg confchg callback
>   id1 = 44, id2 = 44
> 
> node 5 fails
> cluster ringid = 48
> cluster members = 2,3,4
> cpg members = 2,3,4
> 
> cman_ringid(h, &id)
>   id = 48
> 
> cpg_ringid(h, &id1, &id2)
>   id1 = 44, id2 = 48
>   (there are no confchgs for the cpg in response to 5 failing)
> 
> > Turns out that libcman already has a call that returns the ring id, so all I
> > need now is the addition to cpg.
> 
> Chrissie pointed out that libcman only returns the 64 bit ringid as uint32,
> but I doubt we'll see ringid's bigger than that.... even if we do I'm just
> comparing consecutive id's so the lower 32 bits should be fine.
> 
> Dave
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to