On Thu, Sep 10, 2009 at 02:25:34PM -0500, David Teigland wrote: > On Wed, Sep 02, 2009 at 09:23:08AM -0500, David Teigland wrote: > > > > 1. correlating events from different services locally > > > > > > > > I get nodedown from both cman (or quorum service) and cpg. I need to > > > > correlate them with each other. When I get a cpg nodedown for node A, > > > > I don't > > > > know which cman nodedown for A it refers to: one of multiple in the > > > > past or > > > > one in the future that cman hasn't reported yet. > > > > > > > > > > Correlation could be solved by addition of api to cman, cpg, and quorum > > > to retrieve the globally unique ring id for the last configuration > > > change delivered to the application. > > > > > > If you agree, we can work on the implementation for corosync 1.1. > > > Adding this to CPG is trivial, not sure about other services. > > > > > > Our policies wrt x.y.z would not be violated with this change. > > > > > > As an example, the API for cpg might look like > > > > > > cpg_ringid_get (handle, &ring_id); > > > > > > Then ring_id could be memcmp'ed in the application. > > > > > > This would retrieve the last ring id delivered to the application (not > > > the current ring id known to the cpg service). > > Thinking more about this, and I think there are two different kinds of ringid > queries that we'd want from cpg. It's because all new ringid's result in > cman/quorum confchgs, but not all ringid changes result in cpg confchgs.
I describe the specific test that fails and why at the end of this commit message: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=bcc5fdef8473d99399c624a7bc15423a2af645c1 > > My understanding is that ringid (actually ringid sequence number) is > incremented for each new ring (each cluster membership change). > > a. For a given ringid from cpg for a nodedown confchg, need to know that > cman/quorum has seen the same nodedown. > > Comparing the ringid of the cpg nodedown confchg and the ringid from cman > should work for this. If cman ringid is greater than or equal to the ringid > of the cpg nodedown confchg, then we know cman is aware of the cpg nodedown. > cman ringid may be larger if another node has since joined the cluster but not > the cpg, or if a cluster member failed that was not a member of the cpg. > > b. For a given ringid from cman/quorum, need to know that any confchgs up to > that same ringid have been delivered to the cpg. > > These imply two different ringid values for cpg: > > 1. the ringid of the last confchg delivered to the cpg > 2. the ringid that cpg deliveries are up to date with, which may be greater > than the ringid of the last confchg delivered if the latest ring changes > have not altered the cpg membership > > example > > cluster ringid = 40 > cluster members = 1,2,3,4,5 > cpg members = 1,2,3,4 > > node 1 fails > cluster ringid = 44 > cluster members = 2,3,4,5 > cpg members = 2,3,4 > > cman_ringid(&id) > id = 44 > > cpg_ringid(h, &id1, &id2) > before the app dispatches the cpg confchg callback > id1 = 40, id2 = 40 > after the app dispatches the cpg confchg callback > id1 = 44, id2 = 44 > > node 5 fails > cluster ringid = 48 > cluster members = 2,3,4 > cpg members = 2,3,4 > > cman_ringid(h, &id) > id = 48 > > cpg_ringid(h, &id1, &id2) > id1 = 44, id2 = 48 > (there are no confchgs for the cpg in response to 5 failing) > > > Turns out that libcman already has a call that returns the ring id, so all I > > need now is the addition to cpg. > > Chrissie pointed out that libcman only returns the 64 bit ringid as uint32, > but I doubt we'll see ringid's bigger than that.... even if we do I'm just > comparing consecutive id's so the lower 32 bits should be fine. > > Dave > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
