On Wed, Sep 02, 2009 at 09:23:08AM -0500, David Teigland wrote: > > > 1. correlating events from different services locally > > > > > > I get nodedown from both cman (or quorum service) and cpg. I need to > > > correlate them with each other. When I get a cpg nodedown for node A, I > > > don't > > > know which cman nodedown for A it refers to: one of multiple in the past > > > or > > > one in the future that cman hasn't reported yet. > > > > > > > Correlation could be solved by addition of api to cman, cpg, and quorum > > to retrieve the globally unique ring id for the last configuration > > change delivered to the application. > > > > If you agree, we can work on the implementation for corosync 1.1. > > Adding this to CPG is trivial, not sure about other services. > > > > Our policies wrt x.y.z would not be violated with this change. > > > > As an example, the API for cpg might look like > > > > cpg_ringid_get (handle, &ring_id); > > > > Then ring_id could be memcmp'ed in the application. > > > > This would retrieve the last ring id delivered to the application (not > > the current ring id known to the cpg service).
Thinking more about this, and I think there are two different kinds of ringid queries that we'd want from cpg. It's because all new ringid's result in cman/quorum confchgs, but not all ringid changes result in cpg confchgs. My understanding is that ringid (actually ringid sequence number) is incremented for each new ring (each cluster membership change). a. For a given ringid from cpg for a nodedown confchg, need to know that cman/quorum has seen the same nodedown. Comparing the ringid of the cpg nodedown confchg and the ringid from cman should work for this. If cman ringid is greater than or equal to the ringid of the cpg nodedown confchg, then we know cman is aware of the cpg nodedown. cman ringid may be larger if another node has since joined the cluster but not the cpg, or if a cluster member failed that was not a member of the cpg. b. For a given ringid from cman/quorum, need to know that any confchgs up to that same ringid have been delivered to the cpg. These imply two different ringid values for cpg: 1. the ringid of the last confchg delivered to the cpg 2. the ringid that cpg deliveries are up to date with, which may be greater than the ringid of the last confchg delivered if the latest ring changes have not altered the cpg membership example cluster ringid = 40 cluster members = 1,2,3,4,5 cpg members = 1,2,3,4 node 1 fails cluster ringid = 44 cluster members = 2,3,4,5 cpg members = 2,3,4 cman_ringid(&id) id = 44 cpg_ringid(h, &id1, &id2) before the app dispatches the cpg confchg callback id1 = 40, id2 = 40 after the app dispatches the cpg confchg callback id1 = 44, id2 = 44 node 5 fails cluster ringid = 48 cluster members = 2,3,4 cpg members = 2,3,4 cman_ringid(h, &id) id = 48 cpg_ringid(h, &id1, &id2) id1 = 44, id2 = 48 (there are no confchgs for the cpg in response to 5 failing) > Turns out that libcman already has a call that returns the ring id, so all I > need now is the addition to cpg. Chrissie pointed out that libcman only returns the 64 bit ringid as uint32, but I doubt we'll see ringid's bigger than that.... even if we do I'm just comparing consecutive id's so the lower 32 bits should be fine. Dave _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
