Re: [Openais] stale CPG members in confchg callback

Jan Friesse Fri, 26 Feb 2010 01:28:49 -0800

Attached is better version of patch, which returns CS_ERR_TRY_AGAIN(rather than ERR_EXISTS).


Attached is also backport for whitetank.


BZ for RHEL5 - https://bugzilla.redhat.com/show_bug.cgi?id=568650

Regards,
  Honza

Jan Friesse napsal(a):

Dietmar,
I think attached patch will solve your problem. See comment in patch to
understand, what/why is happening.

I've created BZ https://bugzilla.redhat.com/show_bug.cgi?id=568356

If you feel, that your problem is away, please put some comment to BZ.

About error 14. This is really not bug, it can happend when you are
trying to make another join form same application and newly in situation
leading to described problem.

Regards,
  Honza

Dietmar Maurer wrote:

commit 41a2a23e5da11cfbae054dcd79512fb816c37d30
Author: Jan Friesse <[email protected]>
Date:   Thu Feb 25 15:38:25 2010 +0100

    Cpg join with undelivered leave message
    
    Patch handles situation, when on one node, one process:
    - join cpg
    - do same actions
    - leave cpg
    - join cpg again
    
    Following sequence can (racy) end with broken process_info list.
    
    To solve this problem, one more check is done in
    message_handler_req_lib_cpg_join so if process_info with same pid and
    group as new join request exists, CPG_ERR_TRY_AGAIN is returned.

diff --git a/trunk/services/cpg.c b/trunk/services/cpg.c
index 68bd1ed..af5f6c4 100644
--- a/trunk/services/cpg.c
+++ b/trunk/services/cpg.c
@@ -1067,6 +1067,21 @@ static void message_handler_req_lib_cpg_join (void *conn, const void *message)
 		}
 	}
 
+	/*
+	 * Same check must be done in process info list, because there may be not yet delivered
+	 * leave of client.
+	 */
+	for (iter = process_info_list_head.next; iter != &process_info_list_head; iter = iter->next) {
+		struct process_info *pi = list_entry (iter, struct process_info, list);
+
+		if (pi->nodeid == api->totem_nodeid_get () && pi->pid == req_lib_cpg_join->pid &&
+		    mar_name_compare(&req_lib_cpg_join->group_name, &pi->group) == 0) {
+			/* We have same pid and group name joined -> return error */
+			error = CPG_ERR_TRY_AGAIN;
+			goto response_send;
+		}
+	}
+
 	switch (cpd->cpd_state) {
 	case CPD_STATE_UNJOINED:
 		error = CPG_OK;

commit 2d336506d95914afdaf0ecb7e345a33dac15d212
Author: Jan Friesse <[email protected]>
Date:   Fri Feb 26 10:08:39 2010 +0100

    Allow only one connection per (node, pid, grp)
    
    This patch allows only one connection per (node, pid, grp_name) tuple.
    This means, you cannot make more connection from one process to same
    group_name. This is (I hope) how cpg should behave. In case, you will
    try to do that, CPG_ERR_EXISTS error is returned.
    
    Of course, there is no problem with creating:
    - more connection with same (pid, grp) if nodeid is different
    - more connection with same (node, grp) if pid is different (for example
      after fork, or two distinct processes)
    - more connection with same (node, pid) if grp is different (connect
      one process to more cpgs).
    
    It also handles situation, when on one node, one process:
    - join cpg
    - do same actions
    - leave cpg
    - join cpg again
    
    Following sequence can (racy) end with broken process_info list.
    
    To solve this problem, one more check is done in
    message_handler_req_lib_cpg_join so if process_info with same pid and
    group as new join request exists, CPG_ERR_TRY_AGAIN is returned.

diff --git a/branches/whitetank/exec/cpg.c b/branches/whitetank/exec/cpg.c
index 5323db2..bc68213 100644
--- a/branches/whitetank/exec/cpg.c
+++ b/branches/whitetank/exec/cpg.c
@@ -954,6 +954,35 @@ static void message_handler_req_lib_cpg_join (void *conn, void *message)
 	struct cpg_pd *cpd = (struct cpg_pd *)openais_conn_private_data_get (conn);
 	struct res_lib_cpg_join res_lib_cpg_join;
 	SaAisErrorT error = CPG_OK;
+	struct list_head *iter;
+
+	/* Test, if we don't have same pid and group name joined */
+	for (iter = cpg_pd_list_head.next; iter != &cpg_pd_list_head; iter = iter->next) {
+		struct cpg_pd *cpd_item = list_entry (iter, struct cpg_pd, list);
+
+		if (cpd_item->pid == req_lib_cpg_join->pid &&
+			mar_name_compare(&req_lib_cpg_join->group_name, &cpd_item->group_name) == 0) {
+
+			/* We have same pid and group name joined -> return error */
+			error = CPG_ERR_EXIST;
+			goto response_send;
+		}
+	}
+
+	/*
+	 * Same check must be done in process info list, because there may be not yet delivered
+	 * leave of client.
+	 */
+	for (iter = process_info_list_head.next; iter != &process_info_list_head; iter = iter->next) {
+		struct process_info *pi = list_entry (iter, struct process_info, list);
+
+		if (pi->nodeid == totempg_my_nodeid_get () && pi->pid == req_lib_cpg_join->pid &&
+		    mar_name_compare(&req_lib_cpg_join->group_name, &pi->group) == 0) {
+			/* We have same pid and group name joined -> return error */
+			error = CPG_ERR_TRY_AGAIN;
+			goto response_send;
+		}
+	}
 
 	switch (cpd->cpd_state) {
 	case CPD_STATE_UNJOINED:
@@ -978,6 +1007,7 @@ static void message_handler_req_lib_cpg_join (void *conn, void *message)
 		break;
 	}
 
+response_send:
 	res_lib_cpg_join.header.size = sizeof(res_lib_cpg_join);
         res_lib_cpg_join.header.id = MESSAGE_RES_CPG_JOIN;
         res_lib_cpg_join.header.error = error;

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] stale CPG members in confchg callback

Reply via email to