[Openais] [Problem]The node that it breaks down, and left a cluster receives COMMIT-TOKEN.

renayama19661014 Wed, 09 Feb 2011 17:11:13 -0800

Hi All,

We had another problem again.


But, this problem seldom occurs.

In addition, it is a version before applying the next patch that this problem 
occurred.
  * [Openais] [PATCH] When a failed to recv state happens,stop forwarding the 
token
  * [Openais] [PATCH] When a failed to recv state happens,stop forwarding the 
token(take 2)

The problem generated in our environment is the following thing.(coosync1.3.0 + 
pacemaker-1.0.10-1
)

Step 1) corosync constitutes a cluster in 12 nodes.
 * begin communication in TOKEN

Step 2) One node raises [FAILED TO RECEIVE].

Step 3) 12 nodes begin the reconfiguration of the cluster again.

Step 4) The node that caused trouble here receives unjust COMMIT-TOKEN.
 * And this node occurs in assert().

(gdb) up
#1  0x00000031bdc34185 in abort () from /lib64/libc.so.6
(gdb) up
#2  0x00000031bdc2b935 in __assert_fail () from /lib64/libc.so.6
(gdb) up
#3  0x000000313d812326 in memb_state_commit_token_update (instance=
    0x7ffda0239010) at totemsrp.c:2780
2780            assert (instance->commit_token->memb_index <= 
instance->commit_token->addr_entries);
(gdb) list
2775                    }
2776            }
2777
2778            instance->commit_token->header.nodeid = 
instance->my_id.addr[0].nodeid;
2779            instance->commit_token->memb_index += 1;
2780            assert (instance->commit_token->memb_index <= 
instance->commit_token->addr_entries);
2781            assert (instance->commit_token->header.nodeid);
2782    }
2783
2784    static void memb_state_commit_token_target_set (
(gdb) print instance->commit_token->memb_index
$1 = 12
(gdb) print instance->commit_token->addr_entries
$2 = 11


The inoperative node should be the member who does not receive COMMIT-TOKEN, 
but seems to receive it. 

Is there any information for this problem?
Is communication of COMMIT-TOKEN a problem?(BUG?)

Best Regards,
Hideo Yamauchi.


_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

[Openais] [Problem]The node that it breaks down, and left a cluster receives COMMIT-TOKEN.

Reply via email to