Hi All,
We had another problem again.
But, this problem seldom occurs.
In addition, it is a version before applying the next patch that this problem
occurred.
* [Openais] [PATCH] When a failed to recv state happens,stop forwarding the
token
* [Openais] [PATCH] When a failed to recv state happens,stop forwarding the
token(take 2)
The problem generated in our environment is the following thing.(coosync1.3.0 +
pacemaker-1.0.10-1
)
Step 1) corosync constitutes a cluster in 12 nodes.
* begin communication in TOKEN
Step 2) One node raises [FAILED TO RECEIVE].
Step 3) 12 nodes begin the reconfiguration of the cluster again.
Step 4) The node that caused trouble here receives unjust COMMIT-TOKEN.
* And this node occurs in assert().
(gdb) up
#1 0x00000031bdc34185 in abort () from /lib64/libc.so.6
(gdb) up
#2 0x00000031bdc2b935 in __assert_fail () from /lib64/libc.so.6
(gdb) up
#3 0x000000313d812326 in memb_state_commit_token_update (instance=
0x7ffda0239010) at totemsrp.c:2780
2780 assert (instance->commit_token->memb_index <=
instance->commit_token->addr_entries);
(gdb) list
2775 }
2776 }
2777
2778 instance->commit_token->header.nodeid =
instance->my_id.addr[0].nodeid;
2779 instance->commit_token->memb_index += 1;
2780 assert (instance->commit_token->memb_index <=
instance->commit_token->addr_entries);
2781 assert (instance->commit_token->header.nodeid);
2782 }
2783
2784 static void memb_state_commit_token_target_set (
(gdb) print instance->commit_token->memb_index
$1 = 12
(gdb) print instance->commit_token->addr_entries
$2 = 11
The inoperative node should be the member who does not receive COMMIT-TOKEN,
but seems to receive it.
Is there any information for this problem?
Is communication of COMMIT-TOKEN a problem?(BUG?)
Best Regards,
Hideo Yamauchi.
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais