Your firewall may be enabled for the ports corosync uses to communicate. Newer versions of corosync have a diag that tells the user this may be a problem for them.
The firewall needs to be configured properly if it is enabled (which it is by default in RHEL/FEDORA). In a rhel environment, this can be done via system->preferences->firewall GUI or adding your own iptables rules. Regards -steve On 02/21/2011 12:56 AM, NAKAHIRA Kazutomo wrote: > Hi, all > > # This problem related to following previous subject and we use same > test environment. > https://lists.linux-foundation.org/pipermail/openais/2011-February/015673.html > > The start process of corosync fell into an infinite loop > in my test environment. > > The corosync process output a lot of following logs to the debug logfile > and start-up process stalled. > > -- ha-debug -- > Feb 21 15:39:46 node1 corosync[19268]: [TOTEM ] totemsrp.c:1852 > entering GATHER state from 11. > -- ha-debug -- > > It seems that all nodes sending a lot of "join messages" and > they has no way out of the GATHER state. > > This loop is expected operation? > > The backtrace of corosync process is that: > (gdb) bt > #0 0x00000031bdca6a8d in nanosleep () from /lib64/libc.so.6 > #1 0x00000031bdcda904 in usleep () from /lib64/libc.so.6 > #2 0x000000351ae11245 in memb_join_message_send (instance=0x7f81483aa010) > at totemsrp.c:2959 > #3 0x000000351ae13aeb in memb_state_gather_enter (instance=0x7f81483aa010, > gather_from=11) at totemsrp.c:1815 > #4 0x000000351ae16e22 in memb_join_process (instance=0x7f81483aa010, > memb_join=0x232e6c8) at totemsrp.c:3997 > #5 0x000000351ae175a9 in message_handler_memb_join > (instance=0x7f81483aa010, > msg=<value optimized out>, msg_len=<value optimized out>, > endian_conversion_needed=<value optimized out>) at totemsrp.c:4161 > #6 0x000000351ae0e9a4 in rrp_deliver_fn (context=0x23022e0, msg=0x232e6c8, > msg_len=596) at totemrrp.c:1511 > #7 0x000000351ae0b4d6 in net_deliver_fn (handle=<value optimized out>, > fd=<value optimized out>, revents=<value optimized out>, data=0x232e020) > at totemudp.c:1244 > #8 0x000000351ae07202 in poll_run (handle=1265737887312248832) > at coropoll.c:510 > #9 0x0000000000406cfd in main (argc=<value optimized out>, > argv=<value optimized out>, envp=<value optimized out>) at main.c:1813 > > > Our test environment is that: > RHEL6(kernel 2.6.32-71.14.1.el6.x86_64) > Corosync-1.3.0-1 > Pacemaker-1.0.10-1 > cluster-glue-1.0.6-1 > resource-agents-1.0.3-1 > > > corosync.conf is that: > -- corosync.conf -- > compatibility: whitetank > > aisexec { > user: root > group: root > } > > service { > name: pacemaker > ver: 0 > } > > totem { > version: 2 > secauth: off > rrp_mode: active > token: 16000 > consensus: 20000 > clear_node_high_bit: yes > rrp_problem_count_timeout: 30000 > fail_recv_const: 50 > send_join: 10 > interface { > ringnumber: 0 > bindnetaddr: AAA.BBB.xxx.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > } > interface { > ringnumber: 1 > bindnetaddr: AAA.BBB.yyy.0 > mcastaddr: 226.94.1.1 > mcastport: 5405 > } > } > > logging { > fileline: on > to_syslog: yes > syslog_facility: local1 > syslog_priority: info > debug: on > timestamp: on > } > -- corosync.conf -- > > We tried "fail_recv_const: 5000" and it lighten incidence of problem, > But corosync start-up problem keeps being generated now. > > If "send_join: 10" is not set, a lot of multicast packet causes crowding > the network and other network communications are blocked. > > > Best Regards, > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
