Hi, Steven Thank you for your speedy response.
I use iptables but it have no DROP/REJECT rules for INPUT and OUTPUT chain. My iptables setting is below: [root@test1 ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- anywhere anywhere udp dpt:domain ACCEPT tcp -- anywhere anywhere tcp dpt:domain ACCEPT udp -- anywhere anywhere udp dpt:bootps ACCEPT tcp -- anywhere anywhere tcp dpt:bootps Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere 192.168.xxx.0/24 state RELATED,ESTABLISHED ACCEPT all -- 192.168.xxx.0/24 anywhere ACCEPT all -- anywhere anywhere REJECT all -- anywhere anywhere reject-with icmp-port-unreachable REJECT all -- anywhere anywhere reject-with icmp-port-unreachable ACCEPT all -- anywhere anywhere PHYSDEV match --physdev-is-bridged Chain OUTPUT (policy ACCEPT) target prot opt source destination There are any problems? BTW, I use Corosync-1.3 on the RHEL6 with 12-nodes cluster. Does anyone have a good record of Corosync-1.3 + RHEL6 with large scale cluster? Best Regards, (2011/02/22 3:08), Steven Dake wrote: > Your firewall may be enabled for the ports corosync uses to communicate. > Newer versions of corosync have a diag that tells the user this may be > a problem for them. > > The firewall needs to be configured properly if it is enabled (which it > is by default in RHEL/FEDORA). In a rhel environment, this can be done > via system->preferences->firewall GUI or adding your own iptables rules. > > Regards > -steve > > On 02/21/2011 12:56 AM, NAKAHIRA Kazutomo wrote: >> Hi, all >> >> # This problem related to following previous subject and we use same >> test environment. >> https://lists.linux-foundation.org/pipermail/openais/2011-February/015673.html >> >> The start process of corosync fell into an infinite loop >> in my test environment. >> >> The corosync process output a lot of following logs to the debug logfile >> and start-up process stalled. >> >> -- ha-debug -- >> Feb 21 15:39:46 node1 corosync[19268]: [TOTEM ] totemsrp.c:1852 >> entering GATHER state from 11. >> -- ha-debug -- >> >> It seems that all nodes sending a lot of "join messages" and >> they has no way out of the GATHER state. >> >> This loop is expected operation? >> >> The backtrace of corosync process is that: >> (gdb) bt >> #0 0x00000031bdca6a8d in nanosleep () from /lib64/libc.so.6 >> #1 0x00000031bdcda904 in usleep () from /lib64/libc.so.6 >> #2 0x000000351ae11245 in memb_join_message_send (instance=0x7f81483aa010) >> at totemsrp.c:2959 >> #3 0x000000351ae13aeb in memb_state_gather_enter (instance=0x7f81483aa010, >> gather_from=11) at totemsrp.c:1815 >> #4 0x000000351ae16e22 in memb_join_process (instance=0x7f81483aa010, >> memb_join=0x232e6c8) at totemsrp.c:3997 >> #5 0x000000351ae175a9 in message_handler_memb_join >> (instance=0x7f81483aa010, >> msg=<value optimized out>, msg_len=<value optimized out>, >> endian_conversion_needed=<value optimized out>) at totemsrp.c:4161 >> #6 0x000000351ae0e9a4 in rrp_deliver_fn (context=0x23022e0, msg=0x232e6c8, >> msg_len=596) at totemrrp.c:1511 >> #7 0x000000351ae0b4d6 in net_deliver_fn (handle=<value optimized out>, >> fd=<value optimized out>, revents=<value optimized out>, data=0x232e020) >> at totemudp.c:1244 >> #8 0x000000351ae07202 in poll_run (handle=1265737887312248832) >> at coropoll.c:510 >> #9 0x0000000000406cfd in main (argc=<value optimized out>, >> argv=<value optimized out>, envp=<value optimized out>) at main.c:1813 >> >> >> Our test environment is that: >> RHEL6(kernel 2.6.32-71.14.1.el6.x86_64) >> Corosync-1.3.0-1 >> Pacemaker-1.0.10-1 >> cluster-glue-1.0.6-1 >> resource-agents-1.0.3-1 >> >> >> corosync.conf is that: >> -- corosync.conf -- >> compatibility: whitetank >> >> aisexec { >> user: root >> group: root >> } >> >> service { >> name: pacemaker >> ver: 0 >> } >> >> totem { >> version: 2 >> secauth: off >> rrp_mode: active >> token: 16000 >> consensus: 20000 >> clear_node_high_bit: yes >> rrp_problem_count_timeout: 30000 >> fail_recv_const: 50 >> send_join: 10 >> interface { >> ringnumber: 0 >> bindnetaddr: AAA.BBB.xxx.0 >> mcastaddr: 226.94.1.1 >> mcastport: 5405 >> } >> interface { >> ringnumber: 1 >> bindnetaddr: AAA.BBB.yyy.0 >> mcastaddr: 226.94.1.1 >> mcastport: 5405 >> } >> } >> >> logging { >> fileline: on >> to_syslog: yes >> syslog_facility: local1 >> syslog_priority: info >> debug: on >> timestamp: on >> } >> -- corosync.conf -- >> >> We tried "fail_recv_const: 5000" and it lighten incidence of problem, >> But corosync start-up problem keeps being generated now. >> >> If "send_join: 10" is not set, a lot of multicast packet causes crowding >> the network and other network communications are blocked. >> >> >> Best Regards, >> > -- NAKAHIRA Kazutomo Infrastructure Software Technology Unit NTT Open Source Software Center _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
