25.05.2010 02:01, Steven Dake wrote: > On 05/24/2010 02:51 PM, Vladislav Bogdanov wrote: >> Hi all, >> >> Sorry for being out of "References", just subscribed. >> >> >>> On Fri, 2010-05-21 at 16:19 +0200, Alain.Moulle wrote: >>> >>>> Hi >>>> >>>> These new releases of corosync do not start successfully on RHEL5 : >>>> corosync-1.2.2-1.1.el5 >>>> corosynclib-1.2.2-1.1.el5 >>>> I 've joined the messages trace. >>>> >>>> whereas on same machines, these old ones works fine : >>>> corosync-1.2.1-1.el5 >>>> corosynclib-1.2.1-1.el5 >>>> >>>> I've reinstalled these old ones and it works fine again. >>>> And ... I can't test furthermore with the new releases before around 10 >>>> days. >>>> >>>> Regards >>>> Alain >>>> >>>> >>> Building from sources on rhel5, corosync starts properly. I didn't give >>> pacemaker a go. >>> >>> could you provide more information: >>> 1) where did you download the corosync rpms >>> 2) Which version of RHEL are you running >>> >>> Then I can look into reproducing >>> >> I confirm that both 1.2.2 and 1.2.3 segfault on CentOS 5.5 when >> pacemaker is enabled (this is critical, corosync alone starts just fine). >> Tried with both clusterlabs 1.2.2-1.1 RPM and home-brew 1.2.3 RPM. >> >> Segfault is originated from exec/logsys.c:760, in strlen(rec->buffer) >> >> Can't post gdb output, console buffer is lost yet due to urgent >> downgrade. >> >> > > Reproducible?
100% Need to note that arch is x86_64. > > can you run corosync-fplay and send the list the output. Not much info there. Starting replay: head [1311] tail [0] rec=[1] Log Message=Corosync Cluster Engine ('1.2.3'): started and ready to provide service. rec=[2] Log Message=Corosync built-in features: nss rdma rec=[3] Log Message=Successfully read main configuration file '/etc/corosync/corosync.conf'. rec=[4] Log Message=Token Timeout (3000 ms) retransmit timeout (294 ms) rec=[5] Log Message=token hold (225 ms) retransmits before loss (10 retrans) rec=[6] Log Message=join (60 ms) send_join (0 ms) consensus (3600 ms) merge (200 ms) rec=[7] Log Message=downcheck (1000 ms) fail to recv const (50 msgs) rec=[8] Log Message=seqno unchanged const (30 rotations) Maximum network MTU 1402 rec=[9] Log Message=window size per rotation (50 messages) maximum messages per rotation (20 messages) rec=[10] Log Message=send threads (0 threads) rec=[11] Log Message=RRP token expired timeout (294 ms) rec=[12] Log Message=RRP token problem counter (2000 ms) rec=[13] Log Message=RRP threshold (10 problem count) rec=[14] Log Message=RRP mode set to passive. rec=[15] Log Message=heartbeat_failures_allowed (0) rec=[16] Log Message=max_network_delay (50 ms) rec=[17] Log Message=HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 rec=[18] Log Message=Initializing transport (UDP/IP). rec=[19] Log Message=Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). rec=[20] Log Message=Initializing transport (UDP/IP). rec=[21] Log Message=Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). rec=[22] Log Message=you are using ipc api v2 rec=[23] Log Message=Receive multicast socket recv buffer size (262142 bytes). rec=[24] Log Message=Transmit multicast socket send buffer size (262142 bytes). rec=[25] Log Message=The network interface [10.5.250.2] is now up. rec=[26] Log Message=Created or loaded sequence id 296.10.5.250.2 for this ring. rec=[27] Log Message=info: process_ais_conf: Reading configure rec=[28] Log Message=info: config_find_init: Local handle: 2730409743423111170 for logging rec=[29] Log Message=info: config_find_next: Processing additional logging options... rec=[30] Log Message=info: get_config_opt: Found 'off' for option: debug rec=[31] Log Message=info: get_config_opt: Defaulting to 'off' for option: to_file rec=[32] Log Message=info: get_config_opt: Found 'yes' for option: to_syslog rec=[33] Log Message=info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility rec=[34] Log Message=info: config_find_init: Local handle: 5880381755227111427 for service rec=[35] Log Message=info: config_find_next: Processing additional service options... rec=[36] Log Message=info: get_config_opt: Defaulting to 'pcmk' for option: clustername rec=[37] Log Message=info: get_config_opt: Defaulting to 'no' for option: use_logd rec=[38] Log Message=info: get_config_opt: Defaulting to 'no' for option: use_mgmtd rec=[39] Log Message=info: pcmk_startup: CRM: Initialized rec=[40] Log Message=Logging: Initialized pcmk_startup rec=[41] Log Message=info: pcmk_startup: Maximum core file size is: 18446744073709551615 rec=[42] Log Message=info: pcmk_startup: Service: 9 Finishing replay: records found [42] Hmm... I'm wrong. There IS some info. rec id 42 didn't show on stderr if I enable later. > > Please send your conf file. # cat /etc/corosync/corosync.conf compatibility: none totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 # consensus: 1500 # vsftype: none max_messages: 20 clear_node_high_bit: yes # secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 10.5.250.0 mcastaddr: 239.94.1.1 mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 10.5.4.0 mcastaddr: 239.94.2.1 mcastport: 5405 } } logging { fileline: off to_stderr: no to_logfile: no to_syslog: yes logfile: /tmp/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } service { name: pacemaker ver: 0 } aisexec { user: root group: root } Here is gdb backtrace # stty -tostop # gdb `which corosync` GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/corosync...Reading symbols from /usr/lib/debug/usr/sbin/corosync.debug...done. done. (gdb) set args -f (gdb) run Starting program: /usr/sbin/corosync -f [Thread debugging using libthread_db enabled] [New Thread 0x40a00940 (LWP 30752)] [New Thread 0x40a18fe0 (LWP 30753)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x40a00940 (LWP 30752)] 0x00000033812797c0 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x00000033812797c0 in strlen () from /lib64/libc.so.6 #1 0x00002aaaaace4c6b in logsys_worker_thread (data=<value optimized out>) at logsys.c:760 #2 0x0000003381a0673d in start_thread () from /lib64/libpthread.so.0 #3 0x00000033812d3d1d in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x00000033812797c0 in strlen () from /lib64/libc.so.6 No symbol table info available. #1 0x00002aaaaace4c6b in logsys_worker_thread (data=<value optimized out>) at logsys.c:760 rec = 0x2aaaaaee5cc8 dropped = 0 #2 0x0000003381a0673d in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000033812d3d1d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) info threads 3 Thread 0x40a18fe0 (LWP 30753) 0x0000003381a0d48e in __lll_lock_wait_private () from /lib64/libpthread.so.0 * 2 Thread 0x40a00940 (LWP 30752) 0x00000033812797c0 in strlen () from /lib64/libc.so.6 1 Thread 0x2aaaab0f3a60 (LWP 30749) 0x0000003380a145f2 in strcmp () from /lib64/ld-linux-x86-64.so.2 Best, Vladislav > > Thanks > -steve > >> Best, >> Vladislav >> _______________________________________________ >> Openais mailing list >> Openais@lists.linux-foundation.org >> https://lists.linux-foundation.org/mailman/listinfo/openais >> > _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais