Re: [Openais] Upgrade from openais-0.80 to corosync-1.3.0 fails

Dan Frincu Wed, 26 Jan 2011 09:43:29 -0800

Hi Steve,

On Wed, Jan 26, 2011 at 6:53 PM, Steven Dake <[email protected]> wrote:


> Gather from state 3 back to back is an indicator that iptables are not
> properly configured on the node.  I know you said iptables are turned
> off, but if iptables are off, the node would at least form a singleton
> ring.
>
> Could you send your config file?
>

I haven't read about the MCP deployment model that you've mentioned in your
previous email, I'd like to know more, if you can point me to the right
documentation I'd appreciate it.

Here is the config file, iptables output, corosync-cfgtool -s,
corosync-fplay: http://pastebin.com/bChQZgaE

Let me know if there's anything else that I can provide.

Regards,
Dan


> Regards
> -steve
>
>
> On 01/26/2011 09:01 AM, Dan Frincu wrote:
> > Update: increased verbosity to debug and I get the following
> >
> > Jan 26 16:36:45 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:36:45 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:36:53 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:36:53 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:37:00 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:37:00 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:37:08 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:37:08 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:37:12 cluster1 crmd: [16266]: ERROR: crm_timer_popped:
> > Integration Timer (I_INTEGRATED) just popped!
> > Jan 26 16:37:12 cluster1 crmd: [16266]: info: crm_timer_popped:
> > Welcomed: 1, Integrated: 0
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: s_crmd_fsa: Processing
> > I_INTEGRATED: [ state=S_INTEGRATION cause=C_TIMER_POPPED
> > origin=crm_timer_popped ]
> > Jan 26 16:37:12 cluster1 crmd: [16266]: info: do_state_transition: State
> > transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> > cause=C_TIMER_POPPED origin=crm_timer_popped ]
> > Jan 26 16:37:12 cluster1 crmd: [16266]: WARN: do_state_transition:
> > Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
> > Jan 26 16:37:12 cluster1 crmd: [16266]: WARN: do_state_transition: 1
> > cluster nodes failed to respond to the join offer.
> > Jan 26 16:37:12 cluster1 crmd: [16266]: info: ghash_print_node:
> > Welcome reply not received from: cluster1 42
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_DC_TIMER_STOP
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_INTEGRATE_TIMER_STOP
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_FINALIZE_TIMER_START
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: crm_timer_start: Started
> > Finalization Timer (I_ELECTION:1800000ms), src=102
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_DC_JOIN_FINALIZE
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_dc_join_finalize:
> > Finializing join-42 for 0 clients
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: s_crmd_fsa: Processing
> > I_ELECTION_DC: [ state=S_FINALIZE_JOIN cause=C_FSA_INTERNAL
> > origin=do_dc_join_finalize ]
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_WARN
> > Jan 26 16:37:12 cluster1 crmd: [16266]: WARN: do_log: FSA: Input
> > I_ELECTION_DC from do_dc_join_finalize() received in state
> S_FINALIZE_JOIN
> > Jan 26 16:37:12 cluster1 crmd: [16266]: info: do_state_transition: State
> > transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC
> > cause=C_FSA_INTERNAL origin=do_dc_join_finalize ]
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_DC_TIMER_STOP
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_INTEGRATE_TIMER_START
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: crm_timer_start: Started
> > Integration Timer (I_INTEGRATED:180000ms), src=103
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_FINALIZE_TIMER_STOP
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_ELECTION_VOTE
> > Jan 26 16:37:12 corosync [TOTEM ] mcasted message added to pending queue
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_election_vote: Started
> > election 44
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: do_fsa_action:
> > actions:trace:    // A_DC_JOIN_OFFER_ALL
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: initialize_join: join-43:
> > Initializing join data (flag=true)
> > Jan 26 16:37:12 cluster1 crmd: [16266]: debug: join_make_offer: join-43:
> > Sending offer to cluster1
> > Jan 26 16:37:12 cluster1 crmd: [16266]: info: do_dc_join_offer_all:
> > join-43: Waiting on 1 outstanding join acks
> > Jan 26 16:37:15 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:37:16 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:37:23 corosync [TOTEM ] The consensus timeout expired.
> > Jan 26 16:37:23 corosync [TOTEM ] entering GATHER state from 3.
> > Jan 26 16:37:31 corosync [TOTEM ] The consensus timeout expired.
> >
> > Running corosync-blackbox gives me:
> >
> > # corosync-blackbox
> > Starting replay: head [67420] tail [0]
> > rec=[1] Log Message=Corosync Cluster Engine ('1.3.0'): started and ready
> > to provide service.
> > rec=[2] Log Message=Corosync built-in features: nss rdma
> > rec=[3] Log Message=Successfully read main configuration file
> > '/etc/corosync/corosync.conf'.
> > rec=[4] Log Message=Token Timeout (5000 ms) retransmit timeout (490 ms)
> > rec=[5] Log Message=token hold (382 ms) retransmits before loss (10
> retrans)
> > rec=[6] Log Message=join (1000 ms) send_join (45 ms) consensus (2500 ms)
> > merge (200 ms)
> > rec=[7] Log Message=downcheck (1000 ms) fail to recv const (50 msgs)
> > rec=[8] Log Message=seqno unchanged const (30 rotations) Maximum network
> > MTU 1402
> > rec=[9] Log Message=window size per rotation (50 messages) maximum
> > messages per rotation (25 messages)
> > rec=[10] Log Message=send threads (0 threads)
> > rec=[11] Log Message=RRP token expired timeout (490 ms)
> > rec=[12] Log Message=RRP token problem counter (2000 ms)
> > rec=[13] Log Message=RRP threshold (10 problem count)
> > rec=[14] Log Message=RRP mode set to active.
> > rec=[15] Log Message=heartbeat_failures_allowed (0)
> > rec=[16] Log Message=max_network_delay (50 ms)
> > rec=[17] Log Message=HeartBeat is Disabled. To enable set
> > heartbeat_failures_allowed > 0
> > rec=[18] Log Message=Initializing transport (UDP/IP Unicast).
> > rec=[19] Log Message=Initializing transmit/receive security: libtomcrypt
> > SOBER128/SHA1HMAC (mode 0).
> > rec=[20] Log Message=Initializing transport (UDP/IP Unicast).
> > rec=[21] Log Message=Initializing transmit/receive security: libtomcrypt
> > SOBER128/SHA1HMAC (mode 0).
> > rec=[22] Log Message=you are using ipc api v2
> > rec=[23] Log Message=The network interface [10.0.2.11] is now up.
> > rec=[24] Log Message=Created or loaded sequence id 0.10.0.2.11 for this
> > ring.
> > rec=[25] Log Message=debug: pcmk_user_lookup: Cluster user root has
> > uid=0 gid=0
> > rec=[26] Log Message=info: process_ais_conf: Reading configure
> > rec=[27] Log Message=info: config_find_init: Local handle:
> > 4552499517957603332 for logging
> > rec=[28] Log Message=info: config_find_next: Processing additional
> > logging options...
> > rec=[29] Log Message=info: get_config_opt: Found 'on' for option: debug
> > rec=[30] Log Message=info: get_config_opt: Found 'yes' for option:
> > to_logfile
> > rec=[31] Log Message=info: get_config_opt: Found
> > '/var/log/cluster/corosync.log' for option: logfile
> > rec=[32] Log Message=info: get_config_opt: Found 'no' for option:
> to_syslog
> > rec=[33] Log Message=info: process_ais_conf: User configured file based
> > logging and explicitly disabled syslog.
> > rec=[34] Log Message=info: config_find_init: Local handle:
> > 8972265949260414981 for service
> > rec=[35] Log Message=info: config_find_next: Processing additional
> > service options...
> > rec=[36] Log Message=info: get_config_opt: Defaulting to 'pcmk' for
> > option: clustername
> > rec=[37] Log Message=info: get_config_opt: Found 'no' for option:
> use_logd
> > rec=[38] Log Message=info: get_config_opt: Found 'yes' for option:
> use_mgmtd
> > rec=[39] Log Message=info: pcmk_startup: CRM: Initialized
> > rec=[40] Log Message=Logging: Initialized pcmk_startup
> > rec=[41] Log Message=info: pcmk_startup: Maximum core file size is:
> > 18446744073709551615
> > rec=[42] Log Message=debug: pcmk_user_lookup: Cluster user hacluster has
> > uid=101 gid=102
> > rec=[43] Log Message=info: pcmk_startup: Service: 9
> > rec=[44] Log Message=info: pcmk_startup: Local hostname: cluster1
> > rec=[45] Log Message=info: pcmk_update_nodeid: Local node id: 184680458
> > rec=[46] Log Message=info: update_member: Creating entry for node
> > 184680458 born on 0
> > rec=[47] Log Message=info: update_member: 0x5fe73e0 Node 184680458 now
> > known as cluster1 (was: (null))
> > rec=[48] Log Message=info: update_member: Node cluster1 now has 1 quorum
> > votes (was 0)
> > rec=[49] Log Message=info: update_member: Node 184680458/cluster1 is
> > now: member
> > rec=[50] Log Message=info: spawn_child: Forked child 16261 for process
> > stonithd
> > rec=[51] Log Message=debug: pcmk_user_lookup: Cluster user hacluster has
> > uid=101 gid=102
> > rec=[52] Log Message=info: spawn_child: Forked child 16262 for process
> cib
> > rec=[53] Log Message=info: spawn_child: Forked child 16263 for process
> lrmd
> > rec=[54] Log Message=debug: pcmk_user_lookup: Cluster user hacluster has
> > uid=101 gid=102
> > rec=[55] Log Message=info: spawn_child: Forked child 16264 for process
> attrd
> > rec=[56] Log Message=debug: pcmk_user_lookup: Cluster user hacluster has
> > uid=101 gid=102
> > rec=[57] Log Message=info: spawn_child: Forked child 16265 for process
> > pengine
> > rec=[58] Log Message=debug: pcmk_user_lookup: Cluster user hacluster has
> > uid=101 gid=102
> > rec=[59] Log Message=info: spawn_child: Forked child 16266 for process
> crmd
> > *rec=[60] Log Message=spawn_child: FATAL: Cannot exec
> > /usr/lib64/heartbeat/mgmtd: (2) No such file or directory*
> > *** buffer overflow detected ***: corosync-fplay terminated
> > ======= Backtrace: =========
> > /lib64/libc.so.6(__chk_fail+0x2f)[0x37a72e6c2f]
> > corosync-fplay[0x400c0b]
> > /lib64/libc.so.6(__libc_start_main+0xf4)[0x37a721d974]
> > corosync-fplay[0x4008d9]
> > ======= Memory map: ========
> > 00400000-00402000 r-xp 00000000 08:05 1041788
> >  /usr/sbin/corosync-fplay
> > 00602000-00603000 rw-p 00002000 08:05 1041788
> >  /usr/sbin/corosync-fplay
> > 00603000-0060d000 rw-p 00603000 00:00 0
> > 0af45000-0af66000 rw-p 0af45000 00:00 0
> >  [heap]
> > 3190400000-319040d000 r-xp 00000000 08:02 911558
> > /lib64/libgcc_s-4.1.2-20080825.so.1
> > 319040d000-319060d000 ---p 0000d000 08:02 911558
> > /lib64/libgcc_s-4.1.2-20080825.so.1
> > 319060d000-319060e000 rw-p 0000d000 08:02 911558
> > /lib64/libgcc_s-4.1.2-20080825.so.1
> > 37a6e00000-37a6e1c000 r-xp 00000000 08:02 911525
> > /lib64/ld-2.5.so <http://ld-2.5.so>
> > 37a701b000-37a701c000 r--p 0001b000 08:02 911525
> > /lib64/ld-2.5.so <http://ld-2.5.so>
> > 37a701c000-37a701d000 rw-p 0001c000 08:02 911525
> > /lib64/ld-2.5.so <http://ld-2.5.so>
> > 37a7200000-37a734c000 r-xp 00000000 08:02 911526
> > /lib64/libc-2.5.so <http://libc-2.5.so>
> > 37a734c000-37a754c000 ---p 0014c000 08:02 911526
> > /lib64/libc-2.5.so <http://libc-2.5.so>
> > 37a754c000-37a7550000 r--p 0014c000 08:02 911526
> > /lib64/libc-2.5.so <http://libc-2.5.so>
> > 37a7550000-37a7551000 rw-p 00150000 08:02 911526
> > /lib64/libc-2.5.so <http://libc-2.5.so>
> > 37a7551000-37a7556000 rw-p 37a7551000 00:00 0
> > 37a7600000-37a7602000 r-xp 00000000 08:02 911527
> > /lib64/libdl-2.5.so <http://libdl-2.5.so>
> > 37a7602000-37a7802000 ---p 00002000 08:02 911527
> > /lib64/libdl-2.5.so <http://libdl-2.5.so>
> > 37a7802000-37a7803000 r--p 00002000 08:02 911527
> > /lib64/libdl-2.5.so <http://libdl-2.5.so>
> > 37a7803000-37a7804000 rw-p 00003000 08:02 911527
> > /lib64/libdl-2.5.so <http://libdl-2.5.so>
> > 37a7e00000-37a7e16000 r-xp 00000000 08:02 911531
> > /lib64/libpthread-2.5.so <http://libpthread-2.5.so>
> > 37a7e16000-37a8015000 ---p 00016000 08:02 911531
> > /lib64/libpthread-2.5.so <http://libpthread-2.5.so>
> > 37a8015000-37a8016000 r--p 00015000 08:02 911531
> > /lib64/libpthread-2.5.so <http://libpthread-2.5.so>
> > 37a8016000-37a8017000 rw-p 00016000 08:02 911531
> > /lib64/libpthread-2.5.so <http://libpthread-2.5.so>
> > 37a8017000-37a801b000 rw-p 37a8017000 00:00 0
> > 37a8e00000-37a8e07000 r-xp 00000000 08:02 911532
> > /lib64/librt-2.5.so <http://librt-2.5.so>
> > 37a8e07000-37a9007000 ---p 00007000 08:02 911532
> > /lib64/librt-2.5.so <http://librt-2.5.so>
> > 37a9007000-37a9008000 r--p 00007000 08:02 911532
> > /lib64/librt-2.5.so <http://librt-2.5.so>
> > 37a9008000-37a9009000 rw-p 00008000 08:02 911532
> > /lib64/librt-2.5.so <http://librt-2.5.so>
> > 2b3bae55d000-2b3bae55e000 rw-p 2b3bae55d000 00:00 0
> > 2b3bae56a000-2b3bae943000 rw-p 2b3bae56a000 00:00 0
> > 7ffffc537000-7ffffc54c000 rw-p 7ffffffea000 00:00 0
> >  [stack]
> > ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0
> >  [vdso]
> > /usr/bin/corosync-blackbox: line 34: 16676 Aborted
> > corosync-fplay
> >
> > I see the error message with mgmtd however I've performed the same test
> > on a pair of XEN VM's with the exact same packages (clean install, no
> > upgrade from openais-0.80 like the real hardware) and mgmtd doesn't
> > exist either, but it says
> >
> > rec=[55] Log Message=info: spawn_child: Forked child 4459 for process
> mgmtd
> >
> > # ll /usr/lib64/heartbeat/mgmtd
> > ls: /usr/lib64/heartbeat/mgmtd: No such file or directory
> >
> > # rpm -ql pacemaker-1.0.10-1.4 | grep mgm
> > /usr/lib64/python2.4/site-packages/crm/idmgmt.py
> > /usr/lib64/python2.4/site-packages/crm/idmgmt.pyc
> > /usr/lib64/python2.4/site-packages/crm/idmgmt.pyo
> >
> > Anyone?
> >
> > Regards,
> > Dan
> >
> > On Wed, Jan 26, 2011 at 1:35 PM, Dan Frincu <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi,
> >
> >     I've got a pair of servers running on RHEL5 x86_64 with openais-0.80
> >     (older install) which I want to upgrade to corosync-1.3.0 +
> >     pacemaker-1.0.10. Downtime is not an issue and corosync 1.3.0 is
> >     needed for UDPU, so I built it from the corosync.org
> >     <http://corosync.org/> website.
> >
> >     With pacemaker, we won't be using the heartbeat stack, so I built
> >     the pacemaker package from the clusterlabs.org
> >     <http://clusterlabs.org/> src.rpm without heartbeat support. To be
> >     more precise I used
> >
> >     rpmbuild --without heartbeat --with ais --with snmp --with esmtp -ba
> >     pacemaker-epel.spec
> >
> >     Now I've tested the rpm list below on a pair of XEN VM's, it works
> >     just fine.
> >
> >     cluster-glue-1.0.6-1.6.el5.x86_64.rpm
> >     cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
> >     corosync-1.3.0-1.x86_64.rpm
> >     corosynclib-1.3.0-1.x86_64.rpm
> >     libesmtp-1.0.4-5.el5.x86_64.rpm
> >     libibverbs-1.1.2-1.el5.x86_64.rpm
> >     librdmacm-1.0.8-1.el5.x86_64.rpm
> >     libtool-ltdl-1.5.22-6.1.x86_64.rpm
> >     openais-1.1.4-2.x86_64.rpm
> >     openaislib-1.1.4-2.x86_64.rpm
> >     openhpi-2.10.2-1.el5.x86_64.rpm
> >     openib-1.3.2-0.20080728.0355.3.el5.noarch.rpm
> >     pacemaker-1.0.10-1.4.x86_64.rpm
> >     pacemaker-libs-1.0.10-1.4.x86_64.rpm
> >     perl-TimeDate-1.16-5.el5.noarch.rpm
> >     resource-agents-1.0.3-2.6.el5.x86_64.rpm
> >
> >     However when performing the upgrade on the servers running
> >     openais-0.80, first I removed the heartbeat, heartbeat-libs and
> >     PyXML rpms (conflicting dependencies issue) then rpm -Uvh on the rpm
> >     list above. Installation went fine, removed existing cib.xml and
> >     signatures, fresh start. Then I configured corosync, then started it
> >     on both servers, and nothing. At first I got an error related to
> >     pacemaker mgmt, which was an old package installed with the old
> >     rpms. Removed it, tried again. Nothing. Removed all cluster related
> >     rpms old and new + deps, except for DRBD, then installed the list
> >     above, then again, nothing. What nothing means:
> >     - corosync starts, never elects DC, never sees the other node or
> >     itself for that matter.
> >     - can stop corosync via the init script, it goes into an endless
> >     phase where it just prints dots to the screen, have to kill the
> >     process to make it stop.
> >
> >     Troubleshooting done so far:
> >     - tested network sockets (nc from side to side), firewall rules
> >     (iptables down), communication is ok
> >     - searched for the original RPM's list, removed all remaining RPMs,
> >     ran ldconfig, removed new RPM's, installed new RPM's
> >
> >     My guess is that there are some leftovers from the old openais-0.80
> >     installation, which mess with the current installation, seeing as
> >     how the same set of RPMs on a pair of XEN VM's with the same OS work
> >     fine, however I cannot put my finger on the culprit for the real
> >     servers' issue.
> >
> >     Logs: http://pastebin.com/i0maZM4p
> >
> >     Removed everything after removing the RPM's, just to be extra
> >     paranoid about leftovers (rpm -qpl *.rpm >> file && for i in `cat
> >     file `; do [[ -e "$i" ]] && echo "$i" >> newfile ; done && for i in
> >     `cat newfile` ; do rm -rf $i ; done)
> >
> >     Installed RPMs (without openais)
> >
> >     Same output
> >
> >     http://pastebin.com/3iPHSXua
> >
> >     It seems to go into some sort of loop.
> >
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: ERROR: crm_timer_popped:
> >     Integration Timer (I_INTEGRATED) just popped!
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: info: crm_timer_popped:
> >     Welcomed: 1, Integrated: 0
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> >     input=I_INTEGRATED cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: WARN: do_state_transition:
> >     Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: WARN: do_state_transition: 1
> >     cluster nodes failed to respond to the join offer.
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: info: ghash_print_node:
> >     Welcome reply not received from: cluster1 7
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: WARN: do_log: FSA: Input
> >     I_ELECTION_DC from do_dc_join_finalize() received in state
> >     S_FINALIZE_JOIN
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_FINALIZE_JOIN -> S_INTEGRATION [
> >     input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_dc_join_finalize ]
> >     Jan 26 12:13:41 cluster1 crmd: [15612]: info: do_dc_join_offer_all:
> >     join-8: Waiting on 1 outstanding join acks
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: ERROR: crm_timer_popped:
> >     Integration Timer (I_INTEGRATED) just popped!
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: info: crm_timer_popped:
> >     Welcomed: 1, Integrated: 0
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> >     input=I_INTEGRATED cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: WARN: do_state_transition:
> >     Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: WARN: do_state_transition: 1
> >     cluster nodes failed to respond to the join offer.
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: info: ghash_print_node:
> >     Welcome reply not received from: cluster1 8
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: WARN: do_log: FSA: Input
> >     I_ELECTION_DC from do_dc_join_finalize() received in state
> >     S_FINALIZE_JOIN
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_FINALIZE_JOIN -> S_INTEGRATION [
> >     input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_dc_join_finalize ]
> >     Jan 26 12:16:41 cluster1 crmd: [15612]: info: do_dc_join_offer_all:
> >     join-9: Waiting on 1 outstanding join acks
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: ERROR: crm_timer_popped:
> >     Integration Timer (I_INTEGRATED) just popped!
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: info: crm_timer_popped:
> >     Welcomed: 1, Integrated: 0
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_INTEGRATION -> S_FINALIZE_JOIN [
> >     input=I_INTEGRATED cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: WARN: do_state_transition:
> >     Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: WARN: do_state_transition: 1
> >     cluster nodes failed to respond to the join offer.
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: info: ghash_print_node:
> >     Welcome reply not received from: cluster1 9
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: WARN: do_log: FSA: Input
> >     I_ELECTION_DC from do_dc_join_finalize() received in state
> >     S_FINALIZE_JOIN
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: info: do_state_transition:
> >     State transition S_FINALIZE_JOIN -> S_INTEGRATION [
> >     input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_dc_join_finalize ]
> >     Jan 26 12:19:41 cluster1 crmd: [15612]: info: do_dc_join_offer_all:
> >     join-10: Waiting on 1 outstanding join acks
> >     Jan 26 12:20:11 cluster1 cib: [15608]: info: cib_stats: Processed 1
> >     operations (0.00us average, 0% utilization) in the last 10min
> >
> >     Any suggestions?
> >
> >     TIA.
> >
> >     Regards,
> >     Dan
> >
> >     --
> >     Dan Frîncu
> >     CCNA, RHCE
> >
> >
> >
> >
> > --
> > Dan Frîncu
> > CCNA, RHCE
> >
> >
> >
> > _______________________________________________
> > Openais mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/openais
>
>


-- 
Dan Frîncu
CCNA, RHCE

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] Upgrade from openais-0.80 to corosync-1.3.0 fails

Reply via email to