Hi, This sounds like something that someone on the openais would know. I've CC'd the openais list.
-- Lon On Fri, 2008-06-27 at 16:03 +1000, Bevan Broun wrote: > Hi All > > I have a 2 node RHEL-5.1 cluster. A quorum disk is configured. > The hosts have 4 NICs. These are bonded: > (eth0+eth2) -> bond0 > (eth1+eth3) -> bond1 > Unfortunately I was not able to use a dedicated interface for cluster > communications - bond1 is being used. This is where I think Im in trouble. > > The cluster has been configured using IP addressess. I did have to use > http://archives.free.net.ph/message/20080130.074958.5c7a211c.en.html > as the hostname is related to the bond0 IP. > > I have not defined the interface to be used by the cluster, just relying on > the IP address configured. > The cluster's purpose is 2 GFS file systems. > > The cluster was configured and working for 4 days before there was problems. > > I now have almost constant lost of token message in /var/log/message. They > are almost exactly 5 minutes apart. A typical bit of messages file is show > below my sig. > > Just before the problem started a samba message shows nmdb becomming local > master browser for a work group on the interface used for cluster > communications. > > Jun 20 13:39:27 HOST1 nmbd[24506]: [2008/06/20 13:39:27, 0] > nmbd/nmbd_become_lmb.c:become_loca > l_master_stage2(396) > Jun 20 13:39:27 HOST1 nmbd[24506]: ***** > Jun 20 13:39:27 HOST1 nmbd[24506]: > Jun 20 13:39:27 HOST1 nmbd[24506]: Samba name server NBM1 is now a local > master browser for > workgroup SMS_DOMAIN on subnet 162.16.96.229 > Jun 20 13:39:27 HOST1 nmbd[24506]: > Jun 20 13:39:27 HOST1 nmbd[24506]: ***** > Jun 20 13:43:27 HOST1 openais[15265]: [TOTEM] The token was lost in the > OPERATIONAL state. > > "cman_tool status" shows both nodes and looks normal. Looks like clmvd is not > happy, df commands are hanging. > > Could nmdb be causing this token loss? Any ideas on how to proceed? > > (names and IPs have been changed). > > Thanks > > Bevan Broun > Solutions Architect > Ardec International > http://www.ardec.com.au > http://www.lisasoft.com > http://www.terrapages.com > Sydney > ----------------------- > Suite 112,The Lower Deck > 19-21 Jones Bay Wharf > Pirrama Road, Pyrmont 2009 > Ph: +61 2 8570 5000 > Fax: +61 2 8570 5099 > > > > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] The token was lost in the > OPERATIONAL state. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Receive multicast socket recv > buffer size (28800 > 0 bytes). > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Transmit multicast socket send > buffer size (2621 > 42 bytes). > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering GATHER state from 2. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Creating commit token because > I am the rep. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Saving state aru 16 high seq > received 16 > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Storing new sequence id for > ring 20ce34 > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] The token was lost in the > COMMIT state. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Creating commit token because > I am the rep. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Storing new sequence id for > ring 20ce38 > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] The token was lost in the > COMMIT state. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Creating commit token because > I am the rep. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Storing new sequence id for > ring 20ce3c > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] The token was lost in the > COMMIT state. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Creating commit token because > I am the rep. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Storing new sequence id for > ring 20ce40 > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering RECOVERY state. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [0] member > 162.16.96.229: > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep > 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 > received flag 1 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [1] member > 162.16.96.230: > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep > 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 > received flag 1 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Did not need to originate any > messages in recove > ry. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Sending initial ORF token > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] CLM CONFIGURATION CHANGE > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] New Configuration: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.229) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.230) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Left: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Joined: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] CLM CONFIGURATION CHANGE > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] New Configuration: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.229) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.230) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Left: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Joined: > Jun 20 13:49:06 HOST1 openais[15265]: [SYNC ] This node is within the > primary component and wi > ll provide service. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering OPERATIONAL state. > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] got nodejoin message > 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] got nodejoin message > 162.16.96.230 > Jun 20 13:49:06 HOST1 openais[15265]: [CPG ] got joinlist message from node > 2 > Jun 20 13:49:06 HOST1 openais[15265]: [CPG ] got joinlist message from node > 1 > Jun 20 13:53:38 HOST1 openais[15265]: [TOTEM] The token was lost in the > OPERATIONAL state. > > The contents of this email are confidential and may be subject to legal or > professional privilege and copyright. No representation is made that this > email is free of viruses or other defects. If you have received this > communication in error, you may not copy or distribute any part of it or > otherwise disclose its contents to anyone. Please advise the sender of your > incorrect receipt of this correspondence. > > -- > Linux-cluster mailing list > [EMAIL PROTECTED] > https://www.redhat.com/mailman/listinfo/linux-cluster _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais