here's a link from the searchable PLUG archive regarding a simple Linux HA howto.
http://marc.free.net.ph/message/20030211.111335.d4ec7690.html
- can you give me more info regarding your problem? does the problem also occurs when the master was properly shutdown (or heartbeat service stop) ?
- try to specify the udpport address
udpport 694
- try to minimize the HA settings such as the following: keepalive 1 deadtime 3 initdead 6
i've experienced this kind of problem before and the culprit was the NIC driver. so i'd simply recompiled the driver then everything worked fine.
btw, are you only using a single NIC for each machines? coz you can't effectively use heartbeat + DRBD using this kind of setup. DRBD will be confused on who will take ownership of the block device and will end up on being on "standalone" mode which will then lead to a time-consuming "Full-Sync" every time the master rejoins the failover cluster.
try this setup: use ETH0 for your typical LAN connection and as well as your heartbeat medium. while use ETH1 as your dedicated disk replication (and OpenMosix Cluster) network. this setup will not only eliminate the "split-brain" on the block device but it can also decongest your LAN traffic and increases the replication throughput. but you also have to add some lines to your HA configuration such as specifying the RESPAWN IPFAIL & PING nodes to effectively determine which of the two nodes is experiencing a network failure in order to eliminate the HA state wherein two of the nodes are acquiring the Virtual IP and other resource. much better if you'll use a Non-IP heartbeat medium such as a serial cable. if PING is blocked on your network then you can use other technique for checking metwork connection such as Nagios' check commands (check_ftp, check_tcp, etc.) or SNMP then simply call heartbeat's tool "hb_standby" to make the master node release its resources.
and one more thing, if your going to use Samba for the failover cluster, you might as well use WINBIND to unify the user accounts between 2 nodes and as well on the domain to ease account management. care to add openmosix on your network? since samba uses a different "Process" for every client connection, that process "might be" migratable to other openmosix-enabled nodes such as your backup server instead of wasting its computing capability which is most of the time on idle.
HTH
AT wrote:
Care to share your howto / links that you use in setting up HA-Cluster using heartbeat + DRBD?
Im trying to set up HA - cluster using heartbeat on RH AS 2.1
Both heartbeat service are running on primary and backup machine. When I intentionally remove the connection of the primary machine (pull the network cable), the virtual ip address and the hostname was taken over by the backup machine which is as expected then I connect the primary machine on the network, the virtual ip address and hostname was taken back by the primary machine but the problem is that the backup machine does not release the virual ip. So both machine have the same virtual ip address.
Anything that I miss out? or any suggestions what went wrong?
--------------------------------- *ha.cf
logfile /var/log/ha-log
keepalive 5
deadtime 10
bcast eth0
node svr01
node svr02
--------------------------------- *haresource
svr01 192.168.1.3 smb
--------------------------------- where:
svr01 - primry machine
svr02 - backup machine
192.168.1.3 - virtual ip
eth0 - heartbeat
smb - service to stop / start
thanks!
-- Philippine Linux Users' Group (PLUG) Mailing List [EMAIL PROTECTED] (#PLUG @ irc.free.net.ph) Official Website: http://plug.linux.org.ph Searchable Archives: http://marc.free.net.ph . To leave, go to http://lists.q-linux.com/mailman/listinfo/plug . Are you a Linux newbie? To join the newbie list, go to http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie
