Hi,
I am setting up a two node Samba cluster with CTDB in AWS in two different
subnets. All IP ports are open between these two subnets. I am initially
forming the Samba cluster with one node, then will add the second node after
startup of CTDB. I am not using public_addresses for CTDB due to AWS not
supporting VIP's. I am using 64bit Amazon Linux with two NICs defined, eth0 as
the primary NIC, eth1 as the private IP NIC. With clustering off and no CTDB,
Samba works. I need to get this running for a needed project. Only errors are
reported in /var/log/log.ctdb. Please help.
CTDB Configs:
Edit /etc/sysconfig/ctdb for the following to change from default.
CTDB_RECOVERY_LOCK="/samba/samba_lock"
CTDB_NODES=/etc/ctdb/nodes
CTDB_DEBUGLEVEL=3
Edited /etc/ctdb/nodes to add internal Ip address for eth1 for private IP.
The complete /var/log/log.ctdb:
2013/04/09 16:09:59.881679 [30574]: CTDB starting on node
2013/04/09 16:09:59.886133 [30575]: Starting CTDBD as pid : 30575
2013/04/09 16:09:59.886305 [30575]: Set scheduler to SCHED_FIFO
2013/04/09 16:09:59.886637 [30575]: ctdb chose network address 10.22.1.20:4379
pnn 0
2013/04/09 16:09:59.887035 [30575]: server/eventscript.c:800 Starting
eventscript init
2013/04/09 16:09:59.969022 [30575]: 10.interface: No public addresses file
found. Nothing to do for 10.interfaces
2013/04/09 16:10:00.246654 [30575]: server/eventscript.c:486 Eventscript init
finished with state 0
2013/04/09 16:10:00.248978 [30575]: Keepalive monitoring has been started
2013/04/09 16:10:00.249024 [30575]: Monitoring has been started
2013/04/09 16:10:00.249057 [30575]: server/eventscript.c:800 Starting
eventscript setup
2013/04/09 16:10:00.249415 [recoverd:30648]: monitor_cluster starting
2013/04/09 16:10:00.251621 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17870283321406128128
2013/04/09 16:10:00.251760 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17870564796382838784
2013/04/09 16:10:00.251858 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17870846271359549440
2013/04/09 16:10:00.251952 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17365880163140632576
2013/04/09 16:10:00.252050 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17582052945254416384
2013/04/09 16:10:00.252150 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17942340915444056064
2013/04/09 16:10:00.252243 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17798225727368200192
2013/04/09 16:10:00.252332 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=18014398509481984000
2013/04/09 16:10:00.252422 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=18086456103519911936
2013/04/09 16:10:00.252511 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=18087019053473333248
2013/04/09 16:10:00.252600 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=18158513697557839872
2013/04/09 16:10:00.252688 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=17654110539292344320
2013/04/09 16:10:00.252776 [30575]: server/ctdb_daemon.c:182 Registered message
handler for srvid=18086737578496622592
2013/04/09 16:10:00.253577 [recoverd:30648]: server/ctdb_recoverd.c:3415
Initial recovery master set - forcing election
2013/04/09 16:10:00.253609 [recoverd:30648]: server/ctdb_recoverd.c:2521 Force
an election
2013/04/09 16:10:00.253673 [30575]: Freeze priority 1
2013/04/09 16:10:00.253783 [30575]: Freeze priority 2
2013/04/09 16:10:00.253901 [30575]: Freeze priority 3
2013/04/09 16:10:00.254181 [recoverd:30648]: server/ctdb_recoverd.c:2005 Send
election request to all active nodes
2013/04/09 16:10:01.249677 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:02.249961 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:03.250141 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:03.257560 [recoverd:30648]: server/ctdb_recoverd.c:1055
Election timed out
2013/04/09 16:10:03.258563 [recoverd:30648]: The interfaces status has changed
on local node 0 - force takeover run
2013/04/09 16:10:03.258805 [recoverd:30648]: Trigger takeoverrun
2013/04/09 16:10:03.259041 [recoverd:30648]: server/ctdb_recoverd.c:2702 Node:0
was in recovery mode. Restart recovery process
2013/04/09 16:10:03.259071 [recoverd:30648]: server/ctdb_recoverd.c:1555
Starting do_recovery
2013/04/09 16:10:03.259085 [recoverd:30648]: Taking out recovery lock from
recovery daemon
2013/04/09 16:10:03.259108 [recoverd:30648]: Take the recovery lock
2013/04/09 16:10:03.267903 [recoverd:30648]: Recovery lock taken successfully
2013/04/09 16:10:03.267933 [recoverd:30648]: ctdb_recovery_lock: Got recovery
lock on '/mnt/prod-assets/samba/samba_lock'
2013/04/09 16:10:03.268052 [recoverd:30648]: Recovery lock taken successfully
by recovery daemon
2013/04/09 16:10:03.268071 [recoverd:30648]: server/ctdb_recoverd.c:1592
Recovery initiated due to problem with node 0
2013/04/09 16:10:03.268190 [recoverd:30648]: server/ctdb_recoverd.c:1617
Recovery - created remote databases
2013/04/09 16:10:03.268211 [recoverd:30648]: server/ctdb_recoverd.c:1624
Recovery - updated db priority for all databases
2013/04/09 16:10:03.268351 [30575]: Freeze priority 1
2013/04/09 16:10:03.268455 [30575]: Freeze priority 2
2013/04/09 16:10:03.268552 [30575]: Freeze priority 3
2013/04/09 16:10:03.268723 [30575]: server/ctdb_recover.c:1035 startrecovery
eventscript has been invoked
2013/04/09 16:10:03.268744 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.268763 [30575]: server/eventscript.c:800 Starting
eventscript startrecovery
2013/04/09 16:10:03.617562 [30575]: server/eventscript.c:486 Eventscript
startrecovery finished with state 0
2013/04/09 16:10:03.618061 [30575]: Control modflags on node 0 - Unchanged -
flags 0x2
2013/04/09 16:10:03.618127 [recoverd:30648]: server/ctdb_recoverd.c:1661
Recovery - updated flags
2013/04/09 16:10:03.618311 [recoverd:30648]: server/ctdb_recoverd.c:1705
started transactions on all nodes
2013/04/09 16:10:03.618333 [recoverd:30648]: server/ctdb_recoverd.c:1718
Recovery - starting database commits
2013/04/09 16:10:03.618389 [30575]: server/ctdb_freeze.c:408 healthy_nodes[0]
2013/04/09 16:10:03.618450 [recoverd:30648]: server/ctdb_recoverd.c:1730
Recovery - committed databases
2013/04/09 16:10:03.618621 [recoverd:30648]: server/ctdb_recoverd.c:1780
Recovery - updated vnnmap
2013/04/09 16:10:03.618717 [recoverd:30648]: server/ctdb_recoverd.c:1789
Recovery - updated recmaster
2013/04/09 16:10:03.618916 [30575]: Control modflags on node 0 - Unchanged -
flags 0x2
2013/04/09 16:10:03.618973 [recoverd:30648]: server/ctdb_recoverd.c:1806
Recovery - updated flags
2013/04/09 16:10:03.619034 [30575]: server/ctdb_recover.c:665 Recovery mode set
to NORMAL
2013/04/09 16:10:03.619053 [30575]: Thawing priority 1
2013/04/09 16:10:03.619066 [30575]: Release freeze handler for prio 1
2013/04/09 16:10:03.619110 [30575]: Thawing priority 2
2013/04/09 16:10:03.619126 [30575]: Release freeze handler for prio 2
2013/04/09 16:10:03.619150 [30575]: Thawing priority 3
2013/04/09 16:10:03.619164 [30575]: Release freeze handler for prio 3
2013/04/09 16:10:03.622723 [recoverd:30648]: server/ctdb_recoverd.c:1815
Recovery - disabled recovery mode
2013/04/09 16:10:03.623218 [recoverd:30648]: Disabling ip check for 9 seconds
2013/04/09 16:10:03.623228 [30575]: Running eventscripts with arguments
ipreallocated
2013/04/09 16:10:03.623260 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.623283 [30575]: server/eventscript.c:800 Starting
eventscript ipreallocated
2013/04/09 16:10:03.971720 [30575]: server/eventscript.c:486 Eventscript
ipreallocated finished with state 0
2013/04/09 16:10:03.971788 [30575]: Monitoring has been enabled
2013/04/09 16:10:03.972044 [30575]: Recovery has finished
2013/04/09 16:10:03.972067 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.972083 [30575]: server/eventscript.c:800 Starting
eventscript recovered
2013/04/09 16:10:04.250561 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:04.250613 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
2013/04/09 16:10:04.322804 [30575]: server/eventscript.c:486 Eventscript
recovered finished with state 0
2013/04/09 16:10:04.322870 [30575]: Monitoring has been enabled
2013/04/09 16:10:04.322983 [recoverd:30648]: server/ctdb_recoverd.c:1841
Recovery - finished the recovered event
2013/04/09 16:10:04.323022 [recoverd:30648]: server/ctdb_recoverd.c:1847
Recovery complete
2013/04/09 16:10:04.323038 [recoverd:30648]: Resetting ban count to 0 for all
nodes
2013/04/09 16:10:04.323057 [recoverd:30648]: Just finished a recovery. New
recoveries will now be supressed for the rerecovery timeout (10 seconds)
2013/04/09 16:10:05.251440 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:05.251473 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
2013/04/09 16:10:06.251582 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:06.251634 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
2013/04/09 16:10:07.251744 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:07.251775 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
2013/04/09 16:10:08.251886 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:08.251925 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
2013/04/09 16:10:09.252062 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:09.252117 [30575]: server/ctdb_monitor.c:261 wait for pending
recoveries to end. Wait one more second.
Thanks for any help.
Chuck
[email protected]
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba