Hi, last weekend I've updated samba and ctdb on my 2-node cluster. Samba is now on 3.5.6 (from 3.3.4), ctdb on 1.0.114 (from 1.0.84). Both installed from repo via yum and ctdb-packages.
After restarting both nodes everything was fine, we could access files on the cluster. On monday I noticed that the nodes didn't had their initial adresses: Node 1: hostname dscln01, public IP 10.0.0.41/8, now 10.0.0.42/8 /etc/sysconfig/network-scripts/ifcfg-bond0: DEVICE=bond0 BOOTPROTO=none IPADDR=10.0.0.41 NETWORK=10.0.0.0 BROADCAST=10.0.0.255 NETMASK=255.0.0.0 ONBOOT=yes USERCTL=no Node 2: hostname dscln02, public IP 10.0.0.42/8, now 10.0.0.41/8 /etc/sysconfig/network-scripts/ifcfg-bond0: DEVICE=bond0 BOOTPROTO=none IPADDR=10.0.0.42 NETWORK=10.0.0.0 BROADCAST=10.0.0.255 NETMASK=255.0.0.0 ONBOOT=yes USERCTL=no Yesterday it felt over so we had to reboot both nodes and the IP where still mixed up. log.ctdb got some interesing entries after reboot: 2010/11/17 09:48:02.613807 [ 4383]: killed 30 TCP connections to released IP 10.0.0.42 2010/11/17 09:48:02.633263 [ 4383]: re-adding secondary address 10.0.0.41/8 to dev bond0 2010/11/17 09:48:02.646140 [ 4383]: /etc/ctdb/interface_modify.sh: line 71: /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/*: No such file or directory 2010/11/17 09:48:02.646446 [ 4383]: /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/* 'bond0' '10.0.0.41' '8' - failed - 127 2010/11/17 09:48:02.646514 [ 4383]: call /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/* 'bond0' '10.0.0.41' '8' 2010/11/17 09:48:02.647412 [ 4383]: Failed to del 10.0.0.42 on dev bond0 2010/11/17 09:48:02.649354 [ 4383]: server/ctdb_daemon.c:688 waitpid() returned error. errno:10 I also notice, or lets say user reports, slow performance when shutting down their PC. When it comes to closing time load climbs to ~70 on both nodes. with high CPU load on ctdbd and mmfsd. OK, 220 PC writing back their profiles.. Could ctdb the blocking element when writing to it's persistent DB, since the local disks are not that super fast? Both nodes are hooked up to an infortrend SAN, connected up via FC-AL, FS is GPFS, running on CentOS 5.3. Did I do something wrong after or before upgrading? Matthias -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
