[Bug 613793] Re: o2cb stopping Failed
[Expired for ocfs2-tools (Ubuntu) because there has been no activity for 60 days.] ** Changed in: ocfs2-tools (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 613793] Re: o2cb stopping Failed
This bug was originally reported against a version of Ubuntu that is not supported anymore (Ubuntu 10.04.1 LTS, see [1]). On which Ubuntu release are you experiencing the issue? While waiting for additional information I'm marking this bug as Incomplete. Thanks for your feedback! [1] https://wiki.ubuntu.com/Releases ** Changed in: ocfs2-tools (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 613793] Re: o2cb stopping Failed
In an OCFS2 cluster of XenServer 7.1.1 hosts, we met the same issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 613793] Re: o2cb stopping Failed
** Description changed: Binary package hint: ocfs2-tools Ubuntu release: Description:Ubuntu 10.04.1 LTS Release:10.04 Package version: ocfs2-tools 1.4.3-1 The script /etc/init.d/o2cb exits with an error when stopped and the services do not stop. Here the error message: /etc/init.d/o2cb stop Stopping O2CB cluster ocfs2: Failed Unable to stop cluster as heartbeat region still active I have identified a first error in the script. In the function clean_heartbeat the following if: if [ ! -f "$(configfs_path)/cluster/${CLUSTER}/heartbeat/*" ] - then - return + then + return fi is always true and the function returns. If the intention was to check the existence of the directory code must be: if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ] - then - echo "OK" - return + then + echo "OK" + return fi An error persist even after these changes. /etc/init.d/o2cb stop Cleaning heartbeat on ocfs2: Failed At least one heartbeat region still active I added some lines for debugging by changing the function so: # # clean_heartbeat() # Removes the inactive heartbeat regions # clean_heartbeat() { - if [ "$#" -lt "1" -o -z "$1" ] - then - echo "clean_heartbeat(): Requires an argument" >&2 - return 1 - fi - CLUSTER="$1" + if [ "$#" -lt "1" -o -z "$1" ] + then + echo "clean_heartbeat(): Requires an argument" >&2 + return 1 + fi + CLUSTER="$1" - if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ] - then - echo "OK" - return - fi + if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ] + then + echo "OK" + return + fi - echo -n "Cleaning heartbeat on ${CLUSTER}: " + echo -n "Cleaning heartbeat on ${CLUSTER}: " - ls -1 "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" | while read HBUUID - do - if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/${HBUUID}" ] - then - continue - fi + ls -1 "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" | while read HBUUID + do + if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/${HBUUID}" ] + then + continue + fi echo echo "DEBUG ocfs2_hb_ctl -I -u ${HBUUID} 2>&1" - OUTPUT="`ocfs2_hb_ctl -I -u ${HBUUID} 2>&1`" - if [ $? != 0 ] - then - echo "Failed" - echo "${OUTPUT}" >&2 - exit 1 - fi + OUTPUT="`ocfs2_hb_ctl -I -u ${HBUUID} 2>&1`" + if [ $? != 0 ] + then + echo "Failed" + echo "${OUTPUT}" >&2 + exit 1 + fi echo "DEBUG ${OUTPUT}" - REF="`echo ${OUTPUT} | awk '/refs/ {print $2; exit;}' 2>&1`" + REF="`echo ${OUTPUT} | awk '/refs/ {print $2; exit;}' 2>&1`" echo "DEBUG REF=$REF" - if [ $REF != 0 ] - then -echo "Failed" -echo "At least one heartbeat region still active" >&2 -exit 1 - else -OUTPUT="`ocfs2_hb_ctl -K -u ${HBUUID} 2>&1`" - fi - done - if [ $? = 1 ] - then - exit 1 - fi - echo "OK" + if [ $REF != 0 ] + then + echo "Failed" + echo "At least one heartbeat region still active" >&2 + exit 1 + else + OUTPUT="`ocfs2_hb_ctl -K -u ${HBUUID} 2>&1`" + fi + done + if [ $? = 1 ] + then + exit 1 + fi + echo "OK" } The new output is: /etc/init.d/o2cb stop - Cleaning heartbeat on ocfs2: + Cleaning heartbeat on ocfs2: DEBUG ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 2>&1 DEBUG FC046AD7B2584E7EB12A7293993C81B0: 2 refs DEBUG REF=2 Failed At least one heartbeat region still active At this point I checked the source code ocfs2_hb_ctl. The command ocfs2_hb_ctl-I-u ${HBUUID} returns the number of references in a semaphore used by programs that manage ocfs filesystem. In the source file libo2cb/o2cb_api.c: - the function o2cb_mutex_down increases the second semaphore; - the function o2cb_mutex_up decreases the first semaphore; - the function __o2cb_get_ref increases the first semaphore; - the function __o2cb_drop_ref decreases the first semaphore. I have not found the point where the second semaphore is decreased. This could be the cause of the error. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com
[Bug 613793] Re: o2cb stopping Failed
** Changed in: ocfs2-tools (Ubuntu) Importance: Undecided = Medium ** Changed in: ocfs2-tools (Ubuntu) Status: New = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to ocfs2-tools in ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 613793] Re: o2cb stopping Failed
** Changed in: ocfs2-tools (Ubuntu) Importance: Undecided = Medium ** Changed in: ocfs2-tools (Ubuntu) Status: New = Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/613793 Title: o2cb stopping Failed -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 613793] Re: o2cb stopping Failed
Continuing in the investigation have better defined the reasons of the bug. The following command gives the number of references to my filesystem ocfs. Here are the results of this command before starting the service o2cb. r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 ocfs2_hb_ctl: Unable to access cluster service Cannot initialize cluster After starting the service o2cb the number of references is always zero. r...@db1:~# /etc/init.d/o2cb start Loading filesystem configfs: OK Mounting configfs filesystem at /sys/kernel/config: OK Loading stack plugin o2cb: OK Loading filesystem ocfs2_dlmfs: OK Mounting ocfs2_dlmfs filesystem at /dlm: OK Setting cluster stack o2cb: OK Starting O2CB cluster ocfs2: OK r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 FC046AD7B2584E7EB12A7293993C81B0: 0 refs After mounting, the number of references becomes 1. r...@db1:~# mount -v -t ocfs2 /dev/gfsapp/gfs /mnt/gfs device=/dev/mapper/gfsapp-gfs /dev/mapper/gfsapp-gfs on /mnt/gfs type ocfs2 (rw,_netdev,heartbeat=local) r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 FC046AD7B2584E7EB12A7293993C81B0: 1 refs The reference number is still 1 after umounting. r...@db1:~# umount -v /mnt/gfs eseguito umount di /dev/mapper/gfsapp-gfs r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 FC046AD7B2584E7EB12A7293993C81B0: 1 refs Now the service o2cb can not be stopped. r...@db1:~# /etc/init.d/o2cb stop Cleaning heartbeat on ocfs2: DEBUG ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 21 DEBUG FC046AD7B2584E7EB12A7293993C81B0: 1 refs DEBUG REF=1 Failed At least one heartbeat region still active If you mount the ocfs filesystem again the number of references becomes 2. r...@db1:~# mount -v -t ocfs2 /dev/gfsapp/gfs /mnt/gfs device=/dev/mapper/gfsapp-gfs /dev/mapper/gfsapp-gfs on /mnt/gfs type ocfs2 (rw,_netdev,heartbeat=local) r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 FC046AD7B2584E7EB12A7293993C81B0: 2 refs At this point I tried to stop the heartbeat ocfs directly with the command ocfs2_hb_ctl -K. r...@db1:~# umount -v /mnt/gfs eseguito umount di /dev/mapper/gfsapp-gfs r...@db1:~# ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 FC046AD7B2584E7EB12A7293993C81B0: 2 refs r...@db1:~# ocfs2_hb_ctl -K -u FC046AD7B2584E7EB12A7293993C81B0 ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping heartbeat ocfs2_hb_ctl exits with an error because it can not find the device corresponding to ocfs filesystem. This program reads the file /proc/partitions and uses the column name to find the ocfs filesystem device. My /proc/partitions file is: r...@db1:~# cat /proc/partitions major minor #blocks name 1040 71652960 cciss/c0d0 1041 63865856 cciss/c0d0p1 10427785472 cciss/c0d0p2 80 244141056 sda 81 244139741 sda1 8 32 488282112 sdc 8 16 1758130176 sdb 8 48 488282112 sdd 2510 488280064 dm-0 2511 488280064 dm-1 2512 20971520 dm-2 8 64 244141056 sde 8 65 244139741 sde1 8 80 1758130176 sdf 8 112 488282112 sdh 2513 1737154560 dm-3 8 96 488282112 sdg The name that corresponds to the ocfs filesystem device is dm-0. This device does not exist as shown by the following command. r...@db1:~# ll /dev/dm-0 ls: can not access /dev/dm-0: Nessun file o directory The device dm-0 is not created even if udev contains the rules to make it, as shown by the following commands (I use the default udev rules with this distribution). r...@db1:~# udevadm info --query=all --name=gfsapp/gfs P: /devices/virtual/block/dm-0 N: mapper/gfsapp-gfs L: 50 W: 197 S: block/251:0 S: dm-0 S: disk/by-id/dm-name-gfsapp-gfs S: disk/by-id/dm-uuid-LVM-Se1805vbwYqSzX2KEfSWPZ9PqdWcQtMcfhaRcHV0oE9j814Nj97Af4vjRbCEYaYN S: disk/by-uuid/fc046ad7-b258-4e7e-b12a-7293993c81b0 S: disk/by-label/ocfs2 S: gfsapp/gfs E: UDEV_LOG=3 E: DEVPATH=/devices/virtual/block/dm-0 E: MAJOR=251 E: MINOR=0 E: DEVNAME=/dev/mapper/gfsapp-gfs E: DEVTYPE=disk E: SUBSYSTEM=block E: DM_NAME=gfsapp-gfs E: DM_UUID=LVM-Se1805vbwYqSzX2KEfSWPZ9PqdWcQtMcfhaRcHV0oE9j814Nj97Af4vjRbCEYaYN E: DM_SUSPENDED=0 E: DM_UDEV_RULES=1 E: DM_VG_NAME=gfsapp E: DM_LV_NAME=gfs E: DEVLINKS=/dev/block/251:0 /dev/dm-0 /dev/disk/by-id/dm-name-gfsapp-gfs /dev/disk/by-id/dm-uuid-LVM-Se1805vbwYqSzX2KEfSWPZ9PqdWcQtMcfhaRcHV0oE9j814Nj97Af4vjRbCEYaYN /dev/disk/by-uuid/fc046ad7-b258-4e7e-b12a-7293993c81b0 /dev/disk/by-label/ocfs2 /dev/gfsapp/gfs E: ID_FS_LABEL=ocfs2 E: ID_FS_LABEL_ENC=ocfs2 E: ID_FS_UUID=fc046ad7-b258-4e7e-b12a-7293993c81b0 E: ID_FS_UUID_ENC=fc046ad7-b258-4e7e-b12a-7293993c81b0 E: ID_FS_VERSION=0.90 E: ID_FS_TYPE=ocfs2 E: ID_FS_USAGE=filesystem E: FSTAB_NAME=/dev/gfsapp/gfs E: FSTAB_DIR=/mnt/gfs E: FSTAB_TYPE=ocfs2 E: FSTAB_OPTS=_netdev,noauto E: FSTAB_FREQ=0 E: FSTAB_PASSNO=0 E: DM_TABLE_STATE=LIVE E: DM_STATE=ACTIVE E: DM_TYPE=raid r...@db1:~# udevadm