It looks like the link to the master domain is not in the tree.
I need to see the full logs and understand what happened. Including the engine 
Are you sure you don't have them? even if they were rotated they should be kept 
as a vdsm.log.*.xz under /var/log/vdsm/ 

----- Original Message -----
> From: "Yuval M" <>
> To: "Yeela Kaplan" <>
> Cc: "Limor Gavish" <>,, "Nezer Zaidenberg" 
> <>
> Sent: Wednesday, April 17, 2013 4:56:55 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
> 1. we do not have the logs from before the problem.
> 2.
> --------
> $ tree /rhev/data-center/
> /rhev/data-center/
> âââ hsm-tasks
> âââ mnt
>     âââ bufferoverflow.home:_home_BO__ISO__Domain
>     â   âââ 45d24e2a-705e-440f-954c-fda3cab61298
>     â   â   âââ dom_md
>     â   â   â   âââ ids
>     â   â   â   âââ inbox
>     â   â   â   âââ leases
>     â   â   â   âââ metadata
>     â   â   â   âââ outbox
>     â   â   âââ images
>     â   â       âââ 11111111-1111-1111-1111-111111111111
>     â   â           âââ Fedora-18-x86_64-DVD.iso
>     â   â           âââ Fedora-18-x86_64-Live-Desktop.iso
>     â   âââ __DIRECT_IO_TEST__
>     âââ bufferoverflow.home:_home_BO__Ovirt__Storage
>     âââ kernelpanic.home:_home_KP__Data__Domain
>         âââ a8286508-db45-40d7-8645-e573f6bacdc7
>         â   âââ dom_md
>         â   â   âââ ids
>         â   â   âââ inbox
>         â   â   âââ leases
>         â   â   âââ metadata
>         â   â   âââ outbox
>         â   âââ images
>         â       âââ 0df45336-de35-4dc0-9958-95b27d5d4701
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
>         â       â   âââ
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
>         â       â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c
>         â       â   âââ
>         â       â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c.meta
>         â       âââ 0e1ebaf7-3909-44cd-8560-d05a63eb4c4e
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
>         â       â   âââ
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
>         â       â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e
>         â       â   âââ
>         â       â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.meta
>         â       âââ 32ebb85a-0dde-47fe-90c7-7f4fb2c0f1e5
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
>         â       â   âââ
>         â       â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
>         â       â   âââ 4774095e-db3d-4561-8284-53eabfd28f66
>         â       â   âââ
>         â       â   âââ 4774095e-db3d-4561-8284-53eabfd28f66.meta
>         â       âââ a7e13a25-1694-4509-9e6b-e88583a4d970
>         â           âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
>         â           âââ
>         â           âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
>         âââ __DIRECT_IO_TEST__
> 16 directories, 35 files
> --------------------
> 3. We have 3 domains:
> BO_Ovirt_Storage (data domain, on the same machine as engine and vdsm, via
> NFS)
> BO_ISO_Domain (ISO domain, same machine via NFS)
> KP_Data_Domain (data domain on an NFS mount on a different machine)
> Yuval
> On Wed, Apr 17, 2013 at 4:28 PM, Yeela Kaplan <> wrote:
> > Hi Limor,
> > 1) Your log starts exactly after the vdsm restart. I need to see the full
> > vdsm log from before the domains went down in order to understand the
> > problem. Can you attach them?
> > 2) can you send the printout of 'tree /rhev/data-center/'
> > 3) how many domains are attached to your DC, and what type are they(ISO,
> > export,data) and (The DC is nfs right)?
> >
> > Thanks,
> > Yeela
> >
> > ----- Original Message -----
> > > From: "Limor Gavish" <>
> > > To: "Tal Nisan" <>
> > > Cc: "Yuval M" <>,, "Nezer Zaidenberg" <
> >>
> > > Sent: Monday, April 15, 2013 5:10:16 PM
> > > Subject: Re: [Users] oVirt storage is down and doesn't come up
> > >
> > > Thank you very much for your reply.
> > > I ran the commands you asked (see below) but a directory named as the
> > uuid of
> > > the master domain is not mounted. We tried to restart the VDSM and the
> > > entire machine it didn't help.
> > > We succeeded to manually mount " /home/BO_Ovirt_Storage" to a temporary
> > > directory.
> > >
> > > postgres=# \connect engine;
> > > You are now connected to database "engine" as user "postgres".
> > > engine=# select current_database();
> > > current_database
> > > ------------------
> > > engine
> > > (1 row)
> > > engine=# select , ssc.connection from storage_domain_static sds
> > join
> > > storage_server_connections ssc on where
> > > ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
> > > id | connection
> > >
> > --------------------------------------+--------------------------------------------
> > > 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
> > > bufferoverflow.home:/home/BO_Ovirt_Storage
> > > (1 row)
> > >
> > > [wil@bufferoverflow ~] $ mount
> > > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> > > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> > > devtmpfs on /dev type devtmpfs
> > > (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> > > securityfs on /sys/kernel/security type securityfs
> > > (rw,nosuid,nodev,noexec,relatime)
> > > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> > > devpts on /dev/pts type devpts
> > > (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> > > tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> > > tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> > > cgroup on /sys/fs/cgroup/systemd type cgroup
> > >
> > (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> > > cgroup on /sys/fs/cgroup/cpuset type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,cpuset)
> > > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> > > cgroup on /sys/fs/cgroup/memory type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,memory)
> > > cgroup on /sys/fs/cgroup/devices type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,devices)
> > > cgroup on /sys/fs/cgroup/freezer type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,freezer)
> > > cgroup on /sys/fs/cgroup/net_cls type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,net_cls)
> > > cgroup on /sys/fs/cgroup/blkio type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,blkio)
> > > cgroup on /sys/fs/cgroup/perf_event type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,perf_event)
> > > /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> > > rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> > > debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> > > sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> > > systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> > > (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> > > mqueue on /dev/mqueue type mqueue (rw,relatime)
> > > tmpfs on /tmp type tmpfs (rw)
> > > configfs on /sys/kernel/config type configfs (rw,relatime)
> > > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> > > /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> > > /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> > > kernelpanic.home:/home/KP_Data_Domain on
> > > /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
> > >
> > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=
> > > bufferoverflow.home:/home/BO_ISO_Domain on
> > > /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
> > >
> > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=
> > >
> > > [wil@bufferoverflow ~]$ ls -la /home/
> > > total 36
> > > drwxr-xr-x. 6 root root 4096 Mar 22 11:25 .
> > > dr-xr-xr-x. 19 root root 4096 Apr 12 18:53 ..
> > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_ISO_Domain
> > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_Ovirt_Storage
> > > drwx------. 2 root root 16384 Mar 6 09:11 lost+found
> > > drwx------. 27 wil wil 4096 Apr 15 01:50 wil
> > > [wil@bufferoverflow ~]$ cd /home/BO_Ovirt_Storage/
> > > [wil@bufferoverflow BO_Ovirt_Storage]$ ls -la
> > > total 12
> > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 .
> > > drwxr-xr-x. 6 root root 4096 Mar 22 11:25 ..
> > > drwxr-xr-x 5 vdsm kvm 4096 Mar 20 23:06
> > 1083422e-a5db-41b6-b667-b9ef1ef244f0
> > > -rwxr-xr-x 1 vdsm kvm 0 Mar 27 17:33 __DIRECT_IO_TEST__
> > >
> > > Thanks,
> > > Limor
> > >
> > >
> > > On Mon, Apr 15, 2013 at 4:02 PM, Tal Nisan < > wrote:
> > >
> > >
> > >
> > > Hi Limor,
> > > First we should probably start with checking which mount is the master
> > > storage domain that appears as not found, this should be checked against
> > the
> > > oVirt server database, please run
> > >
> > > select , ssc.connection from storage_domain_static sds join
> > > storage_server_connections ssc on
> > > where ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
> > >
> > > You can run this via psql or a Postgres ui if you have one.
> > > In the results you will see the storage connection in the format of
> > > %hostname%:/%mountName%, then in the VDSM server check in the mount list
> > > that you see that it is mounted, the mount itself should contain a
> > directory
> > > named as the uuid of the master domain, let me know the result.
> > >
> > > Tal.
> > >
> > >
> > >
> > >
> > > On 04/12/2013 07:29 PM, Limor Gavish wrote:
> > >
> > >
> > >
> > > Hi,
> > >
> > > For some reason, without doing anything, all the storage domains became
> > down
> > > and restarting VDSM or the entire machine do not bring it up.
> > > I am not using lvm
> > > The following errors appear several times in vdsm.log (full logs are
> > > attached):
> > >
> > > Thread-22::WARNING::2013-04-12
> > > 19:00:08,597::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
> > > Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found']
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,598::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm
> > reload
> > > operation' released the operation mutex
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,681::resourceManager::615::ResourceManager::(releaseResource)
> > > Trying to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3'
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,681::resourceManager::634::ResourceManager::(releaseResource)
> > > Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0
> > active
> > > users)
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,681::resourceManager::640::ResourceManager::(releaseResource)
> > > Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, finding
> > out
> > > if anyone is waiting for it.
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,682::resourceManager::648::ResourceManager::(releaseResource) No
> > > one is waiting for resource
> > 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3',
> > > Clearing records.
> > > Thread-22::ERROR::2013-04-12
> > > 19:00:08,682::task::850::TaskManager.Task::(_setError)
> > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Unexpected error
> > > Traceback (most recent call last):
> > > File "/usr/share/vdsm/storage/", line 857, in _run
> > > return fn(*args, **kargs)
> > > File "/usr/share/vdsm/", line 45, in wrapper
> > > res = f(*args, **kwargs)
> > > File "/usr/share/vdsm/storage/", line 939, in connectStoragePool
> > > masterVersion, options)
> > > File "/usr/share/vdsm/storage/", line 986, in _connectStoragePool
> > > res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
> > > File "/usr/share/vdsm/storage/", line 695, in connect
> > > self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
> > > File "/usr/share/vdsm/storage/", line 1232, in __rebuild
> > > masterVersion=masterVersion)
> > > File "/usr/share/vdsm/storage/", line 1576, in getMasterDomain
> > > raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
> > > StoragePoolMasterNotFound: Cannot find master domain:
> > > 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3,
> > > msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,685::task::869::TaskManager.Task::(_run)
> > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Task._run:
> > > e35a22ac-771a-4916-851f-2fe9d60a0ae6
> > > ('5849b030-626e-47cb-ad90-3ce782d831b3', 1,
> > > '5849b030-626e-47cb-ad90-3ce782d831b3',
> > > '1083422e-a5db-41b6-b667-b9ef1ef244f0', 3942) {} failed - stopping task
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,685::task::1194::TaskManager.Task::(stop)
> > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::stopping in state preparing
> > > (force False)
> > > Thread-22::DEBUG::2013-04-12
> > > 19:00:08,685::task::974::TaskManager.Task::(_decref)
> > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::ref 1 aborting True
> > > Thread-22::INFO::2013-04-12
> > > 19:00:08,686::task::1151::TaskManager.Task::(prepare)
> > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::aborting: Task is aborted:
> > > 'Cannot find master domain' - code 304
> > >
> > > [wil@bufferoverflow ~]$ sudo vgs --noheadings --units b --nosuffix
> > > --separator \| -o
> > >
> > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
> > > No volume groups found
> > >
> > > [wil@bufferoverflow ~]$ mount
> > > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> > > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> > > devtmpfs on /dev type devtmpfs
> > > (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> > > securityfs on /sys/kernel/security type securityfs
> > > (rw,nosuid,nodev,noexec,relatime)
> > > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> > > devpts on /dev/pts type devpts
> > > (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> > > tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> > > tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> > > cgroup on /sys/fs/cgroup/systemd type cgroup
> > >
> > (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> > > cgroup on /sys/fs/cgroup/cpuset type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,cpuset)
> > > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> > > cgroup on /sys/fs/cgroup/memory type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,memory)
> > > cgroup on /sys/fs/cgroup/devices type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,devices)
> > > cgroup on /sys/fs/cgroup/freezer type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,freezer)
> > > cgroup on /sys/fs/cgroup/net_cls type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,net_cls)
> > > cgroup on /sys/fs/cgroup/blkio type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,blkio)
> > > cgroup on /sys/fs/cgroup/perf_event type cgroup
> > > (rw,nosuid,nodev,noexec,relatime,perf_event)
> > > /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> > > rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> > > debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> > > sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> > > systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> > > (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> > > mqueue on /dev/mqueue type mqueue (rw,relatime)
> > > tmpfs on /tmp type tmpfs (rw)
> > > configfs on /sys/kernel/config type configfs (rw,relatime)
> > > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> > > /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> > > /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> > > kernelpanic.home:/home/KP_Data_Domain on
> > > /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
> > >
> > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=
> > > bufferoverflow.home:/home/BO_ISO_Domain on
> > > /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
> > >
> > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=
> > >
> > > [wil@bufferoverflow ~]$ sudo find / -name
> > > 5849b030-626e-47cb-ad90-3ce782d831b3
> > > /run/vdsm/pools/5849b030-626e-47cb-ad90-3ce782d831b3
> > >
> > > [wil@bufferoverflow ~]$ sudo find / -name
> > > 1083422e-a5db-41b6-b667-b9ef1ef244f0
> > > /home/BO_Ovirt_Storage/1083422e-a5db-41b6-b667-b9ef1ef244f0
> > >
> > > I will extremely appreciate any help,
> > > Limor Gavish
> > > _______________________________________________
> > > Users mailing list
> > >
> > >
> > >
> > > _______________________________________________
> > > Users mailing list
> > >
> > >
> > >
> >
Users mailing list

Reply via email to