1. we do not have the logs from before the problem. 2. -------- $ tree /rhev/data-center/ /rhev/data-center/ âââ hsm-tasks âââ mnt âââ bufferoverflow.home:_home_BO__ISO__Domain â  âââ 45d24e2a-705e-440f-954c-fda3cab61298 â  â  âââ dom_md â  â  â  âââ ids â  â  â  âââ inbox â  â  â  âââ leases â  â  â  âââ metadata â  â  â  âââ outbox â  â  âââ images â  â  âââ 11111111-1111-1111-1111-111111111111 â  â  âââ Fedora-18-x86_64-DVD.iso â  â  âââ Fedora-18-x86_64-Live-Desktop.iso â  âââ __DIRECT_IO_TEST__ âââ bufferoverflow.home:_home_BO__Ovirt__Storage âââ kernelpanic.home:_home_KP__Data__Domain âââ a8286508-db45-40d7-8645-e573f6bacdc7 â  âââ dom_md â  â  âââ ids â  â  âââ inbox â  â  âââ leases â  â  âââ metadata â  â  âââ outbox â  âââ images â  âââ 0df45336-de35-4dc0-9958-95b27d5d4701 â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c.lease â  â  âââ b245184f-f8e3-479b-8559-8b6af2473b7c.meta â  âââ 0e1ebaf7-3909-44cd-8560-d05a63eb4c4e â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.lease â  â  âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.meta â  âââ 32ebb85a-0dde-47fe-90c7-7f4fb2c0f1e5 â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease â  â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66 â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66.lease â  â  âââ 4774095e-db3d-4561-8284-53eabfd28f66.meta â  âââ a7e13a25-1694-4509-9e6b-e88583a4d970 â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease â  âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta âââ __DIRECT_IO_TEST__
16 directories, 35 files -------------------- 3. We have 3 domains: BO_Ovirt_Storage (data domain, on the same machine as engine and vdsm, via NFS) BO_ISO_Domain (ISO domain, same machine via NFS) KP_Data_Domain (data domain on an NFS mount on a different machine) Yuval On Wed, Apr 17, 2013 at 4:28 PM, Yeela Kaplan <ykap...@redhat.com> wrote: > Hi Limor, > 1) Your log starts exactly after the vdsm restart. I need to see the full > vdsm log from before the domains went down in order to understand the > problem. Can you attach them? > 2) can you send the printout of 'tree /rhev/data-center/' > 3) how many domains are attached to your DC, and what type are they(ISO, > export,data) and (The DC is nfs right)? > > Thanks, > Yeela > > ----- Original Message ----- > > From: "Limor Gavish" <lgav...@gmail.com> > > To: "Tal Nisan" <tni...@redhat.com> > > Cc: "Yuval M" <yuva...@gmail.com>, users@ovirt.org, "Nezer Zaidenberg" < > nzaidenb...@mac.com> > > Sent: Monday, April 15, 2013 5:10:16 PM > > Subject: Re: [Users] oVirt storage is down and doesn't come up > > > > Thank you very much for your reply. > > I ran the commands you asked (see below) but a directory named as the > uuid of > > the master domain is not mounted. We tried to restart the VDSM and the > > entire machine it didn't help. > > We succeeded to manually mount " /home/BO_Ovirt_Storage" to a temporary > > directory. > > > > postgres=# \connect engine; > > You are now connected to database "engine" as user "postgres". > > engine=# select current_database(); > > current_database > > ------------------ > > engine > > (1 row) > > engine=# select sds.id , ssc.connection from storage_domain_static sds > join > > storage_server_connections ssc on sds.storage= ssc.id where sds.id > > ='1083422e-a5db-41b6-b667-b9ef1ef244f0'; > > id | connection > > > --------------------------------------+-------------------------------------------- > > 1083422e-a5db-41b6-b667-b9ef1ef244f0 | > > bufferoverflow.home:/home/BO_Ovirt_Storage > > (1 row) > > > > [wil@bufferoverflow ~] $ mount > > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) > > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) > > devtmpfs on /dev type devtmpfs > > (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755) > > securityfs on /sys/kernel/security type securityfs > > (rw,nosuid,nodev,noexec,relatime) > > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) > > devpts on /dev/pts type devpts > > (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) > > tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755) > > tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755) > > cgroup on /sys/fs/cgroup/systemd type cgroup > > > (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) > > cgroup on /sys/fs/cgroup/cpuset type cgroup > > (rw,nosuid,nodev,noexec,relatime,cpuset) > > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup > > (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) > > cgroup on /sys/fs/cgroup/memory type cgroup > > (rw,nosuid,nodev,noexec,relatime,memory) > > cgroup on /sys/fs/cgroup/devices type cgroup > > (rw,nosuid,nodev,noexec,relatime,devices) > > cgroup on /sys/fs/cgroup/freezer type cgroup > > (rw,nosuid,nodev,noexec,relatime,freezer) > > cgroup on /sys/fs/cgroup/net_cls type cgroup > > (rw,nosuid,nodev,noexec,relatime,net_cls) > > cgroup on /sys/fs/cgroup/blkio type cgroup > > (rw,nosuid,nodev,noexec,relatime,blkio) > > cgroup on /sys/fs/cgroup/perf_event type cgroup > > (rw,nosuid,nodev,noexec,relatime,perf_event) > > /dev/sda3 on / type ext4 (rw,relatime,data=ordered) > > rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) > > debugfs on /sys/kernel/debug type debugfs (rw,relatime) > > sunrpc on /proc/fs/nfsd type nfsd (rw,relatime) > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) > > systemd-1 on /proc/sys/fs/binfmt_misc type autofs > > (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) > > mqueue on /dev/mqueue type mqueue (rw,relatime) > > tmpfs on /tmp type tmpfs (rw) > > configfs on /sys/kernel/config type configfs (rw,relatime) > > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) > > /dev/sda5 on /home type ext4 (rw,relatime,data=ordered) > > /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered) > > kernelpanic.home:/home/KP_Data_Domain on > > /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs > > > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100) > > bufferoverflow.home:/home/BO_ISO_Domain on > > /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs > > > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108) > > > > [wil@bufferoverflow ~]$ ls -la /home/ > > total 36 > > drwxr-xr-x. 6 root root 4096 Mar 22 11:25 . > > dr-xr-xr-x. 19 root root 4096 Apr 12 18:53 .. > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_ISO_Domain > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_Ovirt_Storage > > drwx------. 2 root root 16384 Mar 6 09:11 lost+found > > drwx------. 27 wil wil 4096 Apr 15 01:50 wil > > [wil@bufferoverflow ~]$ cd /home/BO_Ovirt_Storage/ > > [wil@bufferoverflow BO_Ovirt_Storage]$ ls -la > > total 12 > > drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 . > > drwxr-xr-x. 6 root root 4096 Mar 22 11:25 .. > > drwxr-xr-x 5 vdsm kvm 4096 Mar 20 23:06 > 1083422e-a5db-41b6-b667-b9ef1ef244f0 > > -rwxr-xr-x 1 vdsm kvm 0 Mar 27 17:33 __DIRECT_IO_TEST__ > > > > Thanks, > > Limor > > > > > > On Mon, Apr 15, 2013 at 4:02 PM, Tal Nisan < tni...@redhat.com > wrote: > > > > > > > > Hi Limor, > > First we should probably start with checking which mount is the master > > storage domain that appears as not found, this should be checked against > the > > oVirt server database, please run > > > > select sds.id , ssc.connection from storage_domain_static sds join > > storage_server_connections ssc on sds.storage= ssc.id > > where sds.id ='1083422e-a5db-41b6-b667-b9ef1ef244f0'; > > > > You can run this via psql or a Postgres ui if you have one. > > In the results you will see the storage connection in the format of > > %hostname%:/%mountName%, then in the VDSM server check in the mount list > > that you see that it is mounted, the mount itself should contain a > directory > > named as the uuid of the master domain, let me know the result. > > > > Tal. > > > > > > > > > > On 04/12/2013 07:29 PM, Limor Gavish wrote: > > > > > > > > Hi, > > > > For some reason, without doing anything, all the storage domains became > down > > and restarting VDSM or the entire machine do not bring it up. > > I am not using lvm > > The following errors appear several times in vdsm.log (full logs are > > attached): > > > > Thread-22::WARNING::2013-04-12 > > 19:00:08,597::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' > > Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found'] > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,598::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm > reload > > operation' released the operation mutex > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,681::resourceManager::615::ResourceManager::(releaseResource) > > Trying to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,681::resourceManager::634::ResourceManager::(releaseResource) > > Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0 > active > > users) > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,681::resourceManager::640::ResourceManager::(releaseResource) > > Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, finding > out > > if anyone is waiting for it. > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,682::resourceManager::648::ResourceManager::(releaseResource) No > > one is waiting for resource > 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3', > > Clearing records. > > Thread-22::ERROR::2013-04-12 > > 19:00:08,682::task::850::TaskManager.Task::(_setError) > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Unexpected error > > Traceback (most recent call last): > > File "/usr/share/vdsm/storage/task.py", line 857, in _run > > return fn(*args, **kargs) > > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > > res = f(*args, **kwargs) > > File "/usr/share/vdsm/storage/hsm.py", line 939, in connectStoragePool > > masterVersion, options) > > File "/usr/share/vdsm/storage/hsm.py", line 986, in _connectStoragePool > > res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) > > File "/usr/share/vdsm/storage/sp.py", line 695, in connect > > self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) > > File "/usr/share/vdsm/storage/sp.py", line 1232, in __rebuild > > masterVersion=masterVersion) > > File "/usr/share/vdsm/storage/sp.py", line 1576, in getMasterDomain > > raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) > > StoragePoolMasterNotFound: Cannot find master domain: > > 'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3, > > msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0' > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,685::task::869::TaskManager.Task::(_run) > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Task._run: > > e35a22ac-771a-4916-851f-2fe9d60a0ae6 > > ('5849b030-626e-47cb-ad90-3ce782d831b3', 1, > > '5849b030-626e-47cb-ad90-3ce782d831b3', > > '1083422e-a5db-41b6-b667-b9ef1ef244f0', 3942) {} failed - stopping task > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,685::task::1194::TaskManager.Task::(stop) > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::stopping in state preparing > > (force False) > > Thread-22::DEBUG::2013-04-12 > > 19:00:08,685::task::974::TaskManager.Task::(_decref) > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::ref 1 aborting True > > Thread-22::INFO::2013-04-12 > > 19:00:08,686::task::1151::TaskManager.Task::(prepare) > > Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::aborting: Task is aborted: > > 'Cannot find master domain' - code 304 > > > > [wil@bufferoverflow ~]$ sudo vgs --noheadings --units b --nosuffix > > --separator \| -o > > > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free > > No volume groups found > > > > [wil@bufferoverflow ~]$ mount > > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) > > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) > > devtmpfs on /dev type devtmpfs > > (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755) > > securityfs on /sys/kernel/security type securityfs > > (rw,nosuid,nodev,noexec,relatime) > > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) > > devpts on /dev/pts type devpts > > (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) > > tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755) > > tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755) > > cgroup on /sys/fs/cgroup/systemd type cgroup > > > (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) > > cgroup on /sys/fs/cgroup/cpuset type cgroup > > (rw,nosuid,nodev,noexec,relatime,cpuset) > > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup > > (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) > > cgroup on /sys/fs/cgroup/memory type cgroup > > (rw,nosuid,nodev,noexec,relatime,memory) > > cgroup on /sys/fs/cgroup/devices type cgroup > > (rw,nosuid,nodev,noexec,relatime,devices) > > cgroup on /sys/fs/cgroup/freezer type cgroup > > (rw,nosuid,nodev,noexec,relatime,freezer) > > cgroup on /sys/fs/cgroup/net_cls type cgroup > > (rw,nosuid,nodev,noexec,relatime,net_cls) > > cgroup on /sys/fs/cgroup/blkio type cgroup > > (rw,nosuid,nodev,noexec,relatime,blkio) > > cgroup on /sys/fs/cgroup/perf_event type cgroup > > (rw,nosuid,nodev,noexec,relatime,perf_event) > > /dev/sda3 on / type ext4 (rw,relatime,data=ordered) > > rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) > > debugfs on /sys/kernel/debug type debugfs (rw,relatime) > > sunrpc on /proc/fs/nfsd type nfsd (rw,relatime) > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) > > systemd-1 on /proc/sys/fs/binfmt_misc type autofs > > (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) > > mqueue on /dev/mqueue type mqueue (rw,relatime) > > tmpfs on /tmp type tmpfs (rw) > > configfs on /sys/kernel/config type configfs (rw,relatime) > > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) > > /dev/sda5 on /home type ext4 (rw,relatime,data=ordered) > > /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered) > > kernelpanic.home:/home/KP_Data_Domain on > > /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs > > > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100) > > bufferoverflow.home:/home/BO_ISO_Domain on > > /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs > > > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108) > > > > [wil@bufferoverflow ~]$ sudo find / -name > > 5849b030-626e-47cb-ad90-3ce782d831b3 > > /run/vdsm/pools/5849b030-626e-47cb-ad90-3ce782d831b3 > > > > [wil@bufferoverflow ~]$ sudo find / -name > > 1083422e-a5db-41b6-b667-b9ef1ef244f0 > > /home/BO_Ovirt_Storage/1083422e-a5db-41b6-b667-b9ef1ef244f0 > > > > I will extremely appreciate any help, > > Limor Gavish > > _______________________________________________ > > Users mailing list Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users