Re: [Users] oVirt storage is down and doesn't come up

2013-04-21 Thread Yeela Kaplan
Is the host up? do you have another sd besides the master you can use (from the 
logs I saw that you have another one), maybe you can try to re-initialize on it?
sorry, please attach 'tree /rhev/data-canter/'


- Original Message -
> From: "Yuval M" 
> To: "Yeela Kaplan" 
> Cc: users@ovirt.org, "Nezer Zaidenberg" , "Limor Gavish" 
> 
> Sent: Sunday, April 21, 2013 5:02:19 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
> 
> Hi,
> I am unable to add an additional storage domain - the "Use Host" selection
> box is empty and I cannot select any host:
> 
> 
> [image: Inline image 2]
> 
> # ls -ltr /rhev/data-center/
> total 8
> drwxr-xr-x  2 vdsm kvm 4096 Apr 18 18:27 hsm-tasks
> drwxr-xr-x. 4 vdsm kvm 4096 Apr 21 16:56 mnt
> 
> 
> Yuval
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt storage is down and doesn't come up

2013-04-21 Thread Yuval M
Hi,
I am unable to add an additional storage domain - the "Use Host" selection
box is empty and I cannot select any host:


[image: Inline image 2]

# ls -ltr /rhev/data-center/
total 8
drwxr-xr-x  2 vdsm kvm 4096 Apr 18 18:27 hsm-tasks
drwxr-xr-x. 4 vdsm kvm 4096 Apr 21 16:56 mnt


Yuval
<>___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt storage is down and doesn't come up

2013-04-21 Thread Yeela Kaplan
It looks to me like the master it is looking for is no longer there, I'm not 
sure what happened.
What you can try to do to get your storage back up is to create a new storage 
domain(it won't be able to attach since your pool is not connected to the host).
Then right click on the new SD and choose to re-initialize data center.
It will try to reconstruct your master SD.
also, just so that I have more information about the state of your system, 
please attach the result of: 'ls -ltr /rhev/data-center/'.
If you want to make this more interactive, you can connect to IRC.

- Original Message -
> From: "Yuval M" 
> To: users@ovirt.org, "Nezer Zaidenberg" , "Limor Gavish" 
> 
> Sent: Thursday, April 18, 2013 7:11:57 PM
> Subject: [Users]  oVirt storage is down and doesn't come up
> 
> No luck.
> 
> [wil@bufferoverflow ~]$ sudo systemctl stop vdsmd.service
> [wil@bufferoverflow ~]$ sudo rm -rf /rhev/data-canter/*
> [wil@bufferoverflow ~]$ ls -lad /rhev/data-center/
> drwxr-xr-x. 4 vdsm kvm 4096 Apr 18 18:27 /rhev/data-center/
> [wil@bufferoverflow ~]$ ls -la /rhev/data-center/
> total 16
> drwxr-xr-x. 4 vdsm kvm 4096 Apr 18 18:27 .
> drwxr-xr-x. 3 root root 4096 Mar 13 15:32 ..
> drwxr-xr-x 2 vdsm kvm 4096 Apr 18 18:27 hsm-tasks
> drwxr-xr-x. 4 vdsm kvm 4096 Apr 18 18:25 mnt
> [wil@bufferoverflow ~]$ sudo reboot
> Connection to bufferoverflow closed by remote host.
> ...
> Last login: Thu Apr 18 18:40:19 2013
> [wil@bufferoverflow ~]$ ./engine-service stop
> Stopping engine-service: [ OK ]
> [wil@bufferoverflow ~]$ ./engine-service start
> Starting engine-service: [ OK ]
> [wil@bufferoverflow ~]$
> 
> 
> Logs attached.
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt storage is down and doesn't come up

2013-04-18 Thread Yeela Kaplan
the vdsm.log.44.xz is exactly what I needed. It looks like your storage should 
be fine. 
please try 'rm -rf /rhev/data-canter/*' and then reboot the host.
let me know if it solves the problem. 
and if not, attach the new logs.

- Original Message -
> From: "Limor Gavish" 
> To: "Yeela Kaplan" 
> Cc: "Yuval M" , users@ovirt.org, "Nezer Zaidenberg" 
> 
> Sent: Wednesday, April 17, 2013 9:41:16 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
> 
> Thank you very much for your reply.
> 
> I see that the problem appears in vdsm.log.44.xz but doesn't appear in
> vdsm.log.45.xz
> 
> *[wil@bufferoverflow vdsm]$ xzcat vdsm.log.45.xz | grep
> StoragePoolMasterNotFound | wc -l*
> *0*
> *[wil@bufferoverflow vdsm]$ xzcat vdsm.log.44.xz | grep
> StoragePoolMasterNotFound | wc -l*
> *52*
> 
> so I hope the source of the problem is in one of them (attached).
> 
> *[wil@bufferoverflow vdsm]$ ls -la vdsm.log.44.xz*
> *-rw-r--r-- 1 vdsm kvm 763808 Mar 24 20:00 vdsm.log.44.xz*
> *[wil@bufferoverflow vdsm]$ ls -la vdsm.log.45.xz*
> *-rw-r--r-- 1 vdsm kvm 706212 Mar 22 11:00 vdsm.log.45.xz*
> 
> Unfortunately, I do not have any engine logs from that time (between Mar 22
> 11:00 and Mar 24 20:00)
> 
> *[wil@bufferoverflow ovirt-engine]$ ls -la*
> *total 148720*
> *drwxrwxr-x 2 wil wil 4096 Apr 17 09:07 .*
> *drwxrwxr-x 3 wil wil 4096 Mar 26 20:13 ..*
> *-rw-rw-r-- 1 wil wil  304 Apr 17 16:31 boot.log*
> *-rw-rw 1 wil wil  510 Apr 17 16:31 console.log*
> *-rw-rw-r-- 1 wil wil  7398188 Apr 17 21:35 engine.log*
> *-rw-rw-r-- 1 wil wil 10485813 Apr 13 09:20 engine.log.1*
> *-rw-rw-r-- 1 wil wil 10485766 Apr 11 13:19 engine.log.2*
> *-rw-rw-r-- 1 wil wil 10486016 Apr 11 08:14 engine.log.3*
> *-rw-rw-r-- 1 wil wil 10485972 Apr 11 03:06 engine.log.4*
> *-rw-rw-r-- 1 wil wil 10486208 Apr 10 22:01 engine.log.5*
> *-rw-rw-r-- 1 wil wil  8439424 Apr 17 16:31 server.log*
> *-rw-rw-r-- 1 wil wil 10485867 Apr 17 09:07 server.log.1*
> *-rw-rw-r-- 1 wil wil 10485943 Apr 17 02:40 server.log.2*
> *-rw-rw-r-- 1 wil wil 10485867 Apr 16 20:15 server.log.3*
> *-rw-rw-r-- 1 wil wil 10485943 Apr 16 13:54 server.log.4*
> *-rw-rw-r-- 1 wil wil 10485867 Apr 16 07:32 server.log.5*
> *-rw-rw-r-- 1 wil wil 10485943 Apr 16 01:05 server.log.6*
> *-rw-rw-r-- 1 wil wil 10485867 Apr 15 18:46 server.log.7*
> *-rw-rw-r-- 1 wil wil 10485781 Apr 15 12:28 server.log.8*
> *[wil@bufferoverflow ovirt-engine]$ pwd*
> */home/wil/ovirt-engine/installation/var/log/ovirt-engine*
> 
> 
> 
> On Wed, Apr 17, 2013 at 6:54 PM, Yeela Kaplan  wrote:
> 
> > It looks like the link to the master domain is not in the tree.
> > I need to see the full logs and understand what happened. Including the
> > engine log.
> > Are you sure you don't have them? even if they were rotated they should be
> > kept as a vdsm.log.*.xz under /var/log/vdsm/
> >
> > - Original Message -
> > > From: "Yuval M" 
> > > To: "Yeela Kaplan" 
> > > Cc: "Limor Gavish" , users@ovirt.org, "Nezer
> > Zaidenberg" 
> > > Sent: Wednesday, April 17, 2013 4:56:55 PM
> > > Subject: Re: [Users] oVirt storage is down and doesn't come up
> > >
> > > 1. we do not have the logs from before the problem.
> > > 2.
> > > 
> > > $ tree /rhev/data-center/
> > > /rhev/data-center/
> > > âââ hsm-tasks
> > > âââ mnt
> > > âââ bufferoverflow.home:_home_BO__ISO__Domain
> > > â   âââ 45d24e2a-705e-440f-954c-fda3cab61298
> > > â   â   âââ dom_md
> > > â   â   â   âââ ids
> > > â   â   â   âââ inbox
> > > â   â   â   âââ leases
> > > â   â   â   âââ metadata
> > > â   â   â   âââ outbox
> > > â   â   âââ images
> > > â   â   âââ ----
> > > â   â   âââ Fedora-18-x86_64-DVD.iso
> > > â   â   âââ Fedora-18-x86_64-Live-Desktop.iso
> > > â   âââ __DIRECT_IO_TEST__
> > > âââ bufferoverflow.home:_home_BO__Ovirt__Storage
> > > âââ kernelpanic.home:_home_KP__Data__Domain
> > > âââ a8286508-db45-40d7-8645-e573f6bacdc7
> > > â   âââ dom_md
> > > â   â   âââ ids
> > > â   â   âââ inbox
> > > â   â   âââ leases
> > > â   â   âââ metadata
> > > â   â   âââ outbox
> > > â   âââ images
> > >

Re: [Users] oVirt storage is down and doesn't come up

2013-04-17 Thread Yeela Kaplan
It looks like the link to the master domain is not in the tree.
I need to see the full logs and understand what happened. Including the engine 
log.
Are you sure you don't have them? even if they were rotated they should be kept 
as a vdsm.log.*.xz under /var/log/vdsm/ 

- Original Message -
> From: "Yuval M" 
> To: "Yeela Kaplan" 
> Cc: "Limor Gavish" , users@ovirt.org, "Nezer Zaidenberg" 
> 
> Sent: Wednesday, April 17, 2013 4:56:55 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
> 
> 1. we do not have the logs from before the problem.
> 2.
> 
> $ tree /rhev/data-center/
> /rhev/data-center/
> âââ hsm-tasks
> âââ mnt
> âââ bufferoverflow.home:_home_BO__ISO__Domain
> â   âââ 45d24e2a-705e-440f-954c-fda3cab61298
> â   â   âââ dom_md
> â   â   â   âââ ids
> â   â   â   âââ inbox
> â   â   â   âââ leases
> â   â   â   âââ metadata
> â   â   â   âââ outbox
> â   â   âââ images
> â   â   âââ ----
> â   â   âââ Fedora-18-x86_64-DVD.iso
> â   â   âââ Fedora-18-x86_64-Live-Desktop.iso
> â   âââ __DIRECT_IO_TEST__
> âââ bufferoverflow.home:_home_BO__Ovirt__Storage
> âââ kernelpanic.home:_home_KP__Data__Domain
> âââ a8286508-db45-40d7-8645-e573f6bacdc7
> â   âââ dom_md
> â   â   âââ ids
> â   â   âââ inbox
> â   â   âââ leases
> â   â   âââ metadata
> â   â   âââ outbox
> â   âââ images
> â   âââ 0df45336-de35-4dc0-9958-95b27d5d4701
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
> â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c
> â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c.lease
> â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c.meta
> â   âââ 0e1ebaf7-3909-44cd-8560-d05a63eb4c4e
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
> â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e
> â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.lease
> â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.meta
> â   âââ 32ebb85a-0dde-47fe-90c7-7f4fb2c0f1e5
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
> â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
> â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66
> â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66.lease
> â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66.meta
> â   âââ a7e13a25-1694-4509-9e6b-e88583a4d970
> â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
> â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
> â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
> âââ __DIRECT_IO_TEST__
> 
> 16 directories, 35 files
> 
> 
> 3. We have 3 domains:
> BO_Ovirt_Storage (data domain, on the same machine as engine and vdsm, via
> NFS)
> BO_ISO_Domain (ISO domain, same machine via NFS)
> KP_Data_Domain (data domain on an NFS mount on a different machine)
> 
> Yuval
> 
> 
> 
> On Wed, Apr 17, 2013 at 4:28 PM, Yeela Kaplan  wrote:
> 
> > Hi Limor,
> > 1) Your log starts exactly after the vdsm restart. I need to see the full
> > vdsm log from before the domains went down in order to understand the
> > problem. Can you attach them?
> > 2) can you send the printout of 'tree /rhev/data-center/'
> > 3) how many domains are attached to your DC, and what type are they(ISO,
> > export,data) and (The DC is nfs right)?
> >
> > Thanks,
> > Yeela
> >
> > - Original Message -
> > > From: "Limor Gavish" 
> > > To: "Tal Nisan" 
> > > Cc: "Yuval M" , users@ovirt.org, "Nezer Zaidenberg" <
> > nzaidenb...@mac.com>
> > > Sent: Monday, April 15, 2013 5:10:16 PM
> > > Subject: Re: [Users] oVirt storage is down and doesn't come up
> > >
> > > Thank you very much for your reply.
> > > I ran the commands you asked (see below)

Re: [Users] oVirt storage is down and doesn't come up

2013-04-17 Thread Yuval M
1. we do not have the logs from before the problem.
2.

$ tree /rhev/data-center/
/rhev/data-center/
âââ hsm-tasks
âââ mnt
âââ bufferoverflow.home:_home_BO__ISO__Domain
â   âââ 45d24e2a-705e-440f-954c-fda3cab61298
â   â   âââ dom_md
â   â   â   âââ ids
â   â   â   âââ inbox
â   â   â   âââ leases
â   â   â   âââ metadata
â   â   â   âââ outbox
â   â   âââ images
â   â   âââ ----
â   â   âââ Fedora-18-x86_64-DVD.iso
â   â   âââ Fedora-18-x86_64-Live-Desktop.iso
â   âââ __DIRECT_IO_TEST__
âââ bufferoverflow.home:_home_BO__Ovirt__Storage
âââ kernelpanic.home:_home_KP__Data__Domain
âââ a8286508-db45-40d7-8645-e573f6bacdc7
â   âââ dom_md
â   â   âââ ids
â   â   âââ inbox
â   â   âââ leases
â   â   âââ metadata
â   â   âââ outbox
â   âââ images
â   âââ 0df45336-de35-4dc0-9958-95b27d5d4701
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c
â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c.lease
â   â   âââ b245184f-f8e3-479b-8559-8b6af2473b7c.meta
â   âââ 0e1ebaf7-3909-44cd-8560-d05a63eb4c4e
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e
â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.lease
â   â   âââ 562b9043-bde8-4595-bbea-fa8871f0e19e.meta
â   âââ 32ebb85a-0dde-47fe-90c7-7f4fb2c0f1e5
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â   â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66
â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66.lease
â   â   âââ 4774095e-db3d-4561-8284-53eabfd28f66.meta
â   âââ a7e13a25-1694-4509-9e6b-e88583a4d970
â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d
â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.lease
â   âââ 0d33efc8-a608-439f-abe2-43884c1ce72d.meta
âââ __DIRECT_IO_TEST__

16 directories, 35 files


3. We have 3 domains:
BO_Ovirt_Storage (data domain, on the same machine as engine and vdsm, via
NFS)
BO_ISO_Domain (ISO domain, same machine via NFS)
KP_Data_Domain (data domain on an NFS mount on a different machine)

Yuval



On Wed, Apr 17, 2013 at 4:28 PM, Yeela Kaplan  wrote:

> Hi Limor,
> 1) Your log starts exactly after the vdsm restart. I need to see the full
> vdsm log from before the domains went down in order to understand the
> problem. Can you attach them?
> 2) can you send the printout of 'tree /rhev/data-center/'
> 3) how many domains are attached to your DC, and what type are they(ISO,
> export,data) and (The DC is nfs right)?
>
> Thanks,
> Yeela
>
> - Original Message -
> > From: "Limor Gavish" 
> > To: "Tal Nisan" 
> > Cc: "Yuval M" , users@ovirt.org, "Nezer Zaidenberg" <
> nzaidenb...@mac.com>
> > Sent: Monday, April 15, 2013 5:10:16 PM
> > Subject: Re: [Users] oVirt storage is down and doesn't come up
> >
> > Thank you very much for your reply.
> > I ran the commands you asked (see below) but a directory named as the
> uuid of
> > the master domain is not mounted. We tried to restart the VDSM and the
> > entire machine it didn't help.
> > We succeeded to manually mount " /home/BO_Ovirt_Storage" to a temporary
> > directory.
> >
> > postgres=# \connect engine;
> > You are now connected to database "engine" as user "postgres".
> > engine=# select current_database();
> > current_database
> > --
> > engine
> > (1 row)
> > engine=# select sds.id , ssc.connection from storage_domain_static sds
> join
> > storage_server_connections ssc on sds.storage= ssc.id where sds.id
> > ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
> > id | connection
> >
> --+
> > 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
> > bufferoverflow.home:/home/BO_Ovirt_Storage
> > (1 row)
> >
> &

Re: [Users] oVirt storage is down and doesn't come up

2013-04-17 Thread Yeela Kaplan
Hi Limor,
1) Your log starts exactly after the vdsm restart. I need to see the full vdsm 
log from before the domains went down in order to understand the problem. Can 
you attach them?
2) can you send the printout of 'tree /rhev/data-center/' 
3) how many domains are attached to your DC, and what type are they(ISO, 
export,data) and (The DC is nfs right)?

Thanks,
Yeela

- Original Message -
> From: "Limor Gavish" 
> To: "Tal Nisan" 
> Cc: "Yuval M" , users@ovirt.org, "Nezer Zaidenberg" 
> 
> Sent: Monday, April 15, 2013 5:10:16 PM
> Subject: Re: [Users] oVirt storage is down and doesn't come up
> 
> Thank you very much for your reply.
> I ran the commands you asked (see below) but a directory named as the uuid of
> the master domain is not mounted. We tried to restart the VDSM and the
> entire machine it didn't help.
> We succeeded to manually mount " /home/BO_Ovirt_Storage" to a temporary
> directory.
> 
> postgres=# \connect engine;
> You are now connected to database "engine" as user "postgres".
> engine=# select current_database();
> current_database
> --
> engine
> (1 row)
> engine=# select sds.id , ssc.connection from storage_domain_static sds join
> storage_server_connections ssc on sds.storage= ssc.id where sds.id
> ='1083422e-a5db-41b6-b667-b9ef1ef244f0';
> id | connection
> --+
> 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
> bufferoverflow.home:/home/BO_Ovirt_Storage
> (1 row)
> 
> [wil@bufferoverflow ~] $ mount
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
> securityfs on /sys/kernel/security type securityfs
> (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
> cgroup on /sys/fs/cgroup/systemd type cgroup
> (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
> cgroup on /sys/fs/cgroup/cpuset type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuset)
> cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
> cgroup on /sys/fs/cgroup/memory type cgroup
> (rw,nosuid,nodev,noexec,relatime,memory)
> cgroup on /sys/fs/cgroup/devices type cgroup
> (rw,nosuid,nodev,noexec,relatime,devices)
> cgroup on /sys/fs/cgroup/freezer type cgroup
> (rw,nosuid,nodev,noexec,relatime,freezer)
> cgroup on /sys/fs/cgroup/net_cls type cgroup
> (rw,nosuid,nodev,noexec,relatime,net_cls)
> cgroup on /sys/fs/cgroup/blkio type cgroup
> (rw,nosuid,nodev,noexec,relatime,blkio)
> cgroup on /sys/fs/cgroup/perf_event type cgroup
> (rw,nosuid,nodev,noexec,relatime,perf_event)
> /dev/sda3 on / type ext4 (rw,relatime,data=ordered)
> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
> debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
> hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> mqueue on /dev/mqueue type mqueue (rw,relatime)
> tmpfs on /tmp type tmpfs (rw)
> configfs on /sys/kernel/config type configfs (rw,relatime)
> binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
> /dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
> /dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
> kernelpanic.home:/home/KP_Data_Domain on
> /rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
> bufferoverflow.home:/home/BO_ISO_Domain on
> /rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)
> 
> [wil@bufferoverflow ~]$ ls -la /home/
> total 36
> drwxr-xr-x. 6 root root 4096 Mar 22 11:25 .
> dr-xr-xr-x. 19 root root 4096 Apr 12 18:53 ..
> drwxr-xr-x. 3 vdsm kvm 4096 Mar 27 17:33 BO_ISO_Domain
> drwxr-xr-x. 3

Re: [Users] oVirt storage is down and doesn't come up

2013-04-15 Thread Limor Gavish
Thank you very much for your reply.
I ran the commands you asked (see below) but a directory named as the uuid
of the master domain is not mounted. We tried to restart the VDSM and the
entire machine it didn't help.
We succeeded to manually mount "/home/BO_Ovirt_Storage" to a temporary
directory.

*postgres=#* \connect engine;
You are now connected to database "engine" as user "postgres".
*engine=#* select current_database();
 current_database
--
 engine
(1 row)
*engine=#* select sds.id, ssc.connection from storage_domain_static sds
join storage_server_connections ssc on sds.storage=ssc.id where sds.id
='1083422e-a5db-41b6-b667-b9ef1ef244f0';
  id  | connection
--+
 1083422e-a5db-41b6-b667-b9ef1ef244f0 |
bufferoverflow.home:/home/BO_Ovirt_Storage
(1 row)

*[wil@bufferoverflow ~]**$ mount*
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs
(rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
securityfs on /sys/kernel/security type securityfs
(rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
(rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup
(rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup
(rw,nosuid,nodev,noexec,relatime,perf_event)
/dev/sda3 on / type ext4 (rw,relatime,data=ordered)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs
(rw,relatime,fd=34,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
tmpfs on /tmp type tmpfs (rw)
configfs on /sys/kernel/config type configfs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/sda5 on /home type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
kernelpanic.home:/home/KP_Data_Domain on
/rhev/data-center/mnt/kernelpanic.home:_home_KP__Data__Domain type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.100,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.100)
bufferoverflow.home:/home/BO_ISO_Domain on
/rhev/data-center/mnt/bufferoverflow.home:_home_BO__ISO__Domain type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.100.101.108,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.100.101.108)

*[wil@bufferoverflow ~]$* ls -la /home/
total 36
drwxr-xr-x.  6 root root  4096 Mar 22 11:25 .
dr-xr-xr-x. 19 root root  4096 Apr 12 18:53 ..
drwxr-xr-x.  3 vdsm kvm   4096 Mar 27 17:33 BO_ISO_Domain
drwxr-xr-x.  3 vdsm kvm   4096 Mar 27 17:33 BO_Ovirt_Storage
drwx--.  2 root root 16384 Mar  6 09:11 lost+found
drwx--. 27 wil  wil   4096 Apr 15 01:50 wil
*[wil@bufferoverflow ~]$* cd /home/BO_Ovirt_Storage/
*[wil@bufferoverflow BO_Ovirt_Storage]$ *ls -la
total 12
drwxr-xr-x. 3 vdsm kvm  4096 Mar 27 17:33 .
drwxr-xr-x. 6 root root 4096 Mar 22 11:25 ..
drwxr-xr-x  5 vdsm kvm  4096 Mar 20 23:06
1083422e-a5db-41b6-b667-b9ef1ef244f0
-rwxr-xr-x  1 vdsm kvm 0 Mar 27 17:33 __DIRECT_IO_TEST__

Thanks,
Limor


On Mon, Apr 15, 2013 at 4:02 PM, Tal Nisan  wrote:

> **
> Hi Limor,
> First we should probably start with checking which mount is the master
> storage domain that appears as not found, this should be checked against
> the oVirt server database, please run
>
> select sds.id, ssc.connection from storage_domain_static sds join
> storage_server_connections ssc on sds.storage=ssc.id
> where sds.id='1083422e-a5db-41b6-b667-b9ef1ef244f0';
>
> You can run this via psql or a Postgres ui if you have one.
> In the results you will see the storage connection in th

Re: [Users] oVirt storage is down and doesn't come up

2013-04-15 Thread Tal Nisan

Hi Limor,
First we should probably start with checking which mount is the master 
storage domain that appears as not found, this should be checked against 
the oVirt server database, please run


select sds.id, ssc.connection from storage_domain_static sds join 
storage_server_connections ssc on sds.storage=ssc.id

where sds.id='1083422e-a5db-41b6-b667-b9ef1ef244f0';

You can run this via psql or a Postgres ui if you have one.
In the results you will see the storage connection in the format of 
%hostname%:/%mountName%, then in the VDSM server check in the mount list 
that you see that it is mounted, the mount itself should contain a 
directory named as the uuid of the master domain, let me know the result.


Tal.



On 04/12/2013 07:29 PM, Limor Gavish wrote:

Hi,

For some reason, without doing anything, all the storage domains 
became down and restarting VDSM or the entire machine do not bring it up.

I am not using lvm
The following errors appear several times in vdsm.log (full logs are 
attached):


Thread-22::WARNING::2013-04-12 
19:00:08,597::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] 
['  Volume group "1083422e-a5db-41b6-b667-b9ef1ef244f0" not found']
Thread-22::DEBUG::2013-04-12 
19:00:08,598::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm 
reload operation' released the operation mutex
Thread-22::DEBUG::2013-04-12 
19:00:08,681::resourceManager::615::ResourceManager::(releaseResource) 
Trying to release resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3'
Thread-22::DEBUG::2013-04-12 
19:00:08,681::resourceManager::634::ResourceManager::(releaseResource) 
Released resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' (0 
active users)
Thread-22::DEBUG::2013-04-12 
19:00:08,681::resourceManager::640::ResourceManager::(releaseResource) 
Resource 'Storage.5849b030-626e-47cb-ad90-3ce782d831b3' is free, 
finding out if anyone is waiting for it.
Thread-22::DEBUG::2013-04-12 
19:00:08,682::resourceManager::648::ResourceManager::(releaseResource) 
No one is waiting for resource 
'Storage.5849b030-626e-47cb-ad90-3ce782d831b3', Clearing records.
Thread-22::ERROR::2013-04-12 
19:00:08,682::task::850::TaskManager.Task::(_setError) 
Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Unexpected error

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 939, in connectStoragePool
masterVersion, options)
  File "/usr/share/vdsm/storage/hsm.py", line 986, in _connectStoragePool
res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 695, in connect
self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1232, in __rebuild
masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1576, in getMasterDomain
raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
StoragePoolMasterNotFound: Cannot find master domain: 
'spUUID=5849b030-626e-47cb-ad90-3ce782d831b3, 
msdUUID=1083422e-a5db-41b6-b667-b9ef1ef244f0'
Thread-22::DEBUG::2013-04-12 
19:00:08,685::task::869::TaskManager.Task::(_run) 
Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::Task._run: 
e35a22ac-771a-4916-851f-2fe9d60a0ae6 
('5849b030-626e-47cb-ad90-3ce782d831b3', 1, 
'5849b030-626e-47cb-ad90-3ce782d831b3', 
'1083422e-a5db-41b6-b667-b9ef1ef244f0', 3942) {} failed - stopping task
Thread-22::DEBUG::2013-04-12 
19:00:08,685::task::1194::TaskManager.Task::(stop) 
Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::stopping in state 
preparing (force False)
Thread-22::DEBUG::2013-04-12 
19:00:08,685::task::974::TaskManager.Task::(_decref) 
Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::ref 1 aborting True
Thread-22::INFO::2013-04-12 
19:00:08,686::task::1151::TaskManager.Task::(prepare) 
Task=`e35a22ac-771a-4916-851f-2fe9d60a0ae6`::aborting: Task is 
aborted: 'Cannot find master domain' - code 304


*[wil@bufferoverflow ~]$ */sudo vgs --noheadings --units b --nosuffix 
--separator \| -o 
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free/

  No volume groups found

*[wil@bufferoverflow ~]$ */mount/
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs 
(rw,nosuid,size=8131256k,nr_inodes=2032814,mode=755)
securityfs on /sys/kernel/security type securityfs 
(rw,nosuid,nodev,noexec,relatime)

tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts 
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgro