[ovirt-users] VMs paused due to IO issues - Dell Equallogic controller failover

2016-10-04 Thread Gary Lloyd
Hi

We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for
all our VMs.
At the weekend during early hours an Equallogic controller failed over to
its standby on one of our arrays and this caused about 20 of our VMs to be
paused due to IO problems.

I have also noticed that this happens during Equallogic firmware upgrades
since we moved onto Ovirt 3.65.

As recommended by Dell disk timeouts within the VMs are set to 60 seconds
when they are hosted on an EqualLogic SAN.

Is there any other timeout value that we can configure in vdsm.conf to stop
VMs from getting paused when a controller fails over ?

Also is there anything that we can tweak to automatically unpause the VMs
once connectivity with the arrays is re-established ?

At the moment we are running a customized version of storageServer.py, as
Ovirt has yet to include iscsi multipath support for Direct Luns out of the
box.

Many Thanks


*Gary Lloyd*

I.T. Systems:Keele University
Finance & IT Directorate
Keele:Staffs:IC1 Building:ST5 5NB:UK
+44 1782 733063 <%2B44%201782%20733073>

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] DISCARD support?

2016-10-04 Thread Nicolas Ecarnot

Hello,

Sending this here to share knowledge.

Here is what I learned from many BZ and mailing list posts readings. I'm 
not working at Redhat, so please correct me if I'm wrong.


We are using thin-provisioned block storage LUNs (Equallogic), on which 
oVirt is creating numerous Logical Volumes, and we're very happy with it.
When oVirt is removing a virtual disk, the SAN is not informed, because 
the LVM layer is not sending the "issue_discard" flag.


/etc/lvm/lvm.conf is not the natural place to try to change this 
parameter, as VDSM is not using it.


Efforts are presently made to include issue_discard setting support 
directly into vdsm.conf, first on a datacenter scope (4.0.x), then per 
storage domain (4.1.x) and maybe via a web GUI check-box. Part of the 
effort is to make sure every bit of a planned to be removed LV get wiped 
out. Part is to inform the block storage side about the deletion, in 
case of thin provisioned LUNs.


https://bugzilla.redhat.com/show_bug.cgi?id=1342919
https://bugzilla.redhat.com/show_bug.cgi?id=981626

--
Nicolas ECARNOT

On Mon, Oct 3, 2016 at 2:24 PM, Nicolas Ecarnot > wrote:


   Yaniv,

   As a pure random way of web surfing, I found that you posted on
   twitter an information about DISCARD support.
   (https://twitter.com/YanivKaul/status/773513216664174592
   )

   I did not dig any further, but has it any relation with the fact
   that so far, oVirt did not reclaim lost storage space amongst its
   logical volumes of its storage domains?

   A BZ exist about this, but one was told no work would be done about
   it until 4.x.y, so now we're there, I was wondering if you knew more?


Feel free to send such questions on the mailing list (ovirt users or 
devel), so other will be able to both chime in and see the response.
We've supported a custom hook for enabling discard per disk (which is 
only relevant for virtio-SCSI and IDE) for some versions now (3.5 I 
believe).

We are planning to add this via a UI and API in 4.1.
In addition, we are looking into discard (instead of wipe after delete, 
when discard is also zero'ing content) as well as discard when removing LVs.

See:
http://www.ovirt.org/develop/release-management/features/storage/pass-discard-from-guest-to-underlying-storage/
http://www.ovirt.org/develop/release-management/features/storage/wipe-volumes-using-blkdiscard/
http://www.ovirt.org/develop/release-management/features/storage/discard-after-delete/

Y.


   Best,

   -- 
   Nicolas ECARNOT



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.0 - 2nd node fails on deploy

2016-10-04 Thread Simone Tiraboschi
On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey  wrote:

> Hi,
>
>
>
> Another problem has appeared, after rebooting the primary the VM will not
> start.
>
>
>
> Appears the symlink is broken between gluster mount ref and vdsm
>

The first host was correctly deployed but it seas that you are facing some
issue connecting the storage.
Can you please attach vdsm logs and /var/log/messages from the first host?


>
>
> From broker.log
>
>
>
> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::
> 138::ovirt_hosted_engine_ha.broker.storage_broker.
> StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata
> from /rhev/data-center/mnt/glusterSD/dcastor01:engine/
> bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>
>
>
> [root@dcasrv01 ovirt-hosted-engine-ha]# ls -al /rhev/data-center/mnt/
> glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/
>
> total 9
>
> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>
> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-
> 4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
>
>
> [root@dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
>
>
> Though file appears to be there
>
>
>
> Gluster is setup as xpool/engine
>
>
>
> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>
> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/
> images/fd44dbf9-473a-496a-9996-c8abe3278390
>
> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> total 2060
>
> drwxr-xr-x. 2 vdsm kvm4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm4096 Oct  3 17:17 ..
>
> -rw-rw. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm 283 Oct  3 17:17 
> cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
>
>
> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume info
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/data/brick
>
> Brick2: dcastor03:/xpool/data/brick
>
> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: engine
>
> Type: Replicate
>
> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/engine/brick
>
> Brick2: dcastor02:/xpool/engine/brick
>
> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: off
>
> cluster.eager-lock: enable
>
> network.remote-dio: enable
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: export
>
> Type: Replicate
>
> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor02:/xpool/export/brick
>
> Brick2: dcastor03:/xpool/export/brick
>
> Brick3: dcastor01:/xpool/export/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
> Volume Name: iso
>
> Type: Replicate
>
> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>
> Status: Started
>
> Number of Bricks: 1 x (2 + 1) = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dcastor01:/xpool/iso/brick
>
> Brick2: dcastor02:/xpool/iso/brick
>
> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
>
>
>
>
> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
> status
>
> Status of volume: data
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> 
> --

[ovirt-users] Host keeps disconnecting

2016-10-04 Thread Chris Cowley
Hello all

I am trying out ovirt 4 on a single host using the hosted engine. All
storage is on the same machine, with NFS3 in the middle.

Regularly the host disconnects from the engine - although all the VMs
continue working happily. I can re-activate it immediately and it will come
online happily, I can then manipulate VMs to my hearts content for a few
minutes until it disconnects again and I have to re-activate it.

First, am I being stupid trying to run this on a single node? Am I obliged
to at least break out the storage in to a seperate node/NAS

Second, assuming this is technically possible, where should I be looking
for clues?
​
 vdsm.log

​
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.0 - 2nd node fails on deploy

2016-10-04 Thread Simone Tiraboschi
On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi 
wrote:

>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey  wrote:
>
>> Hi,
>>
>>
>>
>> Another problem has appeared, after rebooting the primary the VM will not
>> start.
>>
>>
>>
>> Appears the symlink is broken between gluster mount ref and vdsm
>>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
> Can you please attach vdsm logs and /var/log/messages from the first host?
>

Thanks Jason,
I suspect that your issue is related to this:
Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.522620] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume data. Stopping local bricks.
Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
17:24:39.523272] C [MSGID: 106002]
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume engine. Stopping local bricks.

and for some time your gluster volume has been working.

But then:
Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.
Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
/rhev/data-center/mnt/glusterSD/dcastor01:engine.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352:
DeprecationWarning: Dispatcher.pending is deprecated. Use
Dispatcher.socket.pending instead.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352:
DeprecationWarning: Dispatcher.pending is deprecated. Use
Dispatcher.socket.pending instead.
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
'pending', lambda: 0)
Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
during reading data: unexpected eof
Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
storage server failed' - trying to restart agent
Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to
storage server failed' - trying to restart agent
Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.384611] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume data. Starting local bricks.
Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
18:02:12.388981] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume engine. Starting local
bricks.

And at that point VDSM started complaining that the hosted-engine-storage
domain doesn't exist anymore:
Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes list:
Storage domain does not exist: (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error fetching volumes list:
Storage domain does not exist: (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)

I see from the logs that the ovirt-ha-agent is trying to mount the
hosted-engine storage domain as:
/usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03
dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.

Pointing to dcastor01, dcastor02 and dcastor03 while your server is
dcasrv01.
But at the same time it seams that also dcasrv01 has local bricks for the
same engine volume.

So, is dcasrv01 just an alias fro dcastor01? if not you probably have some
issue with the configuration of your gluster volume.



>
>>
>> From broker.log
>>
>>
>>
>> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBro
>> ker::(get_raw_stats_for_service_type) Failed to read metadata from
>> /rhev/data-center/mnt/glusterSD/dcastor01:engine/bbb70623-
>> 194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>>
>>
>>
>> [root@dcasrv01 ovirt-hosted-engine-ha]# ls -al
>> /rhev/data-center/mnt/glusterSD/dcastor01\:engine/bbb70623-
>> 194a-46d2-a164-76a4876ecaaf/ha_agent/
>>
>> total 9
>>
>> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>>
>> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
>> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/
>> 23d81b73-bcb7-4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
>> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/
>> 

[ovirt-users] Ovirt 4.04 unable to umount LUNs on hosts

2016-10-04 Thread Andrea Ghelardi
Hello ovirt gurus,
I'd like to bother you with a long lasting issue I face.
My (new) test setup (not production one) is running ovirt 4.04 hosted engine on 
a Dell R510 server connected to SAN Compellent SC040 via iscsi.
All systems installed from scratch following guidelines.
Ovirt hosted engine installed using ovirt-appliance.
Everything is working "fine".
BUT
We are unable to perform a clear LUN removal.
We can perform all actions from web interface (storage add, maintenance, 
detach, delete) with no error.
However, underlying device remains mapped to server. As a result, we are unable 
to unmap the LUN otherwise multipath fails, server fails, ovirt fails etc.

Steps to reproduce:

1)  Create LUN on SAN, map it to server -> OK

2)  Log in ovirt web interface, add a new storage targeting LUN -> OK

3)  Create disk, create VM setup VM, (optional) -> OK

4)  Shutdown VM -> OK

5)  Ovirt: Put storage in maintenance -> OK

6)  Ovirt: Detach storage -> OK

7)  Ovirt: Delete storage -> OK

Expected result:
Ovirt unmaps and remove device from multipath so that it can be destroyed at 
SAN storage level.
It should be possible to perform this action even when host is not in 
maintenance.

Current result: volume remain locked in multipath (map in use).
Trying vgchange -a n  and then multipath -f  removes device, 
but then it is automatically re-added after a while (???)

Please note: this is the same behavior we always faced also with our ovirt 
production environment.
Details attached. Volume to be removed: LUN 64

Any ideas?
Thanks
AG
[root@barbera log]# multipath -ll
36000d310004edc64 dm-0 COMPELNT,Compellent Vol
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 18:0:0:2 sdk  8:160  active ready running
  |- 19:0:0:2 sdm  8:192  active ready running
  |- 22:0:0:2 sdq  65:0   active ready running
  |- 23:0:0:2 sds  65:32  active ready running
  |- 24:0:0:2 sdu  65:64  active ready running
  |- 28:0:0:2 sdaa 65:160 active ready running
  |- 29:0:0:2 sdab 65:176 active ready running
  |- 5:0:0:2  sdc  8:32   active ready running
  `- 6:0:0:2  sdg  8:96   active ready running
36000d310004edc62 dm-1 COMPELNT,Compellent Vol
size=1.0T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 5:0:0:1  sdb  8:16   active ready running
  |- 6:0:0:1  sdd  8:48   active ready running
  |- 19:0:0:1 sdl  8:176  active ready running
  |- 18:0:0:1 sdj  8:144  active ready running
  |- 22:0:0:1 sdp  8:240  active ready running
  |- 23:0:0:1 sdr  65:16  active ready running
  |- 24:0:0:1 sdt  65:48  active ready running
  |- 29:0:0:1 sdz  65:144 active ready running
  `- 28:0:0:1 sdy  65:128 active ready running
36000d310004edc61 dm-19 COMPELNT,Compellent Vol
size=120G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 8:0:0:1  sdf  8:80   active ready running
  |- 7:0:0:1  sde  8:64   active ready running
  |- 11:0:0:1 sdh  8:112  active ready running
  |- 12:0:0:1 sdi  8:128  active ready running
  |- 20:0:0:1 sdn  8:208  active ready running
  |- 21:0:0:1 sdo  8:224  active ready running
  |- 25:0:0:1 sdv  65:80  active ready running
  |- 26:0:0:1 sdw  65:96  active ready running
  `- 27:0:0:1 sdx  65:112 active ready running
[root@barbera log]#
[root@barbera log]# lsscsi
[0:0:0:0]diskSEAGATE  ST3300657SS  ES66  -
[0:0:1:0]diskSEAGATE  ST3300657SS  ES66  -
[0:1:0:0]diskDell Virtual Disk 1028  /dev/sda
[3:0:0:0]cd/dvd  PLDS DVD+-RW DS-8A8SH KD51  /dev/sr0
[5:0:0:1]diskCOMPELNT Compellent Vol   0606  /dev/sdb
[5:0:0:2]diskCOMPELNT Compellent Vol   0606  /dev/sdc
[6:0:0:1]diskCOMPELNT Compellent Vol   0606  /dev/sdd
[6:0:0:2]diskCOMPELNT Compellent Vol   0606  /dev/sdg
[7:0:0:1]diskCOMPELNT Compellent Vol   0606  /dev/sde
[8:0:0:1]diskCOMPELNT Compellent Vol   0606  /dev/sdf
[11:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdh
[12:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdi
[18:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdj
[18:0:0:2]   diskCOMPELNT Compellent Vol   0606  /dev/sdk
[19:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdl
[19:0:0:2]   diskCOMPELNT Compellent Vol   0606  /dev/sdm
[20:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdn
[21:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdo
[22:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdp
[22:0:0:2]   diskCOMPELNT Compellent Vol   0606  /dev/sdq
[23:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdr
[23:0:0:2]   diskCOMPELNT Compellent Vol   0606  /dev/sds
[24:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdt
[24:0:0:2]   diskCOMPELNT Compellent Vol   0606  /dev/sdu
[25:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/sdv
[26:0:0:1]   diskCOMPELNT Compellent Vol   0606  /dev/

[ovirt-users] hosted-engine and GlusterFS on Vlan help

2016-10-04 Thread Hanson

Hi Guys,

I've converted my lab from using 802.3ad with bonding>bridged vlans to 
one link with two vlan bridges and am now having traffic jumping to the 
gateway when moving VM's/ISO/etc.


802.3ad = node1>switch1>node2
801.1q = node1>switch1>gateway>switch1>node2

I assume I've setup the same vlan style, though this time I used the gui 
on the initial host install... setting up the vlans with their parent 
being eth0.


Hosted-engine on deploy then creates ovirtmgmt on top of eth0.11 ...

Switch is tagged for vlans 10 & 11. Including a PVID of 11 for good 
measure. (Gluster is vlan 11)


I'd expect the traffic from node to node to be going from port to port 
like it did in 802.3ad, what have I done wrong or is it using the gui 
initially?


This is how the current setup looks:

/var/lib/vdsm/Persistent/netconf/nets/ovirtmgmt:
{
"ipv6autoconf": false,
"nameservers": [],
"nic": "eth0",
"vlan": 11,
"ipaddr": "10.0.3.11",
"switch": "legacy",
"mtu": 1500,
"netmask": "255.255.255.0",
"dhcpv6": false,
"stp": false,
"bridged": true,
"gateway": "10.0.3.1",
"defaultRoute": true
}

/etc/sysconfig/network-scripts/ifcfg-ovirtmgmt:
# Generated by VDSM version 4.18.13-1.el7.centos
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
IPADDR=10.0.3.11
NETMASK=255.255.255.0
GATEWAY=10.0.3.1
BOOTPROTO=none
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=no
VLAN_ID=11
MTU=1500

Thanks!!

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt 4.0.4 and Active Directory Kerberos SSO for Administration/User Portal. Troubleshooting

2016-10-04 Thread aleksey . maksimov
Martin, thanks for the help. It works.

03.10.2016, 15:01, "Martin Perina" :
> ​Ahh, this is the issue. Above configuration is valid for oVirt 3.x, but in 
> 4.0 we have quite new OAuth base SSO, so you need to use following 
> configuration:
>
>  ^/ovirt-engine/sso/(interactive-login-negotiate|oauth/token-http-auth)|^/ovirt-engine/api>
>   
>     RewriteEngine on
>     RewriteCond %{LA-U:REMOTE_USER} ^(.*)$
>     RewriteRule ^(.*)$ - [L,NS,P,E=REMOTE_USER:%1]
>     RequestHeader set X-Remote-User %{REMOTE_USER}s
>     AuthType Kerberos
>     AuthName "Kerberos Login"
>     Krb5Keytab /etc/httpd/s-oVirt-Krb.keytab
>     KrbAuthRealms AD.HOLDING.COM
>     KrbMethodK5Passwd off
>     Require valid-user
>     ErrorDocument 401 " url=/ovirt-engine/sso/login-unauthorized\"/> href=\"/ovirt-engine/sso/login-unauthorized\">Here"
>   
> 
> ​
>
> ​Also as 4.0 is working on EL7 you may use mod_auth_gssapi/mod_session 
> instead of quite old mod_auth_krb. For mod_auth_gssapi/mod_sessions you need 
> to do following:
>
>   1. yum install mod_session mod_auth_gssapi
>   2. Use following Apache configuration ​
>
> ​ ^/ovirt-engine/sso/(interactive-login-negotiate|oauth/token-http-auth)|^/ovirt-engine/api>
>   
>     RewriteEngine on
>     RewriteCond %{LA-U:REMOTE_USER} ^(.*)$
>     RewriteRule ^(.*)$ - [L,NS,P,E=REMOTE_USER:%1]
>     RequestHeader set X-Remote-User %{REMOTE_USER}s
>
>     AuthType GSSAPI
>     AuthName "Kerberos Login"
>
>     # Modify to match installation
>     GssapiCredStore keytab:/etc/httpd/s-oVirt-Krb.keytab
>     GssapiUseSessions On
>     Session On
>     SessionCookieName ovirt_gssapi_session path=/private;httponly;secure;
>
>     Require valid-user
>     ErrorDocument 401 " url=/ovirt-engine/sso/login-unauthorized\"/> href=\"/ovirt-engine/sso/login-unauthorized\">Here"
>   
> ​
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.0 - 2nd node fails on deploy

2016-10-04 Thread Jason Jeffrey
Hi,

 

DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private /28) 
between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc) planned to be used solely 
for storage.

 

I,e 

 

10.100.50.81dcasrv01

10.100.101.1dcastor01

10.100.50.82dcasrv02

10.100.101.2dcastor02

10.100.50.83dcasrv03

10.100.103.3dcastor03  

 

These were setup with the gluster commands

 

* gluster volume create iso replica 3 arbiter 1  
dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick   
dcastor03:/xpool/iso/brick

* gluster volume create export replica 3 arbiter 1  
dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick  
dcastor01:/xpool/export/brick  

* gluster volume create engine replica 3 arbiter 1 
dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick 
dcastor03:/xpool/engine/brick

* gluster volume create data replica 3 arbiter 1  
dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick  
dcastor02:/xpool/data/bricky

 

 

So yes, DCASRV01 is the server (pri) and have local bricks access through 
DCASTOR01 interface 

 

Is the issue here not the incorrect soft link ?

 

lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata -> 
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93


[root@dcasrv01 /]# ls -al 
/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/

ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/: 
No such file or directory   

But the data does exist 

[root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al

drwxr-xr-x. 2 vdsm kvm4096 Oct  3 17:17 .

drwxr-xr-x. 6 vdsm kvm4096 Oct  3 17:17 ..

-rw-rw. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-c47e6f9cbc93

-rw-rw. 2 vdsm kvm 1048576 Oct  3 17:17 
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.lease

-rw-r--r--. 2 vdsm kvm 283 Oct  3 17:17 
cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta   

 

Thanks 

 

Jason 

 

 

 

From: Simone Tiraboschi [mailto:stira...@redhat.com] 
Sent: 04 October 2016 14:40
To: Jason Jeffrey 
Cc: users 
Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy

 

 

 

On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi mailto:stira...@redhat.com> > wrote:

 

 

On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey mailto:ja...@sudo.co.uk> > wrote:

Hi,

 

Another problem has appeared, after rebooting the primary the VM will not start.

 

Appears the symlink is broken between gluster mount ref and vdsm

 

The first host was correctly deployed but it seas that you are facing some 
issue connecting the storage.

Can you please attach vdsm logs and /var/log/messages from the first host?

 

Thanks Jason,

I suspect that your issue is related to this:

Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 
17:24:39.522620] C [MSGID: 106002] 
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: 
Server quorum lost for volume data. Stopping local bricks.

Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 
17:24:39.523272] C [MSGID: 106002] 
[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: 
Server quorum lost for volume engine. Stopping local bricks.

 

and for some time your gluster volume has been working.

 

But then:

Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o 
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine 
/rhev/data-center/mnt/glusterSD/dcastor01:engine.

Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o 
backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine 
/rhev/data-center/mnt/glusterSD/dcastor01:engine.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: 
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning: 
Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher, 
'pending', lambda: 0)

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: 
/usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning: 
Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead.

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher, 
'pending', lambda: 0)

Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error during 
reading data: unexpected eof

Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage 
server failed' - trying to restart agent

Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: 
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage 
server failed' - trying to restart agent

Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 
18:02:12.384611] C [MSGID: 106003] 
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data.

Re: [ovirt-users] 4.0 - 2nd node fails on deploy

2016-10-04 Thread Simone Tiraboschi
On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey  wrote:

> Hi,
>
>
>
> DCASTORXX is a hosts entry for dedicated  direct 10GB links (each private
> /28) between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc) planned to be used
> solely for storage.
>
>
>
> I,e
>
>
>
> 10.100.50.81dcasrv01
>
> 10.100.101.1dcastor01
>
> 10.100.50.82dcasrv02
>
> 10.100.101.2dcastor02
>
> 10.100.50.83dcasrv03
>
> 10.100.103.3dcastor03
>
>
>
> These were setup with the gluster commands
>
>
>
> · gluster volume create iso replica 3 arbiter 1
> dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick
> dcastor03:/xpool/iso/brick
>
> · gluster volume create export replica 3 arbiter 1
> dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick
> dcastor01:/xpool/export/brick
>
> · gluster volume create engine replica 3 arbiter 1
> dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
> dcastor03:/xpool/engine/brick
>
> · gluster volume create data replica 3 arbiter 1
> dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick
> dcastor02:/xpool/data/bricky
>
>
>
>
>
> So yes, DCASRV01 is the server (pri) and have local bricks access through
> DCASTOR01 interface
>
>
>
> Is the issue here not the incorrect soft link ?
>

No, this should be fine.

The issue is that periodically your gluster volume losses its server quorum
and become unavailable.
It happened more than once from your logs.

Can you please attach also gluster logs for that volume?


>
>
> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>
> [root@dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-
> 76a4876ecaaf/
>
> ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
> No such file or directory
>
> But the data does exist
>
> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>
> drwxr-xr-x. 2 vdsm kvm4096 Oct  3 17:17 .
>
> drwxr-xr-x. 6 vdsm kvm4096 Oct  3 17:17 ..
>
> -rw-rw. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93
>
> -rw-rw. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
> c47e6f9cbc93.lease
>
> -rw-r--r--. 2 vdsm kvm 283 Oct  3 17:17 
> cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>
>
>
>
> Thanks
>
>
>
> Jason
>
>
>
>
>
>
>
> *From:* Simone Tiraboschi [mailto:stira...@redhat.com]
> *Sent:* 04 October 2016 14:40
>
> *To:* Jason Jeffrey 
> *Cc:* users 
> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>
>
>
>
>
>
>
> On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi 
> wrote:
>
>
>
>
>
> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey  wrote:
>
> Hi,
>
>
>
> Another problem has appeared, after rebooting the primary the VM will not
> start.
>
>
>
> Appears the symlink is broken between gluster mount ref and vdsm
>
>
>
> The first host was correctly deployed but it seas that you are facing some
> issue connecting the storage.
>
> Can you please attach vdsm logs and /var/log/messages from the first host?
>
>
>
> Thanks Jason,
>
> I suspect that your issue is related to this:
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume data. Stopping local bricks.
>
> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04
> 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:
> glusterd_do_volume_quorum_action] 0-management: Server quorum lost for
> volume engine. Stopping local bricks.
>
>
>
> and for some time your gluster volume has been working.
>
>
>
> But then:
>
> Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o
> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
> packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending
> is deprecated. Use Dispatcher.socket.pending instead.
>
> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher,
> 'pending', lambda: 0)
>
> Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
> during reading data: unexpected eof
>
> Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Con

Re: [ovirt-users] VMs paused due to IO issues - Dell Equallogic controller failover

2016-10-04 Thread Michal Skrivanek

> On 4 Oct 2016, at 09:51, Gary Lloyd  wrote:
> 
> Hi
> 
> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for all 
> our VMs.
> At the weekend during early hours an Equallogic controller failed over to its 
> standby on one of our arrays and this caused about 20 of our VMs to be paused 
> due to IO problems.
> 
> I have also noticed that this happens during Equallogic firmware upgrades 
> since we moved onto Ovirt 3.65.
> 
> As recommended by Dell disk timeouts within the VMs are set to 60 seconds 
> when they are hosted on an EqualLogic SAN.
> 
> Is there any other timeout value that we can configure in vdsm.conf to stop 
> VMs from getting paused when a controller fails over ?

not really. but things are not so different when you look at it from the guest 
perspective. If the intention is to hide the fact that there is a problem and 
the guest should just see a delay (instead of dealing with error) then pausing 
and unpausing is the right behavior. From guest point of view this is just a 
delay it sees.

> 
> Also is there anything that we can tweak to automatically unpause the VMs 
> once connectivity with the arrays is re-established ?

that should happen when the storage domain monitoring detects error and then 
reactivate(http://gerrit.ovirt.org/16244). It may be that since you have direct 
luns it’s not working with those….dunno, storage people should chime in I 
guess...

Thanks,
michal

> 
> At the moment we are running a customized version of storageServer.py, as 
> Ovirt has yet to include iscsi multipath support for Direct Luns out of the 
> box.
> 
> Many Thanks
> 
> 
> Gary Lloyd
> 
> I.T. Systems:Keele University
> Finance & IT Directorate
> Keele:Staffs:IC1 Building:ST5 5NB:UK
> +44 1782 733063
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM pauses/hangs after migration

2016-10-04 Thread Michal Skrivanek

> On 3 Oct 2016, at 10:39, Davide Ferrari  wrote:
> 
> 
> 
> 2016-09-30 15:35 GMT+02:00 Michal Skrivanek  >:
> 
> 
> that is a very low level error really pointing at HW issues. It may or may 
> not be detected by memtest…but I would give it a try
> 
> 
> I left memtest86 running for 2 days and no error detected :(
>  
>> The only difference that this host (vmhost01) has is that it was the first 
>> host installed in my self-hosted engine installation. But I have already 
>> reinstalled it from GUI and menawhile I've upgraded to 4.0.4 from 4.0.3.
> 
> does it happen only for the big 96GB VM? The others which you said are 
> working, are they all small?
> Might be worth trying other system stability tests, playing with safer/slower 
> settings in BIOS, use lower CPU cluster, etc
> 
> 
> Yep, it happens only for the 96GB VM. Other VMs with fewer RAM (16GB for 
> example) can be created on or migrated to that host flawlessly. I'll try to 
> play a little with BIOS settings but otherwise I'll have the HW replaced. I 
> was only trying to rule out possible oVirt SW problems due to that host being 
> the first I deployed (from CLI) when I installed the cluster.

I understand. Unfortunately it really does look like some sort of 
incompatibility rather than a sw issue:/

> 
> Thanks!
> 
> -- 
> Davide Ferrari
> Senior Systems Engineer
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted-engine and GlusterFS on Vlan help

2016-10-04 Thread Hanson
Running iperf3 between node1 & node2, I can achieve almost 10gbps 
without ever going out to the gateway...


So switching between port to port on the switch is working properly on 
the vlan.


This must be a problem in the gluster settings? Where do I start 
troubleshooting here?



On 10/04/2016 10:38 AM, Hanson wrote:

Hi Guys,

I've converted my lab from using 802.3ad with bonding>bridged vlans to 
one link with two vlan bridges and am now having traffic jumping to 
the gateway when moving VM's/ISO/etc.


802.3ad = node1>switch1>node2
801.1q = node1>switch1>gateway>switch1>node2

I assume I've setup the same vlan style, though this time I used the 
gui on the initial host install... setting up the vlans with their 
parent being eth0.


Hosted-engine on deploy then creates ovirtmgmt on top of eth0.11 ...

Switch is tagged for vlans 10 & 11. Including a PVID of 11 for good 
measure. (Gluster is vlan 11)


I'd expect the traffic from node to node to be going from port to port 
like it did in 802.3ad, what have I done wrong or is it using the gui 
initially?


This is how the current setup looks:

/var/lib/vdsm/Persistent/netconf/nets/ovirtmgmt:
{
"ipv6autoconf": false,
"nameservers": [],
"nic": "eth0",
"vlan": 11,
"ipaddr": "10.0.3.11",
"switch": "legacy",
"mtu": 1500,
"netmask": "255.255.255.0",
"dhcpv6": false,
"stp": false,
"bridged": true,
"gateway": "10.0.3.1",
"defaultRoute": true
}

/etc/sysconfig/network-scripts/ifcfg-ovirtmgmt:
# Generated by VDSM version 4.18.13-1.el7.centos
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
IPADDR=10.0.3.11
NETMASK=255.255.255.0
GATEWAY=10.0.3.1
BOOTPROTO=none
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=no
VLAN_ID=11
MTU=1500

Thanks!!

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VMs paused due to IO issues - Dell Equallogic controller failover

2016-10-04 Thread Nir Soffer
On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd  wrote:

> Hi
>
> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for
> all our VMs.
> At the weekend during early hours an Equallogic controller failed over to
> its standby on one of our arrays and this caused about 20 of our VMs to be
> paused due to IO problems.
>
> I have also noticed that this happens during Equallogic firmware upgrades
> since we moved onto Ovirt 3.65.
>
> As recommended by Dell disk timeouts within the VMs are set to 60 seconds
> when they are hosted on an EqualLogic SAN.
>
> Is there any other timeout value that we can configure in vdsm.conf to
> stop VMs from getting paused when a controller fails over ?
>

You can set the timeout in multipath.conf.

With current multipath configuration (deployed by vdsm), when all paths to
a device
are lost (e.g. you take down all ports on the server during upgrade), all
io will fail
immediately.

If you want to allow 60 seconds gracetime in such case, you can configure:

no_path_retry 12

This will continue to monitor the paths 12 times, each 5 seconds
(assuming polling_interval=5). If some path recover during this time, the io
can complete and the vm will not be paused.

If no path is available after these retries, io will fail and vms with
pending io
will pause.

Note that this will also cause delays in vdsm in various flows, increasing
the chance
of timeouts in engine side, or delays in storage domain monitoring.

However, the 60 seconds delay is expected only on the first time all paths
become
faulty. Once the timeout has expired, any access to the device will fail
immediately.

To configure this, you must add the # VDSM PRIVATE tag at the second line of
multipath.conf, otherwise vdsm will override your configuration in the next
time
you run vdsm-tool configure.

multipath.conf should look like this:

# VDSM REVISION 1.3
# VDSM PRIVATE

defaults {
polling_interval5
no_path_retry   12
user_friendly_names no
flush_on_last_del   yes
fast_io_fail_tmo5
dev_loss_tmo30
max_fds 4096
}

devices {
device {
all_devsyes
no_path_retry   12
}
}

This will use 12 retries (60 seconds) timeout for any device. If you like
to
configure only your specific device, you can add a device section for
your specific server instead.


>
> Also is there anything that we can tweak to automatically unpause the VMs
> once connectivity with the arrays is re-established ?
>

Vdsm will resume the vms when storage monitor detect that storage became
available again.
However we cannot guarantee that storage monitoring will detect that
storage was down.
This should be improved in 4.0.


> At the moment we are running a customized version of storageServer.py, as
> Ovirt has yet to include iscsi multipath support for Direct Luns out of the
> box.
>

Would you like to share this code?

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VMs paused due to IO issues - Dell Equallogic controller failover

2016-10-04 Thread Nir Soffer
On Tue, Oct 4, 2016 at 7:03 PM, Michal Skrivanek <
michal.skriva...@redhat.com> wrote:

>
> > On 4 Oct 2016, at 09:51, Gary Lloyd  wrote:
> >
> > Hi
> >
> > We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for
> all our VMs.
> > At the weekend during early hours an Equallogic controller failed over
> to its standby on one of our arrays and this caused about 20 of our VMs to
> be paused due to IO problems.
> >
> > I have also noticed that this happens during Equallogic firmware
> upgrades since we moved onto Ovirt 3.65.
> >
> > As recommended by Dell disk timeouts within the VMs are set to 60
> seconds when they are hosted on an EqualLogic SAN.
> >
> > Is there any other timeout value that we can configure in vdsm.conf to
> stop VMs from getting paused when a controller fails over ?
>
> not really. but things are not so different when you look at it from the
> guest perspective. If the intention is to hide the fact that there is a
> problem and the guest should just see a delay (instead of dealing with
> error) then pausing and unpausing is the right behavior. From guest point
> of view this is just a delay it sees.
>
> >
> > Also is there anything that we can tweak to automatically unpause the
> VMs once connectivity with the arrays is re-established ?
>
> that should happen when the storage domain monitoring detects error and
> then reactivate(http://gerrit.ovirt.org/16244). It may be that since you
> have direct luns it’s not working with those….dunno, storage people should
> chime in I guess...
>


We don't monitor direct luns, only storage domains, so we do not support
resuming vms using direct luns.

multipath does monitor all devices, so we could monitor the devices status
via multipath, and resume paused vms when a device move from faulty
state to active state.

Maybe open an RFE for this?

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt 4.0.4 and Active Directory Kerberos SSO for Administration/User Portal. Troubleshooting

2016-10-04 Thread Martin Perina
On Tue, Oct 4, 2016 at 5:16 PM,  wrote:

> Martin, thanks for the help. It works.
>

​Glad to hear that, thanks.

Martin
​


>
> 03.10.2016, 15:01, "Martin Perina" :
> > ​Ahh, this is the issue. Above configuration is valid for oVirt 3.x, but
> in 4.0 we have quite new OAuth base SSO, so you need to use following
> configuration:
> >
> >  oauth/token-http-auth)|^/ovirt-engine/api>
> >   
> > RewriteEngine on
> > RewriteCond %{LA-U:REMOTE_USER} ^(.*)$
> > RewriteRule ^(.*)$ - [L,NS,P,E=REMOTE_USER:%1]
> > RequestHeader set X-Remote-User %{REMOTE_USER}s
> > AuthType Kerberos
> > AuthName "Kerberos Login"
> > Krb5Keytab /etc/httpd/s-oVirt-Krb.keytab
> > KrbAuthRealms AD.HOLDING.COM
> > KrbMethodK5Passwd off
> > Require valid-user
> > ErrorDocument 401 " url=/ovirt-engine/sso/login-unauthorized\"/> href=\"/ovirt-engine/sso/login-unauthorized\">Here"
> >   
> > 
> > ​
> >
> > ​Also as 4.0 is working on EL7 you may use mod_auth_gssapi/mod_session
> instead of quite old mod_auth_krb. For mod_auth_gssapi/mod_sessions you
> need to do following:
> >
> >   1. yum install mod_session mod_auth_gssapi
> >   2. Use following Apache configuration ​
> >
> > ​ oauth/token-http-auth)|^/ovirt-engine/api>
> >   
> > RewriteEngine on
> > RewriteCond %{LA-U:REMOTE_USER} ^(.*)$
> > RewriteRule ^(.*)$ - [L,NS,P,E=REMOTE_USER:%1]
> > RequestHeader set X-Remote-User %{REMOTE_USER}s
> >
> > AuthType GSSAPI
> > AuthName "Kerberos Login"
> >
> > # Modify to match installation
> > GssapiCredStore keytab:/etc/httpd/s-oVirt-Krb.keytab
> > GssapiUseSessions On
> > Session On
> > SessionCookieName ovirt_gssapi_session path=/private;httponly;secure;
> >
> > Require valid-user
> > ErrorDocument 401 " url=/ovirt-engine/sso/login-unauthorized\"/> href=\"/ovirt-engine/sso/login-unauthorized\">Here"
> >   
> > ​
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] DISCARD support?

2016-10-04 Thread Yaniv Kaul
On Oct 4, 2016 11:11 AM, "Nicolas Ecarnot"  wrote:
>
> Hello,
>
> Sending this here to share knowledge.
>
> Here is what I learned from many BZ and mailing list posts readings. I'm
not working at Redhat, so please correct me if I'm wrong.
>
> We are using thin-provisioned block storage LUNs (Equallogic), on which
oVirt is creating numerous Logical Volumes, and we're very happy with it.
> When oVirt is removing a virtual disk, the SAN is not informed, because
the LVM layer is not sending the "issue_discard" flag.
>
> /etc/lvm/lvm.conf is not the natural place to try to change this
parameter, as VDSM is not using it.
>
> Efforts are presently made to include issue_discard setting support
directly into vdsm.conf, first on a datacenter scope (4.0.x), then per
storage domain (4.1.x) and maybe via a web GUI check-box. Part of the
effort is to make sure every bit of a planned to be removed LV get wiped
out. Part is to inform the block storage side about the deletion, in case
of thin provisioned LUNs.

Our implementation will be independent of the LVM setting issue_discard,
will not be based on it and it won't be needed.
Y.

>
> https://bugzilla.redhat.com/show_bug.cgi?id=1342919
> https://bugzilla.redhat.com/show_bug.cgi?id=981626
>
> --
> Nicolas ECARNOT
>
> On Mon, Oct 3, 2016 at 2:24 PM, Nicolas Ecarnot 
wrote:
>>
>> Yaniv,
>>
>> As a pure random way of web surfing, I found that you posted on twitter
an information about DISCARD support. (
https://twitter.com/YanivKaul/status/773513216664174592)
>>
>> I did not dig any further, but has it any relation with the fact that so
far, oVirt did not reclaim lost storage space amongst its logical volumes
of its storage domains?
>>
>> A BZ exist about this, but one was told no work would be done about it
until 4.x.y, so now we're there, I was wondering if you knew more?
>
>
> Feel free to send such questions on the mailing list (ovirt users or
devel), so other will be able to both chime in and see the response.
> We've supported a custom hook for enabling discard per disk (which is
only relevant for virtio-SCSI and IDE) for some versions now (3.5 I
believe).
> We are planning to add this via a UI and API in 4.1.
> In addition, we are looking into discard (instead of wipe after delete,
when discard is also zero'ing content) as well as discard when removing LVs.
> See:
>
http://www.ovirt.org/develop/release-management/features/storage/pass-discard-from-guest-to-underlying-storage/
>
http://www.ovirt.org/develop/release-management/features/storage/wipe-volumes-using-blkdiscard/
>
http://www.ovirt.org/develop/release-management/features/storage/discard-after-delete/
>
> Y.
>
>>
>>
>> Best,
>>
>> --
>> Nicolas ECARNOT
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Migrate machines in unknown state?

2016-10-04 Thread Ekin Meroğlu
Hi,

On Fri, Sep 30, 2016 at 9:27 PM, Yaniv Kaul  wrote:
>
> > ​btw, ​b
> > oth of the environments were RHE​V-H based RHEV 3.5 clusters, and both
> we were busy systems, so restarting vdsm service took quite a long time.
> I'm guessing this might be a factor.
>
> That indeed might be the factor - but vdsm should not take long to
> restart. If it happens on a more recent version, I'd be happy to know about
> it, as we've done work on ensuring that it restarts and answers quickly to
> the engine (as far as I remember, even before it fully completed the
> restart).
> Y.
>
​Since the last message we've updated the environments to an up-to-date
3.6.x actually, but I'm not sure if restarting vdsmd is stiil taking a long
time. I'll check back and let you know.​

​Thanks & Regards,​
-- 
*Ekin Meroğlu** Red Hat Certified Architect*

linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
www.linuxera.com | bi...@linuxera.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] DISCARD support?

2016-10-04 Thread Nir Soffer
On Tue, Oct 4, 2016 at 11:11 AM, Nicolas Ecarnot 
wrote:

> Hello,
>
> Sending this here to share knowledge.
>
> Here is what I learned from many BZ and mailing list posts readings. I'm
> not working at Redhat, so please correct me if I'm wrong.
>
> We are using thin-provisioned block storage LUNs (Equallogic), on which
> oVirt is creating numerous Logical Volumes, and we're very happy with it.
> When oVirt is removing a virtual disk, the SAN is not informed, because
> the LVM layer is not sending the "issue_discard" flag.
>
> /etc/lvm/lvm.conf is not the natural place to try to change this
> parameter, as VDSM is not using it.
>

> Efforts are presently made to include issue_discard setting support
> directly into vdsm.conf, first on a datacenter scope (4.0.x), then per
> storage domain (4.1.x) and maybe via a web GUI check-box. Part of the
> effort is to make sure every bit of a planned to be removed LV get wiped
> out. Part is to inform the block storage side about the deletion, in case
> of thin provisioned LUNs.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1342919
> https://bugzilla.redhat.com/show_bug.cgi?id=981626
>

This is already included in 4.0, added in:
https://gerrit.ovirt.org/58036

However it is disabled by default. To enable discard, you need to
enable the irs:discard_enable option.

The best way to do this is to create a dropin conf:
/etc/vdsm/vdsm.conf.d/50_discard.conf

[irs]
discard_enable = true

And restart vdsm.

You need to deploy this file on all hosts.

In the next version we want to enable this automatically if the storage
domain supports discard, no configuration on the host will be needed.

Nir


>
> --
> Nicolas ECARNOT
>
> On Mon, Oct 3, 2016 at 2:24 PM, Nicolas Ecarnot 
> wrote:
>
>> Yaniv,
>>
>> As a pure random way of web surfing, I found that you posted on twitter
>> an information about DISCARD support. (https://twitter.com/YanivKaul
>> /status/773513216664174592)
>>
>> I did not dig any further, but has it any relation with the fact that so
>> far, oVirt did not reclaim lost storage space amongst its logical volumes
>> of its storage domains?
>>
>> A BZ exist about this, but one was told no work would be done about it
>> until 4.x.y, so now we're there, I was wondering if you knew more?
>>
>
> Feel free to send such questions on the mailing list (ovirt users or
> devel), so other will be able to both chime in and see the response.
> We've supported a custom hook for enabling discard per disk (which is only
> relevant for virtio-SCSI and IDE) for some versions now (3.5 I believe).
> We are planning to add this via a UI and API in 4.1.
> In addition, we are looking into discard (instead of wipe after delete,
> when discard is also zero'ing content) as well as discard when removing LVs.
> See:
> http://www.ovirt.org/develop/release-management/features/
> storage/pass-discard-from-guest-to-underlying-storage/
> http://www.ovirt.org/develop/release-management/features/
> storage/wipe-volumes-using-blkdiscard/
> http://www.ovirt.org/develop/release-management/features/
> storage/discard-after-delete/
>
> Y.
>
>
>>
>> Best,
>>
>> --
>> Nicolas ECARNOT
>>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt 3.6.4 / PXE guest boot issues

2016-10-04 Thread Alan Griffiths
A explanation/work-around for this issue raised back in April.

It seems that if, in UCS, you configure a vNIC with a single native VLAN it
will still add an 802.1q header with tag 0 - possibly to do with QoS. And
this extra header prevents iPXE from parsing the DHCP response.

The solution for me was to present all VLANs on a single trunked vNIC to
the blade and configure VLAN tagging as per normal. The result is the tags
are stripped off the packets before being passed to the VM and DHCP now
works.

The same issue applies to VM-FEX as packets coming off the VF will have the
802.1q header. The only solution I can see here is to configure a bridged
interface for initial build of the VM and then switch to VM-FEX afterwards.

I found a discussion on the iPXE mailing list about addressing the vlan 0
issue, but I could see no agreed solution.

http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004901.html

Alan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users