Re: [ovirt-users] oVirt storage best practise

2017-06-12 Thread Stefano Bovina
Thank you very much.
What about "direct lun" usage and database example?


2017-06-08 16:40 GMT+02:00 Elad Ben Aharon <ebena...@redhat.com>:

> Hi,
> Answer inline
>
> On Thu, Jun 8, 2017 at 1:07 PM, Stefano Bovina <bov...@gmail.com> wrote:
>
>> Hi,
>> does a storage best practise document for oVirt exist?
>>
>>
>> Some examples:
>>
>> oVirt allows to extend an existing storage domain: Is it better to keep a
>> 1:1 relation between LUN and oVirt storage domain?
>>
> What do you mean by 1:1 relation? Between storage domain and the number of
> LUNs the domain reside on?
>
>> If not, is it better to avoid adding LUNs to an already existing storage
>> domain?
>>
> No problems with storage domain extension.
>
>>
>> Following the previous questions:
>>
>> Is it better to have 1 Big oVirt storage domain or many small oVirt
>> storage domains?
>>
> Depends on your needs, be aware to the following:
> - Each domain has its own metadata which allocates ~5GB of the domain size.
> - Each domain is being constatntly monitored by the system, so large
> number of domain can decrease the system performance.
> There are also downsides with having big domains, like less flexability
>
>> There is a max num VM/disks for storage domain?
>>
>>
>> In which case is it better to use "direct attached lun" with respect to
>> an image on an oVirt storage domain?
>>
>
>>
>
>> Example:
>>
>> Simple web server:   > image
>> Large database (simple example):
>>- root,swap etc: 30GB  > image?
>>- data disk: 500GB-> (direct or image?)
>>
>> Regards,
>>
>> Stefano
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt storage best practise

2017-06-08 Thread Stefano Bovina
Hi,
does a storage best practise document for oVirt exist?


Some examples:

oVirt allows to extend an existing storage domain: Is it better to keep a
1:1 relation between LUN and oVirt storage domain?
If not, is it better to avoid adding LUNs to an already existing storage
domain?

Following the previous questions:

Is it better to have 1 Big oVirt storage domain or many small oVirt storage
domains?
There is a max num VM/disks for storage domain?


In which case is it better to use "direct attached lun" with respect to an
image on an oVirt storage domain?

Example:

Simple web server:   > image
Large database (simple example):
   - root,swap etc: 30GB  > image?
   - data disk: 500GB-> (direct or image?)

Regards,

Stefano
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-13 Thread Stefano Bovina
It's FC/FcoE.

This is with configuration suggested by emc/redhat

360060160a62134002818778f949de411 dm-5 DGC,VRAID
size=11T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:2 sdr  65:16  active ready running
| `- 2:0:1:2 sdy  65:128 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:2 sdd  8:48   active ready running
  `- 2:0:0:2 sdk  8:160  active ready running
360060160a6213400e622de69949de411 dm-2 DGC,VRAID
size=6.0T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:0 sdp  8:240  active ready running
| `- 2:0:1:0 sdw  65:96  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:0 sdb  8:16   active ready running
  `- 2:0:0:0 sdi  8:128  active ready running
360060160a6213400cce46e40949de411 dm-4 DGC,VRAID
size=560G features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:3 sds  65:32  active ready running
| `- 2:0:1:3 sdz  65:144 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:3 sde  8:64   active ready running
  `- 2:0:0:3 sdl  8:176  active ready running
360060160a6213400c4b39e80949de411 dm-3 DGC,VRAID
size=500G features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:1 sdq  65:0   active ready running
| `- 2:0:1:1 sdx  65:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:1 sdc  8:32   active ready running
  `- 2:0:0:1 sdj  8:144  active ready running
360060160a6213400fa2d31acbbfce511 dm-8 DGC,RAID 5
size=5.4T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:6 sdh  8:112  active ready running
| `- 2:0:0:6 sdo  8:224  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:1:6 sdv  65:80  active ready running
  `- 2:0:1:6 sdac 65:192 active ready running
360060160a621340040652b7582f5e511 dm-7 DGC,RAID 5
size=3.6T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:0:4 sdf  8:80   active ready running
| `- 2:0:0:4 sdm  8:192  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:1:4 sdt  65:48  active ready running
  `- 2:0:1:4 sdaa 65:160 active ready running
360060160a621340064b1034cbbfce511 dm-6 DGC,RAID 5
size=1.0T features='2 queue_if_no_path retain_attached_hw_handler'
hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 1:0:1:5 sdu  65:64  active ready running
| `- 2:0:1:5 sdab 65:176 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 1:0:0:5 sdg  8:96   active ready running
  `- 2:0:0:5 sdn  8:208  active ready running


This is with ovirt default conf:

360060160a6213400848e60af82f5e511 dm-3 DGC ,RAID 5
size=3.6T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 12:0:0:4 sdj 8:144 active ready  running
| `- 13:0:1:4 sdd 8:48  active ready  running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 12:0:1:4 sdf 8:80  active ready  running
  `- 13:0:0:4 sdh 8:112 active ready  running
360060160a621345e425b6b10e611 dm-2 DGC ,RAID 10
size=4.2T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 12:0:1:0 sde 8:64  active ready  running
| `- 13:0:0:0 sdg 8:96  active ready  running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 13:0:1:0 sdc 8:32  active ready  running
  `- 12:0:0:0 sdi 8:128 active ready  running


2017-05-13 18:50 GMT+02:00 Juan Pablo <pablo.localh...@gmail.com>:

> can you please give the output of:
> multipath -ll
> and
> iscsiadm -m session -P3
>
> JP
>
> 2017-05-13 6:48 GMT-03:00 Stefano Bovina <bov...@gmail.com>:
>
>> Hi,
>>
>> 2.6.32-696.1.1.el6.x86_64
>> 3.10.0-514.10.2.el7.x86_64
>>
>> I tried ioping test from different group of servers using multipath,
>> members of different storage group (different lun, different raid ecc), and
>> everyone report latency.
>> I tried the same test (ioping) on a server with powerpath instead of
>> multipath, with a dedicated raid group and ioping don't report latency.
>>
>>
>> 2017-05-13 2:00 GMT+02:00 Juan Pablo <pablo.localh...@gmail.com>:
>>
>>> sorry to jump in, but what kernel version are you using? had similar
>>> issue with kernel 4.10/4.11
>>>
>>>
>>> 2017-05-12 16:36 GMT-03:00 Stefano Bovina <bov...@gmail.com>:
>>&g

Re: [ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-13 Thread Stefano Bovina
Hi,

2.6.32-696.1.1.el6.x86_64
3.10.0-514.10.2.el7.x86_64

I tried ioping test from different group of servers using multipath,
members of different storage group (different lun, different raid ecc), and
everyone report latency.
I tried the same test (ioping) on a server with powerpath instead of
multipath, with a dedicated raid group and ioping don't report latency.


2017-05-13 2:00 GMT+02:00 Juan Pablo <pablo.localh...@gmail.com>:

> sorry to jump in, but what kernel version are you using? had similar issue
> with kernel 4.10/4.11
>
>
> 2017-05-12 16:36 GMT-03:00 Stefano Bovina <bov...@gmail.com>:
>
>> Hi,
>> a little update:
>>
>> The command multipath -ll hung when executed on the host while the
>> problem occur (nothing logged in /var/log/messages or dmesg).
>>
>> I tested latency with ioping:
>> ioping /dev/6a386652-629d-4045-835b-21d2f5c104aa/metadata
>>
>> Usually it return "time=15.6 ms", sometimes return "time=19 s" (yes,
>> seconds)
>>
>> Systems are up to date and I tried both path_checker (emc_clariion and
>> directio), without results.
>> (https://access.redhat.com/solutions/139193, it refers to the Rev A31 of
>> EMC document; last is A42 and suggest emc_clariion).
>>
>> Any idea or suggestion?
>>
>> Thanks,
>>
>> Stefano
>>
>> 2017-05-08 11:56 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:
>>
>>>
>>>
>>> On Mon, May 8, 2017 at 11:50 AM, Stefano Bovina <bov...@gmail.com>
>>> wrote:
>>>
>>>> Yes,
>>>> this configuration is the one suggested by EMC for EL7.
>>>>
>>>
>>> https://access.redhat.com/solutions/139193 suggest that for alua, the
>>> patch checker needs to be different.
>>>
>>> Anyway, it is very likely that you have storage issues - they need to be
>>> resolved first and I believe they have little to do with oVirt at the
>>> moment.
>>> Y.
>>>
>>>
>>>>
>>>> By the way,
>>>> "The parameters rr_min_io vs. rr_min_io_rq mean the same thing but are
>>>> used for device-mapper-multipath on differing kernel versions." and
>>>> rr_min_io_rq default value is 1, rr_min_io default value is 1000, so it
>>>> should be fine.
>>>>
>>>>
>>>> 2017-05-08 9:39 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:
>>>>
>>>>>
>>>>> On Sun, May 7, 2017 at 1:27 PM, Stefano Bovina <bov...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sense data are 0x0/0x0/0x0
>>>>>
>>>>>
>>>>> Interesting - first time I'm seeing 0/0/0. The 1st is usually 0x2 (see
>>>>> [1]), and then the rest [2], [3] make sense.
>>>>>
>>>>> A google search found another user with Clarion with the exact same
>>>>> error[4], so I'm leaning toward misconfiguration of multipathing/clarion
>>>>> here.
>>>>>
>>>>> Is your multipathing configuration working well for you?
>>>>> Are you sure it's a EL7 configuration? For example, I believe you
>>>>> should have rr_min_io_rq and not rr_min_io .
>>>>> Y.
>>>>>
>>>>> [1] http://www.t10.org/lists/2status.htm
>>>>> [2] http://www.t10.org/lists/2sensekey.htm
>>>>> [3] http://www.t10.org/lists/asc-num.htm
>>>>> [4] http://www.linuxquestions.org/questions/centos-111/multi
>>>>> path-problems-4175544908/
>>>>>
>>>>
>>>>
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-12 Thread Stefano Bovina
Hi,
a little update:

The command multipath -ll hung when executed on the host while the problem
occur (nothing logged in /var/log/messages or dmesg).

I tested latency with ioping:
ioping /dev/6a386652-629d-4045-835b-21d2f5c104aa/metadata

Usually it return "time=15.6 ms", sometimes return "time=19 s" (yes,
seconds)

Systems are up to date and I tried both path_checker (emc_clariion and
directio), without results.
(https://access.redhat.com/solutions/139193, it refers to the Rev A31 of
EMC document; last is A42 and suggest emc_clariion).

Any idea or suggestion?

Thanks,

Stefano

2017-05-08 11:56 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:

>
>
> On Mon, May 8, 2017 at 11:50 AM, Stefano Bovina <bov...@gmail.com> wrote:
>
>> Yes,
>> this configuration is the one suggested by EMC for EL7.
>>
>
> https://access.redhat.com/solutions/139193 suggest that for alua, the
> patch checker needs to be different.
>
> Anyway, it is very likely that you have storage issues - they need to be
> resolved first and I believe they have little to do with oVirt at the
> moment.
> Y.
>
>
>>
>> By the way,
>> "The parameters rr_min_io vs. rr_min_io_rq mean the same thing but are
>> used for device-mapper-multipath on differing kernel versions." and
>> rr_min_io_rq default value is 1, rr_min_io default value is 1000, so it
>> should be fine.
>>
>>
>> 2017-05-08 9:39 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:
>>
>>>
>>> On Sun, May 7, 2017 at 1:27 PM, Stefano Bovina <bov...@gmail.com> wrote:
>>>
>>>> Sense data are 0x0/0x0/0x0
>>>
>>>
>>> Interesting - first time I'm seeing 0/0/0. The 1st is usually 0x2 (see
>>> [1]), and then the rest [2], [3] make sense.
>>>
>>> A google search found another user with Clarion with the exact same
>>> error[4], so I'm leaning toward misconfiguration of multipathing/clarion
>>> here.
>>>
>>> Is your multipathing configuration working well for you?
>>> Are you sure it's a EL7 configuration? For example, I believe you should
>>> have rr_min_io_rq and not rr_min_io .
>>> Y.
>>>
>>> [1] http://www.t10.org/lists/2status.htm
>>> [2] http://www.t10.org/lists/2sensekey.htm
>>> [3] http://www.t10.org/lists/asc-num.htm
>>> [4] http://www.linuxquestions.org/questions/centos-111/multi
>>> path-problems-4175544908/
>>>
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-08 Thread Stefano Bovina
Yes,
this configuration is the one suggested by EMC for EL7.

By the way,
"The parameters rr_min_io vs. rr_min_io_rq mean the same thing but are used
for device-mapper-multipath on differing kernel versions." and rr_min_io_rq
default value is 1, rr_min_io default value is 1000, so it should be fine.


2017-05-08 9:39 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:

>
> On Sun, May 7, 2017 at 1:27 PM, Stefano Bovina <bov...@gmail.com> wrote:
>
>> Sense data are 0x0/0x0/0x0
>
>
> Interesting - first time I'm seeing 0/0/0. The 1st is usually 0x2 (see
> [1]), and then the rest [2], [3] make sense.
>
> A google search found another user with Clarion with the exact same
> error[4], so I'm leaning toward misconfiguration of multipathing/clarion
> here.
>
> Is your multipathing configuration working well for you?
> Are you sure it's a EL7 configuration? For example, I believe you should
> have rr_min_io_rq and not rr_min_io .
> Y.
>
> [1] http://www.t10.org/lists/2status.htm
> [2] http://www.t10.org/lists/2sensekey.htm
> [3] http://www.t10.org/lists/asc-num.htm
> [4] http://www.linuxquestions.org/questions/centos-111/multi
> path-problems-4175544908/
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM disk update failure

2017-05-08 Thread Stefano Bovina
Hi,
Ok, ignore what I said regards step 2. The volume exists.

2017-05-07 20:27 GMT+02:00 Stefano Bovina <bov...@gmail.com>:

> Hi,
> 1)
> What do you need exactly? Shutdown VM, edit disk, "extend size by" --->
> error
>
> 2)
> storage domain is up and running, but with lvm command, vg exists, lv does
> not exist
>
> Powering up a machine, I also saw these lines in vdsm log:
>
>
> Thread-22::DEBUG::2017-05-07 18:59:54,115::blockSD::596::St
> orage.Misc.excCmd::(getReadDelay) SUCCESS:  = '1+0 records in\n1+0
> records out\n4096 bytes (4.1 kB) copied, 0.000411491 s, 10.0 MB/s\n'; 
> = 0
> Thread-300::DEBUG::2017-05-07 
> 18:59:55,539::libvirtconnection::151::root::(wrapper)
> Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not
> found: Requested metadata element is not present
> Thread-1101::DEBUG::2017-05-07 
> 18:59:55,539::libvirtconnection::151::root::(wrapper)
> Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not
> found: Requested metadata element is not present
> VM Channels Listener::INFO::2017-05-07 
> 18:59:55,560::guestagent::180::vm.Vm::(_handleAPIVersion)
> vmId=`79cd29dc-bb6c-4f1f-ae38-54d6cac05489`::Guest API version changed
> from 3 to 1
> VM Channels Listener::INFO::2017-05-07 
> 18:59:55,865::guestagent::180::vm.Vm::(_handleAPIVersion)
> vmId=`d37b37e9-1dd7-4a90-8d8c-755743c714ad`::Guest API version changed
> from 2 to 1
> VM Channels Listener::INFO::2017-05-07 
> 18:59:57,414::guestagent::180::vm.Vm::(_handleAPIVersion)
> vmId=`8a034ac2-2646-4fe7-8fda-7c33affa8009`::Guest API version changed
> from 2 to 1
> VM Channels Listener::INFO::2017-05-07 
> 18:59:58,178::guestagent::180::vm.Vm::(_handleAPIVersion)
> vmId=`4d1dad66-4ada-445a-b5f6-c695220d6b19`::Guest API version changed
> from 3 to 1
> Thread-272::DEBUG::2017-05-07 
> 18:59:58,187::libvirtconnection::151::root::(wrapper)
> Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not
> found: Requested metadata element is not present
> Thread-166::DEBUG::2017-05-07 
> 18:59:58,539::libvirtconnection::151::root::(wrapper)
> Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not
> found: Requested metadata element is not present
> Thread-18::DEBUG::2017-05-07 19:00:33,805::persistentDict::
> 234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[]
> Thread-18::DEBUG::2017-05-07 19:00:33,805::persistentDict::
> 252::Storage.PersistentDict::(refresh) Empty metadata
> Thread-18::DEBUG::2017-05-07 19:00:33,805::persistentDict::
> 192::Storage.PersistentDict::(__init__) Created a persistent dict with
> VGTagMetadataRW backend
> Thread-18::DEBUG::2017-05-07 19:00:33,805::lvm::504::Storag
> e.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation'
> got the operation mutex
> Thread-18::DEBUG::2017-05-07 19:00:33,806::lvm::506::Storag
> e.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation'
> released the operation mutex
> Thread-18::DEBUG::2017-05-07 19:00:33,806::lvm::514::Storag
> e.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation'
> got the operation mutex
> Thread-18::DEBUG::2017-05-07 19:00:33,806::lvm::526::Storag
> e.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation'
> released the operation mutex
> Thread-18::DEBUG::2017-05-07 
> 19:00:33,806::lvm::371::Storage.OperationMutex::(_reloadvgs)
> Operation 'lvm reload operation' got the operation mutex
> Thread-18::DEBUG::2017-05-07 
> 19:00:33,806::lvm::291::Storage.Misc.excCmd::(cmd)
> /usr/bin/sudo -n /sbin/lvm vgs --config ' devices { preferred_names =
> ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [
> '\''a|/dev/mapper/360060160a62134002818778f949de411|/dev/
> mapper/360060160a621340040652b7582f5e511|/dev/mapper/3600601
> 60a621340064b1034cbbfce511|/dev/mapper/360060160a6213400c4
> b39e80949de411|/dev/mapper/360060160a6213400cce46e40949de411
> |/dev/mapper/360060160a6213400e622de69949de411|/dev/mapper/3
> 60060160a6213400fa2d31acbbfce511|'\'', '\''r|.*|'\'' ] }  global {
>  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0
> }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b
> --nosuffix --separator '|' --ignoreskippedcluster -o
> uuid,name,attr,size,free,extent_size,extent_count,free_count
> ,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> 2c501858-bf8d-49a5-a42b-bca341b47827 (cwd None)
> Thread-18::DEBUG::2017-05-07 
> 19:00:33,962::lvm::291::Storage.Misc.excCmd::(cmd)
> SUCCESS:  = '  WARNING: lvmetad is running but disabled. Restart
> lvmetad before enabling it!\n';  = 0
> Thread-18::DEBUG::2

Re: [ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-08 Thread Stefano Bovina
Hi,

thanks for the advice. The upgrade is already scheduled, but I would like
to fix this issue before proceeding with a big upgrade (unless an upgrade
will fixes the problem).

The problem is on all hypervisors.

We have 2 cluster (both connected to the same storage system):
 - the old one with FC
 - the new one with FCoE


With dmesg -T and looking at /var/log/messages we found several problems
like these:

1)
[Wed May  3 10:40:11 2017] sd 12:0:0:3: Parameters changed
[Wed May  3 10:40:11 2017] sd 12:0:1:3: Parameters changed
[Wed May  3 10:40:11 2017] sd 12:0:1:1: Parameters changed
[Wed May  3 10:40:12 2017] sd 13:0:0:1: Parameters changed
[Wed May  3 10:40:12 2017] sd 13:0:0:3: Parameters changed
[Wed May  3 10:40:12 2017] sd 13:0:1:3: Parameters changed
[Wed May  3 12:39:32 2017] device-mapper: multipath: Failing path 65:144.
[Wed May  3 12:39:37 2017] sd 13:0:1:2: alua: port group 01 state A
preferred supports tolUsNA

2)
[Wed May  3 17:08:17 2017] perf interrupt took too long (2590 > 2500),
lowering kernel.perf_event_max_sample_rate to 5

3)
[Wed May  3 19:16:21 2017] bnx2fc: els 0x5: tgt not ready
[Wed May  3 19:16:21 2017] bnx2fc: Relogin to the tgt

4)
sd 13:0:1:0: [sdx] FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
sd 13:0:1:0: [sdx] CDB: Read(16) 88 00 00 00 00 00 00 58 08 00 00 00 04 00
00 00
blk_update_request: I/O error, dev sdx, sector 5769216
device-mapper: multipath: Failing path 65:112.
sd 13:0:1:0: alua: port group 01 state A preferred supports tolUsNA

5)
multipathd: 360060160a6213400cce46e40949de411: sdaa - emc_clariion_checker:
Read error for WWN 60060160a6213400cce46e40949de411.  Sense data are
0x0/0x0/0x0.
multipathd: checker failed path 65:160 in map
360060160a6213400cce46e40949de411
multipathd: 360060160a6213400cce46e40949de411: remaining active paths: 3
kernel: device-mapper: multipath: Failing path 65:160.
multipathd: 360060160a6213400cce46e40949de411: sdaa - emc_clariion_checker:
Active path is healthy.
multipathd: 65:160: reinstated
multipathd: 360060160a6213400cce46e40949de411: remaining active paths: 4

6)
[Sat May  6 11:37:07 2017] megaraid_sas :02:00.0: Firmware crash dump
is not available


Multipath configuration is the following (recommended by EMC):

# RHEV REVISION 1.1
# RHEV PRIVATE

devices {
device {
vendor  "DGC"
product ".*"
product_blacklist   "LUNZ"
path_grouping_policygroup_by_prio
path_selector   "round-robin 0"
path_checkeremc_clariion
features"1 queue_if_no_path"
hardware_handler"1 alua"
prioalua
failbackimmediate
rr_weight   uniform
no_path_retry   60
rr_min_io   1
}
}

Regards,

Stefano


2017-05-07 8:36 GMT+02:00 Yaniv Kaul <yk...@redhat.com>:

>
>
> On Tue, May 2, 2017 at 11:09 PM, Stefano Bovina <bov...@gmail.com> wrote:
>
>> Hi, the engine logs show high latency on storage domains: "Storage domain
>>  experienced a high latency of 19.2814 seconds from .. This may
>> cause performance and functional issues."
>>
>> Looking at host logs, I found also these locking errors:
>>
>> 2017-05-02 20:52:13+0200 33883 [10098]: s1 renewal error -202
>> delta_length 10 last_success 33853
>> 2017-05-02 20:52:19+0200 33889 [10098]: 6a386652 aio collect 0
>> 0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 result 1048576:0 other free
>> 2017-05-02 21:08:51+0200 34880 [10098]: 6a386652 aio timeout 0
>> 0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe4f2000 ioto 10 to_count 24
>> 2017-05-02 21:08:51+0200 34880 [10098]: s1 delta_renew read rv -202
>> offset 0 /dev/6a386652-629d-4045-835b-21d2f5c104aa/ids
>> 2017-05-02 21:08:51+0200 34880 [10098]: s1 renewal error -202
>> delta_length 10 last_success 34850
>> 2017-05-02 21:08:53+0200 34883 [10098]: 6a386652 aio collect 0
>> 0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe4f2000 result 1048576:0 other free
>> 2017-05-02 21:30:40+0200 36189 [10098]: 6a386652 aio timeout 0
>> 0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 ioto 10 to_count 25
>> 2017-05-02 21:30:40+0200 36189 [10098]: s1 delta_renew read rv -202
>> offset 0 /dev/6a386652-629d-4045-835b-21d2f5c104aa/ids
>> 2017-05-02 21:30:40+0200 36189 [10098]: s1 renewal error -202
>> delta_length 10 last_success 36159
>> 2017-05-02 21:30:45+0200 36195 [10098]: 6a386652 aio collect 0
>> 0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 result 1048576:0 other free
>>
>> and this vdsm errors too:
>>
>> Thread-22::ERROR::2017-05-02 21:53:48,147::sdc::137::Storag
>> e.StorageDomainCache::(_findDomain) looking for unfetched domain
>> f8f21d6c-2425-45c4-aded-4cb9b53ebd96
>> Thread-22::ERROR::2

[ovirt-users] VM disk update failure

2017-05-07 Thread Stefano Bovina
Hi, while trying to update a VM disk, a failure was returned (forcing me to
add a new disk)

Any advice on how to resolve this error?

Thanks


Installation info:

ovirt-release35-006-1.noarch
libgovirt-0.3.3-1.el7_2.1.x86_64
vdsm-4.16.30-0.el7.centos.x86_64
vdsm-xmlrpc-4.16.30-0.el7.centos.noarch
vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch
vdsm-jsonrpc-4.16.30-0.el7.centos.noarch
vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch
vdsm-python-4.16.30-0.el7.centos.noarch
vdsm-cli-4.16.30-0.el7.centos.noarch
qemu-kvm-ev-2.3.0-29.1.el7.x86_64
qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64
qemu-kvm-tools-ev-2.3.0-29.1.el7.x86_64
libvirt-client-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-storage-1.2.17-13.el7_2.3.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.3.x86_64
libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64
libvirt-glib-0.1.9-1.el7.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-lxc-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-interface-1.2.17-13.el7_2.3.x86_64
libvirt-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-config-network-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-secret-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-kvm-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.3.x86_64


 engine.log

2017-05-02 09:48:26,505 INFO  [org.ovirt.engine.core.bll.UpdateVmDiskCommand]
(ajp--127.0.0.1-8702-6) [c3d7125] Lock Acquired to object EngineLock
[exclusiveLocks= key: 25c0bcc0-0d3d-4ddc-b103-24ed2ac5aa05 value:
VM_DISK_BOOT
key: c5fb9190-d059-4d9b-af23-07618ff660ce value: DISK
, sharedLocks= key: 25c0bcc0-0d3d-4ddc-b103-24ed2ac5aa05 value: VM
]
2017-05-02 09:48:26,515 INFO  [org.ovirt.engine.core.bll.UpdateVmDiskCommand]
(ajp--127.0.0.1-8702-6) [c3d7125] Running command: UpdateVmDiskCommand
internal: false. Entities affected :  ID: c5fb9190-d059-4d9b-af23-07618ff660ce
Type: DiskAction group EDIT_DISK_PROPERTIES with role type USER
2017-05-02 09:48:26,562 INFO
[org.ovirt.engine.core.bll.ExtendImageSizeCommand]
(ajp--127.0.0.1-8702-6) [ae718d8] Running command: ExtendImageSizeCommand
internal: true. Entities affected :  ID: c5fb9190-d059-4d9b-af23-07618ff660ce
Type: DiskAction group EDIT_DISK_PROPERTIES with role type USER
2017-05-02 09:48:26,565 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.ExtendImageSizeVDSCommand]
(ajp--127.0.0.1-8702-6) [ae718d8] START, ExtendImageSizeVDSCommand(
storagePoolId = 715d1ba2-eabe-48db-9aea-c28c30359808, ignoreFailoverLimit =
false), log id: 52aac743
2017-05-02 09:48:26,604 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.ExtendImageSizeVDSCommand]
(ajp--127.0.0.1-8702-6) [ae718d8] FINISH, ExtendImageSizeVDSCommand, log
id: 52aac743
2017-05-02 09:48:26,650 INFO
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(ajp--127.0.0.1-8702-6) [ae718d8] CommandAsyncTask::Adding
CommandMultiAsyncTasks object for command cb7958d9-6eae-44a9-891a-
7fe088a79df8
2017-05-02 09:48:26,651 INFO
[org.ovirt.engine.core.bll.CommandMultiAsyncTasks]
(ajp--127.0.0.1-8702-6) [ae718d8] CommandMultiAsyncTasks::AttachTask:
Attaching task 769a4b18-182b-4048-bb34-a276a55ccbff to command
cb7958d9-6eae-44a9-891a-7fe088a79df8.
2017-05-02 09:48:26,661 INFO
[org.ovirt.engine.core.bll.tasks.AsyncTaskManager]
(ajp--127.0.0.1-8702-6) [ae718d8] Adding task
769a4b18-182b-4048-bb34-a276a55ccbff
(Parent Command UpdateVmDisk, Parameters Type org.ovirt.engine.core.common.
asynctasks.AsyncTaskParameters), polling hasn't started yet..
2017-05-02 09:48:26,673 INFO  [org.ovirt.engine.core.dal.
dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-6)
[ae718d8] Correlation ID: c3d7125, Call Stack: null, Custom Event ID: -1,
Message: VM sysinfo-73 sysinfo-73_Disk3 disk was updated by admin@internal.
2017-05-02 09:48:26,674 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(ajp--127.0.0.1-8702-6) [ae718d8] BaseAsyncTask::startPollingTask: Starting
to poll task 769a4b18-182b-4048-bb34-a276a55ccbff.
2017-05-02 09:48:28,430 INFO
[org.ovirt.engine.core.bll.tasks.AsyncTaskManager]
(DefaultQuartzScheduler_Worker-48) [36cd2f7] Polling and updating Async
Tasks: 1 tasks, 1 tasks to poll now
2017-05-02 09:48:28,435 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.
HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-48)
[36cd2f7] Failed in HSMGetAllTasksStatusesVDS method
2017-05-02 09:48:28,436 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(DefaultQuartzScheduler_Worker-48) [36cd2f7] SPMAsyncTask::PollTask:
Polling task 769a4b18-182b-4048-bb34-a276a55ccbff (Parent Command
UpdateVmDisk, Parameters Type
org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters)
returned status finished, result 'cleanSuccess'.
2017-05-02 09:48:28,446 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask]
(DefaultQuartzScheduler_Worker-48) [36cd2f7] 

[ovirt-users] High latency on storage domains and sanlock renewal error

2017-05-07 Thread Stefano Bovina
Hi, the engine logs show high latency on storage domains: "Storage domain
 experienced a high latency of 19.2814 seconds from .. This may
cause performance and functional issues."

Looking at host logs, I found also these locking errors:

2017-05-02 20:52:13+0200 33883 [10098]: s1 renewal error -202 delta_length
10 last_success 33853
2017-05-02 20:52:19+0200 33889 [10098]: 6a386652 aio collect 0
0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 result 1048576:0 other free
2017-05-02 21:08:51+0200 34880 [10098]: 6a386652 aio timeout 0
0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe4f2000 ioto 10 to_count 24
2017-05-02 21:08:51+0200 34880 [10098]: s1 delta_renew read rv -202 offset
0 /dev/6a386652-629d-4045-835b-21d2f5c104aa/ids
2017-05-02 21:08:51+0200 34880 [10098]: s1 renewal error -202 delta_length
10 last_success 34850
2017-05-02 21:08:53+0200 34883 [10098]: 6a386652 aio collect 0
0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe4f2000 result 1048576:0 other free
2017-05-02 21:30:40+0200 36189 [10098]: 6a386652 aio timeout 0
0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 ioto 10 to_count 25
2017-05-02 21:30:40+0200 36189 [10098]: s1 delta_renew read rv -202 offset
0 /dev/6a386652-629d-4045-835b-21d2f5c104aa/ids
2017-05-02 21:30:40+0200 36189 [10098]: s1 renewal error -202 delta_length
10 last_success 36159
2017-05-02 21:30:45+0200 36195 [10098]: 6a386652 aio collect 0
0x7f1fb80008c0:0x7f1fb80008d0:0x7f1fbe9fb000 result 1048576:0 other free

and this vdsm errors too:

Thread-22::ERROR::2017-05-02 21:53:48,147::sdc::137::
Storage.StorageDomainCache::(_findDomain) looking for unfetched domain
f8f21d6c-2425-45c4-aded-4cb9b53ebd96
Thread-22::ERROR::2017-05-02 21:53:48,148::sdc::154::
Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain
f8f21d6c-2425-45c4-aded-4cb9b53ebd96

Engine instead is showing this errors:

2017-05-02 21:40:38,089 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(DefaultQuartzScheduler_Worker-96) Command SpmStatusVDSCommand(HostName = <
myhost.example.com>, HostId = dcc0275a-b011-4e33-bb95-366ffb0697b3,
storagePoolId = 715d1ba2-eabe-48db-9aea-c28c30359808) execution failed.
Exception: VDSErrorException: VDSGenericException: VDSErrorException:
Failed to SpmStatusVDS, error = (-202, 'Sanlock resource read failure',
'Sanlock exception'), code = 100
2017-05-02 21:41:08,431 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(DefaultQuartzScheduler_Worker-53) [6e0d5ebf] Failed in SpmStatusVDS method
2017-05-02 21:41:08,443 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(DefaultQuartzScheduler_Worker-53) [6e0d5ebf] Command
SpmStatusVDSCommand(HostName = , HostId =
7991933e-5f30-48cd-88bf-b0b525613384, storagePoolId =
4bd73239-22d0-4c44-ab8c-17adcd580309) execution failed. Exception:
VDSErrorException: VDSGenericException: VDSErrorException: Failed to
SpmStatusVDS, error = (-202, 'Sanlock resource read failure', 'Sanlock
exception'), code = 100
2017-05-02 21:41:31,975 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(DefaultQuartzScheduler_Worker-61) [2a54a1b2] Failed in SpmStatusVDS method
2017-05-02 21:41:31,987 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
(DefaultQuartzScheduler_Worker-61) [2a54a1b2] Command
SpmStatusVDSCommand(HostName = , HostId =
dcc0275a-b011-4e33-bb95-366ffb0697b3, storagePoolId =
715d1ba2-eabe-48db-9aea-c28c30359808) execution failed. Exception:
VDSErrorException: VDSGenericException: VDSErrorException: Failed to
SpmStatusVDS, error = (-202, 'Sanlock resource read failure', 'Sanlock
exception'), code = 100


I'm using Fibre Channel or FCoE connectivity; storage array technical
support has analyzed it (also switch and OS configurations), but nothing
has been found.

Any advice?

Thanks


Installation info:

ovirt-release35-006-1.noarch
libgovirt-0.3.3-1.el7_2.1.x86_64
vdsm-4.16.30-0.el7.centos.x86_64
vdsm-xmlrpc-4.16.30-0.el7.centos.noarch
vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch
vdsm-jsonrpc-4.16.30-0.el7.centos.noarch
vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch
vdsm-python-4.16.30-0.el7.centos.noarch
vdsm-cli-4.16.30-0.el7.centos.noarch
qemu-kvm-ev-2.3.0-29.1.el7.x86_64
qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64
qemu-kvm-tools-ev-2.3.0-29.1.el7.x86_64
libvirt-client-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-storage-1.2.17-13.el7_2.3.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.3.x86_64
libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64
libvirt-glib-0.1.9-1.el7.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-lxc-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-interface-1.2.17-13.el7_2.3.x86_64
libvirt-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-config-network-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-driver-secret-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.3.x86_64