Re: [ovirt-users] vms in paused state

2016-04-29 Thread Nir Soffer
/run/vdsm/.recovery

On Fri, Apr 29, 2016 at 10:59 PM, Bill James  wrote:

> where do I find the recovery files?
>
> [root@ovirt1 test vdsm]# pwd
> /var/lib/vdsm
> [root@ovirt1 test vdsm]# ls -la
> total 16
> drwxr-xr-x   6 vdsm kvm100 Mar 17 16:33 .
> drwxr-xr-x. 45 root root  4096 Apr 29 12:01 ..
> -rw-r--r--   1 vdsm kvm  10170 Jan 19 05:04 bonding-defaults.json
> drwxr-xr-x   2 vdsm root 6 Apr 19 11:34 netconfback
> drwxr-xr-x   3 vdsm kvm 54 Apr 19 11:35 persistence
> drwxr-x---.  2 vdsm kvm  6 Mar 17 16:33 transient
> drwxr-xr-x   2 vdsm kvm 40 Mar 17 16:33 upgrade
> [root@ovirt1 test vdsm]# locate recovery
> /opt/hp/hpdiags/en/tcstorage.ldinterimrecovery.htm
> /opt/hp/hpdiags/en/tcstorage.ldrecoveryready.htm
> /usr/share/doc/postgresql-9.2.15/html/archive-recovery-settings.html
> /usr/share/doc/postgresql-9.2.15/html/recovery-config.html
> /usr/share/doc/postgresql-9.2.15/html/recovery-target-settings.html
> /usr/share/pgsql/recovery.conf.sample
> /var/lib/nfs/v4recovery
>
>
> [root@ovirt1 test vdsm]# locate 757a5  (disk id)
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.lease
>
> /ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.meta
> [root@ovirt1 test vdsm]# locate 5bfb140 (vm id)
>
> /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.com.redhat.rhevm.vdsm
>
> /var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.org.qemu.guest_agent.0
>
>
>
>
> On 4/29/16 10:02 AM, Michal Skrivanek wrote:
>
>
>
> On 29 Apr 2016, at 18:26, Bill James < 
> bill.ja...@j2.com> wrote:
>
> yes they are still saying "paused" state.
> No, bouncing libvirt didn't help.
>
>
> Then my suspicion of vm recovery gets closer to a certainty:)
> Can you get one of the paused vm's .recovery file from /var/lib/vdsm and
> check it says Paused there? It's worth a shot to try to remove that file
> and restart vdsm, then check logs and that vm status...it should recover
> "good enough" from libvirt only.
> Try it with one first
>
> I noticed the errors about the ISO domain. Didn't think that was related.
> I have been migrating a lot of VMs to ovirt lately, and recently added
> another node.
> Also had some problems with /etc/exports for a while, but I think those
> issues are all resolved.
>
>
> Last "unresponsive" message in vdsm.log was:
>
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21*
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
> vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become unresponsive
> (command timeout, age=310323.97)
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout)
> vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become unresponsive
> (command timeout, age=310323.97)
>
>
>
> Thanks.
>
>
>
> On 4/29/16 1:40 AM, Michal Skrivanek wrote:
>
>
> On 28 Apr 2016, at 19:40, Bill James  wrote:
>
> thank you for response.
> I bold-ed the ones that are listed as "paused".
>
>
> [root@ovirt1 test vdsm]# virsh -r list --all
>  Id    Name                           State
> 
>
>
>
>
> Looks like problem started around 2016-04-17 20:19:34,822, based on
> engine.log attached.
>
>
> yes, that time looks correct. Any idea what might have been a trigger?
> Anything interesting happened at that time (power outage of some host, some
> maintenance action, anything)?Â
> logs indicate a problem when vdsm talks to libvirt(all those "monitor
> become unresponsive†)
>
> It does seem that at that time you started to have some storage
> connectivity issues - first one at 2016-04-17 20:06:53,929. And it
> doesn’t look temporary because such errors are still there couple hours
> later(in your most recent file you attached I can see at 23:00:54)
> When I/O gets blocked the VMs may experience issues (then VM gets Paused),
> or their qemu process gets stuck(resulting in libvirt either reporting
> error or getting stuck as well -> resulting in what vdsm sees as “monitor
> unresponsive†)
>
> Since you now bounced libvirtd - did it help? Do you still see wrong
> status for those VMs and still those "monitor unresponsive" errors in
> vdsm.log?
> If not…then I would suspect the “vm recovery†code not working
> correctly. Milan is looking at that.
>
> Thanks,
> michal
>
>
> There's a lot of vdsm logs!
>
> fyi, the storage domain for these Vms is a "local" nfs share,
> 7e566f55-e060-47b7-bfa4-ac3c48d70dda.
>
> attached more logs.
>
>

Re: [ovirt-users] VMs becoming non-responsive sporadically

2016-04-29 Thread Nir Soffer
On Fri, Apr 29, 2016 at 9:17 PM,   wrote:
> Hi,
>
> We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues with
> some VMs being paused because they're marked as non-responsive. Mostly,
> after a few seconds they recover, but we want to debug precisely this
> problem so we can fix it consistently.
>
> Our scenario is the following:
>
> ~495 VMs, of which ~120 are constantly up
> 3 datastores, all of them iSCSI-based:
>   * ds1: 2T, currently has 276 disks
>   * ds2: 2T, currently has 179 disks
>   * ds3: 500G, currently has 65 disks
> 7 hosts: All have mostly the same hardware. CPU and memory are currently
> very lowly used (< 10%).
>
>   ds1 and ds2 are physically the same backend which exports two 2TB volumes.
> ds3 is a different storage backend where we're currently migrating some
> disks from ds1 and ds2.

What the the storage backend behind ds1 and 2?

>
> Usually, when VMs become unresponsive, the whole host where they run gets
> unresponsive too, so that gives a hint about the problem, my bet is the
> culprit is somewhere on the host side and not on the VMs side.

Probably the vm became unresponsive because connection to the host was lost.

> When that
> happens, the host itself gets non-responsive and only recoverable after
> reboot, since it's unable to reconnect.

Piotr, can you check engine log and explain why host is not reconnected?

> I must say this is not specific to
> this oVirt version, when we were using v.3.6.4 the same happened, and it's
> also worthy mentioning we've not done any configuration changes and
> everything had been working quite well for a long time.
>
> We were monitoring our ds1 and ds2 physical backend to see performance and
> we suspect we've run out of IOPS since we're reaching the maximum specified
> by the manufacturer, probably at certain times the host cannot perform a
> storage operation within some time limit and it marks VMs as unresponsive.
> That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When we
> run out of space on ds3 we'll create more smaller volumes to keep migrating.
>
> On the host side, when this happens, we've run repoplot on the vdsm log and
> I'm attaching the result. Clearly there's a *huge* LVM response time (~30
> secs.).

Indeed the log show very slow vgck and vgs commands - these are called every
5 minutes for checking the vg health and refreshing vdsm lvm cache.

1. starting vgck

Thread-96::DEBUG::2016-04-29
13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices
{ pre
ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter = [
'\''a|/dev/mapper/36000eb3a4f1acbc20043|'\
'', '\''r|.*|'\'' ] }  global {  locking_type=1
prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {
retain_min = 50  retain_days = 0 } ' 5de4a000-a9c4-48
9c-8eee-10368647c413 (cwd None)

2. vgck ends after 55 seconds

Thread-96::DEBUG::2016-04-29
13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS:  = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n';  = 0

3. starting vgs

Thread-96::DEBUG::2016-04-29
13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices
{ pref
erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter = [
'\''a|/dev/mapper/36000eb3a4f1acbc20043|/de
v/mapper/36000eb3a4f1acbc200b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'',
'\''r|.*|'\'' ] }  global {
 locking_type=1  prioritise_write_locks=1  wait_for_locks=1
use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } '
--noheadings --units b --nosuffix --separator '|
' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
5de4a000-a9c4-489c-8eee-10368
647c413 (cwd None)

4. vgs finished after 37 seconds

Thread-96::DEBUG::2016-04-29
13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS:  = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n';  = 0

Zdenek, how do you suggest to debug this slow lvm commands?

Can you run the following commands on a host in trouble, and on some other
hosts in the same timeframe?

time vgck - --config ' devices { filter =
['\''a|/dev/mapper/36000eb3a4f1acbc20043|'\'',
'\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1
wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413

time vgs - --config ' global { locking_type=1
prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {
retain_min = 50  retain_days = 0 } '
5de4a000-a9c4-489c-8eee-10368647c413

Note that I 

Re: [ovirt-users] vms in paused state

2016-04-29 Thread Bill James

where do I find the recovery files?

[root@ovirt1 test vdsm]# pwd
/var/lib/vdsm
[root@ovirt1 test vdsm]# ls -la
total 16
drwxr-xr-x   6 vdsm kvm100 Mar 17 16:33 .
drwxr-xr-x. 45 root root  4096 Apr 29 12:01 ..
-rw-r--r--   1 vdsm kvm  10170 Jan 19 05:04 bonding-defaults.json
drwxr-xr-x   2 vdsm root 6 Apr 19 11:34 netconfback
drwxr-xr-x   3 vdsm kvm 54 Apr 19 11:35 persistence
drwxr-x---.  2 vdsm kvm  6 Mar 17 16:33 transient
drwxr-xr-x   2 vdsm kvm 40 Mar 17 16:33 upgrade
[root@ovirt1 test vdsm]# locate recovery
/opt/hp/hpdiags/en/tcstorage.ldinterimrecovery.htm
/opt/hp/hpdiags/en/tcstorage.ldrecoveryready.htm
/usr/share/doc/postgresql-9.2.15/html/archive-recovery-settings.html
/usr/share/doc/postgresql-9.2.15/html/recovery-config.html
/usr/share/doc/postgresql-9.2.15/html/recovery-target-settings.html
/usr/share/pgsql/recovery.conf.sample
/var/lib/nfs/v4recovery


[root@ovirt1 test vdsm]# locate 757a5  (disk id)
/ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118
/ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2
/ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.lease
/ovirt-store/nfs1/7e566f55-e060-47b7-bfa4-ac3c48d70dda/images/757a5e69-a791-4391-9d7d-9516bf7f2118/211581dc-fa98-41be-a0b9-ace236149bc2.meta
[root@ovirt1 test vdsm]# locate 5bfb140 (vm id)
/var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.com.redhat.rhevm.vdsm
/var/lib/libvirt/qemu/channels/5bfb140a-a971-4c9c-82c6-277929eb45d4.org.qemu.guest_agent.0



On 4/29/16 10:02 AM, Michal Skrivanek wrote:



On 29 Apr 2016, at 18:26, Bill James > wrote:



yes they are still saying "paused" state.
No, bouncing libvirt didn't help.


Then my suspicion of vm recovery gets closer to a certainty:)
Can you get one of the paused vm's .recovery file from /var/lib/vdsm 
and check it says Paused there? It's worth a shot to try to remove 
that file and restart vdsm, then check logs and that vm status...it 
should recover "good enough" from libvirt only.

Try it with one first


I noticed the errors about the ISO domain. Didn't think that was related.
I have been migrating a lot of VMs to ovirt lately, and recently 
added another node.
Also had some problems with /etc/exports for a while, but I think 
those issues are all resolved.



Last "unresponsive" message in vdsm.log was:

vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21* 
11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become 
unresponsive (command timeout, age=310323.97)
vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21 
11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become 
unresponsive (command timeout, age=310323.97)




Thanks.



On 4/29/16 1:40 AM, Michal Skrivanek wrote:



On 28 Apr 2016, at 19:40, Bill James  wrote:

thank you for response.
I bold-ed the ones that are listed as "paused".


[root@ovirt1 test vdsm]# virsh -r list --all
 Id   Name                          State






Looks like problem started around 2016-04-17 20:19:34,822, based on 
engine.log attached.


yes, that time looks correct. Any idea what might have been a 
trigger? Anything interesting happened at that time (power outage of 
some host, some maintenance action, anything)?Â
logs indicate a problem when vdsm talks to libvirt(all those 
"monitor become unresponsive”)


It does seem that at that time you started to have some storage 
connectivity issues - first one at 2016-04-17 20:06:53,929. And it 
doesn’t look temporary because such errors are still there couple 
hours later(in your most recent file you attached I can see at 23:00:54)
When I/O gets blocked the VMs may experience issues (then VM gets 
Paused), or their qemu process gets stuck(resulting in libvirt 
either reporting error or getting stuck as well -> resulting in what 
vdsm sees as “monitor unresponsive”)


Since you now bounced libvirtd - did it help? Do you still see wrong 
status for those VMs and still those "monitor unresponsive" errors 
in vdsm.log?
If not…then I would suspect the “vm recovery” code not working 
correctly. Milan is looking at that.


Thanks,
michal



There's a lot of vdsm logs!

fyi, the storage domain for these Vms is a "local" nfs share, 
7e566f55-e060-47b7-bfa4-ac3c48d70dda.


attached more logs.


On 04/28/2016 12:53 AM, Michal Skrivanek wrote:

On 27 Apr 2016, at 19:16, Bill James  wrote:

virsh # list --all
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such 

[ovirt-users] CINLUG: Virtualization Management, The oVirt Way

2016-04-29 Thread Brian Proffitt
The world of virtualization seems to be getting passed by with all of the
advances in containers and container management technology. But don't count
virtual machines out just yet. Large-scale, centralized management for
server and desktop virtual machines is available now, with the free and
open source software platform oVirt. This KVM-based management tool
provides production-ready VM management to organizations large and small,
and is used by universities, businesses, and even major airports. Join Red
Hat's Brian Proffitt on a tour of oVirt plus a fun look at how VM
management and cloud computing *do* work together.


http://www.meetup.com/CINLUG/events/230746101/


-- 
Brian Proffitt
Principal Community Analyst
Open Source and Standards
@TheTechScribe
574.383.9BKP
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vms in paused state

2016-04-29 Thread Michal Skrivanek


> On 29 Apr 2016, at 18:26, Bill James  wrote:
> 
> yes they are still saying "paused" state.
> No, bouncing libvirt didn't help.

Then my suspicion of vm recovery gets closer to a certainty:)
Can you get one of the paused vm's .recovery file from /var/lib/vdsm and check 
it says Paused there? It's worth a shot to try to remove that file and restart 
vdsm, then check logs and that vm status...it should recover "good enough" from 
libvirt only. 
Try it with one first

> I noticed the errors about the ISO domain. Didn't think that was related.
> I have been migrating a lot of VMs to ovirt lately, and recently added 
> another node.
> Also had some problems with /etc/exports for a while, but I think those 
> issues are all resolved.
> 
> 
> Last "unresponsive" message in vdsm.log was:
> 
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21 
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
> vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become unresponsive 
> (command timeout, age=310323.97)
> vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21 
> 11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
> vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become unresponsive 
> (command timeout, age=310323.97)
> 
> 
> 
> Thanks.
> 
> 
> 
>> On 4/29/16 1:40 AM, Michal Skrivanek wrote:
>> 
>>> On 28 Apr 2016, at 19:40, Bill James  wrote:
>>> 
>>> thank you for response.
>>> I bold-ed the ones that are listed as "paused".
>>> 
>>> 
>>> [root@ovirt1 test vdsm]# virsh -r list --all
>>>  Id    Name                           State
>>> 
>> 
>> 
>>> 
>>> 
>>> Looks like problem started around 2016-04-17 20:19:34,822, based on 
>>> engine.log attached.
>> 
>> yes, that time looks correct. Any idea what might have been a trigger? 
>> Anything interesting happened at that time (power outage of some host, some 
>> maintenance action, anything)? 
>> logs indicate a problem when vdsm talks to libvirt(all those "monitor become 
>> unresponsive”)
>> 
>> It does seem that at that time you started to have some storage connectivity 
>> issues - first one at 2016-04-17 20:06:53,929. And it doesn’t look 
>> temporary because such errors are still there couple hours later(in your 
>> most recent file you attached I can see at 23:00:54)
>> When I/O gets blocked the VMs may experience issues (then VM gets Paused), 
>> or their qemu process gets stuck(resulting in libvirt either reporting error 
>> or getting stuck as well -> resulting in what vdsm sees as “monitor 
>> unresponsive”)
>> 
>> Since you now bounced libvirtd - did it help? Do you still see wrong status 
>> for those VMs and still those "monitor unresponsive" errors in vdsm.log?
>> If not…then I would suspect the “vm recovery” code not working 
>> correctly. Milan is looking at that.
>> 
>> Thanks,
>> michal
>> 
>> 
>>> There's a lot of vdsm logs!
>>> 
>>> fyi, the storage domain for these Vms is a "local" nfs share, 
>>> 7e566f55-e060-47b7-bfa4-ac3c48d70dda.
>>> 
>>> attached more logs.
>>> 
>>> 
 On 04/28/2016 12:53 AM, Michal Skrivanek wrote:
>> On 27 Apr 2016, at 19:16, Bill James  wrote:
>> 
>> virsh # list --all
>> error: failed to connect to the hypervisor
>> error: no valid connection
>> error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No 
>> such file or directory
>> 
> you need to run virsh in read-only mode
> virsh -r list —all
> 
> [root@ovirt1 test vdsm]# systemctl status libvirtd
> ● libvirtd.service - Virtualization daemon
>   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; 
> vendor preset: enabled)
>  Drop-In: /etc/systemd/system/libvirtd.service.d
>   └─unlimited-core.conf
>   Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago
> 
> 
> tried systemctl restart libvirtd.
> No change.
> 
> Attached vdsm.log and supervdsm.log.
> 
> 
> [root@ovirt1 test vdsm]# systemctl status vdsmd
> ● vdsmd.service - Virtual Desktop Server Manager
>   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
> preset: enabled)
>   Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago
> 
> 
> vdsm-4.17.18-0.el7.centos.noarch
 the vdsm.log attach is good, but it’s too short interval, it only shows 
 recovery(vdsm restart) phase when the VMs are identified as paused….can 
 you add earlier logs? Did you restart vdsm yourself or did it crash?
 
 
> libvirt-daemon-1.2.17-13.el7_2.4.x86_64
> 
> 
> Thanks.
> 
> 
> On 04/26/2016 11:35 PM, Michal Skrivanek wrote:
 On 27 Apr 2016, at 02:04, Nir Soffer  wrote:
 
 jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James 

Re: [ovirt-users] ovirt and JMX

2016-04-29 Thread Fabrice Bacchella

> Le 29 avr. 2016 à 18:11, Juan Hernández  a écrit :
> 

> So, it is the string "public", which means "any-address". The meaning of
> "any-address" depends on what IP version is used by default by the Java
> Virtual Machine, and in 3.6 the default is IPv6. Unless you want to
> change that file you will need to specify one of the IPv6 addresses of
> your machine, for example the loopback address:

Ok, thanks, but I disable ipv6 on my servers, so it fails silenty, that was my 
problem.

The other problem is that it don't use standard transport, like rmi or jmx. I 
think it uses http, so jconsole or other standards jmx tools can't be used.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade

2016-04-29 Thread Will Dennis
Answers inline below...

> From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] 

> what exactly did you do in the UI?
Clicked on the node, and in the bottom pane, clicked on the "Upgrade" link 
showing there (the nodes also had an icon indicating that updates were 
available)

> so..it was not in Maintenance when you run the update?
> You should avoid doing that as an update to any package may interfere with 
> running guests. 
> E.g. a qemu rpm update can (and likely will) simply kill all your VMs, I 
> suppose similarly for Gluster before updating anything 
> the volumes should be in some kind of maintenance mode as well

No, the "Upgrade" link once clicked migrates any running VM off the target node 
onto another node, then sets the target node into Maintenance mode, and then 
performs the updates. Once the updates are completed successfully, it 
re-activates the node and makes it available again. On the second and third 
nodes this coming out of Maintenance process experienced a problem with 
mounting the Gluster storage so it seems, and had the problems I'd indicated.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vms in paused state

2016-04-29 Thread Bill James

yes they are still saying "paused" state.
No, bouncing libvirt didn't help.

I noticed the errors about the ISO domain. Didn't think that was related.
I have been migrating a lot of VMs to ovirt lately, and recently added 
another node.
Also had some problems with /etc/exports for a while, but I think those 
issues are all resolved.



Last "unresponsive" message in vdsm.log was:

vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::*2016-04-21* 
11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`b6a13808-9552-401b-840b-4f7022e8293d`::monitor become unresponsive 
(command timeout, age=310323.97)
vdsm.log.49.xz:jsonrpc.Executor/0::WARNING::2016-04-21 
11:00:54,703::vm::5067::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`5bfb140a-a971-4c9c-82c6-277929eb45d4`::monitor become unresponsive 
(command timeout, age=310323.97)




Thanks.



On 4/29/16 1:40 AM, Michal Skrivanek wrote:


On 28 Apr 2016, at 19:40, Bill James > wrote:


thank you for response.
I bold-ed the ones that are listed as "paused".


[root@ovirt1 test vdsm]# virsh -r list --all
 IdName   State






Looks like problem started around 2016-04-17 20:19:34,822, based on 
engine.log attached.


yes, that time looks correct. Any idea what might have been a trigger? 
Anything interesting happened at that time (power outage of some host, 
some maintenance action, anything)?
logs indicate a problem when vdsm talks to libvirt(all those "monitor 
become unresponsive”)


It does seem that at that time you started to have some storage 
connectivity issues - first one at 2016-04-17 20:06:53,929. And it 
doesn’t look temporary because such errors are still there couple 
hours later(in your most recent file you attached I can see at 23:00:54)
When I/O gets blocked the VMs may experience issues (then VM gets 
Paused), or their qemu process gets stuck(resulting in libvirt either 
reporting error or getting stuck as well -> resulting in what vdsm 
sees as “monitor unresponsive”)


Since you now bounced libvirtd - did it help? Do you still see wrong 
status for those VMs and still those "monitor unresponsive" errors in 
vdsm.log?
If not…then I would suspect the “vm recovery” code not working 
correctly. Milan is looking at that.


Thanks,
michal



There's a lot of vdsm logs!

fyi, the storage domain for these Vms is a "local" nfs share, 
7e566f55-e060-47b7-bfa4-ac3c48d70dda.


attached more logs.


On 04/28/2016 12:53 AM, Michal Skrivanek wrote:

On 27 Apr 2016, at 19:16, Bill James  wrote:

virsh # list --all
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such 
file or directory


you need to run virsh in read-only mode
virsh -r list —all


[root@ovirt1 test vdsm]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor 
preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
   └─unlimited-core.conf
   Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago


tried systemctl restart libvirtd.
No change.

Attached vdsm.log and supervdsm.log.


[root@ovirt1 test vdsm]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago


vdsm-4.17.18-0.el7.centos.noarch

the vdsm.log attach is good, but it’s too short interval, it only shows 
recovery(vdsm restart) phase when the VMs are identified as paused….can you add 
earlier logs? Did you restart vdsm yourself or did it crash?



libvirt-daemon-1.2.17-13.el7_2.4.x86_64


Thanks.


On 04/26/2016 11:35 PM, Michal Skrivanek wrote:

On 27 Apr 2016, at 02:04, Nir Soffer  wrote:

jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James  wrote:

I have a hardware node that has 26 VMs.
9 are listed as "running", 17 are listed as "paused".

In truth all VMs are up and running fine.

I tried telling the db they are up:

engine=> update vm_dynamic set status = 1 where vm_guid =(select
vm_guid from vm_static where vm_name = 'api1.test.j2noc.com 
');

GUI then shows it up for a short while,

then puts it back in paused state.

2016-04-26 15:16:46,095 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
(DefaultQuartzScheduler_Worker-16) [157cc21e] VM '242ca0af-4ab2-4dd6-b515-5
d435e6452c4'(api1.test.j2noc.com ) moved from 'Up' 
--> 'Paused'
2016-04-26 15:16:46,221 INFO [org.ovirt.engine.core.dal.dbbroker.auditlogh
andling.AuditLogDirector] (DefaultQuartzScheduler_Worker-16) [157cc21e] Cor
relation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM api1.
test.j2noc.com   has 

Re: [ovirt-users] ovirt and JMX

2016-04-29 Thread Juan Hernández
On 04/29/2016 04:46 PM, Fabrice Bacchella wrote:
> I'm trying to communicate with ovirt-engine using jmx.
> 
> I read https://www.ovirt.org/develop/developer-guide/engine/jmx-support/
> 
> In the line 
> ENGINE_JMX_INTERFACE=public 
> 
> what is public ? Is that the string 'public', if I set that, ovirt-engine 
> don't listen any more on port 8706. If i set it to the public IP of the 
> server or to the interface name it fails with :
> 

It is the name of the network interface, as defined in the application
server configuration file
/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.xml.in:

  

  


  

  

So, it is the string "public", which means "any-address". The meaning of
"any-address" depends on what IP version is used by default by the Java
Virtual Machine, and in 3.6 the default is IPv6. Unless you want to
change that file you will need to specify one of the IPv6 addresses of
your machine, for example the loopback address:

  $ $JBOSS_HOME/jboss-cli.sh --controller=[::1]:8706 --connect
--user=admin@internal

You can also set ENGINE_JMX_INTERFACE=loopback, and then you should be
able to use the IPv4 loopback address:

  $ $JBOSS_HOME/jboss-cli.sh --controller=127.0.0.1:8706 --connect
--user=admin@internal

> 16:40:46,622 ERROR [org.jboss.as.server] JBAS015956: Caught exception during 
> boot: org.jboss.as.controller.persistence.ConfigurationPersistenceException: 
> JBAS014676: Failed to parse configuration
> at 
> org.jboss.as.controller.persistence.XmlConfigurationPersister.load(XmlConfigurationPersister.java:112)
>  [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
> at org.jboss.as.server.ServerService.boot(ServerService.java:331) 
> [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:259)
>  [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_101]
> Caused by: javax.xml.stream.XMLStreamException: ParseError at 
> [row,col]:[477,70]
> Message: JBAS014796: Unknown interface bond0 interface must be declared in 
> element interfaces
> at 
> org.jboss.as.server.parsing.CommonXml.parseSocketBinding(CommonXml.java:691) 
> [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.as.server.parsing.StandaloneXml.parseSocketBindingGroup_1_1(StandaloneXml.java:1093)
>  [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.as.server.parsing.StandaloneXml.readServerElement_1_4(StandaloneXml.java:470)
>  [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.as.server.parsing.StandaloneXml.readElement(StandaloneXml.java:145) 
> [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.as.server.parsing.StandaloneXml.readElement(StandaloneXml.java:107) 
> [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
> at 
> org.jboss.staxmapper.XMLMapperImpl.processNested(XMLMapperImpl.java:110) 
> [staxmapper-1.1.0.Final.jar:1.1.0.Final]
> at 
> org.jboss.staxmapper.XMLMapperImpl.parseDocument(XMLMapperImpl.java:69) 
> [staxmapper-1.1.0.Final.jar:1.1.0.Final]
> at 
> org.jboss.as.controller.persistence.XmlConfigurationPersister.load(XmlConfigurationPersister.java:104)
>  [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
> ... 3 more

-- 
Dirección Comercial: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, planta
3ºD, 28016 Madrid, Spain
Inscrita en el Reg. Mercantil de Madrid – C.I.F. B82657941 - Red Hat S.L.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] ovirt and JMX

2016-04-29 Thread Fabrice Bacchella
I'm trying to communicate with ovirt-engine using jmx.

I read https://www.ovirt.org/develop/developer-guide/engine/jmx-support/

In the line 
ENGINE_JMX_INTERFACE=public 

what is public ? Is that the string 'public', if I set that, ovirt-engine don't 
listen any more on port 8706. If i set it to the public IP of the server or to 
the interface name it fails with :

16:40:46,622 ERROR [org.jboss.as.server] JBAS015956: Caught exception during 
boot: org.jboss.as.controller.persistence.ConfigurationPersistenceException: 
JBAS014676: Failed to parse configuration
at 
org.jboss.as.controller.persistence.XmlConfigurationPersister.load(XmlConfigurationPersister.java:112)
 [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
at org.jboss.as.server.ServerService.boot(ServerService.java:331) 
[wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:259)
 [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_101]
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[477,70]
Message: JBAS014796: Unknown interface bond0 interface must be declared in 
element interfaces
at 
org.jboss.as.server.parsing.CommonXml.parseSocketBinding(CommonXml.java:691) 
[wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.as.server.parsing.StandaloneXml.parseSocketBindingGroup_1_1(StandaloneXml.java:1093)
 [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.as.server.parsing.StandaloneXml.readServerElement_1_4(StandaloneXml.java:470)
 [wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.as.server.parsing.StandaloneXml.readElement(StandaloneXml.java:145) 
[wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.as.server.parsing.StandaloneXml.readElement(StandaloneXml.java:107) 
[wildfly-server-8.2.1.Final.jar:8.2.1.Final]
at 
org.jboss.staxmapper.XMLMapperImpl.processNested(XMLMapperImpl.java:110) 
[staxmapper-1.1.0.Final.jar:1.1.0.Final]
at 
org.jboss.staxmapper.XMLMapperImpl.parseDocument(XMLMapperImpl.java:69) 
[staxmapper-1.1.0.Final.jar:1.1.0.Final]
at 
org.jboss.as.controller.persistence.XmlConfigurationPersister.load(XmlConfigurationPersister.java:104)
 [wildfly-controller-8.2.1.Final.jar:8.2.1.Final]
... 3 more


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Errors while trying to join an external LDPA provider

2016-04-29 Thread Ondra Machacek

On 04/29/2016 03:03 PM, Alexis HAUSER wrote:



pool.default.ssl.truststore.file = /tmp/.jks


Maybe trailing space here ^ ?


pool.default.ssl.truststore.password = 



Sadly it doesn't help




So please ensure also that file '/tmp/.jks' is readable by ovirt
user. The configuration looks fine.



All permissions are given. The problem is still the same...


Should I report this on the bugzilla ?



You can, but I beleive this is not bug, but some misconfiguration, many 
times I've tried completelly simillar setup and it worked.


Btw.. did you used 'ovirt-engine-extension-aaa-ldap-setup'? If not you 
can install it.

 $ yum install ovirt-engine-extension-aaa-ldap-setup

Then just run:
 $ ovirt-engine-extension-aaa-ldap-setup

And follow the steps. This tool handle for you all perms and typos 
issues, which could be introduces by manually creating those properties 
files.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade

2016-04-29 Thread Will Dennis
(so noted)   ...or anyone else who knows the answer ;)

-Original Message-
From: Michal Skrivanek [mailto:michal.skriva...@redhat.com] 
Sent: Friday, April 29, 2016 9:02 AM
To: Will Dennis
Cc: users@ovirt.org
Subject: Re: [ovirt-users] Hosts temporarily in "Non Operational" state after 
upgrade


> On 29 Apr 2016, at 14:46, Will Dennis  wrote:
> 
> Bump - can any RHAT folks comment on this?

note oVirt is a community project;-)

> 
> -Original Message-
> From: Will Dennis 
> Sent: Wednesday, April 27, 2016 11:00 PM
> To: users@ovirt.org
> Subject: Hosts temporarily in "Non Operational" state after upgrade
> 
> Hi all,
> 
> Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on 
> two of them, they went into “non Operational” state for a few minutes each 
> before springing back to life… The synopsis was this:
> 
> - Ran updates throughout the web Admin UI ...then I got the following series 
> of messages via the “Events” tab in the UI:

what exactly did you do in the UI?

> - Updates successfully ran
> - VDSM “command failed: Heartbeat exceeded” message
> - host is not responding message
> - "Failed to connect to hosted_storage" message
> - “The error message for connection localhost:/engine returned by VDSM was: 
> Problem while trying to mount target”
> - "Host  reports about one of the Active Storage Domains as Problematic”
> - “Host  cannot access the Storage Domain(s) hosted_storage attached to 
> the data center Default. Setting host state to Non-Operational.”
> - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” 
> (once for every brick on the host for every Gluster volume.)
> - "Host  was autorecovered.”
> - "Status of host  was set to Up.”

so..it was not in Maintenance when you run the update?
You should avoid doing that as an update to any package may interfere with 
running guests. E.g. a qemu rpm update can (and likely will) simply kill all 
your VMs, I suppose similarly for Gluster before updating anything the volumes 
should be in some kind of maintenance mode as well

> 
> (BTW, it would be awesome if the UI’s Events log could be copied and pasted… 
> Doesn’t work for me at least…)
> 
> Duration of outage was ~3 mins per each affected host. Didn’t happen on the 
> first host I upgraded, but did on the last two.
> 
> I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) 
> but, should this behavior be expected?
> 
> Also, if I go onto the hosts directly and run a ‘yum update’ after this 
> upgrade process (not that I went thru with it, just wanted to see what was 
> available to be upgraded) I see a bunch of ovirt-* packages that can be 
> upgraded, which didn’t get updated thru the web UI’s upgrade process —
> ovirt-engine-sdk-pythonnoarch   3.6.5.0-1.el7.centos 
> ovirt-3.6  480 k
> ovirt-hosted-engine-ha noarch   1.3.5.3-1.1.el7  
> centos-ovirt36 295 k
> ovirt-hosted-engine-setup  noarch   1.3.5.0-1.1.el7  
> centos-ovirt36 270 k
> ovirt-release36noarch   007-1
> ovirt-3.6  9.5 k
> 
> Are these packages not related to the “Upgrade” process available thru the 
> web UI?
> 
> FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 
> 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: 
> libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: 
> libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 
> Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
> vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
> libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: 
> libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64
> Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch
> Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 
> Installed: unzip-6.0-15.el7.x86_64 

Re: [ovirt-users] Errors while trying to join an external LDPA provider

2016-04-29 Thread Alexis HAUSER

>> pool.default.ssl.truststore.file = /tmp/.jks
>
> Maybe trailing space here ^ ?
>
>> pool.default.ssl.truststore.password = 
>>
>
> Sadly it doesn't help
>

>So please ensure also that file '/tmp/.jks' is readable by ovirt 
>user. The configuration looks fine.

> All permissions are given. The problem is still the same...

Should I report this on the bugzilla ?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade

2016-04-29 Thread Michal Skrivanek

> On 29 Apr 2016, at 14:46, Will Dennis  wrote:
> 
> Bump - can any RHAT folks comment on this?

note oVirt is a community project;-)

> 
> -Original Message-
> From: Will Dennis 
> Sent: Wednesday, April 27, 2016 11:00 PM
> To: users@ovirt.org
> Subject: Hosts temporarily in "Non Operational" state after upgrade
> 
> Hi all,
> 
> Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on 
> two of them, they went into “non Operational” state for a few minutes each 
> before springing back to life… The synopsis was this:
> 
> - Ran updates throughout the web Admin UI ...then I got the following series 
> of messages via the “Events” tab in the UI:

what exactly did you do in the UI?

> - Updates successfully ran
> - VDSM “command failed: Heartbeat exceeded” message
> - host is not responding message
> - "Failed to connect to hosted_storage" message
> - “The error message for connection localhost:/engine returned by VDSM was: 
> Problem while trying to mount target”
> - "Host  reports about one of the Active Storage Domains as Problematic”
> - “Host  cannot access the Storage Domain(s) hosted_storage attached to 
> the data center Default. Setting host state to Non-Operational.”
> - "Detected change in status of brick {…} of volume {…} from DOWN to UP.” 
> (once for every brick on the host for every Gluster volume.)
> - "Host  was autorecovered.”
> - "Status of host  was set to Up.”

so..it was not in Maintenance when you run the update?
You should avoid doing that as an update to any package may interfere with 
running guests. E.g. a qemu rpm update can (and likely will) simply kill all 
your VMs, I suppose similarly for Gluster before updating anything the volumes 
should be in some kind of maintenance mode as well

> 
> (BTW, it would be awesome if the UI’s Events log could be copied and pasted… 
> Doesn’t work for me at least…)
> 
> Duration of outage was ~3 mins per each affected host. Didn’t happen on the 
> first host I upgraded, but did on the last two.
> 
> I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) 
> but, should this behavior be expected?
> 
> Also, if I go onto the hosts directly and run a ‘yum update’ after this 
> upgrade process (not that I went thru with it, just wanted to see what was 
> available to be upgraded) I see a bunch of ovirt-* packages that can be 
> upgraded, which didn’t get updated thru the web UI’s upgrade process —
> ovirt-engine-sdk-pythonnoarch   3.6.5.0-1.el7.centos 
> ovirt-3.6  480 k
> ovirt-hosted-engine-ha noarch   1.3.5.3-1.1.el7  
> centos-ovirt36 295 k
> ovirt-hosted-engine-setup  noarch   1.3.5.0-1.1.el7  
> centos-ovirt36 270 k
> ovirt-release36noarch   007-1
> ovirt-3.6  9.5 k
> 
> Are these packages not related to the “Upgrade” process available thru the 
> web UI?
> 
> FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 
> 21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: 
> libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: 
> libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 
> Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
> vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
> libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: 
> libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 
> libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64
> Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch
> Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 
> Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: 
> gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 
> 1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
> Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: 
> vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch
> Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch 

Re: [ovirt-users] Hosts temporarily in "Non Operational" state after upgrade

2016-04-29 Thread Will Dennis
Bump - can any RHAT folks comment on this?

-Original Message-
From: Will Dennis 
Sent: Wednesday, April 27, 2016 11:00 PM
To: users@ovirt.org
Subject: Hosts temporarily in "Non Operational" state after upgrade

Hi all,

Had run updates tonight on my three oVirt hosts (3.6 hyperconverged) on on two 
of them, they went into “non Operational” state for a few minutes each before 
springing back to life… The synopsis was this:

- Ran updates throughout the web Admin UI ...then I got the following series of 
messages via the “Events” tab in the UI:
- Updates successfully ran
- VDSM “command failed: Heartbeat exceeded” message
- host is not responding message
- "Failed to connect to hosted_storage" message
- “The error message for connection localhost:/engine returned by VDSM was: 
Problem while trying to mount target”
- "Host  reports about one of the Active Storage Domains as Problematic”
- “Host  cannot access the Storage Domain(s) hosted_storage attached to 
the data center Default. Setting host state to Non-Operational.”
- "Detected change in status of brick {…} of volume {…} from DOWN to UP.” (once 
for every brick on the host for every Gluster volume.)
- "Host  was autorecovered.”
- "Status of host  was set to Up."

(BTW, it would be awesome if the UI’s Events log could be copied and pasted… 
Doesn’t work for me at least…)

Duration of outage was ~3 mins per each affected host. Didn’t happen on the 
first host I upgraded, but did on the last two.

I know I’m a little over the bleeding edge running hyperconverged on 3.6 :) 
but, should this behavior be expected?

Also, if I go onto the hosts directly and run a ‘yum update’ after this upgrade 
process (not that I went thru with it, just wanted to see what was available to 
be upgraded) I see a bunch of ovirt-* packages that can be upgraded, which 
didn’t get updated thru the web UI’s upgrade process —
ovirt-engine-sdk-pythonnoarch   3.6.5.0-1.el7.centos 
ovirt-3.6  480 k
ovirt-hosted-engine-ha noarch   1.3.5.3-1.1.el7  
centos-ovirt36 295 k
ovirt-hosted-engine-setup  noarch   1.3.5.0-1.1.el7  
centos-ovirt36 270 k
ovirt-release36noarch   007-1
ovirt-3.6  9.5 k

Are these packages not related to the “Upgrade” process available thru the web 
UI?

FYI, here’s what did get updated thru the web UI “Upgrade” process — Apr 27 
21:36:28 Updated: libvirt-client-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:28 Updated: libvirt-daemon-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:28 Updated: libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:28 Updated: libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:28 Updated: libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:28 Updated: vdsm-infra-4.17.26-1.el7.noarch Apr 27 21:36:28 
Updated: vdsm-python-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
vdsm-xmlrpc-4.17.26-1.el7.noarch Apr 27 21:36:28 Updated: 
libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: mom-0.5.3-1.1.el7.noarch Apr 27 21:36:29 Updated: 
libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: 
libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
Apr 27 21:36:29 Updated: 1:libguestfs-1.28.1-1.55.el7.centos.2.x86_64
Apr 27 21:36:29 Updated: 1:libguestfs-tools-c-1.28.1-1.55.el7.centos.2.x86_64
Apr 27 21:36:29 Installed: libguestfs-winsupport-7.2-1.el7.x86_64
Apr 27 21:36:29 Updated: vdsm-yajsonrpc-4.17.26-1.el7.noarch
Apr 27 21:36:29 Updated: vdsm-jsonrpc-4.17.26-1.el7.noarch Apr 27 21:36:29 
Installed: unzip-6.0-15.el7.x86_64 Apr 27 21:36:30 Installed: 
gtk2-2.24.28-8.el7.x86_64 Apr 27 21:36:31 Installed: 
1:virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
Apr 27 21:36:31 Updated: safelease-1.0-7.el7.x86_64 Apr 27 21:36:31 Updated: 
vdsm-hook-vmfex-dev-4.17.26-1.el7.noarch
Apr 27 21:36:32 Updated: vdsm-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: 
vdsm-gluster-4.17.26-1.el7.noarch Apr 27 21:36:32 Updated: 
vdsm-cli-4.17.26-1.el7.noarch

Thanks,
Will
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Errors while trying to join an external LDPA provider

2016-04-29 Thread Ondra Machacek

On 04/29/2016 02:27 PM, Alexis HAUSER wrote:



pool.default.ssl.truststore.file = /tmp/.jks


Maybe trailing space here ^ ?


pool.default.ssl.truststore.password = 



Sadly it doesn't help




So please ensure also that file '/tmp/.jks' is readable by ovirt
user. The configuration looks fine.


All permissions are given. The problem is still the same...



Please check also SELinux.
Can you please send 'tool.log' generated from the following command?

 $ ovirt-engine-extensions-tool --log-level=FINEST --log-file=tool.log 
aaa search --entity-name=* --extension-name=your_openldap_authz_name

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Errors while trying to join an external LDPA provider

2016-04-29 Thread Alexis HAUSER

>> pool.default.ssl.truststore.file = /tmp/.jks
>
> Maybe trailing space here ^ ?
>
>> pool.default.ssl.truststore.password = 
>>
>
> Sadly it doesn't help
>

>So please ensure also that file '/tmp/.jks' is readable by ovirt 
>user. The configuration looks fine.

All permissions are given. The problem is still the same...
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] missing GPG key (key ID d55f98a6)

2016-04-29 Thread Николаев Алексей
Hi community!How i can resolve this error (using reposync). I've already installed latest ovirt-relese-36 pkg. warning: /var/www/html/resources.ovirt.org/pub/ovirt-3.6/rpm/el7/noarch/ovirt-optimizer-0.9.1-1.el7.centos.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID d55f98a6: NOKEYПубличный ключ для ovirt-optimizer-0.9.1-1.el7.centos.noarch.rpm не установлен(1/4): ovirt-optimizer-0.9.1-1.el7.centos.noarch.rpm(2/4): ovirt-optimizer-jboss-0.9.1-1.el7.centos.noarch.rpm (3/4): ovirt-optimizer-dependencies-0.9.1-1.el7.centos.noarch.rpm(4/4): ovirt-optimizer-ui-0.9.1-1.el7.centos.noarch.rpm   Removing ovirt-optimizer-0.9.1-1.el7.centos.noarch.rpm, due to missing GPG key.Removing ovirt-optimizer-dependencies-0.9.1-1.el7.centos.noarch.rpm, due to missing GPG key.Removing ovirt-optimizer-jboss-0.9.1-1.el7.centos.noarch.rpm, due to missing GPG key.Removing ovirt-optimizer-ui-0.9.1-1.el7.centos.noarch.rpm, due to missing GPG key.___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-29 Thread Luiz Claudio Prazeres Goncalves
Got it. It should be included until 3.6.6 GA

Thanks
Luiz

Em sex, 29 de abr de 2016 04:26, Simone Tiraboschi 
escreveu:

> On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
>  wrote:
> > Hi Simone, I was reviewing the changelog of 3.6.6, on the link below,
> but i
> > was not able to find the bug (https://bugzilla.redhat.com/1327516) as
> fixed
> > on the list. According to Bugzilla the target is really 3.6.6, so what's
> > wrong?
> >
> >
> > http://www.ovirt.org/release/3.6.6/
>
> ' oVirt 3.6.6 first release candidate' so it's still not the GA.
>
> > Thanks
> > Luiz
> >
> > Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
> >  escreveu:
> >>
> >> Nice!... so, I'll survive a bit more with these issues until the version
> >> 3.6.6 gets released...
> >>
> >>
> >> Thanks
> >> -Luiz
> >>
> >> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :
> >>>
> >>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose 
> wrote:
> >>> > This seems like issue reported in
> >>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
> >>> >
> >>> > Nir, Simone?
> >>>
> >>> The issue is here:
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
> >>> Disconnecting storage server
> >>> MainThread::INFO::2016-04-27
> >>>
> >>>
> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
> >>> Fixing storage path in conf file
> >>>
> >>> And it's tracked here: https://bugzilla.redhat.com/1327516
> >>>
> >>> We already have a patch, it will be fixed with 3.6.6
> >>>
> >>> As far as I saw this issue will only cause a lot of mess in the logs
> >>> and some false alert but it's basically harmless
> >>>
> >>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
> >>> >
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
> >>> > nodes
> >>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
> >>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
> >>> > engine
> >>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
> >>> >
> >>> > For some weird reason i've been receiving emails from oVirt with
> >>> > EngineUnexpectedDown (attached picture) on a daily basis more or
> less,
> >>> > but
> >>> > the engine seems to be working fine and my vm's are up and running
> >>> > normally.
> >>> > I've never had any issue to access the User Interface to manage the
> >>> > vm's
> >>> >
> >>> > Today I run "yum update" on the nodes and realised that vdsm was
> >>> > outdated,
> >>> > so I updated the kvm hosts and they are now , again, fully updated.
> >>> >
> >>> >
> >>> > Reviewing the logs It seems to be an intermittent connectivity issue
> >>> > when
> >>> > trying to access the gluster engine storage domain as you can see
> >>> > below. I
> >>> > don't have any network issue in place and I'm 100% sure about it. I
> >>> > have
> >>> > another oVirt Cluster using the same network and using a engine
> storage
> >>> > domain on top of an iSCSI Storage Array with no issues.
> >>> >
> >>> > Here seems to be the issue:
> >>> >
> >>> > Thread-::INFO::2016-04-27
> >>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
> >>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
> >>> > read
> >>> > lines (FileMetadataRW)=[]
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
> >>> > Empty
> >>> > metadata
> >>> >
> >>> > Thread-::ERROR::2016-04-27
> >>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
> >>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
> >>> >
> >>> > Traceback (most recent call last):
> >>> >
> >>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> >>> >
> >>> > return fn(*args, **kargs)
> >>> >
> >>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> >>> >
> >>> > res = f(*args, **kwargs)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
> >>> > getStorageDomainInfo
> >>> >
> >>> > dom = self.validateSdUUID(sdUUID)
> >>> >
> >>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
> >>> >
> >>> > sdDom.validate()
> >>> >
> >>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
> >>> >
> >>> > raise se.StorageDomainAccessError(self.sdUUID)
> >>> >
> >>> > StorageDomainAccessError: Domain is either partially accessible or
> >>> > entirely
> >>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
> >>> >
> >>> > Thread-::DEBUG::2016-04-27
> >>> > 

Re: [ovirt-users] vms in paused state

2016-04-29 Thread Michal Skrivanek

> On 28 Apr 2016, at 19:40, Bill James  wrote:
> 
> thank you for response.
> I bold-ed the ones that are listed as "paused".
> 
> 
> [root@ovirt1 test vdsm]# virsh -r list --all
>  IdName   State
> 
>  2 puppet.test.j2noc.com  running
>  4 sftp2.test.j2noc.com   running
>  5 oct.test.j2noc.com running
>  6 sftp2.dev.j2noc.comrunning
>  10darmaster1.test.j2noc.com  running
>  14api1.test.j2noc.comrunning
>  25ftp1.frb.test.j2noc.comrunning
>  26auto7.test.j2noc.com   running
>  32epaymv02.j2noc.com running
>  34media2.frb.test.j2noc.com  running
>  36auto2.j2noc.comrunning
>  44nfs.testhvy2.colo.j2noc.comrunning
>  53billapp-zuma1.dev.j2noc.comrunning
>  54billing-ci.dev.j2noc.com   running
>  60log2.test.j2noc.comrunning
>  63log1.test.j2noc.comrunning
>  69sonar.dev.j2noc.comrunning
>  73billapp-ui1.dev.j2noc.com  running
>  74billappvm01.dev.j2noc.com  running
>  75db2.frb.test.j2noc.com running
>  83billapp-ui1.test.j2noc.com running
>  84epayvm01.test.j2noc.comrunning
>  87billappvm01.test.j2noc.com running
>  89etapi1.test.j2noc.com  running
>  93billapp-zuma2.test.j2noc.com   running
>  94git.dev.j2noc.com  running
> 
> Yes I did "systemctl restart libvirtd" which apparently also restart vdsm?

yes, it does. 

> 
> 
> Looks like problem started around 2016-04-17 20:19:34,822, based on 
> engine.log attached.

yes, that time looks correct. Any idea what might have been a trigger? Anything 
interesting happened at that time (power outage of some host, some maintenance 
action, anything)? 
logs indicate a problem when vdsm talks to libvirt(all those "monitor become 
unresponsive”)

It does seem that at that time you started to have some storage connectivity 
issues - first one at 2016-04-17 20:06:53,929. And it doesn’t look temporary 
because such errors are still there couple hours later(in your most recent file 
you attached I can see at 23:00:54)
When I/O gets blocked the VMs may experience issues (then VM gets Paused), or 
their qemu process gets stuck(resulting in libvirt either reporting error or 
getting stuck as well -> resulting in what vdsm sees as “monitor unresponsive”)

Since you now bounced libvirtd - did it help? Do you still see wrong status for 
those VMs and still those "monitor unresponsive" errors in vdsm.log?
If not…then I would suspect the “vm recovery” code not working correctly. Milan 
is looking at that.

Thanks,
michal


> There's a lot of vdsm logs!
> 
> fyi, the storage domain for these Vms is a "local" nfs share, 
> 7e566f55-e060-47b7-bfa4-ac3c48d70dda.
> 
> attached more logs.
> 
> 
> On 04/28/2016 12:53 AM, Michal Skrivanek wrote:
>>> On 27 Apr 2016, at 19:16, Bill James  
>>>  wrote:
>>> 
>>> virsh # list --all
>>> error: failed to connect to the hypervisor
>>> error: no valid connection
>>> error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such 
>>> file or directory
>>> 
>> you need to run virsh in read-only mode
>> virsh -r list —all
>> 
>>> [root@ovirt1 test vdsm]# systemctl status libvirtd
>>> ● libvirtd.service - Virtualization daemon
>>>   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor 
>>> preset: enabled)
>>>  Drop-In: /etc/systemd/system/libvirtd.service.d
>>>   └─unlimited-core.conf
>>>   Active: active (running) since Thu 2016-04-21 16:00:03 PDT; 5 days ago
>>> 
>>> 
>>> tried systemctl restart libvirtd.
>>> No change.
>>> 
>>> Attached vdsm.log and supervdsm.log.
>>> 
>>> 
>>> [root@ovirt1 test vdsm]# systemctl status vdsmd
>>> ● vdsmd.service - Virtual Desktop Server Manager
>>>   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
>>> preset: enabled)
>>>   Active: active (running) since Wed 2016-04-27 10:09:14 PDT; 3min 46s ago
>>> 
>>> 
>>> vdsm-4.17.18-0.el7.centos.noarch
>> the vdsm.log attach is good, but it’s too short interval, it only shows 
>> recovery(vdsm restart) phase when the VMs are identified as paused….can you 
>> add earlier logs? Did you restart vdsm yourself or did it crash?
>> 
>> 
>>> libvirt-daemon-1.2.17-13.el7_2.4.x86_64
>>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>> On 04/26/2016 11:35 PM, Michal Skrivanek wrote:
> On 27 Apr 2016, at 02:04, Nir Soffer  
>  wrote:
> 
> jjOn Wed, Apr 27, 2016 at 2:03 AM, Bill James  
>  wrote:
>> I have a hardware node that has 26 VMs.
>> 9 are listed as "running", 17 are listed as "paused".
>> 
>> In truth all VMs are up and running fine.
>> 
>> I tried 

Re: [ovirt-users] Fwd: Having issues with Hosted Engine

2016-04-29 Thread Simone Tiraboschi
On Fri, Apr 29, 2016 at 4:44 AM, Luiz Claudio Prazeres Goncalves
 wrote:
> Hi Simone, I was reviewing the changelog of 3.6.6, on the link below, but i
> was not able to find the bug (https://bugzilla.redhat.com/1327516) as fixed
> on the list. According to Bugzilla the target is really 3.6.6, so what's
> wrong?
>
>
> http://www.ovirt.org/release/3.6.6/

' oVirt 3.6.6 first release candidate' so it's still not the GA.

> Thanks
> Luiz
>
> Em qui, 28 de abr de 2016 11:33, Luiz Claudio Prazeres Goncalves
>  escreveu:
>>
>> Nice!... so, I'll survive a bit more with these issues until the version
>> 3.6.6 gets released...
>>
>>
>> Thanks
>> -Luiz
>>
>> 2016-04-28 4:50 GMT-03:00 Simone Tiraboschi :
>>>
>>> On Thu, Apr 28, 2016 at 8:32 AM, Sahina Bose  wrote:
>>> > This seems like issue reported in
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=1327121
>>> >
>>> > Nir, Simone?
>>>
>>> The issue is here:
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,185::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)
>>> Disconnecting storage server
>>> MainThread::INFO::2016-04-27
>>>
>>> 03:26:27,816::upgrade::983::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(fix_storage_path)
>>> Fixing storage path in conf file
>>>
>>> And it's tracked here: https://bugzilla.redhat.com/1327516
>>>
>>> We already have a patch, it will be fixed with 3.6.6
>>>
>>> As far as I saw this issue will only cause a lot of mess in the logs
>>> and some false alert but it's basically harmless
>>>
>>> > On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
>>> >
>>> >
>>> > Hi everyone,
>>> >
>>> > Until today my environment was fully updated (3.6.5+centos7.2) with 3
>>> > nodes
>>> > (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes
>>> > (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the
>>> > engine
>>> > storage domain is sitting on top (3.7.11 fully updated+centos7.2)
>>> >
>>> > For some weird reason i've been receiving emails from oVirt with
>>> > EngineUnexpectedDown (attached picture) on a daily basis more or less,
>>> > but
>>> > the engine seems to be working fine and my vm's are up and running
>>> > normally.
>>> > I've never had any issue to access the User Interface to manage the
>>> > vm's
>>> >
>>> > Today I run "yum update" on the nodes and realised that vdsm was
>>> > outdated,
>>> > so I updated the kvm hosts and they are now , again, fully updated.
>>> >
>>> >
>>> > Reviewing the logs It seems to be an intermittent connectivity issue
>>> > when
>>> > trying to access the gluster engine storage domain as you can see
>>> > below. I
>>> > don't have any network issue in place and I'm 100% sure about it. I
>>> > have
>>> > another oVirt Cluster using the same network and using a engine storage
>>> > domain on top of an iSCSI Storage Array with no issues.
>>> >
>>> > Here seems to be the issue:
>>> >
>>> > Thread-::INFO::2016-04-27
>>> > 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
>>> > sdUUID=03926733-1872-4f85-bb21-18dc320560db
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
>>> > read
>>> > lines (FileMetadataRW)=[]
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
>>> > Empty
>>> > metadata
>>> >
>>> > Thread-::ERROR::2016-04-27
>>> > 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
>>> >
>>> > Traceback (most recent call last):
>>> >
>>> >   File "/usr/share/vdsm/storage/task.py", line 873, in _run
>>> >
>>> > return fn(*args, **kargs)
>>> >
>>> >   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
>>> >
>>> > res = f(*args, **kwargs)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 2835, in
>>> > getStorageDomainInfo
>>> >
>>> > dom = self.validateSdUUID(sdUUID)
>>> >
>>> >   File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
>>> >
>>> > sdDom.validate()
>>> >
>>> >   File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
>>> >
>>> > raise se.StorageDomainAccessError(self.sdUUID)
>>> >
>>> > StorageDomainAccessError: Domain is either partially accessible or
>>> > entirely
>>> > inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
>>> > d2acf575-1a60-4fa0-a5bb-cd4363636b94
>>> > ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
>>> >
>>> > Thread-::DEBUG::2016-04-27
>>> > 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
>>> > Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state
>>> > preparing
>>> > (force False)
>>> >
>>> > 

Re: [ovirt-users] hosted-engine --deploy errors out with code "29" -- "no link present"

2016-04-29 Thread Sandro Bonazzola
On Thu, Apr 28, 2016 at 11:06 PM, Beckman, Daniel <
daniel.beck...@ingramcontent.com> wrote:

> Hello,
>
>
>
> I’m trying to setup oVirt for the first time using hosted engine. This is
> on a Dell PowerEdge R720 (512GB RAM), with 2 10G interfaces (connected to
> regular access ports on the switch, DHCP), and using external iSCSI
> storage. This is on CentOS 7.2 (latest) with the 4.5 kernel from EPEL.
> Here’s the main error I’m getting at the end of setup:
>
>
>
> RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic': 'p1p1',
> 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}. Error
> code: "29" message: "Determining IP information for ovirtmgmt... failed; no
> link present.  Check cable?"
>


This message comes from vdsm, can you please attach vdsm log?




>
>
> Here is what that interface ‘p1p1’ looks like:
>
>
>
> [root@labvmhostt01 ovirt-hosted-engine-setup]# cat
> /etc/sysconfig/network-scripts/ifcfg-p1p1
>
> # Generated by dracut initrd
>
> DEVICE="p1p1"
>
> ONBOOT=yes
>
> UUID="9d2666a5-9b72-4f9e-b4e9-4bfb6ad9b263"
>
> IPV6INIT=no
>
> BOOTPROTO=dhcp
>
> DEFROUTE=yes
>
> HWADDR="a0:36:9f:33:39:e8"
>
> TYPE=Ethernet
>
> NAME="p1p1"
>
> PERSISTENT_DHCLIENT=1
>
> NM_CONTROLLED=no
>
> LINKDELAY=10
>
>
>
> Note that I had added ‘linkdelay=10’ because that interface takes a while
> to come up. Without it, an ‘ifup p1p1’ will generate that same error about
> “no link present. Check cable?”. It works after a second ‘ifup p1p1’. With
> the linkdelay option it works right away. I wonder if that’s related.  From
> /var/log/messages:
>
>
>
> Apr 28 15:24:51 localhost dhclient[5976]: dhclient.c:2680: Failed to bind
> fallback interface to ovirtmgmt: No such device
>
> Apr 28 15:25:01 localhost dhclient[5976]: DHCPREQUEST on ovirtmgmt to
> 10.50.3.2 port 67 (xid=0x6d98d072)
>
> Apr 28 15:25:01 localhost dhclient[5976]: dhclient.c:2680: Failed to bind
> fallback interface to ovirtmgmt: No such device
>
> Apr 28 15:25:06 localhost systemd: Started /usr/sbin/ifup ovirtmgmt.
>
> Apr 28 15:25:06 localhost systemd: Starting /usr/sbin/ifup ovirtmgmt.
>
> Apr 28 15:25:06 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt:
> link is not ready
>
> Apr 28 15:25:12 localhost kernel: ovirtmgmt: port 1(p1p1) entered disabled
> state
>
> Apr 28 15:25:48 localhost journal: vdsm vds ERROR Determining IP
> information for ovirtmgmt... failed; no link present.  Check
> cable?#012Traceback (most recent call last):#012  File
> "/usr/share/vdsm/API.py", line 1648, in _rollback#012yield
> rollbackCtx#012  File "/usr/share/vdsm/API.py", line 1500, in
> setupNetworks#012supervdsm.getProxy().setupNetworks(networks, bondings,
> options)#012  File "/usr/share/vdsm/supervdsm.py", line 50, in
> __call__#012return callMethod()#012  File
> "/usr/share/vdsm/supervdsm.py", line 48, in #012**kwargs)#012
> File "", line 2, in setupNetworks#012  File
> "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in
> _callmethod#012raise convert_to_error(kind,
> result)#012ConfigNetworkError: (29, 'Determining IP information for
> ovirtmgmt... failed; no link present.  Check cable?')
>
>
>
> I’m attaching the setup log file. The physical interface p1p1 is indeed
> stable once up. Any help would be appreciated!
>
>
>
> Thanks,
>
> Daniel
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users