[ovirt-users] Re: VMs paused due to IO issues - Dell Equallogic controller failover

2019-05-14 Thread InterNetX - Juergen Gotteswinter
you need the eql hit kit to make it work at least somehow better, but
hit kit requires multipathd to be disabled which is an dependency to ovirt.

so far, no real workaround seems to be known

Am 06.10.2016 um 09:19 schrieb Gary Lloyd:
> I asked on the Dell Storage Forum and they recommend the following:
> 
> /I recommend not using a numeric value for the "no_path_retry" variable
> within /etc/multipath.conf as once that numeric value is reached, if no
> healthy LUNs were discovered during that defined time multipath will
> disable the I/O queue altogether./
> 
> /I do recommend, however, changing the variable value from "12" (or even
> "60") to "queue" which will then allow multipathd to continue queing I/O
> until a healthy LUN is discovered (time of fail-over between
> controllers) and I/O is allowed to flow once again./
> 
> Can you see any issues with this recommendation as far as Ovirt is
> concerned ?
> 
> Thanks again
> 
> 
> /Gary Lloyd/
> 
> I.T. Systems:Keele University
> Finance & IT Directorate
> Keele:Staffs:IC1 Building:ST5 5NB:UK
> +44 1782 733063 
> 
> 
> On 4 October 2016 at 19:11, Nir Soffer  > wrote:
> 
> On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd  > wrote:
> 
> Hi
> 
> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct
> Luns for all our VMs.
> At the weekend during early hours an Equallogic controller
> failed over to its standby on one of our arrays and this caused
> about 20 of our VMs to be paused due to IO problems.
> 
> I have also noticed that this happens during Equallogic firmware
> upgrades since we moved onto Ovirt 3.65.
> 
> As recommended by Dell disk timeouts within the VMs are set to
> 60 seconds when they are hosted on an EqualLogic SAN.
> 
> Is there any other timeout value that we can configure in
> vdsm.conf to stop VMs from getting paused when a controller
> fails over ?
> 
> 
> You can set the timeout in multipath.conf.
> 
> With current multipath configuration (deployed by vdsm), when all
> paths to a device
> are lost (e.g. you take down all ports on the server during
> upgrade), all io will fail
> immediately.
> 
> If you want to allow 60 seconds gracetime in such case, you can
> configure:
> 
> no_path_retry 12
> 
> This will continue to monitor the paths 12 times, each 5 seconds 
> (assuming polling_interval=5). If some path recover during this
> time, the io
> can complete and the vm will not be paused.
> 
> If no path is available after these retries, io will fail and vms
> with pending io
> will pause.
> 
> Note that this will also cause delays in vdsm in various flows,
> increasing the chance
> of timeouts in engine side, or delays in storage domain monitoring.
> 
> However, the 60 seconds delay is expected only on the first time all
> paths become
> faulty. Once the timeout has expired, any access to the device will
> fail immediately.
> 
> To configure this, you must add the # VDSM PRIVATE tag at the second
> line of
> multipath.conf, otherwise vdsm will override your configuration in
> the next time
> you run vdsm-tool configure.
> 
> multipath.conf should look like this:
> 
> # VDSM REVISION 1.3
> # VDSM PRIVATE
> 
> defaults {
> polling_interval5
> no_path_retry   12
> user_friendly_names no
> flush_on_last_del   yes
> fast_io_fail_tmo5
> dev_loss_tmo30
> max_fds 4096
> }
> 
> devices {
> device {
> all_devsyes
> no_path_retry   12
> }
> }
> 
> This will use 12 retries (60 seconds) timeout for any device. If you
> like to 
> configure only your specific device, you can add a device section for
> your specific server instead.
>  
> 
> 
> Also is there anything that we can tweak to automatically
> unpause the VMs once connectivity with the arrays is
> re-established ?
> 
> 
> Vdsm will resume the vms when storage monitor detect that storage
> became available again.
> However we cannot guarantee that storage monitoring will detect that
> storage was down.
> This should be improved in 4.0.
>  
> 
> At the moment we are running a customized version of
> storageServer.py, as Ovirt has yet to include iscsi multipath
> support for Direct Luns out of the box.
> 
> 
> Would you like to share this code?
> 
> Nir
> 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> 

Re: [ovirt-users] Moving thin provisioned disks question

2017-06-27 Thread InterNetX - Juergen Gotteswinter

> >
> > Suppose I have one 500Gb thin provisioned disk
> > Why can I indirectly see that the actual size is 300Gb only in Snapshots
> > tab --> Disks of its VM ?
> 
> if you are using live storage migration, ovirt creates a qcow/lvm
> snapshot of the vm block device. but for whatever reason, it does NOT
> remove the snapshot after the migration has finished. you have to remove
> it yourself, otherwise disk usage will grow more and more.
> 
> 
> I believe you are referring to the "Auto-generated" snapshot created
> during live storage migration. This behavior is reported
> in https://bugzilla.redhat.com/1317434 and fixed since 4.0.0.

yep, thats what i meant. i just wasnt aware of the fact that this isnt
the case anymore for 4.x and above. sorry for confusion

> 
> 
> >
> > Thanks,
> > Gianluca
> >
> >
> > ___
> > Users mailing list
> > Users@ovirt.org 
> > http://lists.ovirt.org/mailman/listinfo/users
> 
> >
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Moving thin provisioned disks question

2017-06-27 Thread InterNetX - Juergen Gotteswinter


Am 27.06.2017 um 11:27 schrieb Gianluca Cecchi:
> Hello,
> I have a storage domain that I have to empty, moving its disks to
> another storage domain,
> 
> Both source and target domains are iSCSI
> What is the behavior in case of preallocated and thin provisioned disk?
> Are they preserved with their initial configuration?

yes, they stay within their initial configuration

> 
> Suppose I have one 500Gb thin provisioned disk
> Why can I indirectly see that the actual size is 300Gb only in Snapshots
> tab --> Disks of its VM ?

if you are using live storage migration, ovirt creates a qcow/lvm
snapshot of the vm block device. but for whatever reason, it does NOT
remove the snapshot after the migration has finished. you have to remove
it yourself, otherwise disk usage will grow more and more.

> 
> Thanks,
> Gianluca
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Frustration defines the deployment of Hosted Engine

2017-06-26 Thread InterNetX - Juergen Gotteswinter
> 2. Should I migrate from XenServer to oVirt? This is biased, I know, but
> I would like to hear opinions. The folks with @redhat.com email
> addresses will know how to advocate in favor of oVirt.
> 

in term of reliability, better stay with xenserver
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [HEADS UP] CentOS 7.3 is rolling out, need qemu-kvm-ev 2.6

2016-12-20 Thread InterNetX - Juergen Gotteswinter


Am 20.12.2016 um 11:08 schrieb InterNetX - Juergen Gotteswinter:
> Am 15.12.2016 um 17:26 schrieb Paolo Bonzini:
>>
>>
>> On 15/12/2016 16:46, Sandro Bonazzola wrote:
>>>
>>>
>>> Il 15/Dic/2016 16:17, "InterNetX - Juergen Gotteswinter"
>>> <j...@internetx.com <mailto:j...@internetx.com>> ha scritto:
>>>
>>> Am 15.12.2016 um 15:51 schrieb Sandro Bonazzola:
>>> >
>>> >
>>> > On Thu, Dec 15, 2016 at 3:02 PM, InterNetX - Juergen Gotteswinter
>>> > <j...@internetx.com <mailto:j...@internetx.com>
>>> <mailto:j...@internetx.com <mailto:j...@internetx.com>>> wrote:
>>> >
>>> > i can confirm that it will break ...
>>> >
>>> > Dec 15 14:58:43 vm1 journal: internal error: qemu unexpectedly
>>> closed
>>> > the monitor: Unexpected error in object_property_find() at
>>> > qom/object.c:1003:#0122016-12-15T13:58:43.140073Z qemu-kvm:
>>> can't apply
>>> > global Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic'
>>> not found
>>> >
>>> >
>>> > Just an heads up that qemu-kvm-ev 2.6 is now
>>> > in http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
>>> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>
>>> > <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
>>> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>>
>>>
>>> [16:16:47][root@vm1:/var/log]$rpm -aq |grep qemu-kvm-ev
>>> qemu-kvm-ev-2.6.0-27.1.el7.x86_64
>>> [16:16:52][root@vm1:/var/log]$
>>>
>>> this message is from 2.6
>>>
>>>
>>> Adding Paolo and Michal.
>>
>> The message is ugly, but that "x1apic" should have read "x2apic".
>>
>> Paolo
>>
> 
> Yep, it seems to be introduced with
> "0002-Add-RHEL-7-machine-types.patch" starting at line 862
> 
> +},\
> +{\
> +.driver = "Opteron_G4" "-" TYPE_X86_CPU,\
> +.property = "x1apic",\
> +.value = "off",\
> +},\
> +{\
> 
> 
> going to test this right now
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 

works after fixing the typo & rebuilding
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [HEADS UP] CentOS 7.3 is rolling out, need qemu-kvm-ev 2.6

2016-12-20 Thread InterNetX - Juergen Gotteswinter
Am 15.12.2016 um 17:26 schrieb Paolo Bonzini:
> 
> 
> On 15/12/2016 16:46, Sandro Bonazzola wrote:
>>
>>
>> Il 15/Dic/2016 16:17, "InterNetX - Juergen Gotteswinter"
>> <j...@internetx.com <mailto:j...@internetx.com>> ha scritto:
>>
>> Am 15.12.2016 um 15:51 schrieb Sandro Bonazzola:
>> >
>> >
>> > On Thu, Dec 15, 2016 at 3:02 PM, InterNetX - Juergen Gotteswinter
>> > <j...@internetx.com <mailto:j...@internetx.com>
>> <mailto:j...@internetx.com <mailto:j...@internetx.com>>> wrote:
>> >
>> > i can confirm that it will break ...
>> >
>> > Dec 15 14:58:43 vm1 journal: internal error: qemu unexpectedly
>> closed
>> > the monitor: Unexpected error in object_property_find() at
>> > qom/object.c:1003:#0122016-12-15T13:58:43.140073Z qemu-kvm:
>> can't apply
>> > global Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic'
>> not found
>> >
>> >
>> > Just an heads up that qemu-kvm-ev 2.6 is now
>> > in http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
>> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>
>> > <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
>> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>>
>>
>> [16:16:47][root@vm1:/var/log]$rpm -aq |grep qemu-kvm-ev
>> qemu-kvm-ev-2.6.0-27.1.el7.x86_64
>> [16:16:52][root@vm1:/var/log]$
>>
>> this message is from 2.6
>>
>>
>> Adding Paolo and Michal.
> 
> The message is ugly, but that "x1apic" should have read "x2apic".
> 
> Paolo
> 

Yep, it seems to be introduced with
"0002-Add-RHEL-7-machine-types.patch" starting at line 862

+},\
+{\
+.driver = "Opteron_G4" "-" TYPE_X86_CPU,\
+.property = "x1apic",\
+.value = "off",\
+},\
+{\


going to test this right now

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [HEADS UP] CentOS 7.3 is rolling out, need qemu-kvm-ev 2.6

2016-12-15 Thread InterNetX - Juergen Gotteswinter
Am 15.12.2016 um 16:46 schrieb Sandro Bonazzola:
> 
> 
> Il 15/Dic/2016 16:17, "InterNetX - Juergen Gotteswinter"
> <j...@internetx.com <mailto:j...@internetx.com>> ha scritto:
> 
> Am 15.12.2016 um 15:51 schrieb Sandro Bonazzola:
> >
> >
> > On Thu, Dec 15, 2016 at 3:02 PM, InterNetX - Juergen Gotteswinter
> > <j...@internetx.com <mailto:j...@internetx.com>
> <mailto:j...@internetx.com <mailto:j...@internetx.com>>> wrote:
> >
> > i can confirm that it will break ...
> >
> > Dec 15 14:58:43 vm1 journal: internal error: qemu unexpectedly
> closed
> > the monitor: Unexpected error in object_property_find() at
> > qom/object.c:1003:#0122016-12-15T13:58:43.140073Z qemu-kvm:
> can't apply
> > global Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic'
> not found
> >
> >
> > Just an heads up that qemu-kvm-ev 2.6 is now
> > in http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>
> > <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/>>
> 
> [16:16:47][root@vm1:/var/log]$rpm -aq |grep qemu-kvm-ev
> qemu-kvm-ev-2.6.0-27.1.el7.x86_64
> [16:16:52][root@vm1:/var/log]$
> 
> this message is from 2.6
> 
> 
> Adding Paolo and Michal.

sorry, theres a little bit more in the startup log which might be helpful

Unexpected error in object_property_find() at qom/object.c:1003:
2016-12-15T13:58:43.140073Z qemu-kvm: can't apply global
Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic' not found


the complete startup parameters in that case are

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
guest=jg123_vm1_loadtest,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-jg123_vm1_loadtest/master-key.aes
-machine rhel6.5.0,accel=kvm,usb=off -cpu Opteron_G4 -m 65536 -realtime
mlock=off -smp 8,maxcpus=64,sockets=16,cores=4,threads=1 -uuid
20047459-7e48-4160-ac77-0e26a4f99472 -smbios
'type=1,manufacturer=oVirt,product=oVirt
Node,version=7-3.1611.el7.centos,serial=4C4C4544-0039-3310-8043-B2C04F463032,uuid=20047459-7e48-4160-ac77-0e26a4f99472'
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-jg123_vm1_loadtest/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=2016-12-15T13:58:41,driftfix=slew -global
kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device
virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5
-drive if=none,id=drive-ide0-1-0,readonly=on -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
file=/rhev/data-center/0002-0002-0002-0002-02f7/d5b56ea4-782e-4002-bb9a-478b337b5c9f/images/f022eca0-1af3-43ad-acad-4731ceceed3e/94b35a95-c80b-434c-afe7-e8ab4391395c,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=f022eca0-1af3-43ad-acad-4731ceceed3e,cache=none,werror=stop,rerror=stop,aio=native
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
-netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:5e:43:04,bus=pci.0,addr=0x3,bootindex=2
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/20047459-7e48-4160-ac77-0e26a4f99472.com.redhat.rhevm.vdsm,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/20047459-7e48-4160-ac77-0e26a4f99472.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel2,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0
-spice
tls-port=5900,addr=192.168.210.80,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on
-k en-us -device
qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2
-msg timestamp=on



> 
> 
> 
> 
> 
> >
> >
> >
> >
> > cheers,
> >
> > Juergen
> >
> > Am 13.12.2016 

Re: [ovirt-users] [HEADS UP] CentOS 7.3 is rolling out, need qemu-kvm-ev 2.6

2016-12-15 Thread InterNetX - Juergen Gotteswinter
Am 15.12.2016 um 15:51 schrieb Sandro Bonazzola:
> 
> 
> On Thu, Dec 15, 2016 at 3:02 PM, InterNetX - Juergen Gotteswinter
> <j...@internetx.com <mailto:j...@internetx.com>> wrote:
> 
> i can confirm that it will break ...
> 
> Dec 15 14:58:43 vm1 journal: internal error: qemu unexpectedly closed
> the monitor: Unexpected error in object_property_find() at
> qom/object.c:1003:#0122016-12-15T13:58:43.140073Z qemu-kvm: can't apply
> global Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic' not found
> 
> 
> Just an heads up that qemu-kvm-ev 2.6 is now
> in http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/> 

[16:16:47][root@vm1:/var/log]$rpm -aq |grep qemu-kvm-ev
qemu-kvm-ev-2.6.0-27.1.el7.x86_64
[16:16:52][root@vm1:/var/log]$

this message is from 2.6

> 
> 
>  
> 
> cheers,
> 
> Juergen
> 
> Am 13.12.2016 um 10:30 schrieb Ralf Schenk:
> > Hello
> >
> > by browsing the repository on
> > http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
> <http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/> I can't see
> > any qemu-kvm-ev-2.6.* RPM.
> >
> > I think this will break if I update the Ovirt-Hosts...
> >
> > [root@microcloud21 yum.repos.d]# yum check-update | grep libvirt
> > libvirt.x86_64  2.0.0-10.el7_3.2
> > updates
> > libvirt-client.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-config-network.x86_642.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-config-nwfilter.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-interface.x86_64  2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-lxc.x86_642.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-network.x86_642.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-nodedev.x86_642.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-nwfilter.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-qemu.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-secret.x86_64 2.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-driver-storage.x86_642.0.0-10.el7_3.2
> > updates
> > libvirt-daemon-kvm.x86_64   2.0.0-10.el7_3.2
> > updates
> > libvirt-lock-sanlock.x86_64 2.0.0-10.el7_3.2
> > updates
> > libvirt-python.x86_64   2.0.0-2.el7
> > base
> >
> > [root@microcloud21 yum.repos.d]# yum check-update | grep qemu*
> > ipxe-roms-qemu.noarch   20160127-5.git6366fa7a.el7
> > base
> > libvirt-daemon-driver-qemu.x86_64   2.0.0-10.el7_3.2
> > updates
> >
> >
> > Am 13.12.2016 um 08:43 schrieb Sandro Bonazzola:
> >>
> >>
> >> On Mon, Dec 12, 2016 at 6:38 PM, Chris Adams <c...@cmadams.net
> <mailto:c...@cmadams.net>
> >> <mailto:c...@cmadams.net <mailto:c...@cmadams.net>>> wrote:
> >>
> >> Once upon a time, Sandro Bonazzola <sbona...@redhat.com 
> <mailto:sbona...@redhat.com>
> >> <mailto:sbona...@redhat.com <mailto:sbona...@redhat.com>>> said:
> >> > In terms of ovirt repositories, qemu-kvm-ev 2.6 is available
> >> right now in
> >> > ovirt-master-snapshot-static, ovirt-4.0-snapshot-static, and
> >> ovirt-4.0-pre
> >> > (contains 4.0.6 RC4 rpms going to be announced in a few minutes.)
> >>
> >> Will qemu-kvm-ev 2.6 be added to any of the oVirt repos for prior
> >> versions (such as 3.5 or 3.6)?
> >>
> >>
> >> You can enable CentOS Virt SIG repo by running "yum install
> >> centos-release-qemu-ev" on your CentOS 7 systems.
> >> and you'll have updated qemu-kvm-ev.
> >>
> >>
> >>
> >> --
> >> Chris Adams <c...@cmadams.net <mailto:c...@cmadams.net>
> <mailto:c...@cmadams.net <mailto:c...@cmadams.net>>>
> >> ___
> >> Users mailing list
> >> Users@ovirt.org <m

Re: [ovirt-users] [HEADS UP] CentOS 7.3 is rolling out, need qemu-kvm-ev 2.6

2016-12-15 Thread InterNetX - Juergen Gotteswinter
i can confirm that it will break ...

Dec 15 14:58:43 vm1 journal: internal error: qemu unexpectedly closed
the monitor: Unexpected error in object_property_find() at
qom/object.c:1003:#0122016-12-15T13:58:43.140073Z qemu-kvm: can't apply
global Opteron_G4-x86_64-cpu.x1apic=off: Property '.x1apic' not found

cheers,

Juergen

Am 13.12.2016 um 10:30 schrieb Ralf Schenk:
> Hello
> 
> by browsing the repository on
> http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/ I can't see
> any qemu-kvm-ev-2.6.* RPM.
> 
> I think this will break if I update the Ovirt-Hosts...
> 
> [root@microcloud21 yum.repos.d]# yum check-update | grep libvirt
> libvirt.x86_64  2.0.0-10.el7_3.2
> updates
> libvirt-client.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-daemon.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-daemon-config-network.x86_642.0.0-10.el7_3.2
> updates
> libvirt-daemon-config-nwfilter.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-interface.x86_64  2.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-lxc.x86_642.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-network.x86_642.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-nodedev.x86_642.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-nwfilter.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-qemu.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-secret.x86_64 2.0.0-10.el7_3.2
> updates
> libvirt-daemon-driver-storage.x86_642.0.0-10.el7_3.2
> updates
> libvirt-daemon-kvm.x86_64   2.0.0-10.el7_3.2
> updates
> libvirt-lock-sanlock.x86_64 2.0.0-10.el7_3.2
> updates
> libvirt-python.x86_64   2.0.0-2.el7 
> base
> 
> [root@microcloud21 yum.repos.d]# yum check-update | grep qemu*
> ipxe-roms-qemu.noarch   20160127-5.git6366fa7a.el7  
> base
> libvirt-daemon-driver-qemu.x86_64   2.0.0-10.el7_3.2
> updates
> 
> 
> Am 13.12.2016 um 08:43 schrieb Sandro Bonazzola:
>>
>>
>> On Mon, Dec 12, 2016 at 6:38 PM, Chris Adams > > wrote:
>>
>> Once upon a time, Sandro Bonazzola > > said:
>> > In terms of ovirt repositories, qemu-kvm-ev 2.6 is available
>> right now in
>> > ovirt-master-snapshot-static, ovirt-4.0-snapshot-static, and
>> ovirt-4.0-pre
>> > (contains 4.0.6 RC4 rpms going to be announced in a few minutes.)
>>
>> Will qemu-kvm-ev 2.6 be added to any of the oVirt repos for prior
>> versions (such as 3.5 or 3.6)?
>>
>>
>> You can enable CentOS Virt SIG repo by running "yum install
>> centos-release-qemu-ev" on your CentOS 7 systems.
>> and you'll have updated qemu-kvm-ev.
>>
>>  
>>
>> --
>> Chris Adams >
>> ___
>> Users mailing list
>> Users@ovirt.org 
>> http://lists.phx.ovirt.org/mailman/listinfo/users
>> 
>>
>>
>>
>>
>> -- 
>> Sandro Bonazzola
>> Better technology. Faster innovation. Powered by community collaboration.
>> See how it works at redhat.com 
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.phx.ovirt.org/mailman/listinfo/users
> 
> -- 
> 
> 
> *Ralf Schenk*
> fon +49 (0) 24 05 / 40 83 70
> fax +49 (0) 24 05 / 40 83 759
> mail *r...@databay.de* 
>   
> *Databay AG*
> Jens-Otto-Krag-Straße 11
> D-52146 Würselen
> *www.databay.de* 
> 
> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
> Philipp Hermanns
> Aufsichtsratsvorsitzender: Wilhelm Dohmen
> 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.phx.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VMs paused due to IO issues - Dell Equallogic controller failover

2016-10-06 Thread InterNetX - Juergen Gotteswinter
you need the eql hit kit to make it work at least somehow better, but
hit kit requires multipathd to be disabled which is an dependency to ovirt.

so far, no real workaround seems to be known

Am 06.10.2016 um 09:19 schrieb Gary Lloyd:
> I asked on the Dell Storage Forum and they recommend the following:
> 
> /I recommend not using a numeric value for the "no_path_retry" variable
> within /etc/multipath.conf as once that numeric value is reached, if no
> healthy LUNs were discovered during that defined time multipath will
> disable the I/O queue altogether./
> 
> /I do recommend, however, changing the variable value from "12" (or even
> "60") to "queue" which will then allow multipathd to continue queing I/O
> until a healthy LUN is discovered (time of fail-over between
> controllers) and I/O is allowed to flow once again./
> 
> Can you see any issues with this recommendation as far as Ovirt is
> concerned ?
> 
> Thanks again
> 
> 
> /Gary Lloyd/
> 
> I.T. Systems:Keele University
> Finance & IT Directorate
> Keele:Staffs:IC1 Building:ST5 5NB:UK
> +44 1782 733063 
> 
> 
> On 4 October 2016 at 19:11, Nir Soffer  > wrote:
> 
> On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd  > wrote:
> 
> Hi
> 
> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct
> Luns for all our VMs.
> At the weekend during early hours an Equallogic controller
> failed over to its standby on one of our arrays and this caused
> about 20 of our VMs to be paused due to IO problems.
> 
> I have also noticed that this happens during Equallogic firmware
> upgrades since we moved onto Ovirt 3.65.
> 
> As recommended by Dell disk timeouts within the VMs are set to
> 60 seconds when they are hosted on an EqualLogic SAN.
> 
> Is there any other timeout value that we can configure in
> vdsm.conf to stop VMs from getting paused when a controller
> fails over ?
> 
> 
> You can set the timeout in multipath.conf.
> 
> With current multipath configuration (deployed by vdsm), when all
> paths to a device
> are lost (e.g. you take down all ports on the server during
> upgrade), all io will fail
> immediately.
> 
> If you want to allow 60 seconds gracetime in such case, you can
> configure:
> 
> no_path_retry 12
> 
> This will continue to monitor the paths 12 times, each 5 seconds 
> (assuming polling_interval=5). If some path recover during this
> time, the io
> can complete and the vm will not be paused.
> 
> If no path is available after these retries, io will fail and vms
> with pending io
> will pause.
> 
> Note that this will also cause delays in vdsm in various flows,
> increasing the chance
> of timeouts in engine side, or delays in storage domain monitoring.
> 
> However, the 60 seconds delay is expected only on the first time all
> paths become
> faulty. Once the timeout has expired, any access to the device will
> fail immediately.
> 
> To configure this, you must add the # VDSM PRIVATE tag at the second
> line of
> multipath.conf, otherwise vdsm will override your configuration in
> the next time
> you run vdsm-tool configure.
> 
> multipath.conf should look like this:
> 
> # VDSM REVISION 1.3
> # VDSM PRIVATE
> 
> defaults {
> polling_interval5
> no_path_retry   12
> user_friendly_names no
> flush_on_last_del   yes
> fast_io_fail_tmo5
> dev_loss_tmo30
> max_fds 4096
> }
> 
> devices {
> device {
> all_devsyes
> no_path_retry   12
> }
> }
> 
> This will use 12 retries (60 seconds) timeout for any device. If you
> like to 
> configure only your specific device, you can add a device section for
> your specific server instead.
>  
> 
> 
> Also is there anything that we can tweak to automatically
> unpause the VMs once connectivity with the arrays is
> re-established ?
> 
> 
> Vdsm will resume the vms when storage monitor detect that storage
> became available again.
> However we cannot guarantee that storage monitoring will detect that
> storage was down.
> This should be improved in 4.0.
>  
> 
> At the moment we are running a customized version of
> storageServer.py, as Ovirt has yet to include iscsi multipath
> support for Direct Luns out of the box.
> 
> 
> Would you like to share this code?
> 
> Nir
> 
> 
> 
> 
> ___
> 

Re: [ovirt-users] iSCSI Multipathing -> host inactive

2016-08-29 Thread InterNetX - Juergen Gotteswinter
Am 29.08.2016 um 12:25 schrieb Nir Soffer:
> On Thu, Aug 25, 2016 at 2:37 PM, InterNetX - Juergen Gotteswinter
> <j...@internetx.com> wrote:
>> currently, iscsi multipathed with solaris based filer as backend. but
>> this is already in progress of getting migrated to a different, less
>> fragile, plattform. ovirt is nice, but too bleeding edge and way to much
>> acting like a girly
> 
> "acting like a girly" is not  appropriate  for this list.
> 
> Nir
> 

i am sorry, this was never meant to discriminate any human. if it did, i
promise that it was not meant to.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] iSCSI Multipathing -> host inactive

2016-08-26 Thread InterNetX - Juergen Gotteswinter
one more thing, which i am sure that most ppl are not aware of.

when using thin provisioned disks for vms, hosted on iSCSI SAN Ovirt
uses a for me unusual way to do this.

Ovirt adds a new LVM LV for a vm,  generates a Thin Qcow Image which is
written directly raw onto that LV. So for, ok, can be done like this.

But try generating some Write Load from within the Guest and see
yourself what will happen. Supports answer to this is: use Raw, without
thin.

Seems to me like a wrong design decision, and i can only warn everyone
to use this.

Am 26.08.2016 um 12:33 schrieb InterNetX - Juergen Gotteswinter:
> 
> 
> Am 25.08.2016 um 15:53 schrieb Yaniv Kaul:
>>
>>
>> On Wed, Aug 24, 2016 at 6:15 PM, InterNetX - Juergen Gotteswinter
>> <juergen.gotteswin...@internetx.com
>> <mailto:juergen.gotteswin...@internetx.com>> wrote:
>>
>> iSCSI & Ovirt is an awful combination, no matter if multipathed or
>> bonded. its always gambling how long it will work, and when it fails why
>> did it fail.
>>
>>
>> I disagree. In most cases, it's actually a lower layer issues. In most
>> cases, btw, it's because multipathing was not configured (or not
>> configured correctly).
>>  
> 
> experience tells me it is like is said, this was something i have seen
> from 3.0 up to 3.6. Ovirt, and - suprprise - RHEV. Both act the same
> way. I am absolutly aware of Multpath Configurations, iSCSI Multipathing
> is very widespread in use in our DC. But such problems are an excluse
> Ovirt/RHEV Feature.
> 
>>
>>
>> its supersensitive to latency, and superfast with setting an host to
>> inactive because the engine thinks something is wrong with it. in most
>> cases there was no real reason for.
>>
>>
>> Did you open bugs for those issues? I'm not aware of 'no real reason'
>> issues.
>>  
> 
> Support Tickets for Rhev Installation, after Support (even after massive
> escalation requests) kept telling me the same again and again i gave up
> and we dropped the RHEV Subscriptions to Migrate the VMS to a different
> Plattform Solution (still iSCSI Backend). Problems gone.
> 
> 
>>
>>
>> we had this in several different hardware combinations, self built
>> filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer
>>
>> Been there, done that, wont do again.
>>
>>
>> We've had good success and reliability with most enterprise level
>> storage, such as EMC, NetApp, Dell filers.
>> When properly configured, of course.
>> Y.
>>
> 
> Dell Equallogic? Cant really believe since Ovirt / Rhev and the
> Equallogic Network Configuration wont play nice together (EQL wants all
> Interfaces in the same Subnet). And they only work like expected when
> there Hit Kit Driverpackage is installed. Without Path Failover is like
> russian Roulette. But Ovirt hates the Hit Kit, so this Combo ends up in
> a huge mess, because Ovirt does changes to iSCSI, as well as the Hit Kit
> -> Kaboom. Host not available.
> 
> There are several KB Articles in the RHN, without real Solution.
> 
> 
> But like you try to tell between the lines, this must be the Customers
> misconfiguration. Yep, typical Supportkilleranswer. Same Style than in
> RHN Tickets, i am done with this.
> 
> Thanks.
> 
>>  
>>
>>
>> Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:
>> > Hi Elad,
>> >
>> > thank you very much for clearing things up.
>> >
>> > Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a'
>> > and 'b' are in completely separate networks this can never work as
>> long
>> > as there is no routing between the networks.
>> >
>> > So it seems the iSCSI-bonding feature is not useful for my setup. I
>> > still wonder how and where this feature is supposed to be used?
>> >
>> > thank you,
>> > Uwe
>> >
>> > Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:
>> >> Thanks.
>> >>
>> >> You're getting an iSCSI connection timeout [1], [2]. It means the
>> host
>> >> cannot connect to the targets from iface: enp9s0f1 nor iface:
>> enp9s0f0.
>> >>
>> >> This causes the host to loose its connection to the storage and also,
>> >> the connection to the engine becomes inactive. Therefore, the host
>> >> changes its status to Non-responsive [3] and since it's the SPM, the
>> >> whole DC, with all its storage domains b

Re: [ovirt-users] iSCSI Multipathing -> host inactive

2016-08-25 Thread InterNetX - Juergen Gotteswinter


Am 25.08.2016 um 08:42 schrieb Uwe Laverenz:
> Hi Jürgen,
> 
> Am 24.08.2016 um 17:15 schrieb InterNetX - Juergen Gotteswinter:
>> iSCSI & Ovirt is an awful combination, no matter if multipathed or
>> bonded. its always gambling how long it will work, and when it fails why
>> did it fail.
>>
>> its supersensitive to latency, and superfast with setting an host to
>> inactive because the engine thinks something is wrong with it. in most
>> cases there was no real reason for.
>>
>> we had this in several different hardware combinations, self built
>> filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer
>>
>> Been there, done that, wont do again.
> 
> Thank you, I take this as a warning. :)
> 
> For my testbed I chose to ignore the iSCSI-bond feature and change the
> multipath default to round robin instead.
> 
> What kind of storage do you use in production? Fibre channel, gluster,
> ceph, ...?

>>
>> we had this in several different hardware combinations, self built
>> filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

currently, iscsi multipathed with solaris based filer as backend. but
this is already in progress of getting migrated to a different, less
fragile, plattform. ovirt is nice, but too bleeding edge and way to much
acting like a girly

> 
> thanks,
> Uwe
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] iSCSI Multipathing -> host inactive

2016-08-24 Thread InterNetX - Juergen Gotteswinter
iSCSI & Ovirt is an awful combination, no matter if multipathed or
bonded. its always gambling how long it will work, and when it fails why
did it fail.

its supersensitive to latency, and superfast with setting an host to
inactive because the engine thinks something is wrong with it. in most
cases there was no real reason for.

we had this in several different hardware combinations, self built
filers up on FreeBSD/Illumos & ZFS, Equallogic SAN, Nexenta Filer

Been there, done that, wont do again.

Am 24.08.2016 um 16:04 schrieb Uwe Laverenz:
> Hi Elad,
> 
> thank you very much for clearing things up.
> 
> Initiator/iface 'a' tries to connect target 'b' and vice versa. As 'a'
> and 'b' are in completely separate networks this can never work as long
> as there is no routing between the networks.
> 
> So it seems the iSCSI-bonding feature is not useful for my setup. I
> still wonder how and where this feature is supposed to be used?
> 
> thank you,
> Uwe
> 
> Am 24.08.2016 um 15:35 schrieb Elad Ben Aharon:
>> Thanks.
>>
>> You're getting an iSCSI connection timeout [1], [2]. It means the host
>> cannot connect to the targets from iface: enp9s0f1 nor iface: enp9s0f0.
>>
>> This causes the host to loose its connection to the storage and also,
>> the connection to the engine becomes inactive. Therefore, the host
>> changes its status to Non-responsive [3] and since it's the SPM, the
>> whole DC, with all its storage domains become inactive.
>>
>>
>> vdsm.log:
>> [1]
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/hsm.py", line 2400, in
>> connectStorageServer
>> conObj.connect()
>>   File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect
>> iscsi.addIscsiNode(self._iface, self._target, self._cred)
>>   File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode
>> iscsiadm.node_login(iface.name , portalStr,
>> target.iqn)
>>   File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login
>> raise IscsiNodeError(rc, out, err)
>> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f0, target:
>> iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260]
>> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f0, targ
>> et: iqn.2005-10.org.freenas.ctl:tgtb, portal: 10.0.132.121,3260].',
>> 'iscsiadm: initiator reported error (8 - connection timed out)',
>> 'iscsiadm: Could not log into all portals'])
>>
>>
>>
>> vdsm.log:
>> [2]
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/hsm.py", line 2400, in
>> connectStorageServer
>> conObj.connect()
>>   File "/usr/share/vdsm/storage/storageServer.py", line 508, in connect
>> iscsi.addIscsiNode(self._iface, self._target, self._cred)
>>   File "/usr/share/vdsm/storage/iscsi.py", line 204, in addIscsiNode
>> iscsiadm.node_login(iface.name , portalStr,
>> target.iqn)
>>   File "/usr/share/vdsm/storage/iscsiadm.py", line 336, in node_login
>> raise IscsiNodeError(rc, out, err)
>> IscsiNodeError: (8, ['Logging in to [iface: enp9s0f1, target:
>> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260]
>> (multiple)'], ['iscsiadm: Could not login to [iface: enp9s0f1, target:
>> iqn.2005-10.org.freenas.ctl:tgta, portal: 10.0.131.121,3260].',
>> 'iscsiadm: initiator reported error (8 - connection timed out)',
>> 'iscsiadm: Could not log into all portals'])
>>
>>
>> engine.log:
>> [3]
>>
>>
>> 2016-08-24 14:10:23,222 WARN
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (default task-25) [15d1637f] Correlation ID: 15d1637f, Call Stack: null,
>> Custom Event ID:
>>  -1, Message: iSCSI bond 'iBond' was successfully created in Data Center
>> 'Default' but some of the hosts encountered connection issues.
>>
>>
>>
>> 2016-08-24 14:10:23,208 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
>>
>> (org.ovirt.thread.pool-8-thread-25) [15d1637f] Command
>> 'org.ovirt.engine.core.vdsbrok
>> er.vdsbroker.ConnectStorageServerVDSCommand' return value '
>> ServerConnectionStatusReturnForXmlRpc:{status='StatusForXmlRpc
>> [code=5022, message=Message timeout which can be caused by communication
>> issues]'}
>>
>>
>>
>> On Wed, Aug 24, 2016 at 4:04 PM, Uwe Laverenz > > wrote:
>>
>> Hi Elad,
>>
>> I sent you a download message.
>>
>> thank you,
>> Uwe
>> ___
>> Users mailing list
>> Users@ovirt.org 
>> http://lists.ovirt.org/mailman/listinfo/users
>> 
>>
>>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sanlock add Lockspace Errors

2016-06-04 Thread InterNetX - Juergen Gotteswinter
Am 6/3/2016 um 6:37 PM schrieb Nir Soffer:
> On Fri, Jun 3, 2016 at 11:27 AM, InterNetX - Juergen Gotteswinter
> <juergen.gotteswin...@internetx.com> wrote:
>> What if we move all vm off the lun which causes this error, drop the lun
>> and recreated it. Will we "migrate" the error with the VM to a different
>> lun or could this be a fix?
> 
> This should will fix the ids file, but since we don't know why this corruption
> happened, it may happen again.
>

i am pretty sure to know when / why this happend, after a major outage
with engine gone crazy in fencing hosts + crash / hard reset of the san
this messages occoured the first time.

but i can provide a log package, no problem


> Please open a bug with the log I requested so we can investigate this issue.
> 
> To fix the ids file you don't have to recreate the lun, just
> initialize the ids lv.
> 
> 1. Put the domain to maintenance (via engine)
> 
> No host should access it while you reconstruct the ids file
> 
> 2. Activate the ids lv
> 
> You may need to connect to this iscsi target first, unless you have other
> vgs connected on the same target.
> 
> lvchange -ay sd_uuid/ids
> 
> 3. Initialize the lockspace
> 
> sanlock direct init -s :0:/dev//ids:0
> 
> 4. Deactivate the ids lv
> 
> lvchange -an sd_uuid/ids
> 
> 6. Activate the domain (via engine)
> 
> The domain should become active after a while.
> 

oh, this is great, going to announce an maintance window. Thanks a lot,
this already started to drive me crazy. Will Report after we did this!

> Nir
> 
>>
>> Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter:
>>> Hello David,
>>>
>>> thanks for your explanation of those messages, is there any possibility
>>> to get rid of this? i already figured out that it might be an corruption
>>> of the ids file, but i didnt find anything about re-creating or other
>>> solutions to fix this.
>>>
>>> Imho this occoured after an outage where several hosts, and the iscsi
>>> SAN has been fenced and/or rebooted.
>>>
>>> Thanks,
>>>
>>> Juergen
>>>
>>>
>>> Am 6/2/2016 um 6:03 PM schrieb David Teigland:
>>>> On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
>>>>>> This is a mess that's been caused by improper use of storage, and various
>>>>>> sanity checks in sanlock have all reported errors for "impossible"
>>>>>> conditions indicating that something catastrophic has been done to the
>>>>>> storage it's using.  Some fundamental rules are not being followed.
>>>>>
>>>>> Thanks David.
>>>>>
>>>>> Do you need more output from sanlock to understand this issue?
>>>>
>>>> I can think of nothing more to learn from sanlock.  I'd suggest tighter,
>>>> higher level checking or control of storage.  Low level sanity checks
>>>> detecting lease corruption are not a convenient place to work from.
>>>>
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sanlock add Lockspace Errors

2016-06-03 Thread InterNetX - Juergen Gotteswinter
What if we move all vm off the lun which causes this error, drop the lun
and recreated it. Will we "migrate" the error with the VM to a different
lun or could this be a fix?

Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter:
> Hello David,
> 
> thanks for your explanation of those messages, is there any possibility
> to get rid of this? i already figured out that it might be an corruption
> of the ids file, but i didnt find anything about re-creating or other
> solutions to fix this.
> 
> Imho this occoured after an outage where several hosts, and the iscsi
> SAN has been fenced and/or rebooted.
> 
> Thanks,
> 
> Juergen
> 
> 
> Am 6/2/2016 um 6:03 PM schrieb David Teigland:
>> On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
>>>> This is a mess that's been caused by improper use of storage, and various
>>>> sanity checks in sanlock have all reported errors for "impossible"
>>>> conditions indicating that something catastrophic has been done to the
>>>> storage it's using.  Some fundamental rules are not being followed.
>>>
>>> Thanks David.
>>>
>>> Do you need more output from sanlock to understand this issue?
>>
>> I can think of nothing more to learn from sanlock.  I'd suggest tighter,
>> higher level checking or control of storage.  Low level sanity checks
>> detecting lease corruption are not a convenient place to work from.
>>
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sanlock add Lockspace Errors

2016-06-03 Thread InterNetX - Juergen Gotteswinter
Hello David,

thanks for your explanation of those messages, is there any possibility
to get rid of this? i already figured out that it might be an corruption
of the ids file, but i didnt find anything about re-creating or other
solutions to fix this.

Imho this occoured after an outage where several hosts, and the iscsi
SAN has been fenced and/or rebooted.

Thanks,

Juergen


Am 6/2/2016 um 6:03 PM schrieb David Teigland:
> On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote:
>>> This is a mess that's been caused by improper use of storage, and various
>>> sanity checks in sanlock have all reported errors for "impossible"
>>> conditions indicating that something catastrophic has been done to the
>>> storage it's using.  Some fundamental rules are not being followed.
>>
>> Thanks David.
>>
>> Do you need more output from sanlock to understand this issue?
> 
> I can think of nothing more to learn from sanlock.  I'd suggest tighter,
> higher level checking or control of storage.  Low level sanity checks
> detecting lease corruption are not a convenient place to work from.
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] What recovers a VM from pause?

2016-05-30 Thread InterNetX - Juergen Gotteswinter
Am 5/30/2016 um 3:59 PM schrieb Nicolas Ecarnot:
> Le 30/05/2016 15:30, InterNetX - Juergen Gotteswinter a écrit :
>> Hi,
>>
>> you are aware of the fact that eql sync replication is just about
>> replication, no single piece of high availability? i am not even sure if
>> it does ip failover itself. so better think about minutes of
>> interruptions than seconds.
> 
> Hi Juergen,
> 
> I'm absolutely aware that there is no HA discussed here, at least in my
> mind.
> It does ip fail-over, but I'm not even blindly trusting it enough,
> that's why I'm doing numerous tests and measures.
> I'm gladly surprised by how the iSCSI stack is reacting, and its log
> files are readable enough for me to decide.
> 
> Actually, I was more worrying about the iSCSI reconnection storm, but
> googling about it does not seem to get any warnings.

This works pretty well with the Eql Boxes, except you use the EQL
without Hit Kit. With installed HitKit on each Client i dont think that
this will cause problems.


> 
>> anyway, dont count on ovirts pause/unpause. theres a real chance that it
>> will go horrible wrong. a scheduled maint. window where everything gets
>> shut down whould be best practice
> 
> Indeed, this would the best choice, if I had it.
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] What recovers a VM from pause?

2016-05-30 Thread InterNetX - Juergen Gotteswinter
Hi,

you are aware of the fact that eql sync replication is just about
replication, no single piece of high availability? i am not even sure if
it does ip failover itself. so better think about minutes of
interruptions than seconds.

anyway, dont count on ovirts pause/unpause. theres a real chance that it
will go horrible wrong. a scheduled maint. window where everything gets
shut down whould be best practice

Juergen

Am 5/30/2016 um 3:07 PM schrieb Nicolas Ecarnot:
> Hello,
> 
> We're planning a move from our old building towards a new one a few
> meters away.
> 
> 
> 
> In a similar way of Martijn
> (https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have
> maintenance planed on our storage side.
> 
> Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic).
> This SAN allows me to setup block replication between two SANs, seen by
> oVirt as one (Dell is naming it SyncRep).
> Then switch all the iSCSI accesses to the replicated LUN.
> 
> When doing this, the iSCSI stack of each oVirt host notices the
> de-connection, tries to reconnect, and succeeds.
> Amongst our hosts, this happens between 4 and 15 seconds.
> 
> When this happens fast enough, oVirt engine and the VMs don't even
> notice, and they keep running happily.
> 
> When this takes more than 4 seconds, there are 2 cases :
> 
> 1 - The hosts and/or oVirt and/or the SPM (I actually don't know)
> notices that there is a storage failure, and pauses the VMs.
> When the iSCSI stack reconnects, the VMs are automatically recovered
> from pause, and this all takes less than 30 seconds. That is very
> acceptable for us, as this action is extremely rare.
> 
> 2 - Same storage failure, VMs paused, and some VMs stay in pause mode
> forever.
> Manual "run" action is mandatory.
> When done, everything recovers correctly.
> This is also quite acceptable, but here come my questions :
> 
> My questions : (!)
> - *WHAT* process or piece of code or what oVirt parts is responsible for
> deciding when to UN-pause a VM, and at what conditions?
> That would help me to understand why some cases are working even more
> smoothly than others.
> - Are there related timeouts I could play with in engine-config options?
> - [a bit off-topic] Is it safe to increase some iSCSI timeouts of
> buffer-sizes in the hope this kind of disconnection would get un-noticed?
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Sanlock add Lockspace Errors

2016-05-30 Thread InterNetX - Juergen Gotteswinter
Hi,

since some time we get Error Messages from Sanlock, and so far i was not
able to figure out what exactly they try to tell and more important if
its something which can be ignored or needs to be fixed (and how).

Here are the Versions we are using currently:

Engine

ovirt-engine-3.5.6.2-1.el6.noarch

Nodes

vdsm-4.16.34-0.el7.centos.x86_64
sanlock-3.2.4-1.el7.x86_64
libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64
libvirt-daemon-1.2.17-13.el7_2.3.x86_64
libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64
libvirt-1.2.17-13.el7_2.3.x86_64

-- snip --
May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109
[60137]: verify_leader 2 wrong space name
4643f652-8014-4951-8a1a-02af41e67d08
f757b127-a951-4fa9-bf90-81180c0702e6
/dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109
[60137]: leader1 delta_acquire_begin error -226 lockspace
f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109
[60137]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0
May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109
[60137]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0
May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109
[60137]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn
1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033
May 30 09:55:28 vm2 sanlock[1094]: 2016-05-30 09:55:28+0200 294110
[1099]: s9703 add_lockspace fail result -226
May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140
[60331]: verify_leader 2 wrong space name
4643f652-8014-4951-8a1a-02af41e67d08
f757b127-a951-4fa9-bf90-81180c0702e6
/dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140
[60331]: leader1 delta_acquire_begin error -226 lockspace
f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140
[60331]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0
May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140
[60331]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0
May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140
[60331]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn
1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033
May 30 09:55:59 vm2 sanlock[1094]: 2016-05-30 09:55:59+0200 294141
[1098]: s9704 add_lockspace fail result -226
May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148
[1094]: s1527 check_other_lease invalid for host 0 0 ts 7566376 name  in
4643f652-8014-4951-8a1a-02af41e67d08
May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148
[1094]: s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn
f757b127-a951-4fa9-bf90-81180c0702e6 rn
f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern
May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170
[60496]: verify_leader 2 wrong space name
4643f652-8014-4951-8a1a-02af41e67d08
f757b127-a951-4fa9-bf90-81180c0702e6
/dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170
[60496]: leader1 delta_acquire_begin error -226 lockspace
f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170
[60496]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0
May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170
[60496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0
May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170
[60496]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn
1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033
May 30 09:56:29 vm2 sanlock[1094]: 2016-05-30 09:56:29+0200 294171
[6415]: s9705 add_lockspace fail result -226
May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200
[60645]: verify_leader 2 wrong space name
4643f652-8014-4951-8a1a-02af41e67d08
f757b127-a951-4fa9-bf90-81180c0702e6
/dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids
May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200
[60645]: leader1 delta_acquire_begin error -226 lockspace
f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2
May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200
[60645]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0
May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200
[60645]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0
May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200
[60645]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn
1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033
May 30 09:56:59 vm2 sanlock[1094]: 2016-05-30 09:56:59+0200 294201
[6373]: s9706 add_lockspace fail result -226
May 30 09:57:28 vm2 sanlock[1094]: 2016-05-30 09:57:28+0200 294230
[60806]: 

Re: [ovirt-users] One RHEV Virtual Machine does not Automatically Resume following Compellent SAN Controller Failover

2016-05-30 Thread InterNetX - Juergen Gotteswinter
We see exactly the same, and it does not seem to be Vendor dependend.

- Equallogic Controller Failover -> VM get paused and maybe unpaused but
most dont
- Nexenta ZFS iSCSI with RSF1 HA -> same
- FreeBSD ctld iscsi-target + Heartbeat -> same
- CentOS + iscsi-target + Heartbeat -> same

Multipath Settings are, where available, modified to match the best
practice supplied by the Vendor. On Open Source Solutions we started
with known working multipath/iscsi Settings, and meanwhile nearly every
possible setting has been tested. Without much success.

To me it looks like Ovirt/Rhev is way to sensitive to iSCSI
Interruptions, and it feels like gambling what the engine might do to
your VM (or not).

Am 11/23/2015 um 8:37 PM schrieb Duckworth, Douglas C:
> Hello --
> 
> Not sure if y'all can help with this issue we've been seeing with RHEV...
> 
> On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster
> Recovery Site, we Failed Over to Secondary SAN Controller.  Most Virtual
> Machines in our DR Cluster Resumed automatically after Pausing except VM
> "BADVM" on Host "BADHOST."
> 
> In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state
> at 10:47:57:
> 
> "VM BADVM has paused due to storage I/O problem."
> 
> On this Red Hat Enterprise Virtualization Hypervisor 6.6
> (20150512.0.el6ev) Host, two other VMs paused but then automatically
> resumed without System Administrator intervention...
> 
> In our DR Cluster, 22 VMs also resumed automatically...
> 
> None of these Guest VMs are engaged in high I/O as these are DR site VMs
> not currently doing anything.
> 
> We sent this information to Dell.  Their response:
> 
> "The root cause may reside within your virtualization solution, not the
> parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)"
> 
> We are doing this Failover again on Sunday November 29th so we would
> like to know how to mitigate this issue, given we have to manually
> resume paused VMs that don't resume automatically.
> 
> Before we initiated SAN Controller Failover, all iSCSI paths to Targets
> were present on Host tulhv2p03.
> 
> VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage
> error was reported:
> 
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
> 
> All disks used by this Guest VM are provided by single Storage Domain
> COM_3TB4_DR with serial "270."  In syslog we do see that all paths for
> that Storage Domain Failed:
> 
> Nov 13 16:47:40 multipathd: 36000d310005caf000270: remaining
> active paths: 0
> 
> Though these recovered later:
> 
> Nov 13 16:59:17 multipathd: 36000d310005caf000270: sdbg -
> tur checker reports path is up
> Nov 13 16:59:17 multipathd: 36000d310005caf000270: remaining
> active paths: 8
> 
> Does anyone have an idea of why the VM would fail to automatically
> resume if the iSCSI paths used by its Storage Domain recovered?
> 
> Thanks
> Doug
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended bonding modes?

2016-05-23 Thread InterNetX - Juergen Gotteswinter
Generally, avoid mode 0 - Round Robin or you will face problems sooner
or later. RR will generate TCP out of Order.

Go for mode 4 / lacp wherever possible

Am 5/22/2016 um 9:33 PM schrieb Alan Murrell:
> Hello,
> 
> I am wondering what the recommended bonding modes are for the different
> types of networks:
> 
>   - Management/Display
>   - Guest VM Traffic
>   - Storage (NFS)
> 
> For the purposes pf this post, assume mode 4 is not available.
> 
> For Management/Display, I am thinking mode 1 (active-backup) is adequate
> since there generally isn't a lot of traffic being pushed through.
> 
> Storage, I like mode 0 (round robin) due to some performance results I
> have seen, though I understand this has to be setup in "custom
> bonding".  Mode 6 seems like it would do well here as well.
> 
> Guest VM traffic is the one I am not sure about.  There could be a lot
> of traffic going back and forth, so I would want all the interfaces in
> the bond active to balance the load.  Of the supported bonding modes,
> either 2 or 5 would seem to be the options.
> 
> I am just wondering what others' thoughts and experiences are.
> 
> Thanks! :-)
> 
> Regards,
> 
> Alan
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Shared storage

2016-03-21 Thread InterNetX - Juergen Gotteswinter
nop, thats direct attached sas. you want/need iscsi or nfs

Am 21.03.2016 um 10:50 schrieb Maton, Brett:
> Hello list,
> 
>   I was wondering if any one could tell me if a Dell MD1400 Direct
> Attached Storage unit is suitable for shared storage between ovirt hosts?
> 
>   I really don't want to buy something that isn't going to work :)
> 
> Regards,
> Brett
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Delete snapshot with status illegal - live merge not possible

2015-08-28 Thread InterNetX - Juergen Gotteswinter
got exactly the same issue, with all nice side effects like performance
degradation. Until now i was not able to fix this, or to fool the engine somehow
that it whould show the image as ok again and give me a 2nd chance to drop the
snapshot.
 
in some cases this procedure helped (needs 2nd storage domain)
 
- image live migration to a different storage domain (check which combinations
are supported, iscsi - nfs domain seems unsupported. iscsi - iscsi works)
- snapshot went into ok state, and in ~50% i was able to drop the snapshot
than. space had been reclaimed, so seems like this worked
 
 
other workaround is through exporting the image onto a nfs export domain, here
you can tell the engine to not export snapshots. after re-importing everything
is fine
 
 
the snapshot feature (live at least) should be avoided at all currently
simply not reliable enaugh.
 
 
your way works, too. already did that, even it was a pita to figure out where to
find what. this symlinking mess between /rhev /dev and /var/lib/libvirt is
really awesome. not.
 
 
 Jan Siml js...@plusline.net hat am 28. August 2015 um 12:56 geschrieben:


 Hello,

 if no one has an idea how to correct the Disk/Snapshot paths in Engine
 database, I see only one possible way to solve the issue:

 Stop the VM and copy image/meta files target storage to source storage
 (the one where Engine thinks the files are located). Start the VM.

 Any concerns regarding this procedure? But I still hope that someone
 from oVirt team can give an advice how to correct the database entries.
 If necessary I would open a bug in Bugzilla.

 Kind regards

 Jan Siml

  after a failed live storage migration (cause unknown) we have a
  snapshot which is undeletable due to its status 'illegal' (as seen
  in storage/snapshot tab). I have already found some bugs [1],[2],[3]
  regarding this issue, but no way how to solve the issue within oVirt
   3.5.3.
 
  I have attached the relevant engine.log snippet. Is there any way to
  do a live merge (and therefore delete the snapshot)?
 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1213157
  [2] https://bugzilla.redhat.com/show_bug.cgi?id=1247377 links to [3]
  [3] https://bugzilla.redhat.com/show_bug.cgi?id=1247379 (no access)
 
  some additional informations. I have checked the images on both storages
  and verified the disk paths with virsh's dumpxml.
 
  a) The images and snapshots are on both storages.
  b) The images on source storage aren't used. (modification time)
  c) The images on target storage are used. (modification time)
  d) virsh -r dumpxml tells me disk images are located on _target_ storage.
  e) Admin interface tells me, that images and snapshot are located on
  _source_ storage, which isn't true, see b), c) and d).
 
  What can we do, to solve this issue? Is this to be corrected in database
  only?
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Delete snapshot with status illegal - live merge not possible

2015-08-28 Thread InterNetX - Juergen Gotteswinter

 Jan Siml js...@plusline.net hat am 28. August 2015 um 15:15 geschrieben:


 Hello Juergen,

  got exactly the same issue, with all nice side effects like performance
  degradation. Until now i was not able to fix this, or to fool the engine
  somehow that it whould show the image as ok again and give me a 2nd
  chance to drop the snapshot.
  in some cases this procedure helped (needs 2nd storage domain)
  - image live migration to a different storage domain (check which
  combinations are supported, iscsi - nfs domain seems unsupported. iscsi
  - iscsi works)
  - snapshot went into ok state, and in ~50% i was able to drop the
  snapshot than. space had been reclaimed, so seems like this worked

 okay, seems interesting. But I'm afraid of not knowing which image files
 Engine uses when live migration is demanded. If Engine uses the ones
 which are actually used and updates the database afterwards -- fine. But
 if the images are used that are referenced in Engine database, we will
 take a journey into the past.
 
knocking on wood. so far no problems, and i used this way for sure 50 times +
 
in cases where the live merge failed, offline merging worked in another 50%.
those which fail offline, too went back to illegal snap state


  other workaround is through exporting the image onto a nfs export
  domain, here you can tell the engine to not export snapshots. after
  re-importing everything is fine
  the snapshot feature (live at least) should be avoided at all
  currently simply not reliable enaugh.
  your way works, too. already did that, even it was a pita to figure out
  where to find what. this symlinking mess between /rhev /dev and
  /var/lib/libvirt is really awesome. not.
   Jan Siml js...@plusline.net hat am 28. August 2015 um 12:56
  geschrieben:
  
  
   Hello,
  
   if no one has an idea how to correct the Disk/Snapshot paths in Engine
   database, I see only one possible way to solve the issue:
  
   Stop the VM and copy image/meta files target storage to source storage
   (the one where Engine thinks the files are located). Start the VM.
  
   Any concerns regarding this procedure? But I still hope that someone
   from oVirt team can give an advice how to correct the database entries.
   If necessary I would open a bug in Bugzilla.
  
   Kind regards
  
   Jan Siml
  
after a failed live storage migration (cause unknown) we have a
snapshot which is undeletable due to its status 'illegal' (as seen
in storage/snapshot tab). I have already found some bugs [1],[2],[3]
regarding this issue, but no way how to solve the issue within oVirt
 3.5.3.
   
I have attached the relevant engine.log snippet. Is there any way to
do a live merge (and therefore delete the snapshot)?
   
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1213157
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1247377 links to [3]
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1247379 (no access)
   
some additional informations. I have checked the images on both
  storages
and verified the disk paths with virsh's dumpxml.
   
a) The images and snapshots are on both storages.
b) The images on source storage aren't used. (modification time)
c) The images on target storage are used. (modification time)
d) virsh -r dumpxml tells me disk images are located on _target_
  storage.
e) Admin interface tells me, that images and snapshot are located on
_source_ storage, which isn't true, see b), c) and d).
   
What can we do, to solve this issue? Is this to be corrected in
  database
only?

 Kind regards

 Jan Siml___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Delete snapshot with status illegal - live merge not possible

2015-08-28 Thread InterNetX - Juergen Gotteswinter

 Jan Siml js...@plusline.net hat am 28. August 2015 um 16:47 geschrieben:


 Hello,

got exactly the same issue, with all nice side effects like performance
degradation. Until now i was not able to fix this, or to fool the
  engine
somehow that it whould show the image as ok again and give me a 2nd
chance to drop the snapshot.
in some cases this procedure helped (needs 2nd storage domain)
- image live migration to a different storage domain (check which
combinations are supported, iscsi - nfs domain seems unsupported.
  iscsi
- iscsi works)
- snapshot went into ok state, and in ~50% i was able to drop the
snapshot than. space had been reclaimed, so seems like this worked
  
   okay, seems interesting. But I'm afraid of not knowing which image files
   Engine uses when live migration is demanded. If Engine uses the ones
   which are actually used and updates the database afterwards -- fine. But
   if the images are used that are referenced in Engine database, we will
   take a journey into the past.
  knocking on wood. so far no problems, and i used this way for sure 50
  times +

 This doesn't work. Engine creates the snapshots on wrong storage (old)
 and this process fails, cause the VM (qemu process) uses the images on
 other storage (new).
 
sounds like there are some other problems in your case, wrong db entries image
- snapshot? i didnt investigate further in the vm which failed this process, i
directly went further and exported them


  in cases where the live merge failed, offline merging worked in another
  50%. those which fail offline, too went back to illegal snap state

 I fear offline merge would cause data corruption. Because if I shut down
 the VM, the information in Engine database is still wrong. Engine thinks
 image files and snapshots are on old storage. But VM has written to the
 equal named image files on new storage. And offline merge might use the
 old files on old storage.
 
than your initial plan is an alternative. you use thin or raw on what kind of
storage domain? but like said, manually processing is a pita due to the symlink
mess.


other workaround is through exporting the image onto a nfs export
domain, here you can tell the engine to not export snapshots. after
re-importing everything is fine

 Same issue as with offline merge.

 Meanwhile I think, we need to shut down the VM, copy the image files
 from one storage (qemu has used before) to the other storage (the one
 Engine expects) and pray while starting the VM again.

the snapshot feature (live at least) should be avoided at all
currently simply not reliable enaugh.
your way works, too. already did that, even it was a pita to figure out
where to find what. this symlinking mess between /rhev /dev and
/var/lib/libvirt is really awesome. not.
 Jan Siml js...@plusline.net hat am 28. August 2015 um 12:56
geschrieben:


 Hello,

 if no one has an idea how to correct the Disk/Snapshot paths in
  Engine
 database, I see only one possible way to solve the issue:

 Stop the VM and copy image/meta files target storage to source
  storage
 (the one where Engine thinks the files are located). Start the VM.

 Any concerns regarding this procedure? But I still hope that someone
 from oVirt team can give an advice how to correct the database
  entries.
 If necessary I would open a bug in Bugzilla.

 Kind regards

 Jan Siml

  after a failed live storage migration (cause unknown) we have a
  snapshot which is undeletable due to its status 'illegal' (as seen
  in storage/snapshot tab). I have already found some bugs
  [1],[2],[3]
  regarding this issue, but no way how to solve the issue within
  oVirt
   3.5.3.
 
  I have attached the relevant engine.log snippet. Is there any
  way to
  do a live merge (and therefore delete the snapshot)?
 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1213157
  [2] https://bugzilla.redhat.com/show_bug.cgi?id=1247377 links
  to [3]
  [3] https://bugzilla.redhat.com/show_bug.cgi?id=1247379 (no
  access)
 
  some additional informations. I have checked the images on both
storages
  and verified the disk paths with virsh's dumpxml.
 
  a) The images and snapshots are on both storages.
  b) The images on source storage aren't used. (modification time)
  c) The images on target storage are used. (modification time)
  d) virsh -r dumpxml tells me disk images are located on _target_
storage.
  e) Admin interface tells me, that images and snapshot are
  located on
  _source_ storage, which isn't true, see b), c) and d).
 
  What can we do, to solve this issue? Is this to be corrected in
database
  only?

 Kind regards

 Jan Siml___
Users mailing list
Users@ovirt.org

Re: [ovirt-users] Delete snapshot with status illegal - live merge not possible

2015-08-28 Thread InterNetX - Juergen Gotteswinter

 Jan Siml js...@plusline.net hat am 28. August 2015 um 19:52 geschrieben:


 Hello,

 got exactly the same issue, with all nice side effects like
  performance
 degradation. Until now i was not able to fix this, or to fool the
   engine
 somehow that it whould show the image as ok again and give me a 2nd
 chance to drop the snapshot.
 in some cases this procedure helped (needs 2nd storage domain)
 - image live migration to a different storage domain (check which
 combinations are supported, iscsi - nfs domain seems unsupported.
   iscsi
 - iscsi works)
 - snapshot went into ok state, and in ~50% i was able to drop the
 snapshot than. space had been reclaimed, so seems like this worked
   
okay, seems interesting. But I'm afraid of not knowing which image
  files
Engine uses when live migration is demanded. If Engine uses the ones
which are actually used and updates the database afterwards --
  fine. But
if the images are used that are referenced in Engine database, we will
take a journey into the past.
   knocking on wood. so far no problems, and i used this way for sure 50
   times +
 
  This doesn't work. Engine creates the snapshots on wrong storage (old)
  and this process fails, cause the VM (qemu process) uses the images on
  other storage (new).
 
  sounds like there are some other problems in your case, wrong db entries
  image - snapshot? i didnt investigate further in the vm which failed
  this process, i directly went further and exported them

 Yes, engine thinks image and snapshot are on storage a, but qemu process
 uses equal named images on storage b.

 It seems to me, that first live storage migration was successful on qemu
 level, but engine hasn't updated the database entries.

 Seems to be a possible solution to correct the database entries, but I'm
 not familar with the oVirt schema and won't even try it without an
 advice from oVirt developers.
 
   in cases where the live merge failed, offline merging worked in another
   50%. those which fail offline, too went back to illegal snap state
 
  I fear offline merge would cause data corruption. Because if I shut down
  the VM, the information in Engine database is still wrong. Engine thinks
  image files and snapshots are on old storage. But VM has written to the
  equal named image files on new storage. And offline merge might use the
  old files on old storage.
 
  than your initial plan is an alternative. you use thin or raw on what
  kind of storage domain? but like said, manually processing is a pita due
  to the symlink mess.

 We are using raw images which are thin provisioned on NFS based storage
 domains. On storage b I can see an qcow formatted image file which qemu
 uses and the original (raw) image which is now backing file.

 
might sound a little bit curious, but imho this is the best setup for your plan.
thin on iscsi is an totally different story... lvm volumes which get extended on
demand (which fails with default settings during heavy writes, and causes vm to
pause), additionally ovirt writes qcows images raw onto those lv volumes. since
you can get your hands directly on the images this whould be my prefered
workaround. but maybe one of the ovirt devs got a better idea/solution?
 
 other workaround is through exporting the image onto a nfs export
 domain, here you can tell the engine to not export snapshots. after
 re-importing everything is fine
 
  Same issue as with offline merge.
 
  Meanwhile I think, we need to shut down the VM, copy the image files
  from one storage (qemu has used before) to the other storage (the one
  Engine expects) and pray while starting the VM again.

 the snapshot feature (live at least) should be avoided at all
 currently simply not reliable enaugh.
 your way works, too. already did that, even it was a pita to
  figure out
 where to find what. this symlinking mess between /rhev /dev and
 /var/lib/libvirt is really awesome. not.
  Jan Siml js...@plusline.net hat am 28. August 2015 um 12:56
 geschrieben:
 
 
  Hello,
 
  if no one has an idea how to correct the Disk/Snapshot paths in
   Engine
  database, I see only one possible way to solve the issue:
 
  Stop the VM and copy image/meta files target storage to source
   storage
  (the one where Engine thinks the files are located). Start the VM.
 
  Any concerns regarding this procedure? But I still hope that
  someone
  from oVirt team can give an advice how to correct the database
   entries.
  If necessary I would open a bug in Bugzilla.
 
  Kind regards
 
  Jan Siml
 
   after a failed live storage migration (cause unknown) we have a
   snapshot which is undeletable due to its status 'illegal'
  (as seen
   in storage/snapshot tab). I have already found some bugs
   [1],[2],[3]
   regarding this issue, but no way how to solve the issue within
  

Re: [ovirt-users] vm status

2015-08-25 Thread InterNetX - Juergen Gotteswinter
if the vm is really down, you whould find this helpful

[root@portal dbscripts]# ./unlock_entity.sh -h
Usage: ./unlock_entity.sh [options] [ENTITIES]

-h- This help text.
-v- Turn on verbosity (WARNING:
lots of output)
-l LOGFILE- The logfile for capturing output  (def. )
-s HOST   - The database servername for the database  (def.
localhost)
-p PORT   - The database port for the database(def. 5432)
-u USER   - The username for the database (def. engine)
-d DATABASE   - The database name (def. engine)
-t TYPE   - The object type {vm | template | disk | snapshot}
-r- Recursive, unlocks all disks under the selected
vm/template.
-q- Query db and display a list of the locked entites.
ENTITIES  - The list of object names in case of vm/template,
UUIDs in case of a disk

NOTE: This utility access the database and should have the
  corresponding credentals.

  In case that a password is used to access the database PGPASSWORD
  or PGPASSFILE should be set.

Example:
$ PGPASSWORD=xx ./unlock_entity.sh -t disk -q

[root@portal dbscripts]# pwd
/usr/share/ovirt-engine/dbscripts
[root@portal dbscripts]#


Am 25.08.2015 um 07:33 schrieb Michael Wagenknecht:
 Hi all,
 how can I mark a VM as shut down when the status of the VM is ??
 There is no qemu-kvm process for this VM on the node.
 But I can't set the node to maintenance mode, because there is a very
 important VM running.
 
 Best Regards,
 Michael
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problem with Mac Spoof Filter

2015-07-16 Thread InterNetX - Juergen Gotteswinter
Hi,

seems like the Setting EnableMACAntiSpoofingFilterRules only applies to
the main IP of a VM, additional IP Adresses on Alias Interfaces (eth0:x)
are not included in the generated ebtables ruleset.

Is there any Workaround / Setting / whatever to allow more than one IP
without completly disabling this Filter?

Thanks,

Juergen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problem with VM Export

2015-06-02 Thread InterNetX - Juergen Gotteswinter
Hi,

when trying to export a VM to the NFS Export Domain, the Process fails
nearly immediatly. VDSM drops this:

-- snip --
::Request was made in '/usr/share/vdsm/storage/sp.py' line '1549' at
'moveImage'
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,190::resourceManager::542::Storage.ResourceManager::(registerResource)
Trying to register resource
'8f5f59f9-b3d5-4e13-9c56-b4d33475b277_imageNS.244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3'
for lock type 'shared'
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,191::lvm::428::Storage.OperationMutex::(_reloadlvs) Operation
'lvm reload operation' got the operation mutex
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,192::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n
/sbin/lvm lvs --config ' devices { preferred_names = [^/dev/mapper/]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [
'\''a|/dev/mapper/3600144f0db35bc65534be67e0001|'\'', '\''r|.*|'\''
] }  global {  locking_type=1  prioritise_write_locks=1
wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|'
--ignoreskippedcluster -o
uuid,name,vg_name,attr,size,seg_start_pe,devices,tags
8f5f59f9-b3d5-4e13-9c56-b4d33475b277 (cwd None)
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,480::lvm::291::Storage.Misc.excCmd::(cmd) SUCCESS: err = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n'; rc = 0
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,519::lvm::463::Storage.LVM::(_reloadlvs) lvs reloaded
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,519::lvm::463::Storage.OperationMutex::(_reloadlvs) Operation
'lvm reload operation' released the operation mutex
0a3c909d-0737-492e-a47c-bc0ab5e1a603::ERROR::2015-06-02
20:36:38,520::blockVolume::429::Storage.Volume::(validateImagePath)
Unexpected error
Traceback (most recent call last):
  File /usr/share/vdsm/storage/blockVolume.py, line 427, in
validateImagePath
os.mkdir(imageDir, 0o755)
OSError: [Errno 17] File exists:
'/rhev/data-center/cfc84aa8-8ec4-4e13-8104-370ea5b9d432/8f5f59f9-b3d5-4e13-9c56-b4d33475b277/images/244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3'
0a3c909d-0737-492e-a47c-bc0ab5e1a603::WARNING::2015-06-02
20:36:38,521::resourceManager::591::Storage.ResourceManager::(registerResource)
Resource factory failed to create resource
'8f5f59f9-b3d5-4e13-9c56-b4d33475b277_imageNS.244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3'.
Canceling request.
Traceback (most recent call last):
  File /usr/share/vdsm/storage/resourceManager.py, line 587, in
registerResource
obj = namespaceObj.factory.createResource(name, lockType)
  File /usr/share/vdsm/storage/resourceFactories.py, line 193, in
createResource
lockType)
  File /usr/share/vdsm/storage/resourceFactories.py, line 122, in
__getResourceCandidatesList
imgUUID=resourceName)
  File /usr/share/vdsm/storage/image.py, line 185, in getChain
srcVol = volclass(self.repoPath, sdUUID, imgUUID, uuidlist[0])
  File /usr/share/vdsm/storage/blockVolume.py, line 80, in __init__
volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File /usr/share/vdsm/storage/volume.py, line 144, in __init__
self.validate()
  File /usr/share/vdsm/storage/blockVolume.py, line 89, in validate
volume.Volume.validate(self)
  File /usr/share/vdsm/storage/volume.py, line 156, in validate
self.validateImagePath()
  File /usr/share/vdsm/storage/blockVolume.py, line 430, in
validateImagePath
raise se.ImagePathError(imageDir)
ImagePathError: Image path does not exist or cannot be accessed/created:
('/rhev/data-center/cfc84aa8-8ec4-4e13-8104-370ea5b9d432/8f5f59f9-b3d5-4e13-9c56-b4d33475b277/images/244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3',)
0a3c909d-0737-492e-a47c-bc0ab5e1a603::DEBUG::2015-06-02
20:36:38,522::resourceManager::210::Storage.ResourceManager.Request::(cancel)
ResName=`8f5f59f9-b3d5-4e13-9c56-b4d33475b277_imageNS.244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3`ReqID=`6f861566-b1c9-45c7-9181-452b0bc014d0`::Canceled
request
0a3c909d-0737-492e-a47c-bc0ab5e1a603::WARNING::2015-06-02
20:36:38,523::resourceManager::203::Storage.ResourceManager.Request::(cancel)
ResName=`8f5f59f9-b3d5-4e13-9c56-b4d33475b277_imageNS.244747b4-1b3d-4a9c-8fd9-3a914e6f2bc3`ReqID=`6f861566-b1c9-45c7-9181-452b0bc014d0`::Tried
to cancel a processed request
0a3c909d-0737-492e-a47c-bc0ab5e1a603::ERROR::2015-06-02
20:36:38,523::task::866::Storage.TaskManager.Task::(_setError)
Task=`0a3c909d-0737-492e-a47c-bc0ab5e1a603`::Unexpected error
Traceback (most recent call last):
  File /usr/share/vdsm/storage/task.py, line 873, in _run
return fn(*args, **kargs)
  File /usr/share/vdsm/storage/task.py, line 334, in run
return self.cmd(*self.argslist, **self.argsdict)
  File /usr/share/vdsm/storage/securable.py, line 77, in wrapper
return method(self, *args, **kwargs)
  File 

Re: [ovirt-users] metadata not found

2015-04-23 Thread InterNetX - Juergen Gotteswinter
We see this, too. It appeared (if i didnt miss something) either with
el7 + ovirt 3.5 or with ovirt 3.5 no matter if its el6 or 7. Doesnt seem
to have any further impact so far.

Am 22.04.2015 um 17:58 schrieb Kapetanakis Giannis:
 Hi,
 
 Any idea what this means?
 vdsm.log:
 Thread-65::DEBUG::2015-04-22
 18:57:32,138::libvirtconnection::143::root::(wrapper) Unknown
 libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found:
 Requested metadata element is not present
 
 I don't know if it's related with this messages from syslog:
 Apr 22 18:36:23 v2 vdsm vm.Vm WARNING
 vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device:
 '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address':
 {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}'
 found
 Apr 22 18:36:23 v2 vdsm vm.Vm WARNING
 vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device:
 '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address':
 {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}'
 found
 Apr 22 18:36:23 v2 vdsm vm.Vm WARNING
 vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device:
 '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel',
 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial',
 'port': '3'}}' found
 Apr 22 18:36:24 v2 vdsm vm.Vm ERROR
 vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Alias not found for device
 type graphics during migration at destination host
 
 regards,
 
 Giannis
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-22 Thread InterNetX - Juergen Gotteswinter
i expect that you are aware of the fact that you only get the write
performance of a single disk in that configuration? i whould drop that
pool configuration, drop the spare drives and go for a mirror pool.

Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar:
   pool: z2pool
  state: ONLINE
  scan: scrub canceled on Sun Apr 12 16:33:38 2015
 config:
 
 NAME   STATE READ WRITE CKSUM
 z2pool ONLINE   0 0 0
   raidz1-0 ONLINE   0 0 0
 c0t5000C5004172A87Bd0  ONLINE   0 0 0
 c0t5000C50041A59027d0  ONLINE   0 0 0
 c0t5000C50041A592AFd0  ONLINE   0 0 0
 c0t5000C50041A660D7d0  ONLINE   0 0 0
 c0t5000C50041A69223d0  ONLINE   0 0 0
 c0t5000C50041A6ADF3d0  ONLINE   0 0 0
 logs
   c0t5001517BB2845595d0ONLINE   0 0 0
 cache
   c0t5001517BB2847892d0ONLINE   0 0 0
 spares
   c0t5000C50041A6B737d0AVAIL
   c0t5000C50041AC3F07d0AVAIL
   c0t5000C50041AD48DBd0AVAIL
   c0t5000C50041ADD727d0AVAIL
 
 errors: No known data errors
 
 
 On 04/22/2015 11:17 AM, Karli Sjöberg wrote:
 On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
 Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
 is on the default setting (standard) so sync is on.
 # zpool status ?

 /K

 When the issue happens oVirt event viewer shows indeed latency warnings.
 Not always but most of the time this will be followed by an i/o storage
 error linked to random VMs and they will be paused when that happens.

 All the nodes use mode 4 bonding. The interfaces on the nodes don't show
 any drops or errors, i checked 2 of the VMs that got paused the last
 time it happened they have dropped packets on their interfaces.

 We don't have a subscription with nexenta (anymore).

 On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
 Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
 Hi Juergen,

 The load on the nodes rises far over 200 during the event. Load on
 the
 nexenta stays normal and nothing strange in the logging.
 ZFS + NFS could be still the root of this. Your Pool Configuration is
 RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
 Subvolume which gets exported is kept default on standard ?

 http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html


 Since Ovirt acts very sensible about Storage Latency (throws VM into
 unresponsive or unknown state) it might be worth a try to do zfs set
 sync=disabled pool/volume to see if this changes things. But be aware
 that this makes the NFS Export vuln. against dataloss in case of
 powerloss etc, comparable to async NFS in Linux.

 If disabling the sync setting helps, and you dont use a seperate ZIL
 Flash Drive yet - this whould be very likely help to get rid of this.

 Also, if you run a subscribed Version of Nexenta it might be helpful to
 involve them.

 Do you see any messages about high latency in the Ovirt Events Panel?

 For our storage interfaces on our nodes we use bonding in mode 4
 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
 This should be fine, as long as no Node uses Mode0 / Round Robin which
 whould lead to out of order TCP Packets. The Interfaces themself dont
 show any Drops or Errors - on the VM Hosts as well as on the Switch
 itself?

 Jumbo Frames?

 Kind regards,

 Maikel


 On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
 Hi,

 how about Load, Latency, strange dmesg messages on the Nexenta ?
 You are
 using bonded Gbit Networking? If yes, which mode?

 Cheers,

 Juergen

 Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
 Hi,

 We are running ovirt 3.5.1 with 3 nodes and seperate engine.

 All on CentOS 6.6:
 3 x nodes
 1 x engine

 1 x storage nexenta with NFS

 For multiple weeks we are experiencing issues of our nodes that
 cannot
 access the storage at random moments (atleast thats what the nodes
 think).

 When the nodes are complaining about a unavailable storage then
 the load
 rises up to +200 on all three nodes, this causes that all running
 VMs
 are unaccessible. During this process oVirt event viewer shows
 some i/o
 storage error messages, when this happens random VMs get paused
 and will
 not be resumed anymore (this almost happens every time but not
 all the
 VMs get paused).

 During the event we tested the accessibility from the nodes to the
 storage and it looks like it is working normal, at least we can do a
 normal
 ls on the storage without any delay of showing the contents.

 We tried multiple things that we thought it causes this issue but
 nothing worked so far.
 * rebooting storage / nodes / engine.
 * disabling offsite rsync backups.
 * moved the biggest VMs with highest load

Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-22 Thread InterNetX - Juergen Gotteswinter
you got 4 spare disks, and can take out one of your raidz to create a
temp. parallel existing pool. zfs send/receive to migrate the data, this
shouldnt take much time if you are not using huge drives?

Am 22.04.2015 um 11:54 schrieb Maikel vd Mosselaar:
 Yes we are aware of that, problem is it's running production so not very
 easy to change the pool.
 
 On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote:
 i expect that you are aware of the fact that you only get the write
 performance of a single disk in that configuration? i whould drop that
 pool configuration, drop the spare drives and go for a mirror pool.

 Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar:
pool: z2pool
   state: ONLINE
   scan: scrub canceled on Sun Apr 12 16:33:38 2015
 config:

  NAME   STATE READ WRITE CKSUM
  z2pool ONLINE   0 0 0
raidz1-0 ONLINE   0 0 0
  c0t5000C5004172A87Bd0  ONLINE   0 0 0
  c0t5000C50041A59027d0  ONLINE   0 0 0
  c0t5000C50041A592AFd0  ONLINE   0 0 0
  c0t5000C50041A660D7d0  ONLINE   0 0 0
  c0t5000C50041A69223d0  ONLINE   0 0 0
  c0t5000C50041A6ADF3d0  ONLINE   0 0 0
  logs
c0t5001517BB2845595d0ONLINE   0 0 0
  cache
c0t5001517BB2847892d0ONLINE   0 0 0
  spares
c0t5000C50041A6B737d0AVAIL
c0t5000C50041AC3F07d0AVAIL
c0t5000C50041AD48DBd0AVAIL
c0t5000C50041ADD727d0AVAIL

 errors: No known data errors


 On 04/22/2015 11:17 AM, Karli Sjöberg wrote:
 On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
 Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
 is on the default setting (standard) so sync is on.
 # zpool status ?

 /K

 When the issue happens oVirt event viewer shows indeed latency
 warnings.
 Not always but most of the time this will be followed by an i/o
 storage
 error linked to random VMs and they will be paused when that happens.

 All the nodes use mode 4 bonding. The interfaces on the nodes don't
 show
 any drops or errors, i checked 2 of the VMs that got paused the last
 time it happened they have dropped packets on their interfaces.

 We don't have a subscription with nexenta (anymore).

 On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
 Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
 Hi Juergen,

 The load on the nodes rises far over 200 during the event. Load on
 the
 nexenta stays normal and nothing strange in the logging.
 ZFS + NFS could be still the root of this. Your Pool Configuration is
 RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
 Subvolume which gets exported is kept default on standard ?

 http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html



 Since Ovirt acts very sensible about Storage Latency (throws VM into
 unresponsive or unknown state) it might be worth a try to do zfs set
 sync=disabled pool/volume to see if this changes things. But be
 aware
 that this makes the NFS Export vuln. against dataloss in case of
 powerloss etc, comparable to async NFS in Linux.

 If disabling the sync setting helps, and you dont use a seperate ZIL
 Flash Drive yet - this whould be very likely help to get rid of
 this.

 Also, if you run a subscribed Version of Nexenta it might be
 helpful to
 involve them.

 Do you see any messages about high latency in the Ovirt Events Panel?

 For our storage interfaces on our nodes we use bonding in mode 4
 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
 This should be fine, as long as no Node uses Mode0 / Round Robin
 which
 whould lead to out of order TCP Packets. The Interfaces themself dont
 show any Drops or Errors - on the VM Hosts as well as on the Switch
 itself?

 Jumbo Frames?

 Kind regards,

 Maikel


 On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
 Hi,

 how about Load, Latency, strange dmesg messages on the Nexenta ?
 You are
 using bonded Gbit Networking? If yes, which mode?

 Cheers,

 Juergen

 Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
 Hi,

 We are running ovirt 3.5.1 with 3 nodes and seperate engine.

 All on CentOS 6.6:
 3 x nodes
 1 x engine

 1 x storage nexenta with NFS

 For multiple weeks we are experiencing issues of our nodes that
 cannot
 access the storage at random moments (atleast thats what the nodes
 think).

 When the nodes are complaining about a unavailable storage then
 the load
 rises up to +200 on all three nodes, this causes that all running
 VMs
 are unaccessible. During this process oVirt event viewer shows
 some i/o
 storage error messages, when this happens random VMs get paused
 and will
 not be resumed anymore (this almost happens every time

Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-22 Thread InterNetX - Juergen Gotteswinter
Am 22.04.2015 um 11:12 schrieb Maikel vd Mosselaar:
 
 Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
 is on the default setting (standard) so sync is on.

for testing, i whould give zfs set sync=disabled pool/vol a shot. but as
i already said, thats nothing you should keep for production.

what i had in the past, too: the filer saturated the max lockd/nfs
processes (which are quite low in their default setting, dont worry to
push the nfs threads up to 512+. same goes for lockd)

to get your current values

sharectl get nfs

for example, one of my files which is pretty heavy hammered most of the
time through nfs uses this settings

servers=1024
lockd_listen_backlog=32
lockd_servers=1024
lockd_retransmit_timeout=5
grace_period=90
server_versmin=2
server_versmax=3
client_versmin=2
client_versmax=4
server_delegation=on
nfsmapid_domain=
max_connections=-1
protocol=ALL
listen_backlog=32
device=
mountd_listen_backlog=64
mountd_max_threads=16



to change them, use sharectl or throw it into /etc/system


set rpcmod:clnt_max_conns = 8
set rpcmod:maxdupreqs=8192
set rpcmod:cotsmaxdupreqs=8192


set nfs:nfs3_max_threads=1024
set nfs:nfs3_nra=128
set nfs:nfs3_bsize=1048576
set nfs:nfs3_max_transfer_size=1048576

- reboot

 
 When the issue happens oVirt event viewer shows indeed latency warnings.
 Not always but most of the time this will be followed by an i/o storage
 error linked to random VMs and they will be paused when that happens.
 
 All the nodes use mode 4 bonding. The interfaces on the nodes don't show
 any drops or errors, i checked 2 of the VMs that got paused the last
 time it happened they have dropped packets on their interfaces.
 
 We don't have a subscription with nexenta (anymore).
 
 On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
 Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
 Hi Juergen,

 The load on the nodes rises far over 200 during the event. Load on the
 nexenta stays normal and nothing strange in the logging.
 ZFS + NFS could be still the root of this. Your Pool Configuration is
 RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
 Subvolume which gets exported is kept default on standard ?

 http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html


 Since Ovirt acts very sensible about Storage Latency (throws VM into
 unresponsive or unknown state) it might be worth a try to do zfs set
 sync=disabled pool/volume to see if this changes things. But be aware
 that this makes the NFS Export vuln. against dataloss in case of
 powerloss etc, comparable to async NFS in Linux.

 If disabling the sync setting helps, and you dont use a seperate ZIL
 Flash Drive yet - this whould be very likely help to get rid of this.

 Also, if you run a subscribed Version of Nexenta it might be helpful to
 involve them.

 Do you see any messages about high latency in the Ovirt Events Panel?

 For our storage interfaces on our nodes we use bonding in mode 4
 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
 This should be fine, as long as no Node uses Mode0 / Round Robin which
 whould lead to out of order TCP Packets. The Interfaces themself dont
 show any Drops or Errors - on the VM Hosts as well as on the Switch
 itself?

 Jumbo Frames?

 Kind regards,

 Maikel


 On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
 Hi,

 how about Load, Latency, strange dmesg messages on the Nexenta ? You
 are
 using bonded Gbit Networking? If yes, which mode?

 Cheers,

 Juergen

 Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
 Hi,

 We are running ovirt 3.5.1 with 3 nodes and seperate engine.

 All on CentOS 6.6:
 3 x nodes
 1 x engine

 1 x storage nexenta with NFS

 For multiple weeks we are experiencing issues of our nodes that cannot
 access the storage at random moments (atleast thats what the nodes
 think).

 When the nodes are complaining about a unavailable storage then the
 load
 rises up to +200 on all three nodes, this causes that all running VMs
 are unaccessible. During this process oVirt event viewer shows some
 i/o
 storage error messages, when this happens random VMs get paused and
 will
 not be resumed anymore (this almost happens every time but not all the
 VMs get paused).

 During the event we tested the accessibility from the nodes to the
 storage and it looks like it is working normal, at least we can do a
 normal
 ls on the storage without any delay of showing the contents.

 We tried multiple things that we thought it causes this issue but
 nothing worked so far.
 * rebooting storage / nodes / engine.
 * disabling offsite rsync backups.
 * moved the biggest VMs with highest load to different platform
 outside
 of oVirt.
 * checked the wsize and rsize on the nfs mounts, storage and nodes are
 correct according to the NFS troubleshooting page on ovirt.org.

 The environment is running in production so we are not free to test
 everything.

 I can provide log files if needed

Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-21 Thread InterNetX - Juergen Gotteswinter


Am 21.04.2015 um 16:09 schrieb Maikel vd Mosselaar:
 Hi Fred,
 
 
 This is one of the nodes from yesterday around 01:00 (20-04-15). The
 issue started around 01:00.
 https://bpaste.net/raw/67542540a106
 
 The VDSM logs are very big so i am unable to paste a bigger part of the
 logfile, i don't know what the maximum allowed attachment size is of the
 mailing list?
 
 dmesg on the one the nodes (despite this message the storage is still
 accessible):
 https://bpaste.net/raw/67da167aa300
 
Flaky Network? NFS / Lockd Processes saturated @ Nexenta?

 
 
 Kind regards,
 
 Maikel
 
 On 04/21/2015 02:32 PM, Fred Rolland wrote:
 Hi,

 Can you please attach VDSM logs ?

 Thanks,

 Fred

 - Original Message -
 From: Maikel vd Mosselaar m.vandemossel...@smoose.nl
 To: users@ovirt.org
 Sent: Monday, April 20, 2015 3:25:38 PM
 Subject: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

 Hi,

 We are running ovirt 3.5.1 with 3 nodes and seperate engine.

 All on CentOS 6.6:
 3 x nodes
 1 x engine

 1 x storage nexenta with NFS

 For multiple weeks we are experiencing issues of our nodes that cannot
 access the storage at random moments (atleast thats what the nodes
 think).

 When the nodes are complaining about a unavailable storage then the load
 rises up to +200 on all three nodes, this causes that all running VMs
 are unaccessible. During this process oVirt event viewer shows some i/o
 storage error messages, when this happens random VMs get paused and will
 not be resumed anymore (this almost happens every time but not all the
 VMs get paused).

 During the event we tested the accessibility from the nodes to the
 storage and it looks like it is working normal, at least we can do a
 normal
 ls on the storage without any delay of showing the contents.

 We tried multiple things that we thought it causes this issue but
 nothing worked so far.
 * rebooting storage / nodes / engine.
 * disabling offsite rsync backups.
 * moved the biggest VMs with highest load to different platform outside
 of oVirt.
 * checked the wsize and rsize on the nfs mounts, storage and nodes are
 correct according to the NFS troubleshooting page on ovirt.org.

 The environment is running in production so we are not free to test
 everything.

 I can provide log files if needed.

 Kind Regards,

 Maikel


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-21 Thread InterNetX - Juergen Gotteswinter
Hi,

how about Load, Latency, strange dmesg messages on the Nexenta ? You are
using bonded Gbit Networking? If yes, which mode?

Cheers,

Juergen

Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
 Hi,
 
 We are running ovirt 3.5.1 with 3 nodes and seperate engine.
 
 All on CentOS 6.6:
 3 x nodes
 1 x engine
 
 1 x storage nexenta with NFS
 
 For multiple weeks we are experiencing issues of our nodes that cannot
 access the storage at random moments (atleast thats what the nodes think).
 
 When the nodes are complaining about a unavailable storage then the load
 rises up to +200 on all three nodes, this causes that all running VMs
 are unaccessible. During this process oVirt event viewer shows some i/o
 storage error messages, when this happens random VMs get paused and will
 not be resumed anymore (this almost happens every time but not all the
 VMs get paused).
 
 During the event we tested the accessibility from the nodes to the
 storage and it looks like it is working normal, at least we can do a normal
 ls on the storage without any delay of showing the contents.
 
 We tried multiple things that we thought it causes this issue but
 nothing worked so far.
 * rebooting storage / nodes / engine.
 * disabling offsite rsync backups.
 * moved the biggest VMs with highest load to different platform outside
 of oVirt.
 * checked the wsize and rsize on the nfs mounts, storage and nodes are
 correct according to the NFS troubleshooting page on ovirt.org.
 
 The environment is running in production so we are not free to test
 everything.
 
 I can provide log files if needed.
 
 Kind Regards,
 
 Maikel
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

2015-04-21 Thread InterNetX - Juergen Gotteswinter
Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
 Hi Juergen,
 
 The load on the nodes rises far over 200 during the event. Load on the
 nexenta stays normal and nothing strange in the logging.

ZFS + NFS could be still the root of this. Your Pool Configuration is
RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
Subvolume which gets exported is kept default on standard ?

http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html

Since Ovirt acts very sensible about Storage Latency (throws VM into
unresponsive or unknown state) it might be worth a try to do zfs set
sync=disabled pool/volume to see if this changes things. But be aware
that this makes the NFS Export vuln. against dataloss in case of
powerloss etc, comparable to async NFS in Linux.

If disabling the sync setting helps, and you dont use a seperate ZIL
Flash Drive yet - this whould be very likely help to get rid of this.

Also, if you run a subscribed Version of Nexenta it might be helpful to
involve them.

Do you see any messages about high latency in the Ovirt Events Panel?

 
 For our storage interfaces on our nodes we use bonding in mode 4
 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.

This should be fine, as long as no Node uses Mode0 / Round Robin which
whould lead to out of order TCP Packets. The Interfaces themself dont
show any Drops or Errors - on the VM Hosts as well as on the Switch itself?

Jumbo Frames?

 
 Kind regards,
 
 Maikel
 
 
 On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
 Hi,

 how about Load, Latency, strange dmesg messages on the Nexenta ? You are
 using bonded Gbit Networking? If yes, which mode?

 Cheers,

 Juergen

 Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
 Hi,

 We are running ovirt 3.5.1 with 3 nodes and seperate engine.

 All on CentOS 6.6:
 3 x nodes
 1 x engine

 1 x storage nexenta with NFS

 For multiple weeks we are experiencing issues of our nodes that cannot
 access the storage at random moments (atleast thats what the nodes
 think).

 When the nodes are complaining about a unavailable storage then the load
 rises up to +200 on all three nodes, this causes that all running VMs
 are unaccessible. During this process oVirt event viewer shows some i/o
 storage error messages, when this happens random VMs get paused and will
 not be resumed anymore (this almost happens every time but not all the
 VMs get paused).

 During the event we tested the accessibility from the nodes to the
 storage and it looks like it is working normal, at least we can do a
 normal
 ls on the storage without any delay of showing the contents.

 We tried multiple things that we thought it causes this issue but
 nothing worked so far.
 * rebooting storage / nodes / engine.
 * disabling offsite rsync backups.
 * moved the biggest VMs with highest load to different platform outside
 of oVirt.
 * checked the wsize and rsize on the nfs mounts, storage and nodes are
 correct according to the NFS troubleshooting page on ovirt.org.

 The environment is running in production so we are not free to test
 everything.

 I can provide log files if needed.

 Kind Regards,

 Maikel


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] manually merging thin provisioned Ovirt Images

2015-04-17 Thread InterNetX - Juergen Gotteswinter
Hi,

accidently, i delete the wrong live snapshot of a Virtual Machine. Which
whouldnt be that big deal, since Filer Snapshots are created every hour.

But... after pulling out the qcow Files of the VM, i am confronted with
several files.

- the biggest one, which is most likely the main part from what i was
able to investigate with qemu-img
- several smaller files, which seem to contain the delta - so i expect
this files be the live snaps of the vm

could anyone  point me into a direction to:

- push the restored main image file + snapshot files back into ovirt (if
its hacky, ok..)

or

- how to find out which snapshot files need to be merged into the main
qcow image to get the latest state

for option 2, importing a single qcow image is imho no big deal
(virt-v2v). but how the heck do i find out in which order / which files
needs to be merged ?

Thanks!

Juergen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Poor iSCSI Performance

2015-02-19 Thread InterNetX - Juergen Gotteswinter
Hi,

already captured some iscsi Traffic of both Situations and compared them?

MTU maybe?


Juergen

Am 18.02.2015 um 02:08 schrieb John Florian:
 Hi all,
 
 I've been trying to resolve a storage performance issue but have had no
 luck in identifying the exact cause.  I have my storage domain on iSCSI
 and I can get the expected performance (limited by the Gbit Ethernet)
 when running bonnie++ on:
 
   * a regular physical machine configured with the iSCSI initiator
 connected to a dedicated iSCSI test target -- thus oVirt and VM
 technology are completely out of the picture
   * my oVirt host with the initiator connected to that same dedicated
 target -- thus I have an iSCSI connection on the oVirt host but I'm
 not using the iSCSI connection provided by oVirt's storage domain
   * a VM (hosted by oVirt) with the initiator (inside the VM) connected
 to that target -- thus bypassing oVirt's storage domain and the
 virtual disk it provides this VM
 
 However, if I just use a regular virtual disk via oVirt's storage domain
 the performance is much worse.  I've tried both VirtIO and VirtIO-SCSI
 and have found no appreciable difference.
 
 Here's a typical example of the poor performance I get (as tested with
 bonnie++) with the normal virtual disk setup:
 
 # bonnie++ -d . -r 2048 -u root:root
 snip
 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
 /sec %CP
 narvi-f21.double 4G   806  91 18507   1 15675   1  3174  56 33175   1
 176.4   3
 Latency 15533us8142ms2440ms 262ms1289ms
 780ms
 Version  1.96   --Sequential Create-- Random
 Create
 narvi-f21.doubledog -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
 /sec %CP
  16 13641  24 + +++ 22702  17 18919  31 + +++
 + +++
 Latency 27724us 247us 292us  71us  30us
 172us
 
 For comparison, here's what I see if I run the same test, same VM, same
 host but this time the file system is mounted from a device obtained
 using iscsi-initiator-utils within the VM, i.e., the 3rd bullet config
 above:
 
 bonnie++ -d . -r 2048 -u root:root
 snip
 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
 /sec %CP
 narvi-f21.double 4G  2051  89 103877   4 36286   3  4803  88 88166   4
 163.6   3
 Latency  7724us 191ms 396ms   48734us   73004us   
 1645ms
 Version  1.96   --Sequential Create-- Random
 Create
 narvi-f21.doubledog -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
 /sec %CP
  16  6531  18 + +++ 16388  20  5924  15 + +++
 17906  23
 Latency 15623us  64us  92us1281us  14us
 256us
 
 My host is Fedora 20 running oVirt 3.5 (hosted-engine).  VM is running
 Fedora Server 21.  Tonight I tried updating the host with the Fedora
 virt preview repo and I didn't see any significant change in the
 performance.  Where should I look next?
 
 
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] RHEV 3.5 problem with connecting rhev-m to nodes

2015-02-19 Thread InterNetX - Juergen Gotteswinter
RHEV or Ovirt? If RHEV, i whould suggest contacting the RH Support?

Am 19.02.2015 um 11:46 schrieb Jakub Bittner:
 Hello,
 
 after restart we had problem with manager to connect to nodes. All nodes
 are down and not reachable. When I try to remove and readd to engine it
 fails too.
 
 10 days ago we changed SSL certs of rhev-m apache to IPA's generated.
 
 We have this in logs:
 
 2015-02-19 11:41:33,491 ERROR
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-30) Failure to refresh Vds runtime info:
 org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
 java.io.EOFException: SSL peer shut down incorrectly
 at
 org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:126)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31)
 [dal.jar:]
 at
 org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:571)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:648)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:494)
 [vdsbroker.jar:]
 at
 org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:236)
 [vdsbroker.jar:]
 at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
 [:1.7.0_75]
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [rt.jar:1.7.0_75]
 at java.lang.reflect.Method.invoke(Method.java:606)
 [rt.jar:1.7.0_75]
 at
 org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60)
 [scheduler.jar:]
 at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
 [quartz.jar:]
 at
 org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
 [quartz.jar:]
 Caused by: java.io.EOFException: SSL peer shut down incorrectly
 at sun.security.ssl.InputRecord.read(InputRecord.java:482)
 [jsse.jar:1.7.0_75]
 at
 sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
 [jsse.jar:1.7.0_75]
 at
 sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1332)
 [jsse.jar:1.7.0_75]
 at
 sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:709)
 [jsse.jar:1.7.0_75]
 at
 sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
 [jsse.jar:1.7.0_75]
 at
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
 [rt.jar:1.7.0_75]
 at
 java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
 [rt.jar:1.7.0_75]
 at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
 [rt.jar:1.7.0_75]
 at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
 [rt.jar:1.7.0_75]
 at
 org.apache.xmlrpc.client.XmlRpcCommonsTransport$1$1.close(XmlRpcCommonsTransport.java:204)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.xmlrpc.client.XmlRpcHttpTransport$ByteArrayReqWriter.write(XmlRpcHttpTransport.java:55)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.xmlrpc.client.XmlRpcCommonsTransport$1.writeRequest(XmlRpcCommonsTransport.java:214)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 [commons-httpclient.jar:]
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 [commons-httpclient.jar:]
 at
 org.apache.xmlrpc.client.XmlRpcCommonsTransport.writeRequest(XmlRpcCommonsTransport.java:227)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.xmlrpc.client.XmlRpcStreamTransport.sendRequest(XmlRpcStreamTransport.java:151)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.xmlrpc.client.XmlRpcHttpTransport.sendRequest(XmlRpcHttpTransport.java:143)
 [xmlrpc-client.jar:3.1.3]
 at
 org.apache.xmlrpc.client.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:56)
 [xmlrpc-client.jar:3.1.3]
 at
 

Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2015-01-20 Thread InterNetX - Juergen Gotteswinter
So, did the dirty Way and it was successful :)

Thanks to everyone who helped me out, great community!

Cheers,

Juergen

Am 29.12.2014 um 10:28 schrieb InterNetX - Juergen Gotteswinter:
 Hello both of you,
 
 thanks for your detailed explainations and support, still thinking which
 way i will go. tending to try the dirty way in a lab setup before to see
 what happens.
 
 Will post updates when i got more :)
 
 Cheers,
 
 Juergen
 
 It seems that somebody had deleted manually the constraint
 fk_event_subscriber_event_notification_methods from your database
 Therefor, the first line that attempts to drop this constraint in
 03_05_0050_event_notification_methods.sql:  ALTER TABLE event_subscriber
 DROP CONSTRAINT fk_event_subscriber_event_notification_methods;
 fails.

 uhm, interesting. could this be caused be deinstallation of dwh
 reporting?

 How exactly did you do that?


 very good question, thats a few months ago. i whould guess with rpm -e
 before an engine upgrade (if i remember correctly there was one ovirt
 release where dwh was missing for el6).


 Note that partial cleanup is not supported yet [1].

 checking right after that mail :)


 Can you please post all of /var/log/ovirt-engine/setup/* ?

 sure, sending you the dl link in a private mail. since i am not sure if
 i sed´ed out all private things

 Based on these logs, it seems to me that:

 1. At some point you upgraded to a snapshot of master (then-3.4), installing
 ovirt-engine-3.4.0-0.12.master.20140228075627.el6.

 2. This package had an older version of the script
 dbscripts/upgrade/03_04_0600_event_notification_methods.sql .

 3. Therefore, when you now try to upgrade, engine-setup tries to run the
 newer version, and fails. Why? Because it keeps in the database the checksum
 of every upgrade script it runs, and does not run again scripts with same
 checksum. But in your case the checksums are different, so it does try that.
 It fails, because the older version already dropped the table 
 event_notification_methods.

 How to fix this?

 First, note that upgrades between dev/beta/rc/etc versions is not supported.
 So the official answer is to remove everything and start from scratch. Or, 
 if you
 have good backups of the latest 3.3 version you had, restore to that one and 
 then
 upgrade to 3.4 and then 3.5.

 If you want to try and force an upgrade, you can do the following, but note 
 that
 it might fail elsewhere, or even fail in some future upgrade:

 1. Following a 'git log' of this file, it seems to me that the only change it
 went through between the version you installed and the one in final 3.4, is 
 [1].
 It seems that the relevant part of this change can be done by you by running:

 ALTER TABLE event_subscriber ADD COLUMN notification_method CHARACTER 
 VARYING(32) DEFAULT 'EMAIL' CHECK (notification_method IN ('EMAIL', 
 'SNMP_TRAP'));

 2. After you do that, you can convince engine-setup that you already ran the
 version of the script you now have, by running:

 update schema_version set checksum='feabc7bc7bb7ff749f075be48538c92e' where 
 version='03040600';

 Backup everything before you start.

 No guarantee. Use at your own risk.

 As I said, better remove everything and setup again clean or restore your
 latest backup of a supported version and upgrade from that one.

 Good luck. Please report back :-) Thanks,

 [1] http://gerrit.ovirt.org/25393

 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-29 Thread InterNetX - Juergen Gotteswinter
Hello both of you,

thanks for your detailed explainations and support, still thinking which
way i will go. tending to try the dirty way in a lab setup before to see
what happens.

Will post updates when i got more :)

Cheers,

Juergen

 It seems that somebody had deleted manually the constraint
 fk_event_subscriber_event_notification_methods from your database
 Therefor, the first line that attempts to drop this constraint in
 03_05_0050_event_notification_methods.sql:  ALTER TABLE event_subscriber
 DROP CONSTRAINT fk_event_subscriber_event_notification_methods;
 fails.

 uhm, interesting. could this be caused be deinstallation of dwh
 reporting?

 How exactly did you do that?


 very good question, thats a few months ago. i whould guess with rpm -e
 before an engine upgrade (if i remember correctly there was one ovirt
 release where dwh was missing for el6).


 Note that partial cleanup is not supported yet [1].

 checking right after that mail :)


 Can you please post all of /var/log/ovirt-engine/setup/* ?

 sure, sending you the dl link in a private mail. since i am not sure if
 i sed´ed out all private things
 
 Based on these logs, it seems to me that:
 
 1. At some point you upgraded to a snapshot of master (then-3.4), installing
 ovirt-engine-3.4.0-0.12.master.20140228075627.el6.
 
 2. This package had an older version of the script
 dbscripts/upgrade/03_04_0600_event_notification_methods.sql .
 
 3. Therefore, when you now try to upgrade, engine-setup tries to run the
 newer version, and fails. Why? Because it keeps in the database the checksum
 of every upgrade script it runs, and does not run again scripts with same
 checksum. But in your case the checksums are different, so it does try that.
 It fails, because the older version already dropped the table 
 event_notification_methods.
 
 How to fix this?
 
 First, note that upgrades between dev/beta/rc/etc versions is not supported.
 So the official answer is to remove everything and start from scratch. Or, 
 if you
 have good backups of the latest 3.3 version you had, restore to that one and 
 then
 upgrade to 3.4 and then 3.5.
 
 If you want to try and force an upgrade, you can do the following, but note 
 that
 it might fail elsewhere, or even fail in some future upgrade:
 
 1. Following a 'git log' of this file, it seems to me that the only change it
 went through between the version you installed and the one in final 3.4, is 
 [1].
 It seems that the relevant part of this change can be done by you by running:
 
 ALTER TABLE event_subscriber ADD COLUMN notification_method CHARACTER 
 VARYING(32) DEFAULT 'EMAIL' CHECK (notification_method IN ('EMAIL', 
 'SNMP_TRAP'));
 
 2. After you do that, you can convince engine-setup that you already ran the
 version of the script you now have, by running:
 
 update schema_version set checksum='feabc7bc7bb7ff749f075be48538c92e' where 
 version='03040600';
 
 Backup everything before you start.
 
 No guarantee. Use at your own risk.
 
 As I said, better remove everything and setup again clean or restore your
 latest backup of a supported version and upgrade from that one.
 
 Good luck. Please report back :-) Thanks,
 
 [1] http://gerrit.ovirt.org/25393
 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] vdsm noipspoof.py vdsm hook problem

2014-12-29 Thread InterNetX - Juergen Gotteswinter
Hi,

i am trying to get the noipspoof.py Hook up and running, which works
fine so far if i only feed it with a single ip. when trying to add 2+,
like described in the ource (comma seperated), the gui tells me that
this isnt expected / nice and wont let me do this.

I already tried modding the Regex, which made the engine to take a
2nd/3rd ip (comma seperated), but it seems that theres somehere else
something wrong with parsing this.

VDSM throws this:

vdsm vm.Vm ERROR vmId=`4c9cb160-2283-4769-a69c-434e6c992c2b`::The vm
start process failed#012Traceback (most recent call last):#012  File
/usr/share/vdsm/virt/vm.py, line 2266, in _startUnderlyingVm#012
self._run()#012  File /usr/share/vdsm/virt/vm.py, line 3332, in
_run#012domxml = hooks.before_vm_start(self._buildCmdLine(),
self.conf)#012  File /usr/share/vdsm/hooks.py, line 142, in
before_vm_start#012return _runHooksDir(domxml, 'before_vm_start',
vmconf=vmconf)#012  File /usr/share/vdsm/hooks.py, line 110, in
_runHooksDir#012raise HookError()#012HookError


The VM fails to start, engine tries this on every available host (which,
not surprising fail, too).

Anyone any ideas / patches / hints how to mod this hook ?


Thanks

Juergen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-23 Thread InterNetX - Juergen Gotteswinter
Am 23.12.2014 um 02:15 schrieb Eli Mesika:
 
 
 - Original Message -
 From: InterNetX - Juergen Gotteswinter j...@internetx.com
 To: users@ovirt.org
 Sent: Monday, December 22, 2014 2:07:55 PM
 Subject: Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

 Hello again,

 It seems that somebody had deleted manually the constraint
 fk_event_subscriber_event_notification_methods from your database
 Therefor, the first line that attempts to drop this constraint in
 03_05_0050_event_notification_methods.sql:  ALTER TABLE event_subscriber
 DROP CONSTRAINT fk_event_subscriber_event_notification_methods;
 fails.

 uhm, interesting. could this be caused be deinstallation of dwh reporting?


 Please try to run the following manually and upgrade again

 psql -U engine -c ALTER TABLE ONLY event_subscriber ADD CONSTRAINT
 fk_event_subscriber_event_notification_methods FOREIGN KEY (method_id)
 REFERENCES event_notification_methods(method_id) ON DELETE CASCADE;
 engine


 it just drops ERROR:  relation event_notification_methods does not
 exist
 
 OK
 Lets check what do you have in your DB
 
 Please attach the result of the following 
 
 psql -U engine -c select table_name from information_schema.tables where 
 table_schema = 'public' order by table_name ; engine 
 
 Thanks 
 

sure, here we go


 

 Let me know how it is going ...

 Eli


 Thank you already for your help!

 Juergen
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


  table_name   
---
 action_version_map
 ad_groups
 affinity_group_members
 affinity_groups
 affinity_groups_view
 all_disks
 all_disks_including_snapshots
 async_tasks
 async_tasks_entities
 audit_log
 base_disks
 bookmarks
 business_entity_snapshot
 cluster_policies
 cluster_policy_units
 custom_actions
 desktop_vms
 disk_image_dynamic
 disk_lun_map
 dwh_add_tags_relations_history_view
 dwh_cluster_configuration_history_view
 dwh_datacenter_configuration_history_view
 dwh_datacenter_history_view
 dwh_datacenter_storage_map_history_view
 dwh_disk_vm_map_history_view
 dwh_history_timekeeping
 dwh_host_configuration_full_check_view
 dwh_host_configuration_history_view
 dwh_host_history_view
 dwh_host_interface_configuration_history_view
 dwh_host_interface_history_view
 dwh_osinfo
 dwh_remove_tags_relations_history_view
 dwh_storage_domain_configuration_history_view
 dwh_storage_domain_history_view
 dwh_tags_details_history_view
 dwh_vm_configuration_history_view
 dwh_vm_device_history_view
 dwh_vm_disk_configuration_history_view
 dwh_vm_disks_history_view
 dwh_vm_history_view
 dwh_vm_interface_configuration_history_view
 dwh_vm_interface_history_view
 event_map
 event_notification_hist
 event_subscriber
 gluster_cluster_services
 gluster_hooks
 gluster_server
 gluster_server_hooks
 gluster_server_hooks_view
 gluster_server_services
 gluster_server_services_view
 gluster_services
 gluster_service_types
 gluster_volume_access_protocols
 gluster_volume_bricks
 gluster_volume_bricks_view
 gluster_volume_options
 gluster_volumes
 gluster_volumes_view
 gluster_volume_task_steps
 gluster_volume_transport_types
 images
 images_storage_domain_view
 image_storage_domain_map
 image_types_storage_domain
 image_types_view
 instance_types_storage_domain
 instance_types_view
 internal_permissions_view
 iscsi_bonds
 iscsi_bonds_networks_map
 iscsi_bonds_storage_connections_map
 job
 job_subject_entity
 luns
 lun_storage_server_connection_map
 luns_view
 materialized_views
 network
 network_cluster
 network_cluster_view
 network_qos
 network_vds_view
 network_view
 object_column_white_list
 object_column_white_list_sql
 permissions
 permissions_view
 policy_units
 providers
 quota
 quota_global_view
 quota_limitation
 quota_limitations_view
 quota_storage_view
 quota_vds_group_view
 quota_view
 repo_file_meta_data
 roles
 roles_groups
 schema_version
 server_vms
 snapshots
 step
 storage_domain_dynamic
 storage_domain_file_repos
 storage_domains
 storage_domains_for_search
 storage_domain_static
 storage_domain_static_view
 storage_domains_with_hosts_view
 storage_domains_without_storage_pools
 storage_for_image_view
 storage_pool
 storage_pool_iso_map
 storage_pool_with_storage_domain
 storage_server_connections
 tags
 tags_user_group_map
 tags_user_group_map_view
 tags_user_map
 tags_user_map_view
 tags_vds_map
 tags_vds_map_view
 tags_vm_map
 tags_vm_map_view
 tags_vm_pool_map
 tags_vm_pool_map_view
 user_db_users_permissions_view
 user_disk_permissions_view
 user_disk_permissions_view_base
 user_flat_groups
 user_network_permissions_view
 user_network_permissions_view_base
 user_object_permissions_view
 user_permissions_permissions_view
 users
 users_and_groups_to_vm_pool_map_view
 user_storage_domain_permissions_view
 user_storage_domain_permissions_view_base
 user_storage_pool_permissions_view
 user_storage_pool_permissions_view_base

Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-23 Thread InterNetX - Juergen Gotteswinter

 It seems that somebody had deleted manually the constraint
 fk_event_subscriber_event_notification_methods from your database
 Therefor, the first line that attempts to drop this constraint in
 03_05_0050_event_notification_methods.sql:  ALTER TABLE event_subscriber
 DROP CONSTRAINT fk_event_subscriber_event_notification_methods;
 fails.

 uhm, interesting. could this be caused be deinstallation of dwh reporting?
 
 How exactly did you do that?


very good question, thats a few months ago. i whould guess with rpm -e
before an engine upgrade (if i remember correctly there was one ovirt
release where dwh was missing for el6).

 
 Note that partial cleanup is not supported yet [1].

checking right after that mail :)

 
 Can you please post all of /var/log/ovirt-engine/setup/* ?

sure, sending you the dl link in a private mail. since i am not sure if
i sed´ed out all private things

 
 Thanks!
 
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1060529
 


 Please try to run the following manually and upgrade again

 psql -U engine -c ALTER TABLE ONLY event_subscriber ADD CONSTRAINT
 fk_event_subscriber_event_notification_methods FOREIGN KEY (method_id)
 REFERENCES event_notification_methods(method_id) ON DELETE CASCADE;
 engine


 it just drops ERROR:  relation event_notification_methods does not
 exist

 OK
 Lets check what do you have in your DB

 Please attach the result of the following

 psql -U engine -c select table_name from information_schema.tables where
 table_schema = 'public' order by table_name ; engine

 Thanks



 Let me know how it is going ...

 Eli


 Thank you already for your help!

 Juergen
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-22 Thread InterNetX - Juergen Gotteswinter
Hi,

i am currently trying to upgrade an existing 3.4.4 Setup (which got
upgraded several times before, starting at 3.3), but this time i run
into a Error while Upgrading the DB

-- snip --


* QUERY **
ALTER TABLE event_subscriber DROP CONSTRAINT
fk_event_subscriber_event_notification_methods;
**

2014-12-20 00:16:27 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
plugin.executeRaw:803 execute-result:
['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
'5432', '-u', 'engine', '-d', 'engine', '-l',
'/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
'-c', 'apply'], rc=1
2014-12-20 00:16:27 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
plugin.execute:861 execute-output:
['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
'5432', '-u', 'engine', '-d', 'engine', '-l',
'/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
'-c', 'apply'] stdout:
Creating schema engine@localhost:5432/engine
Saving custom users permissions on database objects...
upgrade script detected a change in Config, View or Stored Procedure...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/_config.sql'...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0010_custom.sql'...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0020_add_materialized_views_table.sql'...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0030_materialized_views_extensions.sql'...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0040_extend_installed_by_column.sql'...
Dropping materialized views...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0010_add_tables_for_gluster_volume_and_brick_details.sql'...
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0020_gluster_refresh_gluster_volume_details-event_map.sql'...
Skipping upgrade script
/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0030_add_ha_columns_to_vds_statistics.sql,
already installed by 03040610
Skipping upgrade script
/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0040_add_ha_maintenance_events.sql,
already installed by 03040620
Running upgrade sql script
'/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql'...

2014-12-20 00:16:27 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
plugin.execute:866 execute-output:
['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
'5432', '-u', 'engine', '-d', 'engine', '-l',
'/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
'-c', 'apply'] stderr:
psql:/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql:2:
ERROR:  constraint fk_event_subscriber_event_notification_methods of
relation event_subscriber does not exist
FATAL: Cannot execute sql command:
--file=/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql

2014-12-20 00:16:27 DEBUG otopi.context context._executeMethod:152
method exception
Traceback (most recent call last):
  File /usr/lib/python2.6/site-packages/otopi/context.py, line 142, in
_executeMethod
method['method']()
  File
/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/schema.py,
line 291, in _misc
oenginecons.EngineDBEnv.PGPASS_FILE
  File /usr/lib/python2.6/site-packages/otopi/plugin.py, line 871, in
execute
command=args[0],
RuntimeError: Command '/usr/share/ovirt-engine/dbscripts/schema.sh'
failed to execute
2014-12-20 00:16:27 ERROR otopi.context context._executeMethod:161
Failed to execute stage 'Misc configuration': Command
'/usr/share/ovirt-engine/dbscripts/schema.sh' failed to execute
2014-12-20 00:16:27 DEBUG otopi.transaction transaction.abort:131
aborting 'Yum Transaction'

-- snip --

after that, engine-setup starts doing a rollback to 3.4.4 which
worksflawless.

Anyone got an Idea what is causing this?

Thanks,

Juergen

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-22 Thread InterNetX - Juergen Gotteswinter
Am 22.12.2014 um 10:41 schrieb Eli Mesika:
 
 
 - Original Message -
 From: InterNetX - Juergen Gotteswinter j...@internetx.com
 To: users@ovirt.org
 Sent: Monday, December 22, 2014 11:08:05 AM
 Subject: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

 Hi,

 i am currently trying to upgrade an existing 3.4.4 Setup (which got
 upgraded several times before, starting at 3.3), but this time i run
 into a Error while Upgrading the DB
 
 Hi
 Can you please attach the following data (this information does not include 
 any customer info, just log of installed scripts)
 
 pg_dump -U engine -f schema-version.sql -t schema_version  engine

sure :)

 

 -- snip --


 * QUERY **
 ALTER TABLE event_subscriber DROP CONSTRAINT
 fk_event_subscriber_event_notification_methods;
 **

 2014-12-20 00:16:27 DEBUG
 otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
 plugin.executeRaw:803 execute-result:
 ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
 '5432', '-u', 'engine', '-d', 'engine', '-l',
 '/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
 '-c', 'apply'], rc=1
 2014-12-20 00:16:27 DEBUG
 otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
 plugin.execute:861 execute-output:
 ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
 '5432', '-u', 'engine', '-d', 'engine', '-l',
 '/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
 '-c', 'apply'] stdout:
 Creating schema engine@localhost:5432/engine
 Saving custom users permissions on database objects...
 upgrade script detected a change in Config, View or Stored Procedure...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/_config.sql'...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0010_custom.sql'...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0020_add_materialized_views_table.sql'...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0030_materialized_views_extensions.sql'...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/pre_upgrade/0040_extend_installed_by_column.sql'...
 Dropping materialized views...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0010_add_tables_for_gluster_volume_and_brick_details.sql'...
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0020_gluster_refresh_gluster_volume_details-event_map.sql'...
 Skipping upgrade script
 /usr/share/ovirt-engine/dbscripts/upgrade/03_05_0030_add_ha_columns_to_vds_statistics.sql,
 already installed by 03040610
 Skipping upgrade script
 /usr/share/ovirt-engine/dbscripts/upgrade/03_05_0040_add_ha_maintenance_events.sql,
 already installed by 03040620
 Running upgrade sql script
 '/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql'...

 2014-12-20 00:16:27 DEBUG
 otopi.plugins.ovirt_engine_setup.ovirt_engine.db.schema
 plugin.execute:866 execute-output:
 ['/usr/share/ovirt-engine/dbscripts/schema.sh', '-s', 'localhost', '-p',
 '5432', '-u', 'engine', '-d', 'engine', '-l',
 '/var/log/ovirt-engine/setup/ovirt-engine-setup-20141220001232-3xjymi.log',
 '-c', 'apply'] stderr:
 psql:/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql:2:
 ERROR:  constraint fk_event_subscriber_event_notification_methods of
 relation event_subscriber does not exist
 FATAL: Cannot execute sql command:
 --file=/usr/share/ovirt-engine/dbscripts/upgrade/03_05_0050_event_notification_methods.sql

 2014-12-20 00:16:27 DEBUG otopi.context context._executeMethod:152
 method exception
 Traceback (most recent call last):
   File /usr/lib/python2.6/site-packages/otopi/context.py, line 142, in
 _executeMethod
 method['method']()
   File
 /usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/db/schema.py,
 line 291, in _misc
 oenginecons.EngineDBEnv.PGPASS_FILE
   File /usr/lib/python2.6/site-packages/otopi/plugin.py, line 871, in
 execute
 command=args[0],
 RuntimeError: Command '/usr/share/ovirt-engine/dbscripts/schema.sh'
 failed to execute
 2014-12-20 00:16:27 ERROR otopi.context context._executeMethod:161
 Failed to execute stage 'Misc configuration': Command
 '/usr/share/ovirt-engine/dbscripts/schema.sh' failed to execute
 2014-12-20 00:16:27 DEBUG otopi.transaction transaction.abort:131
 aborting 'Yum Transaction'

 -- snip --

 after that, engine-setup starts doing a rollback to 3.4.4 which
 worksflawless.

 Anyone got an Idea what is causing this?

 Thanks,

 Juergen

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET

Re: [ovirt-users] Problem Upgrading 3.4.4 - 3.5

2014-12-22 Thread InterNetX - Juergen Gotteswinter
Hello again,

 It seems that somebody had deleted manually the constraint 
 fk_event_subscriber_event_notification_methods from your database 
 Therefor, the first line that attempts to drop this constraint in 
 03_05_0050_event_notification_methods.sql:  ALTER TABLE event_subscriber DROP 
 CONSTRAINT fk_event_subscriber_event_notification_methods;
 fails.

uhm, interesting. could this be caused be deinstallation of dwh reporting?

 
 Please try to run the following manually and upgrade again
 
 psql -U engine -c ALTER TABLE ONLY event_subscriber ADD CONSTRAINT 
 fk_event_subscriber_event_notification_methods FOREIGN KEY (method_id) 
 REFERENCES event_notification_methods(method_id) ON DELETE CASCADE; engine 
 

it just drops ERROR:  relation event_notification_methods does not
exist

 Let me know how it is going ...
 
 Eli
 

Thank you already for your help!

Juergen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Used Network resources of host XY [97%] exceeded

2014-12-18 Thread InterNetX - Juergen Gotteswinter
i have seen those messages in setups with 10G networking, when the
network throughput went to ~1gbit those messages appeared. regarding to
rhn knowledgebase this is a false alert, seemed like some versions of
ovirt/rhev got this hardcoded.

Am 18.12.2014 um 09:47 schrieb Lior Vernia:
 Hi Mario,
 
 On 18/12/14 09:57, Ml Ml wrote:
 Hello List,

 one of my ovirt Guests seems to loose its Network connection. I can
 still see some traffic on the console tough. In the Logs i get:

 
 Network connection where? To the internet? And what do you mean by
 losing it - is the network traffic choppy, or are you working via a
 SPICE console and it's not responsive?...
 
 2014-Dez-04, 13:09
 Used Network resources of host myhostname-here [97%] exceeded
 defined threshold [95%].

 
 It seems that one of your host's interfaces is heavily utilized - I'd
 postulate that the VM is suffering because there's no capacity for the
 traffic it's sending/receiving. What networks are attached to the most
 utilized interface of the host, and is one of those indeed the network
 used by the VM for its aforementioned network connection?
 

 Is this the reason? Where can i change or disable this feature?

 Thanks,
 Mario
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Remote oVirt Cluster - Latency Sensitivity

2014-11-24 Thread InterNetX - Juergen Gotteswinter


Am 24.11.2014 um 13:16 schrieb s k:
 Hello,
 
 
 I'm building an oVirt Cluster on a remote site and to avoid installing a
 separate oVirt Engine there I'm thinking of using the central oVirt
 Engine. The problem is that the latency between the two sites is up to
 100ms. 

Ovirt is very picky about Network Latency, i whouldnt go with that if
this is something important
 
 
 Is this an acceptable value for the oVirt Engine to be able to monitor
 them properly? Alternatively, is there a way to configure the Engine to
 accept latency around 100ms?

engine-config -a

i think that i have seen something regarding this


 
 
 Thanks,
 
 
 Sokratis
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2

2014-09-24 Thread InterNetX - Juergen Gotteswinter
Am 23.09.2014 um 10:11 schrieb Federico Simoncelli:
 - Original Message -
 From: Oved Ourfali ov...@redhat.com
 To: j...@internetx.com, Federico Simoncelli fsimo...@redhat.com
 Cc: users@ovirt.org, Allon Mureinik amure...@redhat.com
 Sent: Tuesday, September 23, 2014 9:56:28 AM
 Subject: Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository / 
 3.5 RC2

 - Original Message -
 From: InterNetX - Juergen Gotteswinter j...@internetx.com
 To: users@ovirt.org
 Sent: Tuesday, September 23, 2014 10:41:41 AM
 Subject: Re: [ovirt-users] Problem Refreshing/Using ovirt-image-repository
 / 3.5   RC2

 Am 23.09.2014 um 09:32 schrieb Oved Ourfali:


 - Original Message -
 From: InterNetX - Juergen Gotteswinter j...@internetx.com
 To: users@ovirt.org
 Sent: Tuesday, September 23, 2014 10:29:07 AM
 Subject: [ovirt-users] Problem Refreshing/Using ovirt-image-repository /
 3.5   RC2

 Hi,

 when trying to refresh the ovirt glance repository i get a 500 Error
 Message


 Operation Canceled

 Error while executing action: A Request to the Server failed with the
 following Status Code: 500


 engine.log says:

 2014-09-23 09:23:08,960 INFO
 [org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand]
 (ajp--127.0.0.1-8702-10) [7fffb4bd] Running command:
 TestProviderConnectivityCommand internal: false. Entities affected :
 ID: aaa0----123456789aaa Type: SystemAction group
 CREATE_STORAGE_POOL with role type ADMIN
 2014-09-23 09:23:08,975 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (ajp--127.0.0.1-8702-10) [7fffb4bd] Correlation ID: 7fffb4bd, Call
 Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type
 has been used.
 2014-09-23 09:23:20,173 INFO
 [org.ovirt.engine.core.bll.aaa.LogoutUserCommand]
 (ajp--127.0.0.1-8702-11) [712895c3] Running command: LogoutUserCommand
 internal: false.
 2014-09-23 09:23:20,184 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (ajp--127.0.0.1-8702-11) [712895c3] Correlation ID: 712895c3, Call
 Stack: null, Custom Event ID: -1, Message: User admin logged out.
 2014-09-23 09:23:20,262 INFO
 [org.ovirt.engine.core.bll.aaa.LoginAdminUserCommand]
 (ajp--127.0.0.1-8702-6) Running command: LoginAdminUserCommand internal:
 false.


 All these message are good... no error here.
 Can you attach the full engine log?


 imho there is nothing else related to this :/ i attached the log
 starting from today. except firing up a test vm nothing else happened
 yet (and several tries refreshing the image repo)


 I don't see a refresh attempt in the log, but i'm not familiar enough with
 that.
 Federico - can you have a look?
 
 I don't see any reference to glance or error 500 in the logs. My impression
 is that the error 500 is between the ui and the engine... have you tried to
 force-refresh the ovirt webadmin page?
 
 You can try and use the rest-api to check if the listing is working there.
 


for the records - problem solved. how? dunno, after upgrading to rc3
everything is working like a charm
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problem Refreshing/Using ovirt-image-repository / 3.5 RC2

2014-09-23 Thread InterNetX - Juergen Gotteswinter
Hi,

when trying to refresh the ovirt glance repository i get a 500 Error Message


Operation Canceled

Error while executing action: A Request to the Server failed with the
following Status Code: 500


engine.log says:

2014-09-23 09:23:08,960 INFO
[org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand]
(ajp--127.0.0.1-8702-10) [7fffb4bd] Running command:
TestProviderConnectivityCommand internal: false. Entities affected :
ID: aaa0----123456789aaa Type: SystemAction group
CREATE_STORAGE_POOL with role type ADMIN
2014-09-23 09:23:08,975 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ajp--127.0.0.1-8702-10) [7fffb4bd] Correlation ID: 7fffb4bd, Call
Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type
has been used.
2014-09-23 09:23:20,173 INFO
[org.ovirt.engine.core.bll.aaa.LogoutUserCommand]
(ajp--127.0.0.1-8702-11) [712895c3] Running command: LogoutUserCommand
internal: false.
2014-09-23 09:23:20,184 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ajp--127.0.0.1-8702-11) [712895c3] Correlation ID: 712895c3, Call
Stack: null, Custom Event ID: -1, Message: User admin logged out.
2014-09-23 09:23:20,262 INFO
[org.ovirt.engine.core.bll.aaa.LoginAdminUserCommand]
(ajp--127.0.0.1-8702-6) Running command: LoginAdminUserCommand internal:
false.


anyone got an idea what is causing this? Same Network Location, Ovirt
3.4, the Repository works like expected.

We also tried to start from scratch, after a engine-cleanup the error
still occours.

Cheers,

Juergen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM paused unexpectedly

2014-09-10 Thread InterNetX - Juergen Gotteswinter


Am 09.09.2014 um 16:38 schrieb Dafna Ron:
 On 09/09/2014 03:21 PM, Frank Wall wrote:
 On Tue, Sep 09, 2014 at 03:09:02PM +0100, Dafna Ron wrote:
 qemu would pause a vm when doing extend on the vm disk and this would
 result in INFO messages on vm's pause.

 looks like this is what you are seeing.
 For the records, I'm using thin provisioned disks here.
 Do you mean an internal qemu task which is triggered to
 extend a thin provisioned disk to the required size?
 yes

To mitigate (or even get completly rid of it) you can try adding
something like this to vdsm.conf in [irs] section

volume_utilization_percent = 25
volume_utilization_chunk_mb = 4096
vol_size_sample_interval = 20


 
 This process shouldn't permanently pause the VM, right?
 no
 

 Or do you mean something else?


 Regards
 - Frank
 
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Multiple datacenters design

2014-09-08 Thread InterNetX - Juergen Gotteswinter


Am 08.09.2014 um 15:05 schrieb Jimmy Dorff:
 On 09/08/2014 05:54 AM, Finstrle, Ludek wrote:

 I think about two possibilities:
 1) One central Engine
 - how to manage guests when connection drop between engine and node
 - latency is up to 1 second is it ok even with working connection?

 
 I had a similar design on a Univ. campus and had problems. Ovirt engine
 would have problems with the state of remote hosts. VMs ran OK, but it
 wasn't clean. I would recommend having a fast, reliable network
 between the engine and the hosts.
 
 -Jimmy
 
 

same experience here, network latency is something where ovirt/rhev is
very very picky and goes crazy (fencing nodes, unknown vm states,
unknown host / storage states).

whould advise to do this.


 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Multiple datacenters design

2014-09-08 Thread InterNetX - Juergen Gotteswinter


Am 08.09.2014 um 15:18 schrieb InterNetX - Juergen Gotteswinter:
 
 
 Am 08.09.2014 um 15:05 schrieb Jimmy Dorff:
 On 09/08/2014 05:54 AM, Finstrle, Ludek wrote:

 I think about two possibilities:
 1) One central Engine
 - how to manage guests when connection drop between engine and node
 - latency is up to 1 second is it ok even with working connection?


 I had a similar design on a Univ. campus and had problems. Ovirt engine
 would have problems with the state of remote hosts. VMs ran OK, but it
 wasn't clean. I would recommend having a fast, reliable network
 between the engine and the hosts.

 -Jimmy


 
 same experience here, network latency is something where ovirt/rhev is
 very very picky and goes crazy (fencing nodes, unknown vm states,
 unknown host / storage states).
 
 whould advise to do this.

whould not, of course. sorry for typo

 

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users