[ovirt-users] Re: Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-12 Thread Nir Soffer
On Wed, Aug 12, 2020 at 2:25 AM  wrote:

> While trying to diagnose an issue with a set of VMs that get stopped for
> I/O problems at startup, I try to deal with the fact that their boot disks
> cause this issue, no matter where I connect them. They might have been the
> first disks I ever tried to sparsify and I was afraid that might have
> messed them up. The images are for a nested oVirt deployment and they
> worked just fine, before I shut down those VMs...
>
> So I first tried to hook them as secondary disks to another VM to have a
> look, but that just cause the other VM to stop at boot.
>
> Also tried downloading, exporting, and plain copying the disks to no
> avail, OVA exports on the entire VM fail again (fix is in!).
>
> So to make sure copying disks between volumes *generally* work, I tried
> copying a disk from a working (but stopped) VM from 'vmstore' to 'data' on
> my 3nHCI farm, but that failed, too!
>
> Plenty of space all around, but all disks are using thin/sparse/VDO on SSD
> underneath.
>
> Before I open a bug, I'd like to have some feedback if this is a standard
> QA test, this is happening to you etc.
>
> Still on oVirt 4.3.11 with pack_ova.py patched to wait for the udev
> settle,
>
> This is from the engine.log on the hosted-engine:
>
> 2020-08-12 00:04:15,870+02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), VDSM gem2 command
> HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ("Command
> ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f',
> 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
> '-O', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
> failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
> sector 131072: Transport endpoint is not connected\\nqemu-img: error while
> reading sector 135168: Transport endpoint is not connected\\nqemu-img:
> error while reading sector 139264: Transport
>   endpoint is not connected\\nqemu-img: error while reading sector 143360:
> Transport endpoint is not connected\\nqemu-img: error while reading sector
> 147456: Transport endpoint is not connected\\nqemu-img: error while reading
> sector 151552: Transport endpoint is not connected\\n')",)
>
> and this is from the vdsm.log on the gem2 node:
> Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T',
> 'none', '-f', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
> '-O', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
> failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
> sector 131072: Transport endpoint is not connected\nqemu-img: error while
> reading sector 135168: Transport endpoint is not connected\nqemu-img: error
> while reading sector 139264: Transport endpoint is not connected\nqemu-img:
> error while reading sector 143360: Transport endpoint is not
> connected\nqemu-img: error while reading sector 147456: Transport endpoint
> is not connected\nqemu-img: error while reading sector 151552: Transport
> endpoint is not connected\n')
> 2020-08-12 00:03:15,428+0200 ERROR (tasks/7) [storage.Image] Unexpected
> error (image:849)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 837,
> in copyCollapsed
> raise se.CopyImageError(str(e))
> CopyImageError: low level Image copy failed: ("Command
> ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f',
> 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
> '-O', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
> failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
> sector 131072: Transport endpoint is not connected\\nqemu-img: error while
> reading sector 135168: Transport endpoint is not connected\\nqemu-img:
> error while reading sector 139264: Transport endpoint is not
> connected\\nqemu-img: error while reading sector 143360: Transport endpoint
> is not connected\\nqemu-img: error while reading sector 147456: Transport
> endpoint is not connected\\nqemu-img: error while reading sector 151552: T
>  ransport endpoint is not 

[ovirt-users] Re: Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-11 Thread Strahil Nikolov via Users
Gluster log level (both volume and brick) can be done by following 
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level

Can you test with a fresh (blank template) VM:
- create a test VM
- create snapshot
- delete snapshot
- live migrate

If you have gluster acl issues , those will fail,  otheewise  it's something 
else.

I hit the bug when upgrading from 6.5 to 6.6, so if it is gluster issue - you 
can downgrade to v6.5 or upgrade to v7.0 ( but not 7.1+)

Best Regards,
Strahil Nikolov

На 12 август 2020 г. 6:45:32 GMT+03:00, tho...@hoberg.net написа:
>Hi Strahil, no updates, especially since I am stuck on 4.3.11 and they
>tell us it's final.
>Glusterfs is 6.9-1.el7.
>
>Apart from those three VMs and the inability to copy disks the whole
>farm runs fine so far.
>
>Where would I configure the verbosity of logging? Can't find an obvious
>config option.
>___
>Users mailing list -- users@ovirt.org
>To unsubscribe send an email to users-le...@ovirt.org
>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/FSUDKB6LESGMG7SONNDL45V5KJE4LOSW/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CH4DH23FCUY2XIIIE2TEWQLLJMEJTQGO/


[ovirt-users] Re: Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-11 Thread thomas
Hi Strahil, no updates, especially since I am stuck on 4.3.11 and they tell us 
it's final.
Glusterfs is 6.9-1.el7.

Apart from those three VMs and the inability to copy disks the whole farm runs 
fine so far.

Where would I configure the verbosity of logging? Can't find an obvious config 
option.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FSUDKB6LESGMG7SONNDL45V5KJE4LOSW/


[ovirt-users] Re: Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-11 Thread Strahil Nikolov via Users
Sounds  like the gluster ACL bug.
Did you recently patchbyour gluster ?

Did you test functionality of oVirt after gluster upgrade?

Check the brick logs for errors mentioning 'acl' (you might need to increase 
log level temporarily).
If you do brick log reporting issues related to acl - you can downgrade all 
gluster packages (but you will need to restart the gluster brick processes).


Best Regards,
Strahil Nikolov

На 12 август 2020 г. 2:23:03 GMT+03:00, tho...@hoberg.net написа:
>While trying to diagnose an issue with a set of VMs that get stopped
>for I/O problems at startup, I try to deal with the fact that their
>boot disks cause this issue, no matter where I connect them. They might
>have been the first disks I ever tried to sparsify and I was afraid
>that might have messed them up. The images are for a nested oVirt
>deployment and they worked just fine, before I shut down those VMs... 
>
>So I first tried to hook them as secondary disks to another VM to have
>a look, but that just cause the other VM to stop at boot.
>
>Also tried downloading, exporting, and plain copying the disks to no
>avail, OVA exports on the entire VM fail again (fix is in!).
>
>So to make sure copying disks between volumes *generally* work, I tried
>copying a disk from a working (but stopped) VM from 'vmstore' to 'data'
>on my 3nHCI farm, but that failed, too!
>
>Plenty of space all around, but all disks are using thin/sparse/VDO on
>SSD underneath.
>
>Before I open a bug, I'd like to have some feedback if this is a
>standard QA test, this is happening to you etc.
>
>Still on oVirt 4.3.11 with pack_ova.py patched to wait for the udev
>settle, 
>
>This is from the engine.log on the hosted-engine:
>
>2020-08-12 00:04:15,870+02 ERROR
>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID:
>VDS_BROKER_COMMAND_FAILURE(10,802), VDSM gem2 command
>HSMGetAllTasksStatusesVDS failed: low level Image copy failed:
>("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T',
>'none', '-f', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
>'-O', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
>failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
>sector 131072: Transport endpoint is not connected\\nqemu-img: error
>while reading sector 135168: Transport endpoint is not
>connected\\nqemu-img: error while reading sector 139264: Transport
>endpoint is not connected\\nqemu-img: error while reading sector
>143360: Transport endpoint is not connected\\nqemu-img: error while
>reading sector 147456: Transport endpoint is not connected\\nqemu-img:
>error while reading sector 151552: Transport endpoint is not
>connected\\n')",)
>
>and this is from the vdsm.log on the gem2 node:
>Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none',
>'-T', 'none', '-f', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
>'-O', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
>failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
>sector 131072: Transport endpoint is not connected\nqemu-img: error
>while reading sector 135168: Transport endpoint is not
>connected\nqemu-img: error while reading sector 139264: Transport
>endpoint is not connected\nqemu-img: error while reading sector 143360:
>Transport endpoint is not connected\nqemu-img: error while reading
>sector 147456: Transport endpoint is not connected\nqemu-img: error
>while reading sector 151552: Transport endpoint is not connected\n')
>2020-08-12 00:03:15,428+0200 ERROR (tasks/7) [storage.Image] Unexpected
>error (image:849)
>Traceback (most recent call last):
>File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line
>837, in copyCollapsed
>raise se.CopyImageError(str(e))
>CopyImageError: low level Image copy failed: ("Command
>['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none',
>'-f', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
>'-O', 'raw',
>u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
>failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
>sector 131072: Transport endpoint is not connected\\nqemu-img: error
>while reading sector 135168: