[ovirt-users] Re: virt-v2v paused by system after one hour or a bit more

2024-03-21 Thread Nir Soffer
On Thu, Mar 21, 2024 at 7:03 PM Cyril VINH-TUNG  wrote:

> Hello
>
> Here's the technique we use :
> - create manually the vm on ovirt with same disks (same size that original
> but you can choose target type, thin provision or preallocated)
> - on any node, force activating the disks to make them writable at the os
> level (lvm, vgchange...)
> - if the disk type is the same on target and destination, you can use dd
> over netcat to copy the disks
> - if the type is not the same, you might use qemu-img convert over netcat
>

This is very fragile and dangerous, you must really know what you are doing
:-)

If you already imported the disks from the other system, uploading them to
any storage domain
in any wanted (and supported) image format and allocation policy is much
easier and faster
using ovirt-img.

See https://www.ovirt.org/media/ovirt-img-v8.pdf


>
> If you have physical access to the node, you might use a flat backup
>
> Another workaround is to backup/restore the vm with a backup tool that
> works both with vmware and ovirt... I would say vprotect or vinchin
>

This should also work, but when importing vms from vmware, it is not enough
to copy the disk data a is,
you want to modify it to remove vmware bits and add the ovirt bits -
virt-v2v does all this for you.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6MASYRHUZPJCRX45G3TO73OURT5LFWCB/


[ovirt-users] Re: virt-v2v paused by system after one hour or a bit more

2024-03-21 Thread Nir Soffer
On Thu, Mar 21, 2024 at 12:44 PM Claus Serbe via Users 
wrote:

> Hi,
>
> I am migrating some vmware VM's from an NFS Storage via rhv-upload in
> virt-v2v, what is working good.
>
> But now I try to move some bigger VM's with several disks and sadly after
> a while (I would guess around an hour) the Ovirt-engine shows me "Paused by
> system" instead of transfering, so when the next disk should be imported,
> it will fail
>
> In the ovirt-engine.log I see the following lines for the remaining 4
> disks.
>
> 2024-03-21 06:14:06,815-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
> [f61b3906-804d-470f-8524-6507081fbdec] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:17,915-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14)
> [aef8e312-d811-4a39-b5fb-342157209bce] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:24,959-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85)
> [860b012d-78a4-49f8-a875-52f4299c8298] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:46,099-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65)
> [f93869ee-2ecb-4f54-b3e9-b12259637b0b] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
>
>
> There are 2 strange things.
>
> 1. When I start virt-v2v it will create all 6 disks and set them to
> transferring, but virt-v2v will import one after the other, what leads to
> kind of unused/timing out transferring tickets.
>

This is a incorrect usage of ovirt-imageio API in virt-v2v, please report
it here:
https://github.com/libguestfs/virt-v2v/issues

The right way to use the API is:
1. Start transfer
2. Upload the data
3. End the transfer

It does not matter if you create all the disk at the start of the
operation, but starting a transfer must be done
right before you upload the data.


> 2. When I copy the disk images to a local disk before, it works. Maybe
> just because of faster transfer speeds.
>
> Is there a possibility to transfer parallel or maybe extend the timeout?
>

Sure you can upload in parallel, but I'm not sure virt-v2v will probably
have issues importing in parallel from other
systems (e.g. vmware would not allow this).

We tested uploading 10 100g images in parallel using the ovirt-img tool,
each upload using 4 connections
(total of 40 upload connections).

You may be able to extend the timeout, but this is not recommended since
your system will not clean up quickly
after a bad client disconnects uncleanly without ending the transfer.
Unfortunately I don't remember the which
timeout should be modified on the engine side, maybe Arik or Albert can
help with this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2TUOTBUYA6GEZYJK3W4EC3PSDJLYXHK3/


[ovirt-users] Re: Create Vm without Storage Domain

2024-03-20 Thread Nir Soffer
On Wed, Mar 20, 2024 at 12:06 PM Shafi Mohammed 
wrote:

> Hi Guys ,
>
> Thanks for the Info .
>
> I am trying to migrate my data from local to Storage domain . But it
> requires twice the effort in terms of space and time to copy the data to
> the local file system and then upload the disk to it .
>
> I created a storage domain and tried to expose it to the NBD server to
> write the data from the source Vm disk . But I'm facing an nbd
> permission issue  . Even a copy or write is restricted .
>
> Actual Command
> qemu-nbd -f qcow2 /rhev/data-center/mnt/192.168.108.27:
> _storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb
>
>
>
> *qemu-nbd: Failed to blk_new_open
> '/rhev/data-center/mnt/192.168.108.27:_storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb':
> Could not open
> '/rhev/data-center/mnt/192.168.108.27:_storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb':
> Permission denied*
> Please suggest me if Ovirt has any API  to open a disk from a
> storage domain and write data to it
>

You are trying to reinvent ovirt-img. Try it:
https://www.ovirt.org/media/ovirt-img-v8.pdf
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BK2RAMNASXIYOBKXSXG7JLN7ZPL2SI4R/


[ovirt-users] Re: How to obtain vm snapshots status

2023-10-03 Thread Nir Soffer
On Tue, Sep 26, 2023 at 9:07 PM anton.alymov--- via Users 
wrote:

> Hi! I use ovirt rest api to start vm, backup vm and then remove vm.
> I start vm, wait for vmstatus up, then  start backup, wait for starting,
> finalize, wait for succeeded, wait for disk unlock. Looks like backup is
> finished here from my side.Because ovirt repost succeed status and unlocks
> disk. But if i try shutdown and remove vm ovirt will throw error  Cannot
> remove VM. The VM is performing an operation on a Snapshot. Please wait for
> the operation to finish, and try again.
> Ok, ovirt is right here, I see from web interface that operation hasn't
> finished yet. How can I obtain correct status where vm can be removed? I
> also tried to get info about vm snapshots but all of them had Status: ok
>

This sounds similar to ovirt stress delete-snapshot and backup tests.

Please check here how to use the ovirt python sdk to create/delete/backup
and wait for events:
https://github.com/ovirt/ovirt-stress

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/B2SFL5DIHW7HEDCZWJXGDFRN7HC5LIGR/


[ovirt-users] Re: python sdk4 ovirt 4.5.5.0 master

2023-07-19 Thread Nir Soffer
On Mon, Jul 17, 2023 at 6:29 PM Jorge Visentini
 wrote:
>
> Hi.
>
> I am testing oVirt 4.5.5-0.master.20230712143502.git07e865d650.el8.
>
> I missed the python scripts to download and upload discs and images... Will 
> it still be possible to use them or should I consider using Ansible?

See 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EUSHCZXXZD4HHBQ64OEVXN5FE7SJT7R6/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MSYHQ6V7DZAZAAHZ5FCCEJVJYMF2QOKG/


[ovirt-users] Re: ImageIO Performance

2023-02-11 Thread Nir Soffer
On Thu, Feb 9, 2023 at 7:03 PM Nir Soffer  wrote:
>
> On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
>  wrote:
> >
> > Hi All,
> >
> > We backup our VM's with a custom script based on the
> > https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/backup_vm.py
> > example.
> > This works fine, but we start to see scaling issues.
> >
> > On VM's where there are a lot of dirty blocks,
>
> We need to see the list of extents returned by the server.
>
> The easiest way would be to enable debug logs - it will be even slower,
> but we will see these logs showing all extents:
>
> log.debug("Copying %s", ext)
> log.debug("Zeroing %s", ext)
> log.debug("Skipping %s", ext)
>
> It will also show other info that can help to understand why it is slow.
>
> > the transfer goes really
> > slow (sometimes only 20MiB/sec).
>
> Seems much slower than expected
>
> > At the same time we see that ovirt-imageio process sometimes uses 100%
> > CPU
>
> This is possible, it shows that you do a lot of requests.
>
> > (its single threaded?).
>
> It uses thread per connection model. When used with backup_vm.py or other
> examples using the ovirt_imageio.client it usually use 4 connections
> per transfer
> so there will be 4 threads on the server size serving the data.
>
> Please share debug log of a slow backup, and info about the backup image 
> storage
> for example, is this local file system or NFS?

I opened https://github.com/oVirt/ovirt-imageio/issues/175 to make
debugging such
issue easier.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HP2OZ3E3HAZIE2OCUKXSZIVEDYKQO2DY/


[ovirt-users] Re: ImageIO Performance

2023-02-09 Thread Nir Soffer
On Thu, Feb 9, 2023 at 7:03 PM Nir Soffer  wrote:
>
> On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
>  wrote:
> The easiest way would be to enable debug logs - it will be even slower,
> but we will see these logs showing all extents:

Using the --debug option

Run backup_vm.py with --help to see all options.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HPJVPFQD32Q55ZALOLK6AQ6PFXNGLGMX/


[ovirt-users] Re: ImageIO Performance

2023-02-09 Thread Nir Soffer
On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
 wrote:
>
> Hi All,
>
> We backup our VM's with a custom script based on the
> https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/backup_vm.py
> example.
> This works fine, but we start to see scaling issues.
>
> On VM's where there are a lot of dirty blocks,

We need to see the list of extents returned by the server.

The easiest way would be to enable debug logs - it will be even slower,
but we will see these logs showing all extents:

log.debug("Copying %s", ext)
log.debug("Zeroing %s", ext)
log.debug("Skipping %s", ext)

It will also show other info that can help to understand why it is slow.

> the transfer goes really
> slow (sometimes only 20MiB/sec).

Seems much slower than expected

> At the same time we see that ovirt-imageio process sometimes uses 100%
> CPU

This is possible, it shows that you do a lot of requests.

> (its single threaded?).

It uses thread per connection model. When used with backup_vm.py or other
examples using the ovirt_imageio.client it usually use 4 connections
per transfer
so there will be 4 threads on the server size serving the data.

Please share debug log of a slow backup, and info about the backup image storage
for example, is this local file system or NFS?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HE6ZKEPEHOYDCIHLUFAW4MDQVXYO2TA6/


[ovirt-users] Re: How does ovirt handle disks across multiple iscsi LUNs

2022-12-04 Thread Nir Soffer
On Sun, Nov 27, 2022 at 9:11 PM  wrote:

> A possibly obvious question I can't find the answer to anywhere—how does
> ovirt allocate VM disk images when a storage domain has multiple LUNs? Are
> these allocated one per LUN, so if e.g. a LUN runs out of space the disks
> on that LUN (only) will be unable to write? Or are these distributed across
> LUNs, so if a LUN fails due to storage failure etc the entire storage
> domain can be affected?
>

A storage domain is exactly one LVM Volume Group (VG). Disks are created
from volume, which are LVM Logical Volume (LV). Each time you create a
snapshot oVirt creates a new volume. So a disk may have one or more LVs in
the VG.

The volumes may be extended as more space is needed. Up to 4.5, oVirt
extended the volumes in chunks of 1g. Since 4.5, it uses chunks of 2.5g.

So every disk may contain multiple chunks in different size, and these may
be allocated anywhere in the VG logical
space, so they may be on any PV.

To understand how the chunks are allocated, you can inspect the each LV
like this:

# lvdisplay -m --devicesfile=
bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  --- Logical volume ---
  LV Path
 /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  LV Name51de2d8b-b67e-4a91-bc68-a2c922bc7398
  VG Namebafd0f16-9aba-4f9f-ba90-46d3b8a29157
  LV UUIDW0FAvX-EUDc-v7QR-4A2A-3aSX-yGC5-2jREeV
  LV Write Accessread/write
  LV Creation host, time host4, 2022-12-04 01:43:17 +0200
  LV Status  NOT available
  LV Size200.00 GiB
  Current LE 1600
  Segments   3
  Allocation inherit
  Read ahead sectors auto

  --- Segments ---
  Logical extents 0 to 796:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02
Physical extents 0 to 796

  Logical extents 797 to 1593:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03
Physical extents 0 to 796

  Logical extents 1594 to 1599:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 49 to 54

Note that oVirt uses lvm devices files to prevent unwanted access of
volumes by lvm
commands. To disable the devices file temporarily you can use
--devicesfile=.

After extending this disk by 10g:

# lvdisplay -m --devicesfile=
bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  --- Logical volume ---
  LV Path
 /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  LV Name51de2d8b-b67e-4a91-bc68-a2c922bc7398
  VG Namebafd0f16-9aba-4f9f-ba90-46d3b8a29157
  LV UUIDW0FAvX-EUDc-v7QR-4A2A-3aSX-yGC5-2jREeV
  LV Write Accessread/write
  LV Creation host, time host4, 2022-12-04 01:43:17 +0200
  LV Status  NOT available
  LV Size210.00 GiB
  Current LE 1680
  Segments   7
  Allocation inherit
  Read ahead sectors auto

  --- Segments ---
  Logical extents 0 to 796:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02
Physical extents 0 to 796

  Logical extents 797 to 1593:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03
Physical extents 0 to 796

  Logical extents 1594 to 1613:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 49 to 68

  Logical extents 1614 to 1616:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 244 to 246

  Logical extents 1617 to 1619:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 154 to 156

  Logical extents 1620 to 1648:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 177 to 205

  Logical extents 1649 to 1679:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 531 to 561

I hope it helps.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V5QX3OJ6KYMXGBKIUEPM7ES4L4HG5UJ6/


[ovirt-users] Re: Issue with disk import and disk upload - ovirt 4.5.2

2022-10-04 Thread Nir Soffer
On Mon, Oct 3, 2022 at 4:11 PM Proniakin Marcin 
wrote:

>
> Hello,
>
>
> After upgrading ovirt to version 4.5.2, I've experienced issue with using
> import function in import disk window in storage domain after attaching it
> to data center. Logs in the attachment (#1).
>
>
> Second issue is with uploading disks from storage domain window.
>
> Using example from attachment (#2):
>
> When choosing to upload disk to domain portal-1 from upload function in
> the storage domain disk window, ovirt chooses wrong data-center dev-1
> (dev-1 datacenter has domain dev-1, portal-1 datacenter has domain
> portal-1) and wrong host to upload. Accepting this upload always fails.
>
>
> When choosing to upload disk from storage -> disks window works fine.
>
>
> Issues confirmed on two independent ovirt servers (both with version
> 4.5.2).
>

Please report an issue in
https://github.com/oVirt/ovirt-engine/issues

ovirt 4.5.2 introduced the ovirt-img tool - you can use it to upload images
from the command line instead of the UI. This is much faster, more reliable,
and support many features like format on the fly format conversion.

Example usage:

ovirt-img upload-disk --engine-url https://example.com \
--username username \
--password-file /path/to/password-file \
--cafile ca.pem \
--storage-domain mydomain \
/path/to/iso

See ovirt-im upload-disk --help for more options.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JKM5JGCNGZYRK2WT34KDZZGRHCLSHEED/


[ovirt-users] Re: Veeam Backup for RHV (oVirt)

2022-07-27 Thread Nir Soffer
On Wed, Jul 27, 2022 at 2:56 PM  wrote:
>
> Hi!
> Not really sure if this is right place to ask, but..
>
> I am trying to use Veeam Backup for Red Hat Virtualization on oVirt 4.5.1.
> I have been using it on version 4.4.10.7 and it works ok there.
>
> On Veeam Release page it says that supported OS is RHV 4.4SP1 (oVirt 4.5).
>
> When i try to do backup, this is what i get from Veeam backup.
> No errors in vdsm.log and engine.log..
>
> 2022-07-27 08:08:44.153 00039 [19545] INFO | [LoggingEventsManager_39]: 
> Add to storage LoggingEvent [id: 34248685-a193-4df5-8ff2-838f738e211c, Type: 
> BackupStartedNew]
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [TaskManager_39]: Create and 
> run new async task. [call method:'RunBackupChain']
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [AsyncTask_39]: New AsynTask 
> created [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f', description:'', 
> type:'BACKUP_VM_POLICY']
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [AsyncTask_39]: Prepare 
> AsynTask to run [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [BackupPolicy_31]: Refresh 
> VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [BackupPolicy_31]: Begin 
> updating list of active VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = True]
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [RhevCluster_31]: Test 
> connection to cluster [IP: engine.example.org, Port: 443, User: 
> admin@ovirt@internalsso]
> 2022-07-27 08:08:44.189 00039 [19545] INFO | [TaskManager_39]: AsyncTask 
> registered. [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:44.371 00031 [19545] INFO | [RhevCluster_31]: Test 
> connection to cluster success. Status: Success. Message:
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicyManager_31]: 
> Refreshing the policies data...
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: Begin 
> updating list of active VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = False]
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: List of 
> active VMs updated for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = False]. Number of active VMs '1'
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicyManager_31]: 
> Policies data has been refreshed.
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: List of 
> active VMs updated for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = True]. Number of active VMs '1'
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: Found the 
> '1' VMs to backup in policy '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.564 00031 [19545] INFO | [BackupPolicy_31]: * 
> Parallel policy runner has started * for policy [Name:'test5', ID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.564 00031 [19545] INFO | [VeeamBackupServer_31]: Test 
> connection to backup server [IP: 'veeambr.example.org', Port: '10006', User: 
> 'rhvproxy']
> 2022-07-27 08:08:44.931 00031 [19545] INFO | [VeeamBackupServer_31]: Test 
> connection to backup server [IP: 'veeambr.example.org', Port: '10006']. 
> Connection status: ConnectionSuccess. Version: 11.0.1.1261
> 2022-07-27 08:08:45.423 00031 [19545] INFO | [BackupPolicy_31]: 
> Successfully called CreateVeeamPolicySession for job [UID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'], session [UID: 
> 'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: 
> Successfully called RetainPolicyVms for job [UID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'] with VMs: 
> 50513a65-6ccc-479b-9b61-032e0961b016
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: Start 
> calculating maxPointsCount
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: End 
> calculating maxPointsCount. Result = 7
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: Starting 
> validate repository schedule. Repository [UID: 
> '237e41d6-7c67-4a1f-80bf-d7c73c481209', MaxPointsCount: '7', 
> IsPeriodicFullRequired: 'False']
> 2022-07-27 08:08:46.595 00031 [19545] INFO | [BackupPolicy_31]: End 
> validate repository schedule. Result: [IsScheduleValid: 'True', ErrorMessage: 
> '']
> 2022-07-27 08:08:46.597 00031 [19545] INFO | [SessionManager_31]: Start 
> registering a new session[Id: 'b6f3f0e1-7aab-41cb-b0e7-10f5b2ed6708']
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [SessionManager_31]: Session 
> registered. [Id:'b6f3f0e1-7aab-41cb-b0e7-10f5b2ed6708']
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [BackupPolicy_31]: Backup VM 
> [id:'50513a65-6ccc-479b-9b61-032e0961b016'] starting...
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [BackupPolicy_31]: 
> RetentionMergeDisabled: false
> 2022-07-27 08:08:46.640 00031 [19545] 

[ovirt-users] Re: Cannot Enable incremental backup

2022-07-06 Thread Nir Soffer
On Thu, Jul 7, 2022 at 12:40 AM Jonas  wrote:

> Hello all
>
> I'm trying to create incremental backups for my VMs on a testing cluster
> and am using the functions from
> https://gitlab.com/nirs/ovirt-stress/-/blob/master/backup/backup.py.
>
Note that the VM configuration is not backed up, so restoring
requires creating a new VM with the restored disks.

So far it works well, but on some disks it is not possible to enable
> incremental backups even when the VM is powered off (see screenshot below).
> Does anyone know why this might be the case and how to activate it? I think
> I already checked the docs and didn't find anything but feel free to nudge
> me in the right direction.
>
It look like you try to enable incremental backup *after* the disk was
created
which is not possible if the disk is using raw format. I guess the disk is
on
file based storage (NFS, Glsuter) and thin volume on file based storage is
using raw format by default.

The way to fix this is to convert the disk to qcow2 format - this feature
is available since ovirt 4.5.0, but it works only via the API/SDK.

Here is and example code for converting disk format:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/7c17fe326fc1b67ba581d3c244ec019508f3ac25/examples/convert_disk.py

The example is not finished and is still in review, but it should show
how to use the API.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W5YKJUXOHDMWUDZ4EHOPIRCHZS6MIB5G/


[ovirt-users] Re: Import KVM VMs on individual iSCSI luns

2022-07-02 Thread Nir Soffer
On Sat, Jul 2, 2022 at 9:40 AM  wrote:
>
> Greetings,
>
> Is it possible with oVirt to  import existing VMs where the underlying 
> storage is on raw iSCSI luns and to keep them on those luns?
>
> The historical scenario is that we have Virtual farms in multiple sites 
> managed by an ancient Orchestration tool that does not support modern OS's as 
> the hypervisor.
> - In each site, there are clusters of hypervisors/Hosts  that have visibility 
>  to the same iSCSI luns.
> - Each VM has it's own set of iscsi luns that are totally dedicated to that VM
> - Each VM is using LVM to manage the disk
> - Each Host has LVM filtering configured to NOT manage the VM's iscsi luns
> - The VMs can be live migrated from any Hypervisor within the cluster to any 
> other  Hypervisor in that same cluster
>
> We are attempting to bring this existing environment into oVirt without  
> replacing the storage model.
> Is there any documentation that will serve as a guide for this scenario?
>
> In a lab environment, we have successfully
> - Added 2 hypervisors (hosts) and oVirt can see their  VMs as 
> external-ovtest1 and external-ovtest2
> - Removed the LVM filtering on the hosts

This should not be needed. The lvm filter ensure that the host can
manage only the
disks used by the host (for example for the boot disk). Other disks
(e.g. your LUNs)
are not managed by the host, but they are managed by oVirt.

> - Created a storage domain that is able to see the iscsi luns, but we have 
> not yet  done the 'add' of each lun

Don't create a storage domain, since you want to use the LUNs directly.
Adding the LUNs to the storage domain can destroy the data on the LUN.

> Is it possible to import these luns as raw block devices without LVM being 
> layered on top of them?

Yes, this is called Direct LUN in oVirt.

> Is it required  to actually import the luns into a storage domain, or can the 
> VM's still be imported if all luns are visible on all hosts in the cluster?

There is no way to import the VM as is, but you can recreate the VM
with the same LUNs.

> In the grand scheme of things, are we trying to do something that is not 
> possible with  oVirt?
> If it is possible, we would greatly appreciate tips, pointers, links to docs 
> etc that will help us migrate this environment to oVirt.

You can do this:

1. Connect to the storage server with the relevant LUNs. The LUNs used by
   the VM should be visible in engine UI (New storage domain dialog).
2. Create a new VM using the same configuration you had in the original VM
3. Attach the right LUNs to the VM (using Direct LUN)
4. In the VM, make sure you use the right disks - the best way is to use:

/dev/disk/by-id/{virtaio}-{serial}

When {serial} is the disk UUID seen in engine UI for the direct LUN.
{virtaio} is correct if you connect the disks using virtio, if you use
virtio-scsi the string will be different.

You may also need to install extra components on the boot disk, like
qemu-guest-agent or virtio drivers.

Note that oVirt does not use the LUNs directly, but the multipath device on
top of the SCSI device. This should be transparent, but be prepared to see
/dev/mapper/{wwid} instead of /dev/sdXXX.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NW5HEA7YWS5OPPYOZLUQHQQEB4MKZC7U/


[ovirt-users] Re: Failed to Start services ovirt-imageio

2022-06-26 Thread Nir Soffer
On Fri, Jun 24, 2022 at 1:33 AM АБИОЛА А. Э  wrote:
>
> Hello Nir,

Hi АБИОЛА, you forgot to reply to users@ovirt.org. Please always reply
to the mailing list so the discussion is public.

>   I am so grateful for your reply and helpful information. I have 
> successfully deployed the Ovirt engine and it is up and running fine, but now 
> I am having another issue uploading .ISO files to disk. I tried to upload 
> .iso files but the status was "Paused by the system" and was waiting 15 hours 
> for the status to change but still no changes, i tried to delete the .ISO 
> file but i can't remove at all, its status was "Finalizing cleaning up" i 
> waited 20 hours for the status to change but with no success. Kindly guard me 
> through the process to fix this errors, so I can upload .ISO files to the 
> Disk to launch the VM successfully. Please see below picture for the error.

When the system pauses an upload, you can cancel the upload from the
upload menu:

Storage > Disks > Upload > Cancel

This will delete the new disk and end the image transfer.

To understand why the system paused the transfer, please share more info:

- Which oVirt version are you running?
- Did you add the engine CA certificate to the browser?

You can check if the browser is configured correctly by opening an
upload dialog:

Storage > Disks > Upload > Start

and clicking the "Test Connection" button. If this fails, you need to
add the engine
CA to the browser. When "Test Connection" is successful, try to upload again.

If the upload fails again, please share logs showing the timeframe of
this error:
- engine log from /var/log/ovirt-engine/engine.log
- vdsm log from the host performing the upload (see the events tag in
engine UI) from /var/log/vdsm/vdsm.log
- ovirt-imageio log from the host performing the upload from
/var/log/ovirt-imageio/daemon.log

Nir

>
> I will be glad to read from you soon.
>
> Appreciated
>
>
> On Tue, Jun 21, 2022 at 3:19 PM Nir Soffer  wrote:
>>
>> On Tue, Jun 21, 2022 at 8:18 AM АБИОЛА А. Э  wrote:
>> >
>> > Hello Sir,
>> > I am new to Ovirt and I tried to deploy it 3weeks into my oracle linux 
>> > with no success.
>> > I got the following error messages
>> > Please how can i fix this error to successfully deploy it.
>> > I will be glad to read from you soon.
>> > Appreciated
>> > AAE.
>>
>> Which oVirt version is this?
>>
>> You can try to check:
>>
>> systemctl status ovirt-imageio
>>
>> It usually show the latest logs which may help to understand why the
>> service could not start.
>>
>> What do you have in /var/log/ovirt-imageio/daemon.log?
>>
>> Nir
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YPGFFMB7H4UONQMLU3QIB4CR2XPVG7YS/


[ovirt-users] Re: Failed to Start services ovirt-imageio

2022-06-21 Thread Nir Soffer
On Tue, Jun 21, 2022 at 8:18 AM АБИОЛА А. Э  wrote:
>
> Hello Sir,
> I am new to Ovirt and I tried to deploy it 3weeks into my oracle linux with 
> no success.
> I got the following error messages
> Please how can i fix this error to successfully deploy it.
> I will be glad to read from you soon.
> Appreciated
> AAE.

Which oVirt version is this?

You can try to check:

systemctl status ovirt-imageio

It usually show the latest logs which may help to understand why the
service could not start.

What do you have in /var/log/ovirt-imageio/daemon.log?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IAVOIVROVX45Z27JA66RG25MFPVNHI6H/


[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer
On Thu, Jun 2, 2022 at 10:33 PM Patrick Hibbs  wrote:
>
> Here's the ausearch results from that host. Looks like more than one
> issue. (openvswitch is also in there.)

I did not see anything related to the issues you reported and selinux
is likely not related. However there are unexpected denials that
may be harmless but should not appear in report.

I think filing a separate bug for the 2 kinds of deinals there makes
sense, someone should check and fix either the selinux policy or
the program trying to do stuff it should not.

I think should be reported for qemu-kvm in bugzilla:

time->Thu Jun  2 10:33:38 2022
type=PROCTITLE msg=audit(1654180418.940:5119):
proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D57656253657276696365735F486F6E6F6B612C64656275672D746872656164733D6F6E002D53002D6F626A656374007B22716F6D2D74797065223A22736563726574222C226964223A226D61737465724B657930222C22666F726D617422
type=SYSCALL msg=audit(1654180418.940:5119): arch=c03e syscall=257
success=no exit=-13 a0=ff9c a1=5647b7ffd910 a2=0 a3=0 items=0
ppid=1 pid=3639 auid=4294967295 uid=107 gid=107 euid=107 suid=107
fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295
comm="qemu-kvm" exe="/usr/libexec/qemu-kvm"
subj=system_u:system_r:svirt_t:s0:c9,c704 key=(null)
type=AVC msg=audit(1654180418.940:5119): avc:  denied  { search } for
pid=3639 comm="qemu-kvm" name="1055" dev="proc" ino=28142
scontext=system_u:system_r:svirt_t:s0:c9,c704
tcontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tclass=dir
permissive=0

I'm not sure where this should be reported, maybe kernel?

type=SYSCALL msg=audit(1651812155.891:50): arch=c03e syscall=175
success=yes exit=0 a0=55bcab394ed0 a1=51494 a2=55bca960b8b6
a3=55bcaab64010 items=0 ppid=1274 pid=1282 auid=4294967295 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295
comm="modprobe" exe="/usr/bin/kmod"
subj=system_u:system_r:openvswitch_load_module_t:s0 key=(null)
type=AVC msg=audit(1651812155.891:50): avc:  denied  { search } for
pid=1282 comm="modprobe" name="events" dev="tracefs" ino=2060
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:tracefs_t:s0 tclass=dir permissive=0
type=AVC msg=audit(1651812155.891:50): avc:  denied  { search } for
pid=1282 comm="modprobe" name="events" dev="tracefs" ino=2060
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:tracefs_t:s0 tclass=dir permissive=0

> I'll see about opening the bug. Should I file it on oVirt's github or
> the RedHat bugzilla?

Bugzilla is still the preferred place, but you can use github if you like,
we will look at it in both places.

Nir

> -Patrick Hibbs
>
> On Thu, 2022-06-02 at 22:08 +0300, Nir Soffer wrote:
> > On Thu, Jun 2, 2022 at 9:52 PM Patrick Hibbs 
> > wrote:
> > >
> > > The attached logs are from the cluster hosts that were running the
> > > HA
> > > VMs during the failures.
> > >
> > > I've finally got all of my HA VMs up again. The last one didn't
> > > start
> > > again until after I freed up more space in the storage domain than
> > > what
> > > was originally available when the VM was running previously. (It
> > > now
> > > has over 150GB of free space. Which should be more than enough, but
> > > it
> > > didn't boot with 140GB avaiable)
> > >
> > > SideNote:
> > > I just found this in the logs on the original host that the HA VMs
> > > were
> > > running on:
> > >
> > > ---snip---
> > > Jun 02 10:33:29 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:29
> > > 674607
> > > [1054]: s1 check_our_lease warning 71 last_success 674536
> > >   # semanage
> > > fcontext -a -t virt_image_t '1055'
> > >   *  Plugin
> > > catchall (2.13 confidence) suggests   **
> > >   Then you
> > > should
> > > report this as a bug.
> > >   You can
> > > generate
> > > a local policy module to allow this access.
> > >   Do
> >
> > Not clear what is the selinux issue. If you run:
> >
> > ausearch -m avc
> >
> > It should be more clear.
> >
> > > Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45
> > > 674623
> > > [1054]: s1 kill 3441 sig 15 count 8
> >

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer
On Thu, Jun 2, 2022 at 9:52 PM Patrick Hibbs  wrote:
>
> The attached logs are from the cluster hosts that were running the HA
> VMs during the failures.
>
> I've finally got all of my HA VMs up again. The last one didn't start
> again until after I freed up more space in the storage domain than what
> was originally available when the VM was running previously. (It now
> has over 150GB of free space. Which should be more than enough, but it
> didn't boot with 140GB avaiable)
>
> SideNote:
> I just found this in the logs on the original host that the HA VMs were
> running on:
>
> ---snip---
> Jun 02 10:33:29 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:29 674607
> [1054]: s1 check_our_lease warning 71 last_success 674536
>   # semanage
> fcontext -a -t virt_image_t '1055'
>   *  Plugin
> catchall (2.13 confidence) suggests   **
>   Then you should
> report this as a bug.
>   You can generate
> a local policy module to allow this access.
>   Do

Not clear what is the selinux issue. If you run:

ausearch -m avc

It should be more clear.

> Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45 674623
> [1054]: s1 kill 3441 sig 15 count 8
> Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45 674623
> [1054]: s1 kill 4337 sig 15 count 8
> Jun 02 10:33:46 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:46 674624
> [1054]: s1 kill 3206 sig 15 count 9

This means that the host could not access the storage for 80 seconds, and the
leases expired. When leases expire, sanlock must kill the process holding the
lease. Here we see that sanlock send a SIGTERM to 3 processes.

If these are VMs, they will pause and libvirt will release the lease.

I can check the log deeper next week.

Nir

> Jun 02 10:33:47 ryuki.codenet kernel: ovirtmgmt: port 4(vnet2) entered
> disabled state
> ---snip---
>
> That looks like some SELinux failure.
>
> -Patrick Hibbs
>
> On Thu, 2022-06-02 at 19:44 +0300, Nir Soffer wrote:
> > On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs 
> > wrote:
> > >
> > > OK, so the data storage domain on a cluster filled up to the point
> > > that
> > > the OS refused to allocate any more space.
> > >
> > > This happened because I tried to create a new prealloc'd disk from
> > > the
> > > Admin WebUI. The disk creation claims to be completed successfully,
> > > I've not tried to use that disk yet, but due to a timeout with the
> > > storage domain in question the engine began trying to fence all of
> > > the
> > > HA VMs.
> > > The fencing failed for all of the HA VMs leaving them in a powered
> > > off
> > > state. Despite all of the HA VMs being up at the time, so no
> > > reallocation of the leases should have been necessary.
> >
> > Leases are not reallocated during fencing, not sure why you expect
> > this to happen.
> >
> > > Attempting to
> > > restart them manually from the Admin WebUI failed. With the
> > > original
> > > host they were running on complaining about "no space left on
> > > device",
> > > and the other hosts claiming that the original host still held the
> > > VM
> > > lease.
> >
> > No space left on device may be an unfortunate error from sanlock,
> > meaning that there is no locksapce. This means the host has trouble
> > adding the lockspace, or it did not complete yet.
> >
> > > After cleaning up some old snapshots, the HA VMs would still not
> > > boot.
> > > Toggling the High Availability setting for each one and allowing
> > > the
> > > lease to be removed from the storage domain was required to get the
> > > VMs
> > > to start again.
> >
> > If  you know that the VM is not running, disabling the lease
> > temporarily is
> > a good way to workaround the issue.
> >
> > > Re-enabling the High Availability setting there after
> > > fixed the lease issue. But now some, not all, of the HA VMs are
> > > still
> > > throwing "no space left on device" errors when attempting to start
> > > them. The others are working just fine even with their HA lease
> > > enabled.
> >
> > All erros come from same host(s) or some vms cannot start while
> > others can on the same host?
> >
> > > My qu

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer
On Thu, Jun 2, 2022 at 7:44 PM Nir Soffer  wrote:
>
> On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs  wrote:
> >
> > OK, so the data storage domain on a cluster filled up to the point that
> > the OS refused to allocate any more space.
> >
> > This happened because I tried to create a new prealloc'd disk from the
> > Admin WebUI. The disk creation claims to be completed successfully,
> > I've not tried to use that disk yet, but due to a timeout with the
> > storage domain in question the engine began trying to fence all of the
> > HA VMs.
> > The fencing failed for all of the HA VMs leaving them in a powered off
> > state. Despite all of the HA VMs being up at the time, so no
> > reallocation of the leases should have been necessary.
>
> Leases are not reallocated during fencing, not sure why you expect
> this to happen.
>
> > Attempting to
> > restart them manually from the Admin WebUI failed. With the original
> > host they were running on complaining about "no space left on device",
> > and the other hosts claiming that the original host still held the VM
> > lease.
>
> No space left on device may be an unfortunate error from sanlock,
> meaning that there is no locksapce. This means the host has trouble
> adding the lockspace, or it did not complete yet.
>
> > After cleaning up some old snapshots, the HA VMs would still not boot.
> > Toggling the High Availability setting for each one and allowing the
> > lease to be removed from the storage domain was required to get the VMs
> > to start again.
>
> If  you know that the VM is not running, disabling the lease temporarily is
> a good way to workaround the issue.
>
> > Re-enabling the High Availability setting there after
> > fixed the lease issue. But now some, not all, of the HA VMs are still
> > throwing "no space left on device" errors when attempting to start
> > them. The others are working just fine even with their HA lease
> > enabled.
>
> All erros come from same host(s) or some vms cannot start while
> others can on the same host?
>
> > My questions are:
> >
> > 1. Why does oVirt claim to have a constantly allocated HA VM lease on
> > the storage domain when it's clearly only done while the VM is running?
>
> Leases are allocated when a VM is created. This allocated a the lease space
> (1MiB) in the external leases special volume, and bind it to the VM ID.
>
> When VM starts, it acquires the lease for its VM ID. If sanlock is not 
> connected
> to the lockspace on this host, this may fail with the confusing
> "No space left on device" error.
>
> > 2. Why does oVirt deallocate the HA VM lease when performing a fencing
> > operation?
>
> It does not. oVirt does not actually "fence" the VM. If the host running the 
> VM
> cannot access storage and update the lease, the host lose all leases on that
> storage. The result is pausing all the VM holding a lease on that storage.
>
> oVirt will try to start the VM on another host, which will try to
> acquire the lease
> again on the new host. If enough time passed since the original host lost
> access to storage, the lease can be acquired on the new host. If not, this
> will happen in the next retrie(s).
>
> If the original host did not lose access to storage, and it is still
> updating the
> lease you cannot acquire the lease from another host. This protect the VM
> from split-brain that will corrupt the vm disk.
>
> > 3. Why can't oVirt clear the old HA VM lease when the VM is down and
> > the storage pool has space available? (How much space is even needed?
> > The leases section of the storage domain in the Admin WebUI doesn't
> > contain any useful info beyond the fact that a lease should exist for a
> > VM even when it's off.)
>
> Acquiring the lease is possible only if the lease is not held on another host.
>
> oVirt does not support acquiring a held lease by killing the process holding
> the lease on another host, but sanlock provides such capability.
>
> > 4. Is there a better way to force start a HA VM when the lease is old
> > and the VM is powered off?
>
> If the original VM is powered off for enough time (2-3 minutes), the lease
> expires and starting the VM on another host should succeed.
>
> > 5. Should I file a bug on the whole HA VM failing to reacquire a lease
> > on a full storage pool?
>
> The external lease volume is not fully allocated. If you use thin provisioned
> storage, and the there is really no storage space, it is possible that 
> creating
> a new lease will fail, but starting and stopping VM that have leases should 
&

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer
On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs  wrote:
>
> OK, so the data storage domain on a cluster filled up to the point that
> the OS refused to allocate any more space.
>
> This happened because I tried to create a new prealloc'd disk from the
> Admin WebUI. The disk creation claims to be completed successfully,
> I've not tried to use that disk yet, but due to a timeout with the
> storage domain in question the engine began trying to fence all of the
> HA VMs.
> The fencing failed for all of the HA VMs leaving them in a powered off
> state. Despite all of the HA VMs being up at the time, so no
> reallocation of the leases should have been necessary.

Leases are not reallocated during fencing, not sure why you expect
this to happen.

> Attempting to
> restart them manually from the Admin WebUI failed. With the original
> host they were running on complaining about "no space left on device",
> and the other hosts claiming that the original host still held the VM
> lease.

No space left on device may be an unfortunate error from sanlock,
meaning that there is no locksapce. This means the host has trouble
adding the lockspace, or it did not complete yet.

> After cleaning up some old snapshots, the HA VMs would still not boot.
> Toggling the High Availability setting for each one and allowing the
> lease to be removed from the storage domain was required to get the VMs
> to start again.

If  you know that the VM is not running, disabling the lease temporarily is
a good way to workaround the issue.

> Re-enabling the High Availability setting there after
> fixed the lease issue. But now some, not all, of the HA VMs are still
> throwing "no space left on device" errors when attempting to start
> them. The others are working just fine even with their HA lease
> enabled.

All erros come from same host(s) or some vms cannot start while
others can on the same host?

> My questions are:
>
> 1. Why does oVirt claim to have a constantly allocated HA VM lease on
> the storage domain when it's clearly only done while the VM is running?

Leases are allocated when a VM is created. This allocated a the lease space
(1MiB) in the external leases special volume, and bind it to the VM ID.

When VM starts, it acquires the lease for its VM ID. If sanlock is not connected
to the lockspace on this host, this may fail with the confusing
"No space left on device" error.

> 2. Why does oVirt deallocate the HA VM lease when performing a fencing
> operation?

It does not. oVirt does not actually "fence" the VM. If the host running the VM
cannot access storage and update the lease, the host lose all leases on that
storage. The result is pausing all the VM holding a lease on that storage.

oVirt will try to start the VM on another host, which will try to
acquire the lease
again on the new host. If enough time passed since the original host lost
access to storage, the lease can be acquired on the new host. If not, this
will happen in the next retrie(s).

If the original host did not lose access to storage, and it is still
updating the
lease you cannot acquire the lease from another host. This protect the VM
from split-brain that will corrupt the vm disk.

> 3. Why can't oVirt clear the old HA VM lease when the VM is down and
> the storage pool has space available? (How much space is even needed?
> The leases section of the storage domain in the Admin WebUI doesn't
> contain any useful info beyond the fact that a lease should exist for a
> VM even when it's off.)

Acquiring the lease is possible only if the lease is not held on another host.

oVirt does not support acquiring a held lease by killing the process holding
the lease on another host, but sanlock provides such capability.

> 4. Is there a better way to force start a HA VM when the lease is old
> and the VM is powered off?

If the original VM is powered off for enough time (2-3 minutes), the lease
expires and starting the VM on another host should succeed.

> 5. Should I file a bug on the whole HA VM failing to reacquire a lease
> on a full storage pool?

The external lease volume is not fully allocated. If you use thin provisioned
storage, and the there is really no storage space, it is possible that creating
a new lease will fail, but starting and stopping VM that have leases should not
be affected. But if you reach to the point when you don't have enough storage
space you have much bigger trouble and you should fix urgently.

Do you really have issue with available space? What does engine reports
about the storage domain? What does the underlying storage reports?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OYA5KFPJIZXUGDK6CZFO6BWHY7ZDT7OR/


[ovirt-users] Re: storage high latency, sanlock errors, cluster instability

2022-05-29 Thread Nir Soffer
On Sun, May 29, 2022 at 9:03 PM Jonathan Baecker  wrote:
>
> Am 29.05.22 um 19:24 schrieb Nir Soffer:
>
> On Sun, May 29, 2022 at 7:50 PM Jonathan Baecker  wrote:
>
> Hello everybody,
>
> we run a 3 node self hosted cluster with GlusterFS. I had a lot of problem 
> upgrading ovirt from 4.4.10 to 4.5.0.2 and now we have cluster instability.
>
> First I will write down the problems I had with upgrading, so you get a 
> bigger picture:
>
> engine update when fine
> But nodes I could not update because of wrong version of imgbase, so I did a 
> manual update to 4.5.0.1 and later to 4.5.0.2. First time after updating it 
> was still booting into 4.4.10, so I did a reinstall.
> Then after second reboot I ended up in the emergency mode. After a long 
> searching I figure out that lvm.conf using use_devicesfile now but there it 
> uses the wrong filters. So I comment out this and add the old filters back. 
> This procedure I have done on all 3 nodes.
>
> When use_devicesfile (default in 4.5) is enabled, lvm filter is not
> used. During installation
> the old lvm filter is removed.
>
> Can you share more info on why it does not work for you?
>
> The problem was, that the node could not mount the gluster volumes anymore 
> and ended up in emergency mode.
>
> - output of lsblk
>
> NAME   MAJ:MIN RM   SIZE 
> RO TYPE  MOUNTPOINT
> sda  8:00   1.8T  
> 0 disk
> `-XA1920LE10063_HKS028AV   253:00   1.8T  
> 0 mpath
>   |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda_tmeta   253:16   0 9G  
> 0 lvm
>   | `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda-tpool 253:18   0   1.7T  
> 0 lvm
>   |   |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda 253:19   0   1.7T  
> 1 lvm
>   |   |-gluster_vg_sda-gluster_lv_data 253:20   0   100G  
> 0 lvm   /gluster_bricks/data
>   |   `-gluster_vg_sda-gluster_lv_vmstore  253:21   0   1.6T  
> 0 lvm   /gluster_bricks/vmstore
>   `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda_tdata   253:17   0   1.7T  
> 0 lvm
> `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda-tpool 253:18   0   1.7T  
> 0 lvm
>   |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda 253:19   0   1.7T  
> 1 lvm
>   |-gluster_vg_sda-gluster_lv_data 253:20   0   100G  
> 0 lvm   /gluster_bricks/data
>   `-gluster_vg_sda-gluster_lv_vmstore  253:21   0   1.6T  
> 0 lvm   /gluster_bricks/vmstore
> sr0 11:01  1024M  
> 0 rom
> nvme0n1259:00 238.5G  
> 0 disk
> |-nvme0n1p1259:10 1G  
> 0 part  /boot
> |-nvme0n1p2259:20   134G  
> 0 part
> | |-onn-pool00_tmeta   253:10 1G  
> 0 lvm
> | | `-onn-pool00-tpool 253:3087G  
> 0 lvm
> | |   |-onn-ovirt--node--ng--4.5.0.2--0.20220513.0+1   253:4050G  
> 0 lvm   /
> | |   |-onn-pool00 253:7087G  
> 1 lvm
> | |   |-onn-home   253:80 1G  
> 0 lvm   /home
> | |   |-onn-tmp253:90 1G  
> 0 lvm   /tmp
> | |   |-onn-var253:10   015G  
> 0 lvm   /var
> | |   |-onn-var_crash  253:11   010G  
> 0 lvm   /var/crash
> | |   |-onn-var_log253:12   0 8G  
> 0 lvm   /var/log
> | |   |-onn-var_log_audit  253:13   0 2G  
> 0 lvm   /var/log/audit
> | |   |-onn-ovirt--node--ng--4.5.0.1--0.20220511.0+1   253:14   050G  
> 0 lvm
> | |   `-onn-var_tmp253:15   010G  
> 0 lvm   /var/tmp
> | |-onn-pool00_tdata   253:2087G  
> 0 lvm
> | | `-onn-pool00-tpool 253:3087G  
> 0 lvm
> | |   |-onn-ovirt--node--ng--4.5.0.2--0.20220513.0+1   253:4050G  
> 0 lvm   /
> | |   |-onn-pool00 253:7087G  
> 1 lvm
> | |   |-onn-home   253:80 1G  
> 0 lvm   /home
> | |   |-onn-tmp253:90 1G  
> 0 lvm   /tmp
> | |   |-onn-var

[ovirt-users] Re: storage high latency, sanlock errors, cluster instability

2022-05-29 Thread Nir Soffer
On Sun, May 29, 2022 at 7:50 PM Jonathan Baecker  wrote:
>
> Hello everybody,
>
> we run a 3 node self hosted cluster with GlusterFS. I had a lot of problem 
> upgrading ovirt from 4.4.10 to 4.5.0.2 and now we have cluster instability.
>
> First I will write down the problems I had with upgrading, so you get a 
> bigger picture:
>
> engine update when fine
> But nodes I could not update because of wrong version of imgbase, so I did a 
> manual update to 4.5.0.1 and later to 4.5.0.2. First time after updating it 
> was still booting into 4.4.10, so I did a reinstall.
> Then after second reboot I ended up in the emergency mode. After a long 
> searching I figure out that lvm.conf using use_devicesfile now but there it 
> uses the wrong filters. So I comment out this and add the old filters back. 
> This procedure I have done on all 3 nodes.

When use_devicesfile (default in 4.5) is enabled, lvm filter is not
used. During installation
the old lvm filter is removed.

Can you share more info on why it does not work for you?
- output of lsblk
- The old lvm filter used, and why it was needed
- output of vdsm-tool config-lvm-filter

If using lvm devices does not work for you, you can enable the lvm
filter in vdsm configuration
by adding a drop-in file:

$ cat /etc/vdsm/vdsm.conf.d/99-local.conf
[lvm]
config_method = filter

And run:

vdsm-tool config-lvm-filter

to configure the lvm filter in the best way for vdsm. If this does not create
the right filter we would like to know why, but in general you should use
lvm devices since it avoids the trouble of maintaining the filter and dealing
with upgrades and user edited lvm filter.

If  you disable use_devicesfile, the next vdsm upgrade will enable it
back unless
you change the configuration.

Also even if you disable use_devicesfile in lvm.conf, vdsm still use
--devices instead
of filter when running lvm commands, and lvm commands run by vdsm ignore your
lvm filter since the --devices option overrides the system settings.

...
> I notice some unsync volume warning, but because I had this in the past to, 
> after upgrading, I though after some time they will disappear. The next day 
> there still where there, so I decided to put the nodes again in the 
> maintenance mode and restart the glusterd service. After some time the sync 
> warnings where gone.

Not clear what these warnings are, I guess Gluster warning?

> So now the actual problem:
>
> Since this time the cluster is unstable. I get different errors and warning, 
> like:
>
> VM [name] is not responding
> out of nothing HA VM gets migrated
> VM migration can fail
> VM backup with snapshoting and export take very long

How do you backup the vms? do you sue a backup application? how is it
configured?

> VMs are getting very slow some times
> Storage domain vmstore experienced a high latency of 9.14251
> ovs|1|db_ctl_base|ERR|no key "dpdk-init" in Open_vSwitch record "." 
> column other_config
> 489279 [1064359]: s8 renewal error -202 delta_length 10 last_success 489249
> 444853 [2243175]: s27 delta_renew read timeout 10 sec offset 0 
> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids
> 471099 [2243175]: s27 delta_renew read timeout 10 sec offset 0 
> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids
> many of: 424035 [2243175]: s27 delta_renew long write time XX sec

All these issues tell use that your storage is not working correctly.

sanlock.log is full of renewal errors form May:

$ grep 2022-05- sanlock.log | wc -l
4844

$ grep 2022-05- sanlock.log | grep 'renewal error' | wc -l
631

But there is lot of trouble from earlier months:

$ grep 2022-04- sanlock.log | wc -l
844
$ grep 2022-04- sanlock.log | grep 'renewal error' | wc -l
29

$ grep 2022-03- sanlock.log | wc -l
1609
$ grep 2022-03- sanlock.log | grep 'renewal error' | wc -l
483

$ grep 2022-02- sanlock.log | wc -l
826
$ grep 2022-02- sanlock.log | grep 'renewal error' | wc -l
242

Here sanlock log looks healthy:

$ grep 2022-01- sanlock.log | wc -l
3
$ grep 2022-01- sanlock.log | grep 'renewal error' | wc -l
0

$ grep 2021-12- sanlock.log | wc -l
48
$ grep 2021-12- sanlock.log | grep 'renewal error' | wc -l
0

vdsm log shows that 2 domains are not accessible:

$ grep ERROR vdsm.log
2022-05-29 15:07:19,048+0200 ERROR (check/loop) [storage.monitor]
Error checking path
/rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata
(monitor:511)
2022-05-29 16:33:59,049+0200 ERROR (check/loop) [storage.monitor]
Error checking path
/rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata
(monitor:511)
2022-05-29 16:34:39,049+0200 ERROR (check/loop) [storage.monitor]
Error checking path
/rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata
(monitor:511)
2022-05-29 

[ovirt-users] Re: On oVirt 4.4 Can not import VM from Export domain from ovirt-4.3 nor DataDomain from ovirt-4.3

2022-05-23 Thread Nir Soffer
On Mon, May 23, 2022 at 3:31 PM  wrote:
>
> HI
>
> Thank you for fast response.
>
> In the mean time I have discovered what was the problem in my case.
>
> The problem was that export domain and data domain from oVirt 4.3 had OVF 
> where  tag is used (ID caps letters) instead of expected 
> .
>
> oVirt 4.4 expected   tag which wasn't used in this case so the 
> engine assumed that OVF files were corrupted.
>
> Fix for me was simple on Export Domain I swapped InstanceID with InstanceId.
> bash# for i in `find . -name "*.ovf"` ; do sudo sed -i 
> 's/InstanceID/InstanceId/g' $i ; done ;
>
> But I could not fix datadomain since I didn't want to dive into OVF_STORE 
> disk. I am guessing that there is a tool for editing OVF_STORE disks whit out 
> damaging the domain?!

The OVF_STORE disks contains a single tar file at offset 0. You can
extract the tar from the volume
using:

   tar xf /path/to/ovf_store/volume

On file storage this is easy -  you can modify the contents of the OVF
files in the tar, and write the
modied tar back to the volume, but you must update the size of the tar
in the ovf store metadata file.

For example:

# grep DESCRIPTION
/rhev/data-center/mnt/alpine\:_01/81738a7a-7ca6-43b8-b9d8-1866a1f81f83/images/0b0dd3b2-71a2-4c48-ad83-cea1dc900818/35dd9951-
DESCRIPTION={"Updated":true,"Size":23040,"Last Updated":"Sun Apr 24
15:46:27 IDT 2022","Storage
Domains":[{"uuid":"81738a7a-7ca6-43b8-b9d8-1866a1f81f83"}],"Disk
Description":"OVF_STORE"}

You need to keep "Size":23040, correct, since engine use it to read
the tar from storage.

On block storage updating the metadata is much harder, so I would not
go in this way.

If the issue is code expecting "InstanceId" but the actual key is
"InstanceID" the right place to
fix this is in the code, accepting "InstanceId" or "InstanceID".

In general this sounds like a bug, so you should file a bug for the
component reading the
OVF (vdsm?).

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FNCU4ZZEY5YR2HH2SIFEEFHTS5FNKM6E/


[ovirt-users] Re: oVirt / Vinchin Backup Application-Level Consistency

2022-05-19 Thread Nir Soffer
On Thu, May 19, 2022 at 11:25 AM Andrei Verovski  wrote:
>
> Hi,
>
> I’m currently testing oVirt with Vinchin backup, and for application-level 
> consistency need to make snapshot with “quiesce” option.
> What need to be done in order to activate this feature?
>
> > Guest Quiesce for Application-Level Consistency in Windows/Linux via Guest 
> > Agent: Done. Available in oVirt.
>
> Running oVirt version 4.4.7.6

The short version is that you should always get a consistent backup,
but it depends on the guest.

oVirt does not use the “quiesce” option but it use the virDomainFSFreeze[1] and
virDomainFSThaw[2] to get the same results.

I think that Vinchin supports both snapshot based and incremental backup.

In snapshot based backup, oVirt defaults to freeze the guest file system
during snapshot creation, so you should get consistent backup.

In incremental backup before 4.5, oVirt also freeze the filesystems when
entering backup mode, so you should get consistent backup.

In incremental backup in 4.5 and later, oVirt creates a temporary snapshot
for every backup, and it freezes the file systems during the snapshot, so
you should get a consistent backup.

During incremental backup, the application can use the require_consistency[3]
flag to fail to the backup if freezing the file systems failed.

Note that in all cases, getting a consistent backup requires running
qemu-guest-agent in the guest, and proper guest configuration
if you need to do something special when fsFreeze or fsThaw are called.

[1] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainFSFreeze
[2] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainFSThaw
[3] 
http://ovirt.github.io/ovirt-engine-api-model/master/#services/vm_backups/methods/add

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PAV67J3YI3BPQQTXYQRZ5VQPKIJQQEVZ/


[ovirt-users] Re: upgrade python3

2022-05-16 Thread Nir Soffer
On Mon, May 16, 2022 at 11:09 AM  wrote:
>
> Hi,
> the support should be determined by the Red Hat support, the python3 pacakges 
> have a lifespan same as rhel8, so 2029. I also have a couple of python3.8 
> packages on my ovirt-engine from the 3.8 module, that one is supported until 
> may 2023. So I don't think this is something that needs to be adressed right 
> now.
> https://access.redhat.com/support/policy/updates/rhel-app-streams-life-cycle

Correct, this is a bug in the tool reporting a security issue.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z4A2XA63SEQXPXAIMSBRQDCJ3DY4UQLP/


[ovirt-users] Re: Unable to import ovirt vm ova into aws?

2022-05-15 Thread Nir Soffer
On Fri, May 13, 2022 at 5:43 PM rickey john  wrote:
>
> I am trying to import a ubuntu 18 os ovirt vm ova template.
> For this i am creating task with below command aws ec2 import-image --region 
> ap-south-1 --description "Ovirt VM" --license-type BYOL --disk-containers 
> "file://containers.json" aws ec2 describe-import-image-tasks --region 
> ap-south-1 --import-task-ids import-ami-0755c8cd52d08ac88
>
> But unfortunately it is failing with "StatusMessage": "ClientError: No valid 
> partitions. Not a valid volume." error.
>
> can someone please guide the steps to export and  import ovirt vm ova into 
> aws ec2 instance?

oVirt OVA disk are using qcow2 format. Does aws tool support this format?

You can try to extract the OVA file (which is a tar file):

tar xf ovirt.ova

And then convert the disks to raw format:

cd extracted-ova
qemu-img convert -f qcow2 -O raw disk1.qcow2 disk1.raw

And tey to import the extracted OVA using ther raw images.

This will not be very efficient but it will help to debug this issue.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V5HNZ5ZTI4322ORADQXOTK7DNTWYXXDO/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-15 Thread Nir Soffer
On Sat, May 14, 2022 at 12:53 AM  wrote:
>
> I’ve tried your suggestion and continue to get the same results.

It will be more helpful if you describe in detail what you tried
and what the results were.

> Based on continuing investigation I’ve found on the Red Hat Knowledge base a 
> resolution to this issue, the following link references the solution: 
> https://access.redhat.com/solutions/354255.

This URL does not exist.

> However, I’ve run across another issue, since the creation of a new host 
> within ovirt

Based on the output of "ovirt-imageio --show-config" you had an all-in-one
setup, when the engine host is added to the engine as a hypervisor. This setup
is not supported but works, but adding more hosts to this kind of setup will not
work with image transfer, and it is a really bad idea to have multiple hosts and
run engine on one of them.

For example, engine can stop a host using host power management API.  If this
is the host running engine you don't have a way to start your engine unless you
have access to the host power management console.

If have more than one host, your engine should not run on any of the hosts,
and you must enable the imageio proxy (this is the default):

engine-config -s ImageTransferProxyEnabled=true

And restart engine:

systemctl restart ovirt-engine

> I’ve not been able to access the internet or reach the host/server remotely.  
>   Therefore, I’m unable to try the solution provide via the Red Hat Knowledge 
> Base.
>
> I’ve reviewed the kernel routing table displayed below:
>
>
> ip route show
> default via 20.10.20.1 dev eno5 proto static metric 100
> default via 20.10.20.1 dev eno6 proto static metric 101
> default via 20.10.20.1 dev eno7 proto static metric 102
> default via 20.10.20.1 dev ovirtmgmt proto static metric 425
> 20.10.20.0/24 dev eno5 proto kernel scope link src 20.10.20.65 metric 100
> 20.10.20.0/24 dev eno6 proto kernel scope link src 20.10.20.66 metric 101
> 20.10.20.0/24 dev eno7 proto kernel scope link src 20.10.20.67 metric 102
> 20.10.20.0/24 dev ovirtmgmt proto kernel scope link src 20.10.20.68 metric 425
>
> Is it normal behavior for the host to sever all connection when a “Host” 
> machine is added to ovirt?   Is there a solution to this issue?  I’ve 
> recognize the risks of having the host exposed to the internet, how would I 
> keep the OS/RHEL 8.6 & ovirt current?

It is hard to tell what's going on when we don't know which
hosts do you have and what is their ip address.

Please confirm that  you access engine using https:// and that
when you access the host your browser reports a secure connection
without warnings (meaning that engine CA certificate was added to
the browser).

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/22WGHAU3NHVB256S6UYT2F7G7VQZWAMG/


[ovirt-users] Re: Host cannot connect to storage domains

2022-05-12 Thread Nir Soffer
On Wed, May 11, 2022 at 3:33 PM Ritesh Chikatwar  wrote:
>
> Sorry compile-time error in this
>
> Use this:
>
> if el.find('stripeCount') is not None:
> value['stripeCount'] = el.find('stripeCount').text
> else:
> value['stripeCount'] = 1
>

Fixed in ovirt 4.5.1, see:
https://github.com/oVirt/vdsm/pull/172

As a workaround, you can apply this change locally:

diff --git a/lib/vdsm/gluster/cli.py b/lib/vdsm/gluster/cli.py
index 69154a18e..7c8e954ab 100644
--- a/lib/vdsm/gluster/cli.py
+++ b/lib/vdsm/gluster/cli.py
@@ -426,7 +426,7 @@ def _parseVolumeInfo(tree):
 value["volumeStatus"] = VolumeStatus.OFFLINE
 value['brickCount'] = el.find('brickCount').text
 value['distCount'] = el.find('distCount').text
-value['stripeCount'] = el.find('stripeCount').text
+value['stripeCount'] = el.findtext('stripeCount', '1')
 value['replicaCount'] = el.find('replicaCount').text
 value['disperseCount'] = el.find('disperseCount').text
 value['redundancyCount'] = el.find('redundancyCount').text

Nir

>
>
> On Wed, May 11, 2022 at 11:07 AM Ritesh Chikatwar  wrote:
>>
>> and once you have done the changes Restart VDSM and SuperVDSM, then your 
>> host should be able to connect
>>
>> On Wed, May 11, 2022 at 10:33 AM Ritesh Chikatwar  
>> wrote:
>>>
>>> Hey Jose,
>>>
>>>
>>> If still have a setup can you try replacing with this
>>>
>>> if el.find('stripeCount'):
>>> value['stripeCount'] = el.find('stripeCount').text
>>> else:
>>> value['stripeCount'] = '1'
>>>
>>> can you try replacing with this
>>>
>>> On Wed, Apr 27, 2022 at 9:48 PM José Ferradeira via Users  
>>> wrote:

 It did not work

 Thanks

 
 De: "Abe E" 
 Para: users@ovirt.org
 Enviadas: Quarta-feira, 27 De Abril de 2022 15:58:01
 Assunto: [ovirt-users] Re: Host cannot connect to storage domains

 I think you're running into that bug, someone mentioned the following 
 which seemed to work for my nodes that complained of not being able to 
 connect to the storage pool.

 The following fix worked for me, i.e. replacing the following line in
 /usr/lib/python3.6/site-packages/vdsm/gluster/cli.y


 Replace: value['stripeCount'] =el.find('stripeCount').text

 With: if (el.find('stripeCount')): value['stripeCount'] = 
 el.find('stripeCount').text

 Restart VSMD and SuperVSMD and then your host should be able to connect if 
 you have the same issue
 ___
 Users mailing list -- users@ovirt.org
 To unsubscribe send an email to users-le...@ovirt.org
 Privacy Statement: https://www.ovirt.org/privacy-policy.html
 oVirt Code of Conduct: 
 https://www.ovirt.org/community/about/community-guidelines/
 List Archives: 
 https://lists.ovirt.org/archives/list/users@ovirt.org/message/TWTFZ4VHKSEABMEZYMDUJI2PUYA24XMU/
 ___
 Users mailing list -- users@ovirt.org
 To unsubscribe send an email to users-le...@ovirt.org
 Privacy Statement: https://www.ovirt.org/privacy-policy.html
 oVirt Code of Conduct: 
 https://www.ovirt.org/community/about/community-guidelines/
 List Archives: 
 https://lists.ovirt.org/archives/list/users@ovirt.org/message/22NF5BKNFPDVS3OGITBIM3XVFZJVCO2H/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MJZJQ43T3OPUZSNM6PGZWD6MJ3OG3UF5/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AVMA3FAVYKRJCSSJUDB7J2CWIWIXIVKO/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-11 Thread Nir Soffer
On Wed, May 11, 2022 at 9:58 PM  wrote:
>
> I checked the network configuration on both the Client & Server I found 
> network proxy turned off.  However, during the installation of ovirt there is 
> a question regarding proxy.  The question is as follows:
>
> Configure WebSocket Proxy on this machine? (Yes, No) [Yes]:
>
> I took the default above could this my issue?

No, the web socket proxy is not related.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DJVNWQPBJL6Z2U3GCUVWBRD2DE3UOC6P/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-11 Thread Nir Soffer
On Wed, May 11, 2022 at 6:42 PM  wrote:
>
> I started to investigate based on your question regarding a secure 
> connection.  From that investigation this what I’ve found:
>
> When viewing he certificate the AIA section shows the following:
>
> Authority Info (AIA)
> Location: 
> http://ovirtdl380gen10.cscd.net:80/ovirt-engine/services/pki-resource?resource=ca-certificate=X509-PEM-CA
>
> Method: CA Issuers
>
> It appears that the certificate is being issue/released on port 80, could 
> this be the reason no connection can be established with the “ovirt imageio” 
> service; since the service is looking for a connection on a secured port such 
> as 443?
>
> How can or what should be done to correct this.  If this is the issue I 
> suspect that I need to have a certificate that is from port 443 or some other 
> secured connection.

So you are accessing the engine via http:?

I don't think this can work for image upload. We support only https.

Access engine at:

 https://ovirtdl380gen10.cscd.net/

You should get a secure connection - if not download the certificate
and install it,
and make sure the proxy is disabled, and upload should work.

Trying your engine address:
https://ovirtdl380gen10.cscd.net/

I get unrelated site (24th Judicial District Community ...). You may
need to fix the web server
setup so engine can be accessed using https.

Also having engine accessible on the web is not a good idea, it is
better to make it available
only inside a closed network.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7KNWU2G6PPLSYMRQYC43JMGUEXWPMGBI/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-11 Thread Nir Soffer
On Wed, May 11, 2022 at 1:37 AM  wrote:
>
> I also started to receive the error message below after making the suggested 
> changes:
>
> VDSM ovirtdl380gen10 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()

This is not related. You may have other issue in this setup.

>
> On my browser it indicates that there is no tracking, verified by my domain, 
> You have granted this website additional permission.

Do you have secure connection or not?

Upload will not work if you don't have a secure connection.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6SFIKPGK4C7MTSLD7LXEJTAZUCCIX3EJ/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-11 Thread Nir Soffer
On Wed, May 11, 2022 at 1:29 AM  wrote:
>
> Made the suggestion that you made earlier I continued to get the same results.
> Sharing the files/things you requested below:
>
> ovirt-imageio --show-config
...
> "control": {
> "port": 54324,
> "prefer_ipv4": true,
> "remove_timeout": 60,
> "socket": "/run/ovirt-imageio/sock",
> "transport": "unix"
> },

So your ovirt-imageio service is configured for vdsm. This confirms
that your engine host is also used a hypervisor
(all-in-one deprecated configuration).

...
> },
> "remote": {
> "host": "::",
> "port": 54322
> },

Since the ovirt-imageio service listen on port 54322, engine
UI should not try to connect to the proxy (port 54323). This
is done by disabling the proxy as I explained in the previous email.

> "tls": {
> "ca_file": "/etc/pki/vdsm/certs/cacert.pem",
> "cert_file": "/etc/pki/vdsm/certs/vdsmcert.pem",
> "enable": true,
> "enable_tls1_1": false,
> "key_file": "/etc/pki/vdsm/keys/vdsmkey.pem"
> }
> }

Using vdsm pki, works for all-in-one setup.

Did you run engine-config as root?
Did you restart ovirt-engine after disabling the proxy?

Just to be sure - do this and share the output:

sudo engine-config -s ImageTransferProxyEnabled=false
sudo engine-config -g ImageTransferProxyEnabled
sudo systemctl restart ovirt-engine

engine-config should report:

ImageTransferProxyEnabled: false version: general

After the engine is restarted, try the "Test connection" again from
the upload UI.
If it works, upload should also work. If not, we will have to dig deeper.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GUT5IAEQAFCJCF6Y47YLJ6KA2ZHA4KZ5/


[ovirt-users] Re: Upload Cerfiticate issue

2022-05-10 Thread Nir Soffer
On Mon, May 9, 2022 at 10:40 PM  wrote:
>
> I’m trying to upload an ISO image in ovirt 4.4.10, It’s been a huge challenge 
> to accomplish this.  I read several post regarding this issue, I really don’t 
> have s clear understanding of solution to this issue.  My experience has not 
> been very fruitful at all.

Sorry to hear this.

> When I try to perform the upload using the web GUI I get the following 
> message in the status column: “Paused by System“.  I’ve been reading for 
> roughly three weeks trying to understand and resolve the issue.  There is a 
> tremendous amount of discussion centered around changing certificate file 
> located in the directory “etcpki/ovirt-engine”, however it not clear at all 
> what files need to change.
>
> My installation is an out-of-box installation with any certificates beginning 
> generated as part of the install process, I’ve imported the certificate that 
> was generated into my browser/Firefox 91.9.0.   Based on what I’ve been 
> reading the solution to my problems is that the certificate does not match 
> the certificate defined in the “imageio-service”, my question is why because 
> it was generated as part of the installation?

If your system is out-of-box installation and you are using the
default self signed
engine CA, there should be no mismatch.

> What files in the “/etc/pki/ovirt-engine” must be changed to get things 
> working.  Further should  or do I copy the certificate saved from the GUI to 
> files under “/etc/pki/ovirt-engine” directory?

You don't have to change anything to use the defaults.

> I feel like I’m so close after six month of reading and re-installs, what do 
> I do next?

There is not enough info in you mail what  you tried to do, and or any
logs showing
what the system did. To make sure you installed the certificate
properly, this is
the way to import the engine CA certificate:

1. Download the certificate from:

https://myengine.example.com/ovirt-engine/services/pki-resource?resource=ca-certificate=X509-PEM-CA

2. In Firefox settings, search "Certificates" and click "View certificates"

3. Click "Authorities" and "Import..."

Select the certificate file, and enable the first checkbox for
trusting web sites.

4. Reopen the browser to activate the certificate

To test that the certificate works, open the "Disks" tab, and click
"Upload > Start".

Click "Test connection" - you should see a green message about
successful connection
to ovirt-imageio.

Lets continue when you reach this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7ZZHYVTZDAYLOLEJM3SSM7BGQ4AKXCX3/


[ovirt-users] Re: How do I automate VM backups?

2022-05-08 Thread Nir Soffer
On Fri, May 6, 2022 at 4:24 AM David White via Users  wrote:
>
> So perhaps qemu-img isn't the best way to do this.
> I was hoping I could write a bash script or something to take a snapshot. Is 
> that possible, or is there a better way?
>
> I was looking at https://github.com/wefixit-AT/oVirtBackup tonight, but 
> haven't been able to get it to work as of yet.
>
> When I run it, it recognizes the backup storage domain, says that it has 
> started taking a snapshot, but then immediately says the snapshot was created 
> (at which point everything else fails):
>
> 2022-05-05 21:05:35,453: Start backup for: my-vm-name.example.com
> 2022-05-05 21:05:35,554: The storage domain SpinningData is in state active
> 2022-05-05 21:05:35,732: Snapshot creation started ...
> 2022-05-05 21:05:35,732: Snapshot created
> 2022-05-05 21:05:40,773: !!! No snapshot found !!!
> 2022-05-05 21:05:40,773: All backups done
> 2022-05-05 21:05:40,773: Backup failured for:
> 2022-05-05 21:05:40,773:   my-vm-name.example.com
> 2022-05-05 21:05:40,773: Some errors occured during the backup, please check 
> the log file
>
> The README says something about the python-sdk. How do I install that? I 
> don't see that anywhere.
>
> [root@phys1 oVirtBackup-master]# yum info ovirt-engine-sdk-python
> Updating Subscription Management repositories.
> Last metadata expiration check: 1:14:19 ago on Thu 05 May 2022 07:58:16 PM 
> EDT.
> Error: No matching Packages to list
> [root@phys1 oVirtBackup-master]# yum whatprovides ovirt-engine-sdk-python
> Updating Subscription Management repositories.
> Last metadata expiration check: 1:14:31 ago on Thu 05 May 2022 07:58:16 PM 
> EDT.
> Error: No Matches found

Maybe the backup solution you tried works only with CentOS 7 and python 2.7?

For CentOS Stream 8, the package name is python3-ovirt-engine-sdk4.

> Is there a better way to run automated backups than this approach and/or 
> using qemu-img?

The best way to backup is to use one of the backup applications supporting
the incremental backup API with oVirt 4.4+.

If you want to develop your own solution, you can start with the backup_vm.py
example in the python sdk.

If you install python3-ovirt-engine-sdk4 on a CentOS Stream 8 host
you have backup_vm.py script at:

/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py

This script can create full and incremental backups for VM disks.

To use this script (and other scripts like upload_disk.py, download_disk.py)
you need to create a ovirt.conf file:

$ cat ~/.config/ovirt.conf
[myengine]
engine_url = https://myengine.mydomain
username = admin@internal
password = mypassword
cafile = /etc/pki/vdsm/certs/cacert.pem

You can create a full backup using:

 /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py \
 -c myengine \
 full \
 --backup-dir /backups/vm-id \
 vm-id

This creates a files like:

/backups/vm-id/{timestamp}.{checkpoint-id}.{disk-id}.full.qcow2

This file contains the contents of the disk disk-id at the time the backup was
started, including data from all snapshots.

The next backup can be incremental backup, using the checkpoint id of
the previous
backup:

 /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py \
-c myengine \
incremental \
--backup-dir /backups/vm-id \
--from-checkpoint-uuid checkpoint-id-1 \
vm-id

This creates a file like:

/backups/vm-id/{timestamp}.{checkpoint-id}.{disk-id}.incremental.qcow2

Using the previous backup as a backing file:

/backups/vm-id/{timestamp}.checkpoint-id-1.{disk-id}.full.qcow2

The next backup will use this incremental backup as a backing file. This
creates a chain of qcow2 images that can be used to restore the VM disks
using any image in the chain.

Note that the script does not backup the VM configuration. To restore the VM
with the same configuration at the time of the backup you need to get the VM
configuration and store it, and use it when restoring the VM. Unfortunately we
don't have example code for this yet.

You may also use the backup library in the ovirt-stress project. It is a simple
library with the same features of the backup_vm.py script, with additional
features like verifying backups with checksums.

https://gitlab.com/nirs/ovirt-stress/-/blob/master/backup/backup.py

You can see how this library is used in the stress test:

   https://gitlab.com/nirs/ovirt-stress/-/blob/master/backup/test.py#L130

> Sent with ProtonMail secure email.
>
> --- Original Message ---
> On Wednesday, May 4th, 2022 at 1:27 PM, David White via Users 
>  wrote:
>
> I've recently been working with the qemu-img commands for some work that has 
> nothing to do with oVirt or anything inside an oVirt environment.
>
> But learning and using these commands have given me an idea for automating 
> backups.
>
> I believe that the following is true, but to confirm, would the qemu-img 
> commands be 

[ovirt-users] Re: Issue upgrading 4.4 to 4.5 Gluster HCG

2022-04-28 Thread Nir Soffer
On Tue, Apr 26, 2022 at 12:47 PM Alessandro De Salvo
 wrote:
>
> Hi,
>
> the error with XML and gluster is the same I reported with a possible fix in 
> vdsm in another thread.
>
> The following fix worked for me, i.e. replacing the following line in 
> /usr/lib/python3.6/site-packages/vdsm/gluster/cli.y
>
> 429c429
> < if (el.find('stripeCount')): value['stripeCount'] = 
> el.find('stripeCount').text
>
> ---
> > value['stripeCount'] = el.find('stripeCount').text
>
> In this way, after restarting vdsmd and supervdsmd, I was able to connect to 
> gluster 10 volumes. I can file a bug if someone could please point me where 
> to file it :-)

Someone already filed a bug:
https://github.com/oVirt/vdsm/issues/155

You can send a pull request with this fix:
https://github.com/oVirt/vdsm/pulls

Nir

>
> Cheers,
>
>
> Alessandro
>
>
> Il 26/04/22 10:55, Sandro Bonazzola ha scritto:
>
> @Gobinda Das can you please have a look?
>
> Il giorno mar 26 apr 2022 alle ore 06:47 Abe E  ha 
> scritto:
>>
>> Hey All,
>>
>> I am having an issue upgrading from 4.4 to 4.5.
>> My setup
>> 3 Node Gluster (Cluster 1) + 3 Node Cluster (Cluster 2)
>>
>> If i recall the process correctly, the process I did last week:
>>
>> On all my Nodes:
>> dnf install -y centos-release-ovirt45 --enablerepo=extras
>>
>> On Ovirt Engine:
>> dnf install -y centos-release-ovirt45
>> dnf update -y --nobest
>> engine-setup
>>
>> Once the engine was upgraded successfully I ran the upgrade from the GUI on 
>> the Cluster 2 Nodes one by one although when they came back, they complained 
>> of "Host failed to attach one of the Storage Domains attached to it." which 
>> is the "hosted_storage", "data" (gluster).
>>
>> I thought maybe its due to the fact that 4.5 brings an update to the 
>> glusterfs version, so I decided to upgrade Node 3 in my Gluster Cluster and 
>> it booted to emergency mode after the install "succeeded".
>>
>> I feel like I did something wrong, aside from my bravery of upgrading so 
>> much before realizing somethings not right.
>>
>> My VDSM Logs from one of the nodes that fails to connect to storage (FYI I 
>> have 2 Networks, one for Mgmt and 1 for storage that are up):
>>
>> [root@ovirt-4 ~]# tail -f /var/log/vdsm/vdsm.log
>> 2022-04-25 22:41:31,584-0600 INFO  (jsonrpc/3) [vdsm.api] FINISH repoStats 
>> return={} from=:::172.17.117.80,38712, 
>> task_id=8370855e-dea6-4168-870a-d6235d9044e9 (api:54)
>> 2022-04-25 22:41:31,584-0600 INFO  (jsonrpc/3) [vdsm.api] START 
>> multipath_health() from=:::172.17.117.80,38712, 
>> task_id=14eb199a-7fbf-4638-a6bf-a384dfbb9d2c (api:48)
>> 2022-04-25 22:41:31,584-0600 INFO  (jsonrpc/3) [vdsm.api] FINISH 
>> multipath_health return={} from=:::172.17.117.80,38712, 
>> task_id=14eb199a-7fbf-4638-a6bf-a384dfbb9d2c (api:54)
>> 2022-04-25 22:41:31,602-0600 INFO  (periodic/1) [vdsm.api] START 
>> repoStats(domains=()) from=internal, 
>> task_id=08a5c00b-1f66-493f-a408-d4006ddaa959 (api:48)
>> 2022-04-25 22:41:31,603-0600 INFO  (periodic/1) [vdsm.api] FINISH repoStats 
>> return={} from=internal, task_id=08a5c00b-1f66-493f-a408-d4006ddaa959 
>> (api:54)
>> 2022-04-25 22:41:31,606-0600 INFO  (jsonrpc/3) [api.host] FINISH getStats 
>> return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} 
>> from=:::172.17.117.80,38712 (api:54)
>> 2022-04-25 22:41:35,393-0600 INFO  (jsonrpc/5) [api.host] START 
>> getAllVmStats() from=:::172.17.117.80,38712 (api:48)
>> 2022-04-25 22:41:35,393-0600 INFO  (jsonrpc/5) [api.host] FINISH 
>> getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': 
>> (suppressed)} from=:::172.17.117.80,38712 (api:54)
>> 2022-04-25 22:41:39,366-0600 INFO  (jsonrpc/2) [api.host] START 
>> getAllVmStats() from=::1,53634 (api:48)
>> 2022-04-25 22:41:39,366-0600 INFO  (jsonrpc/2) [api.host] FINISH 
>> getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': 
>> (suppressed)} from=::1,53634 (api:54)
>> 2022-04-25 22:41:46,530-0600 INFO  (jsonrpc/1) [api.host] START getStats() 
>> from=:::172.17.117.80,38712 (api:48)
>> 2022-04-25 22:41:46,568-0600 INFO  (jsonrpc/1) [vdsm.api] START 
>> repoStats(domains=()) from=:::172.17.117.80,38712, 
>> task_id=30404767-9761-4f8c-884a-5561dd0d82fe (api:48)
>> 2022-04-25 22:41:46,568-0600 INFO  (jsonrpc/1) [vdsm.api] FINISH repoStats 
>> return={} from=:::172.17.117.80,38712, 
>> task_id=30404767-9761-4f8c-884a-5561dd0d82fe (api:54)
>> 2022-04-25 22:41:46,569-0600 INFO  (jsonrpc/1) [vdsm.api] START 
>> multipath_health() from=:::172.17.117.80,38712, 
>> task_id=8dbfa47f-e1b7-408c-a060-8d45012f0b90 (api:48)
>> 2022-04-25 22:41:46,569-0600 INFO  (jsonrpc/1) [vdsm.api] FINISH 
>> multipath_health return={} from=:::172.17.117.80,38712, 
>> task_id=8dbfa47f-e1b7-408c-a060-8d45012f0b90 (api:54)
>> 2022-04-25 22:41:46,574-0600 INFO  (jsonrpc/1) [api.host] FINISH getStats 
>> return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} 
>> 

[ovirt-users] Re: convert disk image to thin-provisioned - Ovirt 4.1

2022-04-27 Thread Nir Soffer
On Wed, Apr 27, 2022 at 1:02 PM Mohamed Roushdy
 wrote:
>
> Hello,
>
> I’ve researched a bit on how to convert a disk image from pre-allocated to 
> thin in Ovirt 4.1, but nothing worked. Is there a way to achieve this please?

Yes!

1. Install ovirt 4.5.0
2.  Use the new convert disk feature

With oVirt 4.1 you can do this manually. What kind storage are you using?
(NFS, iSCSI?)

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4RXWOKSUGKWHUQBRX7MF4ANPMASDCJB/


[ovirt-users] Re: Host cannot connect to storage domains

2022-04-27 Thread Nir Soffer
On Wed, Apr 27, 2022 at 3:54 PM José Ferradeira via Users
 wrote:
>
> After upgrade to 4.5 host cannot be activated because cannot connect to data 
> domain.
>
> I have a data domain in NFS (master) and a GlusterFS. It complains about the 
> Gluster domain:
> The error message for connection node1-teste.acloud.pt:/data1 returned by 
> VDSM was: XML error
>
> # rpm -qa|grep glusterfs*
> glusterfs-10.1-1.el8s.x86_64
> glusterfs-selinux-2.0.1-1.el8s.noarch
> glusterfs-client-xlators-10.1-1.el8s.x86_64
> glusterfs-events-10.1-1.el8s.x86_64
> libglusterfs0-10.1-1.el8s.x86_64
> glusterfs-fuse-10.1-1.el8s.x86_64
> glusterfs-server-10.1-1.el8s.x86_64
> glusterfs-cli-10.1-1.el8s.x86_64
> glusterfs-geo-replication-10.1-1.el8s.x86_64
>
>
> engine log:
>
> 2022-04-27 13:35:16,118+01 ERROR 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
> be79c6] EVENT_ID: VDS_STORAGES_CONNECTION_FAILED(188), Failed to connect Host 
> NODE1 to the Storage Domains DATA1.
> 2022-04-27 13:35:16,169+01 ERROR 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
> be79c6] EVENT_ID: STORAGE_DOMAIN_ERROR(996), The error message for connection 
> node1-teste.acloud.pt:/data1 returned by VDSM was: XML error
> 2022-04-27 13:35:16,170+01 ERROR 
> [org.ovirt.engine.core.bll.storage.connection.FileStorageHelper] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) 
> [ebe79c6
> ] The connection with details 'node1-teste.acloud.pt:/data1' failed because 
> of error code '4106' and error message is: xml error
>
>
>
> vdsm log:
> 2022-04-27 13:40:07,125+0100 ERROR (jsonrpc/4) [storage.storageServer] Could 
> not connect to storage server (storageServer:92)
> Traceback (most recent call last):
>  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 
> 90, in connect_all
>con.connect()
>  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 
> 233, in connect
>self.validate()
>  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 
> 365, in validate
>if not self.volinfo:
>  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 
> 352, in volinfo
>self._volinfo = self._get_gluster_volinfo()
>  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 
> 405, in _get_gluster_volinfo
>self._volfileserver)
>  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, 
> in __call__
>return callMethod()
>  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, 
> in 
>**kwargs)
>  File "", line 2, in glusterVolumeInfo
>  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in 
> _callmethod
>raise convert_to_error(kind, result)
> vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=() 
> err=[b'\n  0\n  0\n   />\n  \n\
> n  \ndata1\n
> d7eb2c38-2707-4774-9873-a7303d024669\n1\n   
>  Started\n apshotCount>0\n2\n
> 2\n1\n
> 0
> \n0\n
> 0\n0\n
> Distribute\n0 sport>\n\n   uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/home/brick1node1-teste.acloud.pt:/home/brick10
> 8c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b0\n
>uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/brick2nod
> e1-teste.acloud.pt:/brick208c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b0\n
> \n23\n
> \n  \nnfs.disable\n 
>on\n  \n  \n   
>  transport.addre
> ss-family\ninet\n  \n  
> \nstorage.fips-mode-rchecksum\n  
>   on\n
>\n  \n
> storage.owner-uid\n36\n  
> \n  \nstorag
> e.owner-gid\n36\n  \n  
> \ncluster.min-free-disk\n
> 5%\n
>  \n  \n
> performance.quick-read\noff\n 
>  \n  \nperfor
> mance.read-ahead\noff\n  
> \n  \n
> performance.io-cache\noff\n
>\n  \n
> performance.low-prio-threads\n32\n
>   \n  \n<
> name>network.remote-dio\nenable\n  
> \n  \ncluster.eager-lock\n  
>   enable<
> /value>\n  \n  \n
> cluster.quorum-type\nauto\n  
> \n  \n
>   cluster.server-quorum-type\n
> server\n  \n  \n
> cluster.data-self-heal-algorithm\n
>full\n  \n  \n 
>cluster.locking-scheme\n
> granular\n  

[ovirt-users] Re: No Host listed when trying to create a Storage Domain

2022-04-20 Thread Nir Soffer
On Wed, Apr 20, 2022 at 5:42 PM  wrote:
>
> I've created a new "Data Center" in this case.  When I do select the default 
> "Data Center" I get the same results, are you aware of a work around for this 
> issue?  I will file/create a engine UI bug report for this issue.

Maybe you did not add a cluster with some hosts to the new data center,
or the host is still installing?

The normal flow is:
1. Create data center
2. Create cluster in the new data center
3. Add at least one host to data center
4. Wait until host is up
5. Add first storage domain

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7LZ6CNZX5F2TQY4XPWZ5T4T2ZKDWOZFN/


[ovirt-users] Re: No Host listed when trying to create a Storage Domain

2022-04-20 Thread Nir Soffer
On Wed, Apr 20, 2022 at 4:51 PM  wrote:
>
> I'm trying to create a new storage domain, I've successfully created a new 
> "Data Center" and Cluster with no issues during the process.  However, when I 
> try to created a new storage domain the pull down menu bar is blank.  
> Therefor, I'm unable to create a new storage domain.
>
> What might I have missed or not configure properly to prevent the menu bar 
> from getting populated?

You missed the fact that the selected Data Center is "Default", and
you don't have
any hosts in the default data center.

Please file engine UI bug about this. If there are no hosts in the
Default data center,
it should not be disabled since there is no way to create storage without hosts.

The selected data center must have at least one host.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JFIMWAM2Q26CO577UGRVQ544ZVXP2HF/


[ovirt-users] Re: vdsm hook after node upgrade

2022-04-12 Thread Nir Soffer
On Tue, Apr 12, 2022 at 5:06 PM Nathanaël Blanchet  wrote:
> I've upgraded my hosts from 4.4.9 to 4.4.10 and none of my vdsm hooks
> are present anymore... i believed those additionnal personnal data were
> persistent across update...

If you think this is a bug, please file a vdsm bug for this:
https://github.com/oVirt/vdsm/issues

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KPAKXHNIAAKX2X6W7GTUARYNFSJU52CG/


[ovirt-users] Re: ovirt-dr generate

2022-04-12 Thread Nir Soffer
On Mon, Apr 11, 2022 at 1:39 PM Colin Coe  wrote:
>
> Hi all
>
> I'm trying to run ovirt-dr generate but its failing:
> /usr/share/ansible/collections/ansible_collections/redhat/rhv/roles/disaster_recovery/files/ovirt-dr
>  generate
> Log file: '/tmp/ovirt-dr-164967324.log'
> [Generate Mapping File] Connection to setup has failed. Please check your 
> credentials:
>  URL: https://server.fqdn/ovirt-engine/api
>  user: admin@internal
>  CA file: ./ca.pem

ca.pem is likely engine self signed certificate...

> [Generate Mapping File] Failed to generate var file.
>
> When I examine the log file:
> 2022-04-11 18:34:03,332 INFO Start generate variable mapping file for oVirt 
> ansible disaster recovery
> 2022-04-11 18:34:03,333 INFO Site address: 
> https://server.fqdn/ovirt-engine/api
> username: admin@internal
> password: ***
> ca file location: ./ca.pem
> output file location: ./disaster_recovery_vars.yml
> ansible play location: ./dr_play.yml
> 2022-04-11 18:34:03,343 ERROR Connection to setup has failed. Please check 
> your credentials:
>  URL: https://server.fqdn/ovirt-engine/api
>  user: admin@internal
>  CA file: ./ca.pem
> 2022-04-11 18:34:03,343 ERROR Error: Error while sending HTTP request: (60, 
> 'SSL certificate problem: unable to get local issuer certificate')
> 2022-04-11 18:34:03,343 ERROR Failed to generate var file.
>
> My suspicion is that the script doesn't like third party certs.
>
> Has anyone got this working with third party certs?  If so, what did you need 
> to do?

But you are using a 3rd party certificate, so you need to use the
right certificate.

Depending on the code, an empty ca_file can work, or you need to point it to the
actual ca file installed in the system.

I think Didi can help with this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZN22BIHYLNL2P3WTLXFVZH4PKNQPXR6D/


[ovirt-users] Re: How to list all snapshots?

2022-04-04 Thread Nir Soffer
On Mon, Apr 4, 2022 at 9:05 PM  wrote:
>
> Hello everyone!
>
> First, I would like to thank everyone involved in this wonderful project. I 
> leave here my sincere thanks!
>
> Does anyone know if it is possible to list all snapshots automatically? It 
> can be by ansible, python, shell... any way that helps to list them all 
> without having to enter Domain by Domain.

I'm not sure what you mean by "all" snapshots? All snapshots of a vm?

You can try the API in a browser, for example this list all the
snapshots of one vm:


https://engine.local/ovirt-engine/api/vms/4a964ea0-c9f8-48d4-8fc1-aa8eee04c7c7/snapshots

If you want easier to use way the python sdk can help, see:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/list_vm_snapshots.py

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2OGKWJDIKF4LKYJWVIIHNXTSXI3EL5HO/


[ovirt-users] Re: info about removal of LVM structures before removing LUNs

2022-03-31 Thread Nir Soffer
On Thu, Mar 31, 2022 at 6:03 PM Gianluca Cecchi
 wrote:
>
> On Thu, Mar 31, 2022 at 4:45 PM Nir Soffer  wrote:
>>
>>
>>
>> Regarding removing the vg on other nodes - you don't need to do anything.
>> On the host, the vg is hidden since you use lvm filter. Vdsm can see the
>> vg since vdsm uses lvm filter with all the luns on the system. Vdsm will
>> see the change the next time it runs pvs, vgs, or lvs.
>>
>> Nir
>>
> Ok, thank you very much
> So I will:
> . remove LVM structures on one node (probably I'll use the SPM host, but as 
> you said it shouldn't matter)
> . remove multipath devices and paths on both hosts (hope the second host 
> doesn't complain about LVM presence, because actually it is hidden by 
> filter...)
> . have the SAN mgmt guys unpresent LUN from both hosts
> . rescan SAN from inside oVirt (to verify LUN not detected any more and at 
> the same time all expected LUNs/paths ok)
>
> I should have also the second host updated in regard of LVM structures... 
> correct?

The right order his:

1. Make sure the vg does not have any active lv on any host, since you removed
it in the path without formatting, and some lvs may be activated
by mistake since
that time.

   vgchange -an --config 'devices { filter = ["a|.*|" ] }' vg-name

2. Remove the vg on one of the hosts
(assuming you don't need the data)

vgremove -f --config 'devices { filter = ["a|.*|" ] }' vg-name

If you don't plan to use this vg with lvm, you can remove the pvs

3. Have the SAN mgmt guys unpresent LUN from both hosts

   This should be done before removing the multipath devices, otherwise
   scsi rescan initiated by vdsm may discover the devices again and recreate
   the multipath devices.

4. Remove the multipath devices and the scsi devices related to these luns

   To verify you can use lsblk on the hosts, the devices will disappear.

   If you want to make sure the luns were unzoned, doing a rescan is a
good idea.
   it can be done by opening the "new domain" or "manage domain" in ovirt UI, or
   by running:

   vdsm-client Host getDeviceList checkStatus=''

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVJWIXX54W3G5F5BPCYFI4UPUO2KFZCP/


[ovirt-users] Re: info about removal of LVM structures before removing LUNs

2022-03-31 Thread Nir Soffer
On Thu, Mar 31, 2022 at 3:13 PM Gianluca Cecchi
 wrote:
>
> On Thu, Mar 31, 2022 at 1:30 PM Nir Soffer  wrote:
>>
>>
>>
>> Removing a storage domain requires moving the storage domain to maintainance
>> and detaching it. In this state oVirt does not use the domain so it is
>> safe to remove
>> the lvs and vg on any host in the cluster.
>>
>> But if you remove the storage domain in engine with:
>>
>> [x] Format Domain, i.e. Storage Content will be lost!
>>
>> vdsm will remove all the lvs and the vg for you.
>>
>> If you forgot to format the domain when removing it, removing manually
>> is fine.
>>
>> Nir
>>
>
> Thanks for answering, Nir.
> In fact I think I didn't select to format the domain and so the LVM structure 
> remained in place (I did it some time ago...)
> When you write "vdsm will remove all the lvs and the vg for you", how does 
> vdsm act and work in this case and how does it coordinate the nodes' view of 
> LVM structures so that they are consistent, with no cluster LVM in place?

oVirt has its own clustered lvm solution, using sanlock.

In oVirt only the SPM host creates, extends, or deletes or changes tags in
logical volumes. Other host only consume the logical volumes by activating
them for running vms or performing storage operations.

> I presume it is lvmlockd using sanlock as external lock manager,

lvmlockd is not involved. When oVirt was created, lvmlockd supported
only dlm, which does not scale for oVirt use case. So oVirt uses sanlock
directly to manage cluster locks.

> but how can I run LVM commands mimicking what vdsm probably does?
> Or is it automagic and I need only to run the LVM commands above without 
> worrying about it?

There is no magic, but you don't need to mimic what vdsm is doing.

> When I manually remove LVs, VG and PV on the first node, what to do on other 
> nodes? Simply a
> vgscan --config 'devices { filter = ["a|.*|" ] }'

Don't run this on ovirt hosts, the host should not scan all vgs without
a filter.

> or what?

When you remove a storage domain engine, even without formatting it, no
host is using the logical volumes. Vdsm on all hosts can see the vg, but
never activate the logical volumes.

You can remove the vg on any host, since you are the only user of this vg.
Vdsm on other hosts can see the vg, but since it does not use the vg, it is
not affected.

The vg metadata is stored on one pv. When you remove a vg, lvm clears
the metadata on this pv. Other pvs cannot be affected by this change.
The only risk is trying to modify the same vg from multiple hosts at the
same time, which can corrupt the vg metadata.

Regarding removing the vg on other nodes - you don't need to do anything.
On the host, the vg is hidden since you use lvm filter. Vdsm can see the
vg since vdsm uses lvm filter with all the luns on the system. Vdsm will
see the change the next time it runs pvs, vgs, or lvs.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6J5FBLE7ULDDH33LCVQVJMPWLU4T3UQF/


[ovirt-users] Re: info about removal of LVM structures before removing LUNs

2022-03-31 Thread Nir Soffer
On Thu, Mar 31, 2022 at 1:35 PM Gianluca Cecchi
 wrote:
>
> Hello,
> I'm going to hot remove some LUNS that were used as storage domains from a 
> 4.4.7 environment.
> I have already removed them for oVirt.
> I think I would use the remove_mpath_device.yml playbook if I find it... it 
> seems it should be in examples dir inside ovirt ansible collections, but 
> there is not...
> Anyway I'm aware of the corresponding manual steps of (I think version 8 
> doesn't differ from 7 in this):
>
> . get disks name comprising the multipath device to remove
>
> . remove multipath device
> multipath -f "{{ lun }}"
>
> . flush I/O
> blockdev --flushbufs {{ item }}
> for every disk that was comprised in the multipath device
>
> . remove disks
> echo 1 > /sys/block/{{ item }}/device/delete
> for every disk that was comprised in the multipath device
>
> My main doubt is related to the LVM structure that I can see is yet present 
> on the multipath devices.
>
> Eg for a multipath device 360002ac0013e0001894c:
> # pvs --config 'devices { filter = ["a|.*|" ] }' | grep 
> 360002ac0013e0001894c
>   /dev/mapper/360002ac0013e0001894c 
> a7f5cf77-5640-4d2d-8f6d-abf663431d01 lvm2 a--<4.00t <675.88g
>
> # lvs --config 'devices { filter = ["a|.*|" ] }' 
> a7f5cf77-5640-4d2d-8f6d-abf663431d01
>   LV   VG   
> Attr   LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   067dd3d0-db3b-4fd0-9130-c616c699dbb4 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 900.00g
>   1682612b-fcbb-4226-a821-3d90621c0dc3 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---  55.00g
>   3b863da5-2492-4c07-b4f8-0e8ac943803b a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   47586b40-b5c0-4a65-a7dc-23ddffbc64c7 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---  35.00g
>   7a5878fb-d70d-4bb5-b637-53934d234ba9 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 570.00g
>   94852fc8-5208-4da1-a429-b97b0c82a538 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---  55.00g
>   a2edcd76-b9d7-4559-9c4f-a6941aaab956 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   de08d92d-611f-445c-b2d4-836e33935fcf a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 300.00g
>   de54928d-2727-46fc-81de-9de2ce002bee a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---   1.17t
>   f9f4d24d-5f2b-4ec3-b7e3-1c50a7c45525 a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 300.00g
>   ids  a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   inboxa7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   leases   a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---   2.00g
>   master   a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---   1.00g
>   metadata a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   outbox   a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi--- 128.00m
>   xleases  a7f5cf77-5640-4d2d-8f6d-abf663431d01 
> -wi---   1.00g
>
> So the question is:
> would it be better to execute something like
> lvremove for every LV lv_name
> lvremove --config 'devices { filter = ["a|.*|" ] }' 
> a7f5cf77-5640-4d2d-8f6d-abf663431d01/lv_name
>
> vgremove
> vgremove --config 'devices { filter = ["a|.*|" ] }' 
> a7f5cf77-5640-4d2d-8f6d-abf663431d01
>
> pvremove
> pvremove --config 'devices { filter = ["a|.*|" ] }' 
> /dev/mapper/360002ac0013e0001894c
>
> and then proceed with the steps above or nothing at all as the OS itself 
> doesn't "see" the LVMs and it is only an oVirt view that is already "clean"?
> Also because LVM is not cluster aware, so after doing that on one node, I 
> would have the problem about LVM rescan on other nodes

Removing a storage domain requires moving the storage domain to maintainance
and detaching it. In this state oVirt does not use the domain so it is
safe to remove
the lvs and vg on any host in the cluster.

But if you remove the storage domain in engine with:

[x] Format Domain, i.e. Storage Content will be lost!

vdsm will remove all the lvs and the vg for you.

If you forgot to format the domain when removing it, removing manually
is fine.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OFADAJY6J7MLTCXY27KQZ3OGCNIMTJTT/


[ovirt-users] Re: Python Unsupported Version Detection (ovirt Manager 4.4.10)

2022-03-31 Thread Nir Soffer

‫ב-31 במרץ 2022, בשעה 8:54, ‏michael...@hactlsolutions.com כתב/ה:‬
> 
> Hi,
> We have installed oVirt manger in Centos stream 8 and running the security 
> scanning by Tenable Nessus ID 148367
> 
> When I try to remove the python3.6. It will remove many dependency package 
> related ovirt.
> How can I fixed this vulnerability as below?

There is no vulnerability to fix. oVirt uses platform python which is python 
3.6. This version is supported for entire life cycle of CentOS Stream 8 and is 
the same version on RHEL.

You should report a bug in the tool reporting this python version as EOL.

Nir

> 
> Python Unsupported Version Detection
> Plugin Output: 
> The following Python installation is unsupported :
> 
>  Path  : /
>  Port  : 35357
>  Installed version : 3.6.8
>  Latest version: 3.10
>  Support dates : 2021-12-23 (end of life)
> 
> Regards,
> Michael Li
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MCAYRMJXUBNVQFDSNEJBMNCHQWMRBQR3/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TMAV57IYXB6OS4ZNEM4Y3YUAXEI2JYQX/


[ovirt-users] Re: No bootable device

2022-03-28 Thread Nir Soffer
On Mon, Mar 28, 2022 at 11:01 AM  wrote:
>
> Hi Nir,
>
> El 2022-03-27 10:23, Nir Soffer escribió:
> > On Wed, Mar 23, 2022 at 3:09 PM  wrote:
> >> We're running oVirt 4.4.8.6. We have uploaded a qcow2 image
> >> (metasploit
> >> v.3, FWIW)
> >
> > Is it Metasploitable3-0.1.4.ova from the github releases page?
> > https://github.com/brimstone/metasploitable3/releases
> >
>
> Actually, the disk has been shared with us by one of our professors. It
> has been provided in qcow2, vmdk and raw formats, still the result was
> the same. I don't actually know which exact version is it, I just know
> the version is "3".
>
> > If not, can you share the image? It will help if we can reproduce this
> > problem
> > locally with the same image you are using.
>
> I will provide the link off-list because it belongs to the professor.
> >
> >> using the GUI (Storage -> Disks -> Upload -> Start). The
> >> image is in qcow2 format.
> >
> > Did you convert the vmdk file from the ova to qcow2?
>
> Yes, I also tried these steps with the same result.
>
> >
> >> No options on the right side were checked. The
> >> upload went smoothly, so we now tried to attach the disk to a VM.
> >>
> >> To do that, we opened the VM -> Disks -> Attach and selected the disk.
> >> As interface, VirtIO-iSCSI was chosen, and the disk was marked as OS,
> >> so
> >> the "bootable" checkbox was selected.
> >>
> >> The VM was later powered on, but when accessing the console the
> >> message
> >> "No bootable device." appears. We're pretty sure this is a bootable
> >> image, because it was tested on other virtualization infrastructure
> >> and
> >> it boots well. We also tried to upload the image in RAW format but the
> >> result is the same.
> >>
> >> What are we missing here? Is anything else needed to do so the disk is
> >> bootable?
> >
> > It sounds like you converted an image from another virtualization
> > system (virtualbox)
> > to qcow2 format, which may not be good enough to use the virtual
> > machine.
> >
> > oVirt supports importing OVA, but based on the UI, it supports only OVA
> > created
> > by oVirt.
> >
> > You can try virt-v2v - this is an example command, you need
> > to fill in the {} parts:
> >
> > virt-v2v \
> > -i ova {path-to-ova-file} \
> > -o rhv-upload \
> > -oc https://{engine-address}/ovirt-engine/api \
> > -op {engine-password-file} \
> > -on {vm-name} \
> > -os {storrage-domain-name} \
> > -of qcow2 \
> > -oo rhv-cafile={engine-ca-file} \
> > -oo rhv-cluster={cluster-name}
> >
> > I tried to import the Metasploitable3-0.1.4.ova, and virt-v2 fails
> > with this error:
> >
> > virt-v2v: error: inspection could not detect the source guest (or
> > physical machine).
> >
> > attached virt-v2v log.
> >
>
> Actually, the professor also provided the OVA from which he extracted
> the disk files and the import process in oVirt worked with no issues. I
> can now boot the VM, not sure what difference made the OVA but now it
> works.

Great that you solved this issue.

For the benefit of the community, can you explain how you imported the OVA?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3IYCVUWM3UNWAIKMYWQ33IJSU3RWPZVX/


[ovirt-users] Re: No bootable device

2022-03-27 Thread Nir Soffer
On Sun, Mar 27, 2022 at 9:09 PM Richard W.M. Jones  wrote:
>
>
> On Sun, Mar 27, 2022 at 01:18:43PM +0300, Arik Hadas wrote:
> > That information message is incorrect, both OVAs that are created by
> > oVirt/RHV and OVAs that are created by VMware are supported It could
> > work for OVAs that are VMware-compatible though
>
> "VMware-compatible" is doing a bit of work there.  Virt-v2v only
> supports (and more importantly _tests_) OVAs produced by VMware.
> Anything claiming to be "VMware-compatible" might or might not work.
>
> I'm on holiday at the moment but I can have a look at the OVA itself
> when I get back if someone posts a link.

The v2v log was from this image:
https://github.com/brimstone/metasploitable3/releases/download/0.1.4/Metasploitable3-0.1.4.ova
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YUHELHX37V3TZZCKOXXHK4J5V3EHC33A/


[ovirt-users] Re: No bootable device

2022-03-27 Thread Nir Soffer
On Wed, Mar 23, 2022 at 3:09 PM  wrote:
> We're running oVirt 4.4.8.6. We have uploaded a qcow2 image (metasploit
> v.3, FWIW)

Is it Metasploitable3-0.1.4.ova from the github releases page?
https://github.com/brimstone/metasploitable3/releases

If not, can you share the image? It will help if we can reproduce this problem
locally with the same image you are using.

> using the GUI (Storage -> Disks -> Upload -> Start). The
> image is in qcow2 format.

Did you convert the vmdk file from the ova to qcow2?

> No options on the right side were checked. The
> upload went smoothly, so we now tried to attach the disk to a VM.
>
> To do that, we opened the VM -> Disks -> Attach and selected the disk.
> As interface, VirtIO-iSCSI was chosen, and the disk was marked as OS, so
> the "bootable" checkbox was selected.
>
> The VM was later powered on, but when accessing the console the message
> "No bootable device." appears. We're pretty sure this is a bootable
> image, because it was tested on other virtualization infrastructure and
> it boots well. We also tried to upload the image in RAW format but the
> result is the same.
>
> What are we missing here? Is anything else needed to do so the disk is
> bootable?

It sounds like you converted an image from another virtualization
system (virtualbox)
to qcow2 format, which may not be good enough to use the virtual machine.

oVirt supports importing OVA, but based on the UI, it supports only OVA created
by oVirt.

You can try virt-v2v - this is an example command, you need
to fill in the {} parts:

virt-v2v \
-i ova {path-to-ova-file} \
-o rhv-upload \
-oc https://{engine-address}/ovirt-engine/api \
-op {engine-password-file} \
-on {vm-name} \
-os {storrage-domain-name} \
-of qcow2 \
-oo rhv-cafile={engine-ca-file} \
-oo rhv-cluster={cluster-name}

I tried to import the Metasploitable3-0.1.4.ova, and virt-v2 fails
with this error:

virt-v2v: error: inspection could not detect the source guest (or
physical machine).

attached virt-v2v log.

Nir


v2v.log.xz
Description: application/xz
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SSTTTGN2NMV4FMLZYUBSQWV2KLKROIZE/


[ovirt-users] Re: VDSM Issue after Upgrade of Node in HCI

2022-03-23 Thread Nir Soffer
On Wed, Mar 23, 2022 at 6:04 PM Abe E  wrote:

> After running : yum reinstall ovirt-node-ng-image-update
> It re-installed the ovirt node and I was able to start VDSM again aswell
> as the ovirt-ha-broker an ovirt-ha-agent.
>
> I was still unable to activate the 2nd Node in the engine so I tried to
> re-install with engine deploy and it was able to complete past the previous
> VDSM issue it had.
>
> Thank You for your help in regards to the LVM issues I was having, noted
> for future reference!
>

Great that you managed to recover, but if reinstalling fixed the issue, it
means that there is some issue with the node upgrade.

Sandro, do you think we need a bug for this?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UBZBQY5TYZPY55HZJ5ULX4RT4ZGBSSAX/


[ovirt-users] Re: VDSM Issue after Upgrade of Node in HCI

2022-03-22 Thread Nir Soffer
On Tue, Mar 22, 2022 at 8:14 PM Abe E  wrote:

> Apologies, here it is
> [root@ovirt-2 ~]# vdsm-tool config-lvm-filter
> Analyzing host...
> Found these mounted logical volumes on this host:
>
>   logical volume:  /dev/mapper/gluster_vg_sda4-gluster_lv_data
>   mountpoint:  /gluster_bricks/data
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU
>
>   logical volume:  /dev/mapper/gluster_vg_sda4-gluster_lv_engine
>   mountpoint:  /gluster_bricks/engine
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU
>
>   logical volume:  /dev/mapper/onn-home
>   mountpoint:  /home
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:
> /dev/mapper/onn-ovirt--node--ng--4.4.10.1--0.20220202.0+1
>   mountpoint:  /
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-swap
>   mountpoint:  [SWAP]
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-tmp
>   mountpoint:  /tmp
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-var
>   mountpoint:  /var
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-var_crash
>   mountpoint:  /var/crash
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-var_log
>   mountpoint:  /var/log
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
>   logical volume:  /dev/mapper/onn-var_log_audit
>   mountpoint:  /var/log/audit
>   devices:
>  /dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY
>
> This is the recommended LVM filter for this host:
>
>   filter = [
> "a|^/dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU$|",
> "a|^/dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY$|",
> "r|.*|" ]
>
> This filter allows LVM to access the local devices used by the
> hypervisor, but not shared storage owned by Vdsm. If you add a new
> device to the volume group, you will need to edit the filter manually.
>
> This is the current LVM filter:
>
>   filter = [
> "a|^/dev/disk/by-id/lvm-pv-uuid-3QbgiW-WaOV-ejW9-rs5R-akfW-sUZb-AXm8Pq$|",
> "a|^/dev/sda|", "r|.*|" ]
>
> To use the recommended filter we need to add multipath
> blacklist in /etc/multipath/conf.d/vdsm_blacklist.conf:
>
>   blacklist {
>   wwid "364cd98f06762ec0029afc17a03e0cf6a"
>   }
>
>
> WARNING: The current LVM filter does not match the recommended filter,
> Vdsm cannot configure the filter automatically.
>
> Please edit /etc/lvm/lvm.conf and set the 'filter' option in the
> 'devices' section to the recommended value.
>
> Make sure /etc/multipath/conf.d/vdsm_blacklist.conf is set with the
> recommended 'blacklist' section.
>
> It is recommended to reboot to verify the new configuration.
>
> After configuring the LVM to the recommended
>
> I adjusted to the recommended filter although it is still returning the
> same results when i run the vdsm-tool config-lvm-filter command. Instead I
> did as you mentioned, I commented out my current filter and ran the
> vdsm-tool config-lvm-filter and it configured successfully and I rebooted
> the node.
>
> Now on boot it is returning the following which looks alot better.
> Analyzing host...
> LVM filter is already configured for Vdsm
>

Good, we solved the storage issue.


> Now my error on re-install is Host ovirt-2... installation failed. Task
> Configure host for vdsm failed to execute. THat was just a re-install and
> this host currently has and the log returns this output, let me know if
> youd like more from it but this is where it errors out it seems:
>
> "start_line" : 215,
> "end_line" : 216,
> "runner_ident" : "ddb84e00-aa0a-11ec-98dc-00163e6f31f1",
> "event" : "runner_on_failed",
> "pid" : 83339,
> "created" : "2022-03-22T18:09:08.381022",
> "parent_uuid" : "00163e6f-31f1-a3fb-8e1d-0201",
> "event_data" : {
>   "playbook" : "ovirt-host-deploy.yml",
>   "playbook_uuid" : "2e84fbd4-8368-463e-82e7-3f457ae702d4",
>   "play" : "all",
>   "play_uuid" : "00163e6f-31f1-a3fb-8e1d-000b",
>   "play_pattern" : "all",
>   "task" : "Configure host for vdsm",
>   "task_uuid" : "00163e6f-31f1-a3fb-8e1d-0201",
>   "task_action" : "command",
>   "task_args" : "",
>   "task_path" :
> "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/configure.yml:27",
>   "role" : "ovirt-host-deploy-vdsm",
>   "host" : "ovirt-2..com",
>   "remote_addr" : "ovirt-2..com",
>   "res" : {
> "msg" : "non-zero return code",
> "cmd" : [ "vdsm-tool", "configure", "--force" ],
> 

[ovirt-users] Re: VDSM Issue after Upgrade of Node in HCI

2022-03-22 Thread Nir Soffer
On Tue, Mar 22, 2022 at 7:17 PM Nir Soffer  wrote:
>
> On Tue, Mar 22, 2022 at 6:57 PM Abe E  wrote:
> >
> > Yes it throws the following:
> >
> > This is the recommended LVM filter for this host:
> >
> >   filter = [
"a|^/dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU$|",
"a|^/dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY$|",
"r|.*|" ]
>
> This is not complete output - did you strip the lines explaining why
> we need this
> filter?
>
> > This filter allows LVM to access the local devices used by the
> > hypervisor, but not shared storage owned by Vdsm. If you add a new
> > device to the volume group, you will need to edit the filter manually.
> >
> > This is the current LVM filter:
> >
> >   filter = [
"a|^/dev/disk/by-id/lvm-pv-uuid-3QbgiW-WaOV-ejW9-rs5R-akfW-sUZb-AXm8Pq$|",
"a|^/dev/sda|", "r|.*|" ]
>
> So the issue is that you likely have a stale lvm filter for a device
> which is not
> used by the host.
>
> >
> > To use the recommended filter we need to add multipath
> > blacklist in /etc/multipath/conf.d/vdsm_blacklist.conf:
> >
> >   blacklist {
> >   wwid "364cd98f06762ec0029afc17a03e0cf6a"
> >   }
> >
> >
> > WARNING: The current LVM filter does not match the recommended filter,
> > Vdsm cannot configure the filter automatically.
> >
> > Please edit /etc/lvm/lvm.conf and set the 'filter' option in the
> > 'devices' section to the recommended value.
> >
> > Make sure /etc/multipath/conf.d/vdsm_blacklist.conf is set with the
> > recommended 'blacklist' section.
> >
> > It is recommended to reboot to verify the new configuration.
> >
> >
> >
> >
> > I updated my entry to the following (Blacklist is already configured
from before):
> >   filter = [
"a|^/dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU$|","a|^/dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY$|","a|^/dev/sda|","r|.*|"
]
> >
> >
> > although then it threw this error
> >
> > [root@ovirt-2 ~]# vdsm-tool config-lvm-filter
> > Analyzing host...
> > Parse error at byte 106979 (line 2372): unexpected token
> >   Failed to load config file /etc/lvm/lvm.conf
> > Traceback (most recent call last):
> >   File "/usr/bin/vdsm-tool", line 209, in main
> > return tool_command[cmd]["command"](*args)
> >   File
"/usr/lib/python3.6/site-packages/vdsm/tool/config_lvm_filter.py", line 65,
in main
> > mounts = lvmfilter.find_lvm_mounts()
> >   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py",
line 170, in find_lvm_mounts
> > vg_name, tags = vg_info(name)
> >   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py",
line 467, in vg_info
> > lv_path
> >   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py",
line 566, in _run
> > out = subprocess.check_output(args)
> >   File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
> > **kwargs).stdout
> >   File "/usr/lib64/python3.6/subprocess.py", line 438, in run
> > output=stdout, stderr=stderr)
> > subprocess.CalledProcessError: Command '['/usr/sbin/lvm', 'lvs',
'--noheadings', '--readonly', '--config', 'devices {filter=["a|.*|"ed
non-zero exit status 4.
>
>
> I'm not sure if this error comes from the code configuring lvm filter,
> or from lvm.
>
> The best way to handle this depends on why you have lvm filter that
> vdsm-tool cannot handle.
>
> If you know why the lvm filter is set to the current value, and you
> know that the system actually
> need all the devices in the filter, you can keep the current lvm filter.
>
> If you don't know why the curent lvm filter is set to this value, you
> can remove the lvm filter
> from lvm.conf, and run "vdsm-tool config-lvm-filter" to let the tool
> configure the default filter.
>
> In general, the lvm filter allows the host to access the devices
> needed by the host, for
> example the root file system.
>
> If you are not sure what are the required devices, please share the
> the *complete* output
> of running "vdsm-tool config-lvm-filter", with lvm.conf that does not
> include any filter.

Example of running config-lvm-filter on RHEL 8.6 host with oVirt 4.5:

# vdsm-tool config-lvm-filter
Analyzing host...
Found these mounted logical volumes on this host:

  logical volume:  /dev/mapper/rhel-root
  mountpoint:  /
  dev

[ovirt-users] Re: VDSM Issue after Upgrade of Node in HCI

2022-03-22 Thread Nir Soffer
On Tue, Mar 22, 2022 at 6:57 PM Abe E  wrote:
>
> Yes it throws the following:
>
> This is the recommended LVM filter for this host:
>
>   filter = [ 
> "a|^/dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU$|", 
> "a|^/dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY$|", 
> "r|.*|" ]

This is not complete output - did you strip the lines explaining why
we need this
filter?

> This filter allows LVM to access the local devices used by the
> hypervisor, but not shared storage owned by Vdsm. If you add a new
> device to the volume group, you will need to edit the filter manually.
>
> This is the current LVM filter:
>
>   filter = [ 
> "a|^/dev/disk/by-id/lvm-pv-uuid-3QbgiW-WaOV-ejW9-rs5R-akfW-sUZb-AXm8Pq$|", 
> "a|^/dev/sda|", "r|.*|" ]

So the issue is that you likely have a stale lvm filter for a device
which is not
used by the host.

>
> To use the recommended filter we need to add multipath
> blacklist in /etc/multipath/conf.d/vdsm_blacklist.conf:
>
>   blacklist {
>   wwid "364cd98f06762ec0029afc17a03e0cf6a"
>   }
>
>
> WARNING: The current LVM filter does not match the recommended filter,
> Vdsm cannot configure the filter automatically.
>
> Please edit /etc/lvm/lvm.conf and set the 'filter' option in the
> 'devices' section to the recommended value.
>
> Make sure /etc/multipath/conf.d/vdsm_blacklist.conf is set with the
> recommended 'blacklist' section.
>
> It is recommended to reboot to verify the new configuration.
>
>
>
>
> I updated my entry to the following (Blacklist is already configured from 
> before):
>   filter = [ 
> "a|^/dev/disk/by-id/lvm-pv-uuid-DxNDT5-3NH3-I1YJ-0ajl-ah6W-M7Kf-h5uZKU$|","a|^/dev/disk/by-id/lvm-pv-uuid-Yepp1J-dsfN-jLh7-xCxm-G7QC-nbaL-6rT2KY$|","a|^/dev/sda|","r|.*|"
>  ]
>
>
> although then it threw this error
>
> [root@ovirt-2 ~]# vdsm-tool config-lvm-filter
> Analyzing host...
> Parse error at byte 106979 (line 2372): unexpected token
>   Failed to load config file /etc/lvm/lvm.conf
> Traceback (most recent call last):
>   File "/usr/bin/vdsm-tool", line 209, in main
> return tool_command[cmd]["command"](*args)
>   File "/usr/lib/python3.6/site-packages/vdsm/tool/config_lvm_filter.py", 
> line 65, in main
> mounts = lvmfilter.find_lvm_mounts()
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py", line 
> 170, in find_lvm_mounts
> vg_name, tags = vg_info(name)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py", line 
> 467, in vg_info
> lv_path
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvmfilter.py", line 
> 566, in _run
> out = subprocess.check_output(args)
>   File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
> **kwargs).stdout
>   File "/usr/lib64/python3.6/subprocess.py", line 438, in run
> output=stdout, stderr=stderr)
> subprocess.CalledProcessError: Command '['/usr/sbin/lvm', 'lvs', 
> '--noheadings', '--readonly', '--config', 'devices {filter=["a|.*|"ed 
> non-zero exit status 4.


I'm not sure if this error comes from the code configuring lvm filter,
or from lvm.

The best way to handle this depends on why you have lvm filter that
vdsm-tool cannot handle.

If you know why the lvm filter is set to the current value, and you
know that the system actually
need all the devices in the filter, you can keep the current lvm filter.

If you don't know why the curent lvm filter is set to this value, you
can remove the lvm filter
from lvm.conf, and run "vdsm-tool config-lvm-filter" to let the tool
configure the default filter.

In general, the lvm filter allows the host to access the devices
needed by the host, for
example the root file system.

If you are not sure what are the required devices, please share the
the *complete* output
of running "vdsm-tool config-lvm-filter", with lvm.conf that does not
include any filter.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7BEWQBE5MRLPL7PDK3CPECZDOS5Q62X7/


[ovirt-users] Re: VDSM Issue after Upgrade of Node in HCI

2022-03-22 Thread Nir Soffer
On Tue, Mar 22, 2022 at 6:09 PM Abe E  wrote:
>
> Interestingly enough I am able to re-install ovirt from the engine to a 
> certain point.
> I ran a re-install and it failed asking me to run vdsm-tool config-lvm-filter
> Error: Installing Host ovirt-2... Check for LVM filter configuration error: 
> Cannot configure LVM filter on host, please run: vdsm-tool config-lvm-filter.

Did you try to run it?

Please the complete output of running:

   vdsm-tool config-lvm-filter

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MIHVMDALDKFZGVTIDGHO6C4UBFL63XLG/


[ovirt-users] Re: querying which LUNs are associated to a specific VM disks

2022-03-18 Thread Nir Soffer
On Fri, Mar 18, 2022 at 10:13 AM Sandro Bonazzola 
wrote:

> I got a question on oVirt Itala Telegram group about how to get which LUNs
> are used by the disks attached to a specific VMs.
> This information doesn't seem to be exposed in API or within the engine DB.
> Has anybody ever tried something like this?
>

We don't expose this, but you can find it using lvm.

For example for disk id c5401e6c-9c56-4ddf-b57a-efde3f8b0494

# lvs -o vg_name,lv_name,devices --devicesfile='' --select 'lv_tags =
{IU_c5401e6c-9c56-4ddf-b57a-efde3f8b0494}'
  VG   LV
Devices
  aecec81f-d464-4a35-9a91-6acf2ca4938c dea573e4-734c-405c-9c2c-590dac63122c
/dev/mapper/36001405351b21217d814266b5354d710(141)

141 is the first extent used by the disk on the device
/dev/mapper/36001405351b21217d814266b5354d710.

A disk with snapshots can have many logical volumes. Each logical
volume can use one or more luns in the storage domain.

The example works with oVirt 4.5, using lvmdevices. For older versions
using lvm filter you can use:

--config 'devices { filter = ["|.*|" ] }'

This info is not static, lvm may move data around, so we cannot keep it
in engine db. Getting the info is pretty cheap, one lvs command can
return the info for all disks in a storage domain.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HROXAM3ISRLNCSCJXMAC5QYEHUZB5RDS/


[ovirt-users] Re: Import an snapshot of an iSCSI Domain

2022-03-06 Thread Nir Soffer
On Fri, Mar 4, 2022 at 8:28 AM Vinícius Ferrão via Users
 wrote:
>
> Hi again, I don’t know if it will be possible to import the storage domain 
> due to conflicts with the UUID of the LVM devices. I’ve tried to issue a 
> vgimportclone to chance the UUIDs and import the volume but it still does not 
> shows up on oVirt.

LVM can change VG/PV UUIDs and names, but storage domain metadata kept in the
VG tags and volume metadata area contain the old VG and PV names and UUIDs,
so it is unlikely to work.

The system is designed so if the original PVs are bad, you can
disconnect them and
connect a backup of the PVs, and import the storage domain again to the system.

Can you explain in more details what you are trying to do?

Nir

> I don’t know how to mount the iSCSI volume to recover the data. The data is 
> there but it’s extremely difficult to get it.
>
> Any ideias?
>
> Thanks.
>
>
> > On 3 Mar 2022, at 20:56, Vinícius Ferrão  wrote:
> >
> > I think I’ve found the root cause, and it’s the LVM inside the iSCSI volume:
> >
> > [root@rhvh5 ~]# pvscan
> >  WARNING: Not using device /dev/mapper/36589cfc00db9cf56949c63d338ef 
> > for PV fTIrnd-gnz2-dI8i-DesK-vIqs-E1BK-mvxtha.
> >  WARNING: PV fTIrnd-gnz2-dI8i-DesK-vIqs-E1BK-mvxtha prefers device 
> > /dev/mapper/36589cfc006f6c96763988802912b because device is used by LV.
> >  PV /dev/mapper/36589cfc006f6c96763988802912bVG 
> > 9377d243-2c18-4620-995f-5fc680e7b4f3   lvm2 [<10.00 TiB / 7.83 TiB free]
> >  PV /dev/mapper/36589cfc00a1b985d3908c07e41adVG 
> > 650b0003-7eec-4fa5-85ea-c019f6408248   lvm2 [199.62 GiB / <123.88 GiB free]
> >  PV /dev/mapper/3600605b00805d8a01c2180fd0d8d8dad3   VG rhvh_rhvh5  
> >lvm2 [<277.27 GiB / 54.55 GiB free]
> >  Total: 3 [<10.47 TiB] / in use: 3 [<10.47 TiB] / in no VG: 0 [0   ]
> >
> > The device that’s not being using is the snapshot. There’s a way to change 
> > the ID of the device so I can import the data domain?
> >
> > Thanks.
> >
> >> On 3 Mar 2022, at 20:21, Vinícius Ferrão via Users  wrote:
> >>
> >> Hello,
> >>
> >> I need to import an old snapshot of my Data domain but oVirt does not find 
> >> the snapshot version when importing on the web interface.
> >>
> >> To be clear, I’ve mounted a snapshot on my storage, and exported it on 
> >> iSCSI. I was expecting that I could be able to import it on the engine.
> >>
> >> On the web interface this Import Pre-Configured Domain finds the relative 
> >> IQN but it does not show up as a target.
> >>
> >> Any ideas?
> >>
> >>
> >> ___
> >> Users mailing list -- users@ovirt.org
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct: 
> >> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives: 
> >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3WEQQHZ46DKQJXHVX5QF4S2UVBYF4URR/
> >
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3XDKQK32V6E4K3IB7BLY5XOGDNHJBW3L/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2EY4WRDMMKZRDBIWSQEHQPMHODHMVZOA/


[ovirt-users] Re: Unable to upload ISO Image in ovirt 4.4.10

2022-03-06 Thread Nir Soffer
On Sun, Mar 6, 2022 at 11:42 PM Patrick Hibbs  wrote:
>
> I set up a new ovirt test instance on a clean Rocky Linux 8.5 server
> with a custom apache cert about two weeks ago.

Do you have single server used both for running ovirt-engine and
as a hypervisor? This requires special configuration. If engine is
not running on the hypervisor, for example engine runs in a VM or
on another host, everything should work out of the box.

> Uploading a test image
> via the browser didn't work until I changed the .truststore file.

.truststore file where?

> I should also point out that I also had to set the cert in apache's
> config. Simply replacing the symlink in the cert directory didn't work
> as it wasn't pointing to it at all. (Instead it was pointing at some
> snakeoil cert generated by apache.) Granted, the apache issue is
> specific to Rocky, but the imageio service is definitely in ovirt's
> full control.
>
> If the imageio service is supposed to work out of the box with a custom
> certificate, there's something amiss.

These are the defaults:

$ ovirt-imageio --show-config | jq '.tls'
{
  "ca_file": "/etc/pki/ovirt-engine/apache-ca.pem",
  "cert_file": "/etc/pki/ovirt-engine/certs/apache.cer",
  "enable": true,
  "enable_tls1_1": false,
  "key_file": "/etc/pki/ovirt-engine/keys/apache.key.nopass"
}

The ovirt-imageio service works with apache configuration files.
If these symlinks point to the right files, everything should work
out of the box.

If you change the apache PKI files, you need to modify ovirt-imageio
configuration by adding a drop-in configuration file with the right
configuration:

$ cat /etc/ovirt-imageio/conf.d/99-local.conf
[tls]
key_file = /path/to/keyfile
cert_file = /path/to/certfile
ca_file = /path/to/cafile

Note: the following configuration *must not* change:

$ ovirt-imageio --show-config | jq '.backend_http'
{
  "buffer_size": 8388608,
  "ca_file": "/etc/pki/ovirt-engine/ca.pem"
}

This CA file is used to access the hosts, which are managed by
ovirt-engine self signed CA, and cannot be replaced.

> WARNING: Small rant follows:
>
> Yes, I could have changed a config file instead of changing
> .truststore, but it's just another way to achieve the same result. (And
> the one I discovered back in ovirt 3.x.) It doesn't make the process
> any eaiser, if anything it's just another option to check if something
> goes wrong. Instead of checking only .truststore, Now we have to check
> .truststore, and any number of extra config files for a redirect
> statement, and the load ordering of those config files, *and* whether
> or not those redirect statements point to a valid cert or not. Instead
> of having just one place to troubleshoot, now there's at least four.
> The config file change also doesn't make it any eaiser to perform those
> changes. You still need to manually make these changes via ssh on the
> engine host. Why would I want to advice changing a config file, and
> risk that much of an additional mess to deal with in support, when I
> can tell them one specific file to fix that has none of these extras to
> deal with? Personally, I would choose the option with less chance for
> human error.

It is clear that you think we can have a better way to configure the system,
but it is not clear what is the issue and what is the better solution.

Can you explain in more details what is the problem with the documented
solution on using custom PKI files, and what is the easier way to do this?

If we have an easier way, it should be documented.

Nir

> On Sun, 2022-03-06 at 21:54 +0200, Nir Soffer wrote:
> > On Sun, Mar 6, 2022 at 9:42 PM  wrote:
> > >
> > > I don't have the file "ovirt-imageio-proxy" on my system, is there
> > > another file that I should be looking at?  Once I locate the
> > > correct file what content in the file needs to change?
> > >
> > > I'm using  the latest release of "Firefox/91.6.0" as my browser,
> > > and i import the "Engine CA" after the fact.  However, after the
> > > import I tried again and got the same results.
> >
> > In oVirt 4.4 the ovirt-imageio-proxy service was replaced with the
> > ovirt-imageio service.
> >
> > The built-in configuration should work with the default (self signed)
> > CA and with custom
> > CA without any configuration change.
> >
> > Is this all-in-one installation, when ovirt-engine is installed on
> > the
> > single hypervisor,
> > and the same host is added later as an hypervisor?
> >
> > To make sure you configured the browser correctly, please open the
> > "upload" dialog
> > and click the "Test

[ovirt-users] Re: Unable to upload ISO Image in ovirt 4.4.10

2022-03-06 Thread Nir Soffer
On Sat, Mar 5, 2022 at 4:25 PM Patrick Hibbs  wrote:
>
> That's typically one of three issues:
>
> 1. You've replaced the certificate used by apache, but haven't updated
> the configuration for the ovirt-imageio-proxy.

This should never be needed in 4.4.

> 2. You're using an older web browser. (Pale Moon, Waterfox, older
> versions of Chrome, etc.)
>
> 3. There's an issue that causes the initial transfers (upload or
> download) to fail once or twice every ~24 hours.

We had an issue when upload from the browser fails randomly:
https://bugzilla.redhat.com/1977276

Fix is available since 4.4.9, so it should not happen in 4.4.10.

> If it's the first issue, a quick fix is to simply add your new
> certificate's CA cert to the hidden /etc/pki/ovirt-engine/.truststore
> file (It's a java keystore.) then restart the engine and imageio-proxy.

In 4.4, you need to restart the ovirt-imageio service to pick the new CA
used to communicate with the browser.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/35NTZMHNJYRXB43PPLFK5AMUKHIKAIPT/


[ovirt-users] Re: Unable to upload ISO Image in ovirt 4.4.10

2022-03-06 Thread Nir Soffer
On Sun, Mar 6, 2022 at 9:42 PM  wrote:
>
> I don't have the file "ovirt-imageio-proxy" on my system, is there another 
> file that I should be looking at?  Once I locate the correct file what 
> content in the file needs to change?
>
> I'm using  the latest release of "Firefox/91.6.0" as my browser,  and i 
> import the "Engine CA" after the fact.  However, after the import I tried 
> again and got the same results.

In oVirt 4.4 the ovirt-imageio-proxy service was replaced with the
ovirt-imageio service.

The built-in configuration should work with the default (self signed)
CA and with custom
CA without any configuration change.

Is this all-in-one installation, when ovirt-engine is installed on the
single hypervisor,
and the same host is added later as an hypervisor?

To make sure you configured the browser correctly, please open the
"upload" dialog
and click the "Test connection" button. If the testing the connection
works the browser
can communicate with the ovirt-imageio service and your system is
ready for upload
from the browser.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4JN43TC3CGMLUZW6OCUTEZHQDNJDMRNP/


[ovirt-users] Re: Correct method to upload ISOs

2022-03-02 Thread Nir Soffer
On Wed, Mar 2, 2022 at 5:53 PM nroach44--- via Users  wrote:
>
> Via the WebUI.
>
> Disks > Upload > Select iso locally, select normal data repo etc > Go

Sounds like a bug in the javascript code detecting ISO type, or maybe
this is not an ISO file but a qcow2 image.

What does this show:

   qemu-img info my.iso

If this shows a raw image, this may be an ISO.

In this case I would like to test this ISO image. If it is not public,
can you share the first 64K of the file?

You can do this:

dd if=my.iso bs=64K count=1 of=head.iso
gzip head.iso

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VOYYZCFBI6DKXFNU6IC72Q6JBHU2KAW2/


[ovirt-users] Re: Correct method to upload ISOs

2022-03-02 Thread Nir Soffer
On Wed, Mar 2, 2022 at 4:11 PM nroach44--- via Users  wrote:
>
> As ISO domains are deprecated, what's the correct/modern procedure to upload 
> ISOs to install / boot from?
>
> When I upload them to a data domain, they get converted into QCOW2 images (as 
> confirmed by qemu-img info), but attached like ISOs to the qemu process. This 
> means the VMs won't boot, until I manually overwrite the disk image on the 
> datastore with the ISO directly (which works fine).
>
> My cluster started on 4.4 just after release, and is fully updated, if that 
> changes things.

How do you upload?

Please share the exact command used.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6TPGXAEBFNECMRQKOW2SB7UR5YYSQQLS/


[ovirt-users] Re: Deleting Snapshot failed

2022-03-02 Thread Nir Soffer
On Sat, Feb 26, 2022 at 8:56 PM Jonathan Baecker  wrote:
>
> Hello everybody,
>
> last night my backup script was not able to finish the backup from a VM in 
> the last step of deleting the snapshot. And now I also can not delete this 
> snapshot by hand, the message says:
>
> VDSM onode2 command MergeVDS failed: Drive image file could not be found: 
> {'driveSpec': {'poolID': 'c9baa5d4-3543-11eb-9c0c-00163e33f845', 'volumeID': 
> '024e1844-c19b-40d8-a2ac-cb4ea6ec34e6', 'imageID': 
> 'ad23c0db-1838-4f1f-811b-2b213d3a11cd', 'domainID': 
> '3cf83851-1cc8-4f97-8960-08a60b9e25db'}, 'job': 
> '96c7003f-e111-4270-b922-d9b215aaaea2', 'reason': 'Cannot find drive'}
>
> The full log you found in the attachment.
>
> Any idea?

Which oVirt version?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MTFRPQMULMOV3FPX4NHELYUY2S7XOUJA/


[ovirt-users] Re: oVirt + TrueNAS: Unable to create iSCSI domain - I am missing something obvious

2022-03-02 Thread Nir Soffer
On Wed, Mar 2, 2022 at 3:01 PM David Johnson 
wrote:

> Good morning folks, and thank you in advance.
>
> I am working on migrating my oVirt backing store from NFS to iSCSI.
>
> *oVirt Environment:*
>
> oVirt Open Virtualization Manager
> Software Version:4.4.4.7-1.el8
>
> *TrueNAS environment:*
>
> FreeBSD truenas.local 12.2-RELEASE-p11 75566f060d4(HEAD) TRUENAS amd64
>
>
> The iSCSI share is on a TrueNAS server, exposed to user VDSM and group 36.
>
> oVirt sees the targeted share, but is unable to make use of it.
>
> The latest issue is "Error while executing action New SAN Storage Domain:
> Volume Group block size error, please check your Volume Group
> configuration, Supported block size is 512 bytes."
>
> As near as I can tell, oVirt does not support any block size other than
> 512 bytes, while TrueNAS's smallest OOB block size is 4k.
>

This is correct, oVirt does not support 4k block storage.


>
> I know that oVirt on TrueNAS is a common configuration, so I expect I am
> missing something really obvious here, probably a TrueNAS configuration
> needed to make TrueNAS work with 512 byte blocks.
>
> Any advice would be helpful.
>

You can use NFS exported by TrueNAS. With NFS the underlying block size is
hidden
since direct I/O on NFS does not perform direct I/O on the server.

Another way is to use Managed Block Storage (MBS) - if there a Cinder
driver that can manage
your storage server, you can use MBS disks with any block size. The block
size limit comes from
the traditional lvm based storage domain code. When using MBS, you use one
LUN per disk, and
qemu does not have any issue working with such LUNs.

Check with TrueNAS if they support emulating 512 block size of have another
way to
support clients that do not support 4k storage.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FOPLGL4BQTTDSMDIAJPGGFDFMGDIZ4OT/


[ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-23 Thread Nir Soffer
On Wed, Feb 23, 2022 at 4:20 PM Muli Ben-Yehuda  wrote:
>
> Thanks, Nir and Benny (nice to run into you again, Nir!). I'm a neophyte in 
> ovirt and vdsm... What's the simplest way to set up a development 
> environment? Is it possible to set up a "standalone" vdsm environment to hack 
> support for nvme/tcp or do I need "full ovirt" to make it work?

It should be possible to install vdsm on a single host or vm, and use vdsm
API to bring the host to the right state, and then attach devices and run
vms. But I don't know anyone that can pull this out since simulating what
engine is doing is hard.

So the best way is to set up at least one host and engine host using the
latest 4.5 rpms, and continue from there. Once you have a host, building
vdsm on the host and upgrading the rpms is pretty easy.

My preferred setup is to create vms using virt-manager for hosts, engine
and storage and run all the vms on my laptop.

Note that you must have some traditional storage (NFS/iSCSI) to bring up
the system even if you plan to use only managed block storage (MBS).
Unfortunately when we add MBS support we did have time to fix the huge
technical debt so you still need a master storage domain using one of the
traditional legacy options.

To build a setup, you can use:

- engine vm: 6g ram, 2 cpus, centos stream 8
- hosts vm: 4g ram, 2 cpus, centos stream 8
  you can start with one host and add more hosts later if you want to
test migration.
- storage vm: 2g ram, 2 cpus, any os you like, I use alpine since it
takes very little
  memory and its NFS server is fast.

See vdsm README for instructions how to setup a host:
https://github.com/oVirt/vdsm#manual-installation

For engine host you can follow:
https://ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/#Enabling_the_Red_Hat_Virtualization_Manager_Repositories_install_RHVM

And after that this should work:

dnf install ovirt-engine
engine-setup

Accepting all the defaults should work.

When you have engine running, you can add a new host with
the ip address or dns name of you host(s) vm, and engine will
do everything for you. Note that you must install the ovirt-release-master
rpm on the host before you add it to engine.

Nir

>
> Cheers,
> Muli
> --
> Muli Ben-Yehuda
> Co-Founder and Chief Scientist @ http://www.lightbitslabs.com
> LightOS: The Special Storage Sauce For Your Cloud
>
>
> On Wed, Feb 23, 2022 at 4:16 PM Nir Soffer  wrote:
>>
>> On Wed, Feb 23, 2022 at 2:48 PM Benny Zlotnik  wrote:
>> >
>> > So I started looking in the logs and tried to follow along with the
>> > code, but things didn't make sense and then I saw it's ovirt 4.3 which
>> > makes things more complicated :)
>> > Unfortunately because GUID is sent in the metadata the volume is
>> > treated as a vdsm managed volume[2] for the udev rule generation and
>> > it prepends the /dev/mapper prefix to an empty string as a result.
>> > I don't have the vdsm logs, so I am not sure where exactly this fails,
>> > but if it's after [4] it may be possible to workaround it with a vdsm
>> > hook
>> >
>> > In 4.4.6 we moved the udev rule triggering the volume mapping phase,
>> > before starting the VM. But it could still not work because we check
>> > the driver_volume_type in[1], and I saw it's "driver_volume_type":
>> > "lightos" for lightbits
>> > In theory it looks like it wouldn't take much to add support for your
>> > driver in a future release (as it's pretty late for 4.5)
>>
>> Adding support for nvme/tcp in 4.3 is probably not feasible, but we will
>> be happy to accept patches for 4.5.
>>
>> To debug such issues vdsm log is the best place to check. We should see
>> the connection info passed to vdsm, and we have pretty simple code using
>> it with os_brick to attach the device to the system and setting up the udev
>> rule (which may need some tweaks).
>>
>> Nir
>>
>> > [1] 
>> > https://github.com/oVirt/vdsm/blob/500c035903dd35180d71c97791e0ce4356fb77ad/lib/vdsm/storage/managedvolume.py#L110
>> >
>> > (4.3)
>> > [2] 
>> > https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/clientIF.py#L451
>> > [3] 
>> > https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/storage/hsm.py#L3141
>> > [4] 
>> > https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/virt/vm.py#L3835
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Feb 23, 2022 at 12:44 PM Muli Ben-Yehuda  
>> > wrote:
>

[ovirt-users] Re: [=EXTERNAL=] Re: help using nvme/tcp storage with cinderlib and Managed Block Storage

2022-02-23 Thread Nir Soffer
On Wed, Feb 23, 2022 at 2:48 PM Benny Zlotnik  wrote:
>
> So I started looking in the logs and tried to follow along with the
> code, but things didn't make sense and then I saw it's ovirt 4.3 which
> makes things more complicated :)
> Unfortunately because GUID is sent in the metadata the volume is
> treated as a vdsm managed volume[2] for the udev rule generation and
> it prepends the /dev/mapper prefix to an empty string as a result.
> I don't have the vdsm logs, so I am not sure where exactly this fails,
> but if it's after [4] it may be possible to workaround it with a vdsm
> hook
>
> In 4.4.6 we moved the udev rule triggering the volume mapping phase,
> before starting the VM. But it could still not work because we check
> the driver_volume_type in[1], and I saw it's "driver_volume_type":
> "lightos" for lightbits
> In theory it looks like it wouldn't take much to add support for your
> driver in a future release (as it's pretty late for 4.5)

Adding support for nvme/tcp in 4.3 is probably not feasible, but we will
be happy to accept patches for 4.5.

To debug such issues vdsm log is the best place to check. We should see
the connection info passed to vdsm, and we have pretty simple code using
it with os_brick to attach the device to the system and setting up the udev
rule (which may need some tweaks).

Nir

> [1] 
> https://github.com/oVirt/vdsm/blob/500c035903dd35180d71c97791e0ce4356fb77ad/lib/vdsm/storage/managedvolume.py#L110
>
> (4.3)
> [2] 
> https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/clientIF.py#L451
> [3] 
> https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/storage/hsm.py#L3141
> [4] 
> https://github.com/oVirt/vdsm/blob/b42d4a816b538e00ea4955576a5fe762367be787/lib/vdsm/virt/vm.py#L3835
>
>
>
>
>
>
>
>
> On Wed, Feb 23, 2022 at 12:44 PM Muli Ben-Yehuda  
> wrote:
> >
> > Certainly, thanks for your help!
> > I put cinderlib and engine.log here: 
> > http://www.mulix.org/misc/ovirt-logs-20220223123641.tar.gz
> > If you grep for 'mulivm1' you will see for example:
> >
> > 2022-02-22 04:31:04,473-05 ERROR 
> > [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default 
> > task-10) [36d8a122] Command 'HotPlugDiskVDSCommand(HostName = client1, 
> > HotPlugDiskVDSParameters:{hostId='fc5c2860-36b1-4213-843f-10ca7b35556c', 
> > vmId='e13f73a0-8e20-4ec3-837f-aeacc082c7aa', 
> > diskId='d1e1286b-38cc-4d56-9d4e-f331ffbe830f', addressMap='[bus=0, 
> > controller=0, unit=2, type=drive, target=0]'})' execution failed: 
> > VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = 
> > Failed to bind /dev/mapper/ on to /var/run/libvirt/qemu/21-mulivm1.mapper.: 
> > Not a directory, code = 45
> >
> > Please let me know what other information will be useful and I will prove.
> >
> > Cheers,
> > Muli
> >
> > On Wed, Feb 23, 2022 at 11:14 AM Benny Zlotnik  wrote:
> >>
> >> Hi,
> >>
> >> We haven't tested this, and we do not have any code to handle nvme/tcp
> >> drivers, only iscsi and rbd. Given the path seen in the logs
> >> '/dev/mapper', it looks like it might require code changes to support
> >> this.
> >> Can you share cinderlib[1] and engine logs to see what is returned by
> >> the driver? I may be able to estimate what would be required (it's
> >> possible that it would be enough to just change the handling of the
> >> path in the engine)
> >>
> >> [1] /var/log/ovirt-engine/cinderlib/cinderlib//log
> >>
> >> On Wed, Feb 23, 2022 at 10:54 AM  wrote:
> >> >
> >> > Hi everyone,
> >> >
> >> > We are trying to set up ovirt (4.3.10 at the moment, customer 
> >> > preference) to use Lightbits (https://www.lightbitslabs.com) storage via 
> >> > our openstack cinder driver with cinderlib. The cinderlib and cinder 
> >> > driver bits are working fine but when ovirt tries to attach the device 
> >> > to a VM we get the following error:
> >> >
> >> > libvirt:  error : cannot create file 
> >> > '/var/run/libvirt/qemu/18-mulivm1.dev/mapper/': Is a directory
> >> >
> >> > We get the same error regardless of whether I try to run the VM or try 
> >> > to attach the device while it is running. The error appears to come from 
> >> > vdsm which passes /dev/mapper as the prefered device?
> >> >
> >> > 2022-02-22 09:50:11,848-0500 INFO  (vm/3ae7dcf4) [vdsm.api] FINISH 
> >> > appropriateDevice return={'path': '/dev/mapper/', 'truesize': 
> >> > '53687091200', 'apparentsize': '53687091200'} from=internal, 
> >> > task_id=77f40c4e-733d-4d82-b418-aaeb6b912d39 (api:54)
> >> > 2022-02-22 09:50:11,849-0500 INFO  (vm/3ae7dcf4) [vds] prepared volume 
> >> > path: /dev/mapper/ (clientIF:510)
> >> >
> >> > Suggestions for how to debug this further? Is this a known issue? Did 
> >> > anyone get nvme/tcp storage working with ovirt and/or vdsm?
> >> >
> >> > Thanks,
> >> > Muli
> >> >
> >> > ___
> >> > Users mailing list -- users@ovirt.org
> >> > To unsubscribe send an email to 

[ovirt-users] Re: Random reboots

2022-02-17 Thread Nir Soffer
On Thu, Feb 17, 2022 at 11:58 AM Nir Soffer  wrote:
>
> On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:
> >
> > Hi Nir,
> >
> >
> > Thank you very much for your detailed explanations.
> >
> > The pid 6398 looks like it's HostedEngine:
> >
> > audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 
> > uid=0 auid=4294967295 ses=4294967295 
> > subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
> > reason=booted vm="HostedEngine" uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
> > vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
> > res=success'UID="root" AUID="unset"
> >
> > So, I understand that SanLock has problems with the storage (it loses 
> > connection with NFS storage). The watchdog begins to check connectivity 
> > with the MV and after the established time, the order to
> > reboot the machine.
> >
> > I don't know if I can somehow increase these timeouts, or try to make 
> > sanlock force the reconnection or renewal with the storage and in this way 
> > try to avoid host reboots for this reason.
>
> You can do one of these:
> 1. Use lower timeouts on the NFS server mount, so the NFS server at
> the same time
>the sanlock lease times out.
> 2. Use larger sanlock timeout so sanlock lease time out when the NFS
> server times out.
> 3. Both 1 and 2
>
> The problem is that NFS timeouts are not predictable. In the past we used:
> "timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
> we saw up to a 30 minutes timeout.
>
> In 
> https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
> we change this to the recommended seting:
> "timeo=100,retrans=3"
>
> Which according to the docs, should fail in 60 seconds if all retries
> fail. But practically we
> saw up to 270 seconds timeout with this setting, which does not play
> well with sanlock.
>
> We assumed that the timeout value should not be less than sanlock io timeout
> (10 seconds) but I'm not sure this assumption is correct.
>
> You can smaller timeout value in engine storage domain
> "custom connections parameters"
> - Retransmissions - mapped to "retrans" mount option
> - Timeout (deciseconds) - mapped to "timeo" mount option
>
> For example:
> Retransmissions: 3
> Timeout: 5

Correction:

Timeout: 50 (5 seconds, 50 deciseconds)

>
> Theoretically this will behave like this:
>
> 00:00   retry 1 (5 seconds timeout)
> 00:10   retry 2 (10 seconds timeout)
> 00:30   retry 3 (15 seconds timeout)
> 00:45   request fail
>
> But based on what we see with the defaults, this is likely to take more time.
> If it fails before 140 seconds, the VM will be killed and the host
> will not reboot.
>
> The other way is to increase sanlock timeout, in vdsm configuration.
> note that changing sanlock timeout requires also changing other
> settings (e.g. spm:watchdog_interval).
>
> Add this file on all hosts:
>
> $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
> [spm]
>
> # If enabled, montior the SPM lease status and panic if the lease
> # status is not expected. The SPM host will lose the SPM role, and
> # engine will select a new SPM host. (default true)
> # watchdog_enable = true
>
> # Watchdog check internal in seconds. The recommended value is
> # sanlock:io_timeout * 2. (default 20)
> watchdog_interval = 40
>
> [sanlock]
>
> # I/O timeout in seconds. All sanlock timeouts are computed based on
> # this value. Using larger timeout will make VMs more resilient to
> # short storage outage, but increase VM failover time and the time to
> # acquire a host id. For more info on sanlock timeouts please check
> # sanlock source:
> # https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
> # storage requires larger timeouts, you can increase the value to 15
> # or 20 seconds. If you change this you need to update also multipath
> # no_path_retry. For more info onconfiguring multipath please check
> # /etc/multipath.conf.oVirt is tested only with the default value (10
> # seconds)
> io_timeout = 20
>
>
> You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
> to learn more about sanlock timeouts.
>
> Alternatively, you can make a small change in NFS timeout and small change in
> sanlock timeout to make them work better together.
>
> All this is of course to handle the case when the NFS server is not 
> accessible,
> but this is something that should not happen in a healthy cluster. You need
> to check why the server was not accessible and fix this problem.
>
> Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5MSXZ6PCKQFTMCC3KIFJJWZJXAKCPIAP/


[ovirt-users] Re: Random reboots

2022-02-17 Thread Nir Soffer
On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:
>
> Hi Nir,
>
>
> Thank you very much for your detailed explanations.
>
> The pid 6398 looks like it's HostedEngine:
>
> audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 
> uid=0 auid=4294967295 ses=4294967295 
> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
> reason=booted vm="HostedEngine" uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
> vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
> res=success'UID="root" AUID="unset"
>
> So, I understand that SanLock has problems with the storage (it loses 
> connection with NFS storage). The watchdog begins to check connectivity with 
> the MV and after the established time, the order to
> reboot the machine.
>
> I don't know if I can somehow increase these timeouts, or try to make sanlock 
> force the reconnection or renewal with the storage and in this way try to 
> avoid host reboots for this reason.

You can do one of these:
1. Use lower timeouts on the NFS server mount, so the NFS server at
the same time
   the sanlock lease times out.
2. Use larger sanlock timeout so sanlock lease time out when the NFS
server times out.
3. Both 1 and 2

The problem is that NFS timeouts are not predictable. In the past we used:
"timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
we saw up to a 30 minutes timeout.

In https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
we change this to the recommended seting:
"timeo=100,retrans=3"

Which according to the docs, should fail in 60 seconds if all retries
fail. But practically we
saw up to 270 seconds timeout with this setting, which does not play
well with sanlock.

We assumed that the timeout value should not be less than sanlock io timeout
(10 seconds) but I'm not sure this assumption is correct.

You can smaller timeout value in engine storage domain
"custom connections parameters"
- Retransmissions - mapped to "retrans" mount option
- Timeout (deciseconds) - mapped to "timeo" mount option

For example:
Retransmissions: 3
Timeout: 5

Theoretically this will behave like this:

00:00   retry 1 (5 seconds timeout)
00:10   retry 2 (10 seconds timeout)
00:30   retry 3 (15 seconds timeout)
00:45   request fail

But based on what we see with the defaults, this is likely to take more time.
If it fails before 140 seconds, the VM will be killed and the host
will not reboot.

The other way is to increase sanlock timeout, in vdsm configuration.
note that changing sanlock timeout requires also changing other
settings (e.g. spm:watchdog_interval).

Add this file on all hosts:

$ cat /etc/vdsm/vdsm.conf.d/99-local.conf
[spm]

# If enabled, montior the SPM lease status and panic if the lease
# status is not expected. The SPM host will lose the SPM role, and
# engine will select a new SPM host. (default true)
# watchdog_enable = true

# Watchdog check internal in seconds. The recommended value is
# sanlock:io_timeout * 2. (default 20)
watchdog_interval = 40

[sanlock]

# I/O timeout in seconds. All sanlock timeouts are computed based on
# this value. Using larger timeout will make VMs more resilient to
# short storage outage, but increase VM failover time and the time to
# acquire a host id. For more info on sanlock timeouts please check
# sanlock source:
# https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
# storage requires larger timeouts, you can increase the value to 15
# or 20 seconds. If you change this you need to update also multipath
# no_path_retry. For more info onconfiguring multipath please check
# /etc/multipath.conf.oVirt is tested only with the default value (10
# seconds)
io_timeout = 20


You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
to learn more about sanlock timeouts.

Alternatively, you can make a small change in NFS timeout and small change in
sanlock timeout to make them work better together.

All this is of course to handle the case when the NFS server is not accessible,
but this is something that should not happen in a healthy cluster. You need
to check why the server was not accessible and fix this problem.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XRJXOF3CSDKBKN3ZH3BWAKCWCZ3XETC2/


[ovirt-users] Re: Random reboots

2022-02-16 Thread Nir Soffer
On Wed, Feb 16, 2022 at 9:18 PM Nir Soffer  wrote:
>
> On Wed, Feb 16, 2022 at 5:12 PM Nir Soffer  wrote:
> >
> > On Wed, Feb 16, 2022 at 10:10 AM Pablo Olivera  wrote:
> > >
> > > Hi community,
> > >
> > > We're dealing with an issue as we occasionally have random reboots on
> > > any of our hosts.
> > > We're using ovirt 4.4.3 in production with about 60 VM distributed over
> > > 5 hosts. We've a virtualized engine and a DRBD storage mounted by NFS.
> > > The infrastructure is interconnected by a Cisco 9000 switch.
> > > The last random reboot was yesterday February 14th at 03:03 PM (in the
> > > log it appears as: 15:03 due to our time configuration) of the host:
> > > 'nodo1'.
> > > At the moment of the reboot we detected in the log of the switch a
> > > link-down in the port where the host is connected.
> > > I attach log of the engine and host 'nodo1' in case you can help us to
> > > find the cause of these random reboots.
> >
> >
> > According to messages:
> >
> > 1. Sanlock could not renew the lease for 80 seconds:
> >
> > Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
> > [2017]: s1 check_our_lease failed 80
> >
> >
> > 2. In this case sanlock must terminate the processes holding a lease
> >on the that storage - I guess that pid 6398 is vdsm.
> >
> > Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
> > [2017]: s1 kill 6398 sig 15 count 1
> > Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655258
> > [2017]: s1 kill 6398 sig 15 count 2
>
> pid 6398 is not vdsm:
>
> Feb 14 15:02:51 nodo1 vdsm[4338]
>
> The fact that we see "sig 15" means sanlock is trying to send SIGTERM.
> If pid 6398 is a VM (hosted engine vm?) we would expect to see:
>
> > [2017]: s1 kill 6398 sig 100 count 1
>
> Exactly once - which means run the killpath program registered by libvirt,
> which will terminate the vm.

I reproduce this issue locally - we never use killpath program, because we
don't configure libvirt on_lockfailure in the domain xml.

So we get the default behavior, which is sanlock terminating the vm.

>
> So my guess is that this is not a VM, so the only other option is hosted
> engine broker, using a lease on the whiteboard.
>
> > ...
> > Feb 14 15:03:36 nodo1 sanlock[2017]: 2022-02-14 15:03:36 1655288
> > [2017]: s1 kill 6398 sig 15 count 32
> >
> > 3. Terminating pid 6398 stopped here, and we see:
> >
> > Feb 14 15:03:36 nodo1 wdmd[2033]: test failed rem 19 now 1655288 ping
> > 1655237 close 1655247 renewal 1655177 expire 1655257 client 2017
> > sanlock_a5c35d19-4c34-4571-ac77-1b10de484426:1
>
> According to David, this means we have 19 more attempts to kill the process
> holding the lease.
>
> >
> > 4. So it looks like wdmd rebooted the host.
> >
> > Feb 14 15:08:09 nodo1 kernel: Linux version
> > 4.18.0-193.28.1.el8_2.x86_64 (mockbu...@kbuilder.bsys.centos.org) (gcc
> > version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Oct 22
> > 00:20:22 UTC 2020
> >
> >
> > This is strange, since sanlock should try to kill pid 6398 40 times,
> > and then switch
> > to SIGKILL. The watchdog should not reboot the host before sanlock
> > finish the attempt to kill the processes.
> >
> > David, do you think this is expected? do we have any issue in sanlock?
>
> I discussed it with David (sanlock author). What we see here may be truncated
> logs when a host is rebooted by the watchdog. The last time logs were synced
> to storage was probably Feb 14 15:03:36. Any message written after that was
> lost in the host page cache.
>
>
> > It is possible that sanlock will not be able to terminate a process if
> > the process is blocked on inaccessible storage. This seems to be the
> > case here.
> >
> > In vdsm log we see that storage is indeed inaccessible:
> >
> > 2022-02-14 15:03:03,149+0100 WARN  (check/loop) [storage.check]
> > Checker 
> > '/rhev/data-center/mnt/newstoragedrbd.andromeda.com:_var_nfsshare_data/a5c35d19-4c34-4571-ac77-1b10de484426/dom_md/metadata'
> > is blocked for 60.00 seconds (check:282)
> >
> > But we don't see any termination request - so this host is not the SPM.
> >
> > I guess this host was running the hosted engine vm, which uses a storage 
> > lease.
> > If you lose access to storage, sanlcok will kill the hosted engine vm,
> > so the system
> > can start it elsewhere. If the hosted engine vm is stuck on storage, sanlock
> > cannot kill it and it

[ovirt-users] Re: Random reboots

2022-02-16 Thread Nir Soffer
On Wed, Feb 16, 2022 at 5:12 PM Nir Soffer  wrote:
>
> On Wed, Feb 16, 2022 at 10:10 AM Pablo Olivera  wrote:
> >
> > Hi community,
> >
> > We're dealing with an issue as we occasionally have random reboots on
> > any of our hosts.
> > We're using ovirt 4.4.3 in production with about 60 VM distributed over
> > 5 hosts. We've a virtualized engine and a DRBD storage mounted by NFS.
> > The infrastructure is interconnected by a Cisco 9000 switch.
> > The last random reboot was yesterday February 14th at 03:03 PM (in the
> > log it appears as: 15:03 due to our time configuration) of the host:
> > 'nodo1'.
> > At the moment of the reboot we detected in the log of the switch a
> > link-down in the port where the host is connected.
> > I attach log of the engine and host 'nodo1' in case you can help us to
> > find the cause of these random reboots.
>
>
> According to messages:
>
> 1. Sanlock could not renew the lease for 80 seconds:
>
> Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
> [2017]: s1 check_our_lease failed 80
>
>
> 2. In this case sanlock must terminate the processes holding a lease
>on the that storage - I guess that pid 6398 is vdsm.
>
> Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
> [2017]: s1 kill 6398 sig 15 count 1
> Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655258
> [2017]: s1 kill 6398 sig 15 count 2

pid 6398 is not vdsm:

Feb 14 15:02:51 nodo1 vdsm[4338]

The fact that we see "sig 15" means sanlock is trying to send SIGTERM.
If pid 6398 is a VM (hosted engine vm?) we would expect to see:

> [2017]: s1 kill 6398 sig 100 count 1

Exactly once - which means run the killpath program registered by libvirt,
which will terminate the vm.

So my guess is that this is not a VM, so the only other option is hosted
engine broker, using a lease on the whiteboard.

> ...
> Feb 14 15:03:36 nodo1 sanlock[2017]: 2022-02-14 15:03:36 1655288
> [2017]: s1 kill 6398 sig 15 count 32
>
> 3. Terminating pid 6398 stopped here, and we see:
>
> Feb 14 15:03:36 nodo1 wdmd[2033]: test failed rem 19 now 1655288 ping
> 1655237 close 1655247 renewal 1655177 expire 1655257 client 2017
> sanlock_a5c35d19-4c34-4571-ac77-1b10de484426:1

According to David, this means we have 19 more attempts to kill the process
holding the lease.

>
> 4. So it looks like wdmd rebooted the host.
>
> Feb 14 15:08:09 nodo1 kernel: Linux version
> 4.18.0-193.28.1.el8_2.x86_64 (mockbu...@kbuilder.bsys.centos.org) (gcc
> version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Oct 22
> 00:20:22 UTC 2020
>
>
> This is strange, since sanlock should try to kill pid 6398 40 times,
> and then switch
> to SIGKILL. The watchdog should not reboot the host before sanlock
> finish the attempt to kill the processes.
>
> David, do you think this is expected? do we have any issue in sanlock?

I discussed it with David (sanlock author). What we see here may be truncated
logs when a host is rebooted by the watchdog. The last time logs were synced
to storage was probably Feb 14 15:03:36. Any message written after that was
lost in the host page cache.


> It is possible that sanlock will not be able to terminate a process if
> the process is blocked on inaccessible storage. This seems to be the
> case here.
>
> In vdsm log we see that storage is indeed inaccessible:
>
> 2022-02-14 15:03:03,149+0100 WARN  (check/loop) [storage.check]
> Checker 
> '/rhev/data-center/mnt/newstoragedrbd.andromeda.com:_var_nfsshare_data/a5c35d19-4c34-4571-ac77-1b10de484426/dom_md/metadata'
> is blocked for 60.00 seconds (check:282)
>
> But we don't see any termination request - so this host is not the SPM.
>
> I guess this host was running the hosted engine vm, which uses a storage 
> lease.
> If you lose access to storage, sanlcok will kill the hosted engine vm,
> so the system
> can start it elsewhere. If the hosted engine vm is stuck on storage, sanlock
> cannot kill it and it will reboot the host.

Pablo, can you locate the process with pid 6398?

Looking in hosted engine logs and other logs on the system may reveal what
was this process. When we find the process, we can check the source to
understand
why it was not terminating - likely blocked on the inaccessible NFS server.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XIQP4G2Y3WQYU4C2Q5TQNIOZKQ3U5TR/


[ovirt-users] Re: Random reboots

2022-02-16 Thread Nir Soffer
On Wed, Feb 16, 2022 at 10:10 AM Pablo Olivera  wrote:
>
> Hi community,
>
> We're dealing with an issue as we occasionally have random reboots on
> any of our hosts.
> We're using ovirt 4.4.3 in production with about 60 VM distributed over
> 5 hosts. We've a virtualized engine and a DRBD storage mounted by NFS.
> The infrastructure is interconnected by a Cisco 9000 switch.
> The last random reboot was yesterday February 14th at 03:03 PM (in the
> log it appears as: 15:03 due to our time configuration) of the host:
> 'nodo1'.
> At the moment of the reboot we detected in the log of the switch a
> link-down in the port where the host is connected.
> I attach log of the engine and host 'nodo1' in case you can help us to
> find the cause of these random reboots.


According to messages:

1. Sanlock could not renew the lease for 80 seconds:

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 check_our_lease failed 80


2. In this case sanlock must terminate the processes holding a lease
   on the that storage - I guess that pid 6398 is vdsm.

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 kill 6398 sig 15 count 1
Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655258
[2017]: s1 kill 6398 sig 15 count 2
...
Feb 14 15:03:36 nodo1 sanlock[2017]: 2022-02-14 15:03:36 1655288
[2017]: s1 kill 6398 sig 15 count 32

3. Terminating pid 6398 stopped here, and we see:

Feb 14 15:03:36 nodo1 wdmd[2033]: test failed rem 19 now 1655288 ping
1655237 close 1655247 renewal 1655177 expire 1655257 client 2017
sanlock_a5c35d19-4c34-4571-ac77-1b10de484426:1

4. So it looks like wdmd rebooted the host.

Feb 14 15:08:09 nodo1 kernel: Linux version
4.18.0-193.28.1.el8_2.x86_64 (mockbu...@kbuilder.bsys.centos.org) (gcc
version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Oct 22
00:20:22 UTC 2020


This is strange, since sanlock should try to kill pid 6398 40 times,
and then switch
to SIGKILL. The watchdog should not reboot the host before sanlock
finish the attempt to kill the processes.

David, do you think this is expected? do we have any issue in sanlock?


It is possible that sanlock will not be able to terminate a process if
the process is blocked on inaccessible storage. This seems to be the
case here.

In vdsm log we see that storage is indeed inaccessible:

2022-02-14 15:03:03,149+0100 WARN  (check/loop) [storage.check]
Checker 
'/rhev/data-center/mnt/newstoragedrbd.andromeda.com:_var_nfsshare_data/a5c35d19-4c34-4571-ac77-1b10de484426/dom_md/metadata'
is blocked for 60.00 seconds (check:282)

But we don't see any termination request - so this host is not the SPM.

I guess this host was running the hosted engine vm, which uses a storage lease.
If you lose access to storage, sanlcok will kill the hosted engine vm,
so the system
can start it elsewhere. If the hosted engine vm is stuck on storage, sanlock
cannot kill it and it will reboot the host.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFJZGLGL5QVEDBRQOXOO7C6Y2Q5TK5S3/


[ovirt-users] Re: How I'd like to contribute

2022-02-14 Thread Nir Soffer
On Sat, Feb 12, 2022 at 9:21 AM Glen Jarvis  wrote:

>
> > I would like to know why "vdsm-tool config-lvm-filter" makes
> > you pull your hair out.
> >
> > It was designed to help people configure their system
> > correctly without pulling their hair out trying to
> > understand how lvm filter works, and avoid the many
> > wrong ways it can be used.
>
> Honestly, it is just from a lack of understanding of what is happening
> when it does break (as it is for us when doing a hypervisor install on a
> fresh box). Normally, I can just dig through the stack and figure out what
> is happening. There are enough areas where I don't have some foundations
> that I'm not able to do so here (yet).
>
> For example, I don't quite understand iSCSI, Multipath, etc enough to know
> what is breaking, and why.
>
> I'm accustomed to being able to dig deeper at each level until I can
> generally see what is breaking.
>
> This confusion will go away as I get a bit more experience and have a
> "map" of what is going on on this level.
>
> FWIW, I started reading the Libvirt book previously mentioned and it's
> pretty straight forward. It looks like I just need to get through this
> enough to get a good foothold so that I can figure out how to either fix
> things when they break or point out properly why something is breaking (so
> I can get the appropriate help).
>
> >
> > Do you have some specific areas you would like to improve?
> >
>
> Actually, yes :)  In:
> 1. Learn libvirt more (in progress)

2. Review source for vdsm-tool to have a better sense what is going on
> 3. Setup an iSCSI system so  that I can make luns, build a system, play
> with storage pools this way
>

You may find vdsm/contrib/target tool useful:
https://github.com/oVirt/vdsm/blob/master/contrib/target

This tool makes it simple to create or delete a new iSCSI target for
development purposes.

The best way to add iSCSI server for development, is to create a new VM -
virt-manager is the easier way to do this, and install targetcli and copy
vdsm/contrib/target to the vm.

Then to create a new target you can run:

# ./target create mytarget

Creating target
  target_name:   mytarget
  target_iqn:iqn.2003-01.org.alpine.mytarget
  target_dir:/target/mytarget
  lun_count: 10
  lun_size:  100 GiB
  cache: False
  exists:False

Create target? [N/y]:


You may find the tool source interesting, explaining why we configure
the target in a certain way.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/57ASPEVSTKHWBDYEVAKMSZVFYU6XORML/


[ovirt-users] Re: oVirt 4.3 - Fibre Channel Data Domain as ISO dump.

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 4:31 PM Angus Clarke  wrote:
>
> Hello
>
> RE: oVirt 4.3 - Fibre Channel Data Domain as ISO dump.
>
> Thanks for letting me join the list.

Welcome!

> I added a fibre channel Data domain with a view to using it as an ISO dump 
> however I cannot mount CDs to VMs with this error:
>
> "Error while executing action Change CD: Drive image file could not be found"
>
> A bit of reading suggests there is no way around this when using fibre 
> channel data domains - is this the case?

This works since 4.4.6, but was broken in 4.3. You should upgrade to
4.4 at this point.

> I guess I could present the LUN to a VM and run NFS from there as an 
> alternative option.

Yes NFS works for ISO on data domain in 4.3.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EOOGT64SSEMYXDSZAY2SYPTRWX7X4M3H/


[ovirt-users] Re: ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing mailbox, clearing new mail

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 10:51 AM Petr Kyselák  wrote:
>
> Hi,
> I see a lot of errors in vdsm.log
>
> 2022-02-14 08:42:52,086+0100 ERROR (mailbox-spm) 
> [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing 
> mailbox, clearing new mail (data=b'\xff\xff\xff\xff\  \x00\x00', 
> checksum=, expected=b'\xbfG\x00\x00') 
> (mailbox:602)
> 2022-02-14 08:42:52,087+0100 ERROR (mailbox-spm) 
> [storage.MailBox.SpmMailMonitor] mailbox 66 checksum failed, not clearing 
> mailbox, clearing new mail (data=b'\x00\x00\x00\x00\  \xff\xff', 
> checksum=, expected=b'\x04\xf0\x0b\x00') 
> (mailbox:602)

This can be a real checksum error, meaning random failure on storage,
but is more likely a race in ovirt itself. We had lot of these in the past and
I think we fixed them but it is possible that we have more due to the way
this code works.

> We are running latest ovirt engine and hosts:
> Hosts: ovirt-node-ng-installer-4.4.10-2022020214.el8.iso
> engine: ovirt-engine-4.4.10.6-1.el8.noarch
>
> We have 3 hosts and 8 iSCSI domains. I found similar issue from 2018 
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/FJ6KIEOXEEFFZSJOT2ZF4TRKQ5NCP4OQ/#L7WD2FY25XJCNMB3YMTA4ASKMZGKCDZM
> I am not sure how to determinate which mailbox I should try to "clean". Can 
> anybody help me please?

You don't need to do anything, the mailbox already cleaned up.
This message means that the SPM found bad checksum and drop the
messages in the mailbox.

Processes that sent mail to the SPM will resed dropped mail in 2-3 seconds,
so the issue should be recovered automatically.

I would monitor your logs to check if this is a common issue, or one time
incident. If this error is repeating, please file a vdsm bug and attach complete
log since this host was started.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJSPPHF4GGFT2JG3QAMWVIVI6XE4LY24/


[ovirt-users] Re: How I'd like to contribute

2022-02-11 Thread Nir Soffer
On Fri, Feb 11, 2022 at 7:57 AM Glen Jarvis via Users  wrote:
>
> I am researching how to contribute to the oVirt community. I started here:
>
> https://www.ovirt.org/community/
>
> And, I immediately saw to sign up for this email address and...send us an 
> email saying how you would like to contribute. Visit our mailing lists page 
> for other oVirt mailing lists to sign up for.
>
>
> My answers are: I want to be useful (and give more than I take). I can answer 
> questions on mailing lists, help troubleshoot, write ovirt.ovirt Ansible 
> collections, roles and custom modules. I am a seasoned Python programmer.
>
> My background:
>  - Python programmer for 10+ years
>  - I write custom Ansible modules, roles, playbook, etc.
>  - Previous DBA for Informix (highly certified but who has heard of informix 
> anymore). Postgres and Informix are cousins (both offspring from Ingres)
>   - I have *some* rudimentary knowledge of Virtualization. However, I'm far 
> from an expert
>   - One of my favorite OS's is Qubes (an OS of virtual machines really)
>   - I do a lot of technical training (writing materials and facilitating 
> classes)
>   - I work in an SRE / SysAd role at a large company that puts music in 
> peoples ears (I'm hoping to move some of this from less SysAd and more SRE 
> when with some of this oVirt stuff we're working on).
>
> My intermediate skills
>- I have bought a book on libvirt. But, it's still on my backlog. It feels 
> that I'm always sucking an ocean through a straw so I have to pick and choose 
> what I read next
> - My second favorite OS is Ubuntu as my main desktop (Qubes on separate 
> computer for more secure stuff -- like crypto)
> - I'm just starting to use Virtual Machine Manager to run other OSs on 
> Ubuntu
>
> My Oh-I-have-No-Idea skills:
>- things like luns, iSCSI and the `vdsm-tool config-lvm-filter` are making 
> me pull my hair out.

I would like to know why "vdsm-tool config-lvm-filter" makes you pull
your hair out.

It was designed to help people configure their system correctly without
pulling their hair out trying to understand how lvm filter works, and avoid
the many wrong ways it can be used.

> This is the reason I was frustrated enough to say "Let me join this community 
> so I can learn more how this architecture works."
>- I work for a large company that has a RedHat support contract. We use 
> RHV. I just wrote up this long descriptive case of the problem, uploaded 
> sosreports, added as much detail as I could. But, it's crickets. If I knew 
> more I could debug what was happening more myself.
>
>
> How did I do for an introduction?

Great!

Do you have some specific areas you would like to improve?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GIT5MDRSCNRU5HOCOGUCUVLSRCOXKTUA/


[ovirt-users] Re: hosted engine deployment (v4.4.10) - TASK Check engine VM health - fatal FAILED

2022-02-09 Thread Nir Soffer
On Wed, Feb 9, 2022 at 5:06 PM Gilboa Davara  wrote:
>
>
> On Wed, Feb 9, 2022 at 3:35 PM Nir Soffer  wrote:
>>
>> On Wed, Feb 9, 2022 at 12:47 PM Gilboa Davara  wrote:
>> >
>> > On Wed, Feb 9, 2022 at 1:05 AM Strahil Nikolov  
>> > wrote:
>> >>
>> >> Or just add an exclude in /etc/dnf/dnf.conf
>> >
>> >
>> > I personally added and exclusion to 
>> > /etc/yum.repos.d/CentOS-Stream-AppStream.repo
>> > exclude=qemu*
>> > It allows ovirt-4.4* repos to push a new qemu release, without letting 
>> > CentOS stream break things...
>>
>> But new libvirt versions may require a newer qemu version, and oVirt itself
>> may require a new libvirt version.
>>
>> These kind of excludes are fragile and need constant maintenance.
>>
>> Nir
>
>
> The previous poster proposed a global qemu exclusion.
> I propose a partial qemu exclusion (on centos-streams only), with the 
> assumption that ovirt-required qemu will be pushed directly via the ovirt 
> repo.
> In both cases, this is a temporary measure needed to avoid using the broken 
> qemu pushed by streams.
> In both cases libvirt update from appstreams will get blocked - assuming it 
> requires the broken qemu release.
>
> Do you advise we simply --exclude=qemu* everything we run dnf? I would 
> imagine it's far more dangerous and will block libvirt update just as well.

I don't have a better solution, I just wanted to warn about these excludes.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A46XCFC7PYWPAF7EDS2LU4SDYP34J3XW/


[ovirt-users] Re: hosted engine deployment (v4.4.10) - TASK Check engine VM health - fatal FAILED

2022-02-09 Thread Nir Soffer
On Wed, Feb 9, 2022 at 12:47 PM Gilboa Davara  wrote:
>
> On Wed, Feb 9, 2022 at 1:05 AM Strahil Nikolov  wrote:
>>
>> Or just add an exclude in /etc/dnf/dnf.conf
>
>
> I personally added and exclusion to 
> /etc/yum.repos.d/CentOS-Stream-AppStream.repo
> exclude=qemu*
> It allows ovirt-4.4* repos to push a new qemu release, without letting CentOS 
> stream break things...

But new libvirt versions may require a newer qemu version, and oVirt itself
may require a new libvirt version.

These kind of excludes are fragile and need constant maintenance.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GUCYLOMBEETBHGP7GPPEAWYSYSRSP2LC/


[ovirt-users] Re: RHGS and RHV closing down: could you please put that on the home page?

2022-02-07 Thread Nir Soffer
On Mon, Feb 7, 2022 at 3:04 PM Sandro Bonazzola  wrote:

>
>
> Il giorno lun 7 feb 2022 alle ore 09:28 Thomas Hoberg 
> ha scritto:
>
>> Sandro, I am ever so glad you're fighting on, buon coraggio!
>>
>
> Thanks :-)
>
>
>>
>> Yes, please write a blog post on how oVirt could develop without a
>> commercial downstream product that pays your salaries.
>>
>
> I have no magic recipe but I know oVirt is used in several universities
> with computer science departments. If just 1 student for each of them would
> contribute 1 patch per semester that would help keeping oVirt alive even
> without any downstream company backing it.
> And there are also people in this list like @Jean-Louis Dupond
>  who are contributing fixes, latest is here
> https://github.com/oVirt/ovirt-engine/pull/59  .
> I don't want to write a book on how an opensource project can be healthy,
> I believe there are already out there :-) .
> It would indeed help if some company or foundation would show up and get
> engaged with the project but this is not strictly needed for an open source
> project to be alive.
>
>
>> Ideally you'd add a perspective for current HCI users, many of which
>> chose this approach, because a fault-tolerant SAN or NAS wasn't available.
>>
>
> I'll let the storage team to answer here
>

The oVirt storage team never worked on HCI and we don't plan to work on
it in the future. HCI was designed and maintained by Gluster folks. Our
contribution for HCI was adding 4k support, enabling usage of VDO.

Improving on the HCI side is unlikely to come from Red Hat, but nothing
blocks other companies or contributors from working on this.

Our focus for 4.5 is Managed Block Storage and incremental backup.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5JBTH3JKW23ZRKDTPLBNTIIF3PMFKZ3L/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 3:18 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 03:09:02PM +0200, Nir Soffer wrote:
> > On Fri, Feb 4, 2022 at 11:16 AM Richard W.M. Jones  
> > wrote:
> > >
> > > On Fri, Feb 04, 2022 at 08:42:08AM +, Richard W.M. Jones wrote:
> > > > On Thu, Feb 03, 2022 at 06:31:52PM +0200, Nir Soffer wrote:
> > > > > This is expected on oVirt, our multipath configuration is 
> > > > > intentionally grabbing
> > > > > any device that multipath can work with, even if the device only has 
> > > > > one path.
> > > > > The motivation is to be able to configure a system when only one path 
> > > > > is
> > > > > available (maybe you have an hba/network/server issue), and once the 
> > > > > other
> > > > > paths are available the system will use them transparently.
> > > > >
> > > > > To avoid this issue with local devices, you need to blacklist the 
> > > > > device.
> > > > >
> > > > > Add this file:
> > > > >
> > > > > $ cat /etc/multipath/conf.d/local.conf
> > > > > blacklist {
> > > > > wwid "QEMU HARDDISK"
> > > > > }
> > > >
> > > > Thanks - for the mailing list record the syntax that worked for me is:
> > > >
> > > > # cat /etc/multipath/conf.d/local.conf
> > > > blacklist {
> > > > wwid ".*QEMU_HARDDISK.*"
> > > > }
> > > >
> > > > > Configuring NFS on some other machine is easy.
> > > > >
> > > > > I'm using another VM for this, so I can easily test negative flows 
> > > > > like stopping
> > > > > or restarting the NFS server while it is being used by vms or storage
> > > > > operations.
> > > > > I'm using 2G alpine vm for this, it works fine even with 1G memory.
> > > >
> > > > I think I can get local storage working now (I had it working before).
> > >
> > > Well finally it fails with:
> > >
> > > 2022-02-04 09:14:55,779Z ERROR 
> > > [org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand] 
> > > (default task-2) [25a32edf] Command 
> > > 'org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand' 
> > > failed: EngineException: 
> > > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: 
> > > VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, 
> > > error = Could not initialize cluster lock: (), code = 701 (Failed with 
> > > error unexpected and code 16)
> >
> > The error "Could not initialize cluster lock" comes from vdsm. Usually
> > engine log is
> > not the best way to debug such failures. This is only the starting
> > point and you need to
> > go to the host and check vdsm and supervdsm logs in /var/log/vdsm/.
>
> I can't really see anything relevant in supervdsm.log, it's all fairly
> neutral debug messages.
>
> > Since this error
> > comes from sanlock, we also may have useful info in /var/log/sanlock.log.
>
> Interesting:
>
> 2022-02-04 13:15:27 16723 [826]: open error -13 EACCES: no permission to open 
> /rhev/data-center/mnt/_dev_sdb1/13a731d2-e1d2-4998-9b02-ac46899e3159/dom_md/ids
> 2022-02-04 13:15:27 16723 [826]: check that daemon user sanlock 179 group 
> sanlock 179 has access to disk or file.
>
> I think it's quite likely that the sanlock daemon does not have access
> here, since (see below) I choown'd the root of the xfs filesystem to
> 36:36 (otherwise vdsm complains).
>
> > Can you share instructions on how to reproduce this issue?
>
> I have one engine and one node (both happen to be VMs, but I don't
> believe that is relevant here).  It's running Version 4.4.10.6-1.el8.
>
> I added a second disk to the node, and disabled multipath as
> previously discussed.  The second disk is /dev/sdb1.  I formatted it
> as xfs and chowned the root of the filesystem to 36:36.
>
> In the admin portal, Storage -> Domains -> New domain
>
> Storage type: Posix compliant fs
>
> Name: ovirt-data
>
> Path: /dev/sdb1
>
> VFS type: xfs
>
> Hit OK ->
> Error while executing action AddPosixFsStorageDomain: Unexpected exception

Avihai, are we testing this configuration?

I'm not sure this is useful for real users - you need to have the same
device available
on multipath hosts, backed up by a shared file system, like GFS2.

[1] https://en.wikipedia.org/wiki/GFS2

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WYY3GGAQ2JHG5Z5EEURQ5AZ5JP3AE32R/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 3:49 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 03:47:11PM +0200, Nir Soffer wrote:
> > Can be fixed with:
> >
> > $ sudo chcon -R -t nfs_t mnt
>
> Yes that did work, thanks.

Warning: that this configuration is a trap - if you add another host
to this system,
the system will try to mount the same device (/dev/sdb1) on the new
host. Since the
other host does not have the same disk, the mount will fail, and the
other host will
be deactivated, since it cannot access all storage.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7BZBMO5TXNKLBXVGKAUHBGRJAS7VWUHS/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 3:49 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 03:47:11PM +0200, Nir Soffer wrote:
> > Can be fixed with:
> >
> > $ sudo chcon -R -t nfs_t mnt
>
> Yes that did work, thanks.
>
> Is this still a bug?

For NFS this works out of the box - I don't think you need to
relebal anything manually.

As a user I would expect that mounting posix file system will also work
out of the box. For example vdsm ca relabel the mount point when creating
a storage domain.

Also I'm not sure using nfs_t is the right label to use, we better discuss this
with selinux folks.

So I think this issue is worth a bug, in the worst case we will close
it, and the
bug will also document the workaround.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WUXD2DQ6DDCVOK2HNMEOHLJNRWXKWOE2/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 3:18 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 03:09:02PM +0200, Nir Soffer wrote:
> > On Fri, Feb 4, 2022 at 11:16 AM Richard W.M. Jones  
> > wrote:
> > >
> > > On Fri, Feb 04, 2022 at 08:42:08AM +, Richard W.M. Jones wrote:
> > > > On Thu, Feb 03, 2022 at 06:31:52PM +0200, Nir Soffer wrote:
> > > > > This is expected on oVirt, our multipath configuration is 
> > > > > intentionally grabbing
> > > > > any device that multipath can work with, even if the device only has 
> > > > > one path.
> > > > > The motivation is to be able to configure a system when only one path 
> > > > > is
> > > > > available (maybe you have an hba/network/server issue), and once the 
> > > > > other
> > > > > paths are available the system will use them transparently.
> > > > >
> > > > > To avoid this issue with local devices, you need to blacklist the 
> > > > > device.
> > > > >
> > > > > Add this file:
> > > > >
> > > > > $ cat /etc/multipath/conf.d/local.conf
> > > > > blacklist {
> > > > > wwid "QEMU HARDDISK"
> > > > > }
> > > >
> > > > Thanks - for the mailing list record the syntax that worked for me is:
> > > >
> > > > # cat /etc/multipath/conf.d/local.conf
> > > > blacklist {
> > > > wwid ".*QEMU_HARDDISK.*"
> > > > }
> > > >
> > > > > Configuring NFS on some other machine is easy.
> > > > >
> > > > > I'm using another VM for this, so I can easily test negative flows 
> > > > > like stopping
> > > > > or restarting the NFS server while it is being used by vms or storage
> > > > > operations.
> > > > > I'm using 2G alpine vm for this, it works fine even with 1G memory.
> > > >
> > > > I think I can get local storage working now (I had it working before).
> > >
> > > Well finally it fails with:
> > >
> > > 2022-02-04 09:14:55,779Z ERROR 
> > > [org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand] 
> > > (default task-2) [25a32edf] Command 
> > > 'org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand' 
> > > failed: EngineException: 
> > > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: 
> > > VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, 
> > > error = Could not initialize cluster lock: (), code = 701 (Failed with 
> > > error unexpected and code 16)
> >
> > The error "Could not initialize cluster lock" comes from vdsm. Usually
> > engine log is
> > not the best way to debug such failures. This is only the starting
> > point and you need to
> > go to the host and check vdsm and supervdsm logs in /var/log/vdsm/.
>
> I can't really see anything relevant in supervdsm.log, it's all fairly
> neutral debug messages.
>
> > Since this error
> > comes from sanlock, we also may have useful info in /var/log/sanlock.log.
>
> Interesting:
>
> 2022-02-04 13:15:27 16723 [826]: open error -13 EACCES: no permission to open 
> /rhev/data-center/mnt/_dev_sdb1/13a731d2-e1d2-4998-9b02-ac46899e3159/dom_md/ids
> 2022-02-04 13:15:27 16723 [826]: check that daemon user sanlock 179 group 
> sanlock 179 has access to disk or file.

The issue is selinux:

NFS domain:

$ ls -lhZ 
/rhev/data-center/mnt/alpine\:_01/e9467633-ee31-4e15-b3f8-3812b374c764/dom_md/
total 2.3M
-rw-rw. 1 vdsm kvm system_u:object_r:nfs_t:s0 1.0M Feb  4 15:32 ids
-rw-rw. 1 vdsm kvm system_u:object_r:nfs_t:s0  16M Jan 20 23:53 inbox
-rw-rw. 1 vdsm kvm system_u:object_r:nfs_t:s0 2.0M Jan 20 23:54 leases
-rw-r--r--. 1 vdsm kvm system_u:object_r:nfs_t:s0  354 Jan 20 23:54 metadata
-rw-rw. 1 vdsm kvm system_u:object_r:nfs_t:s0  16M Jan 20 23:53 outbox
-rw-rw. 1 vdsm kvm system_u:object_r:nfs_t:s0 1.3M Jan 20 23:53 xleases

The posix domain mount (mounted manually):

$ ls -lhZ mnt/689c22c4-e264-4873-aa75-1aa4970d4366/dom_md/
total 252K
-rw-rw. 1 vdsm kvm system_u:object_r:unlabeled_t:s00 Feb  4 15:23 ids
-rw-rw. 1 vdsm kvm system_u:object_r:unlabeled_t:s0  16M Feb  4 15:23 inbox
-rw-rw. 1 vdsm kvm system_u:object_r:unlabeled_t:s00 Feb  4 15:23 leases
-rw-r--r--. 1 vdsm kvm system_u:object_r:unlabeled_t:s0  316 Feb  4
15:23 metadata
-rw-rw. 1 vdsm kvm system_u:object_r:unlabeled_t:s0  16M Feb  4 15:23 outbox
-rw-rw. 1 vdsm kvm system_u:object_r:

[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 3:18 PM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 03:09:02PM +0200, Nir Soffer wrote:
> > On Fri, Feb 4, 2022 at 11:16 AM Richard W.M. Jones  
> > wrote:
> > >
> > > On Fri, Feb 04, 2022 at 08:42:08AM +, Richard W.M. Jones wrote:
> > > > On Thu, Feb 03, 2022 at 06:31:52PM +0200, Nir Soffer wrote:
> > > > > This is expected on oVirt, our multipath configuration is 
> > > > > intentionally grabbing
> > > > > any device that multipath can work with, even if the device only has 
> > > > > one path.
> > > > > The motivation is to be able to configure a system when only one path 
> > > > > is
> > > > > available (maybe you have an hba/network/server issue), and once the 
> > > > > other
> > > > > paths are available the system will use them transparently.
> > > > >
> > > > > To avoid this issue with local devices, you need to blacklist the 
> > > > > device.
> > > > >
> > > > > Add this file:
> > > > >
> > > > > $ cat /etc/multipath/conf.d/local.conf
> > > > > blacklist {
> > > > > wwid "QEMU HARDDISK"
> > > > > }
> > > >
> > > > Thanks - for the mailing list record the syntax that worked for me is:
> > > >
> > > > # cat /etc/multipath/conf.d/local.conf
> > > > blacklist {
> > > > wwid ".*QEMU_HARDDISK.*"
> > > > }
> > > >
> > > > > Configuring NFS on some other machine is easy.
> > > > >
> > > > > I'm using another VM for this, so I can easily test negative flows 
> > > > > like stopping
> > > > > or restarting the NFS server while it is being used by vms or storage
> > > > > operations.
> > > > > I'm using 2G alpine vm for this, it works fine even with 1G memory.
> > > >
> > > > I think I can get local storage working now (I had it working before).
> > >
> > > Well finally it fails with:
> > >
> > > 2022-02-04 09:14:55,779Z ERROR 
> > > [org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand] 
> > > (default task-2) [25a32edf] Command 
> > > 'org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand' 
> > > failed: EngineException: 
> > > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: 
> > > VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, 
> > > error = Could not initialize cluster lock: (), code = 701 (Failed with 
> > > error unexpected and code 16)
> >
> > The error "Could not initialize cluster lock" comes from vdsm. Usually
> > engine log is
> > not the best way to debug such failures. This is only the starting
> > point and you need to
> > go to the host and check vdsm and supervdsm logs in /var/log/vdsm/.
>
> I can't really see anything relevant in supervdsm.log, it's all fairly
> neutral debug messages.
>
> > Since this error
> > comes from sanlock, we also may have useful info in /var/log/sanlock.log.
>
> Interesting:
>
> 2022-02-04 13:15:27 16723 [826]: open error -13 EACCES: no permission to open 
> /rhev/data-center/mnt/_dev_sdb1/13a731d2-e1d2-4998-9b02-ac46899e3159/dom_md/ids
> 2022-02-04 13:15:27 16723 [826]: check that daemon user sanlock 179 group 
> sanlock 179 has access to disk or file.
>
> I think it's quite likely that the sanlock daemon does not have access
> here, since (see below) I choown'd the root of the xfs filesystem to
> 36:36 (otherwise vdsm complains).
>
> > Can you share instructions on how to reproduce this issue?
>
> I have one engine and one node (both happen to be VMs, but I don't
> believe that is relevant here).  It's running Version 4.4.10.6-1.el8.
>
> I added a second disk to the node, and disabled multipath as
> previously discussed.  The second disk is /dev/sdb1.  I formatted it
> as xfs and chowned the root of the filesystem to 36:36.

Looks right, forgetting to change ownership is a common mistake.

> In the admin portal, Storage -> Domains -> New domain
>
> Storage type: Posix compliant fs
>
> Name: ovirt-data
>
> Path: /dev/sdb1
>
> VFS type: xfs
>
> Hit OK ->
> Error while executing action AddPosixFsStorageDomain: Unexpected exception

I reproduce with vdsm-4.50.0.5-1.el8.x86_64. on RHEL 8.6 nightly.

Can you file a oVirt/vdsm bug for this?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TXPJHOTW4FN7FNLC3WHZP3H5DRKPGOWD/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-04 Thread Nir Soffer
On Fri, Feb 4, 2022 at 11:16 AM Richard W.M. Jones  wrote:
>
> On Fri, Feb 04, 2022 at 08:42:08AM +, Richard W.M. Jones wrote:
> > On Thu, Feb 03, 2022 at 06:31:52PM +0200, Nir Soffer wrote:
> > > This is expected on oVirt, our multipath configuration is intentionally 
> > > grabbing
> > > any device that multipath can work with, even if the device only has one 
> > > path.
> > > The motivation is to be able to configure a system when only one path is
> > > available (maybe you have an hba/network/server issue), and once the other
> > > paths are available the system will use them transparently.
> > >
> > > To avoid this issue with local devices, you need to blacklist the device.
> > >
> > > Add this file:
> > >
> > > $ cat /etc/multipath/conf.d/local.conf
> > > blacklist {
> > > wwid "QEMU HARDDISK"
> > > }
> >
> > Thanks - for the mailing list record the syntax that worked for me is:
> >
> > # cat /etc/multipath/conf.d/local.conf
> > blacklist {
> > wwid ".*QEMU_HARDDISK.*"
> > }
> >
> > > Configuring NFS on some other machine is easy.
> > >
> > > I'm using another VM for this, so I can easily test negative flows like 
> > > stopping
> > > or restarting the NFS server while it is being used by vms or storage
> > > operations.
> > > I'm using 2G alpine vm for this, it works fine even with 1G memory.
> >
> > I think I can get local storage working now (I had it working before).
>
> Well finally it fails with:
>
> 2022-02-04 09:14:55,779Z ERROR 
> [org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand] 
> (default task-2) [25a32edf] Command 
> 'org.ovirt.engine.core.bll.storage.domain.AddPosixFsStorageDomainCommand' 
> failed: EngineException: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: 
> VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, 
> error = Could not initialize cluster lock: (), code = 701 (Failed with error 
> unexpected and code 16)

The error "Could not initialize cluster lock" comes from vdsm. Usually
engine log is
not the best way to debug such failures. This is only the starting
point and you need to
go to the host and check vdsm and supervdsm logs in /var/log/vdsm/.
Since this error
comes from sanlock, we also may have useful info in /var/log/sanlock.log.

Can you share instructions on how to reproduce this issue?

> I think this feature (local storage) no longer works.

This is not local storage, local storage is a storage domain using a
local directory
on a host. This works only when creating a local data center,
basically each host
has its own data center.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SO6SUNKEIGGTJSOD4QQ7DSJ5667C45TZ/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-03 Thread Nir Soffer
On Thu, Feb 3, 2022 at 3:42 PM Richard W.M. Jones  wrote:
>
> On Thu, Feb 03, 2022 at 03:07:20PM +0200, Nir Soffer wrote:
> > On Thu, Feb 3, 2022 at 2:30 PM Richard W.M. Jones  wrote:
> > >
> > >
> > > I'm following the instructions here:
> > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/administration_guide/sect-preparing_and_adding_posix_compliant_file_system_storage
> > >
> > > I've also added an extra virtual disk to my host node which appears as
> > > /dev/sdb.  Although the disk is partitioned, /dev/sdb1 is not created.
> > > Is udev broken in oVirt node?
> > >
> > > I cannot see anywhere in the dialog where you specify the name of the
> > > device (eg. "/dev/sdb1").  So how's it supposed to work?
> > >
> > > It doesn't work, giving an information-free error message:
> > >
> > >   Error while executing action Add Storage Connection: Problem while 
> > > trying to mount target
> >
> > You can find more info on the failure in:
> > /var/log/vdsm/supervdsmd.log
>
> vdsm.storage.mount.MountError: Command ['/usr/bin/mount', '-t', 'xfs', 
> '/srv', '/rhev/data-center/mnt/_srv'] failed with rc=32 out=b'' err=b'mount: 
> /rhev/data-center/mnt/_srv: /srv is not a block device.\n'
>
> I suppose it expects the name of the block device (ie. /dev/sdb)
> rather than the mount point there.
>
> It also turns out the new device has been "captured" by multipathd:
>
> # multipath -ll
> 0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-1 dm-0 QEMU,QEMU HARDDISK
> size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=active
>   `- 0:0:0:1 sdb 8:16 active ready running
>
> I've so far not found a way to disable multipathd effectively.  Even
> stopping and disabling the service and rebooting doesn't help so I
> guess something starts it up.

This is expected on oVirt, our multipath configuration is intentionally grabbing
any device that multipath can work with, even if the device only has one path.
The motivation is to be able to configure a system when only one path is
available (maybe you have an hba/network/server issue), and once the other
paths are available the system will use them transparently.

To avoid this issue with local devices, you need to blacklist the device.

Add this file:

$ cat /etc/multipath/conf.d/local.conf
blacklist {
wwid "QEMU HARDDISK"
}

And run (as root):

   multipathd reconfigure

At this point lsblk will show the expected /dev/sdb1 and multipath
will never use this device again.

Adding a serial to the device in libvirt xml will make it easier to blacklist.

>
> > Posix compliant is basically NFS without some mount options:
> > https://github.com/oVirt/vdsm/blob/878407297cb7dc892110ae5d6b0403ca97249247/lib/vdsm/storage/storageServer.py#L174
> >
> > Using a local device on a host is less tested path, I'm not QE is testing
> > this (Avihai, please correct me if you do).
> >
> > If you have multiple hosts, this will break if the local device does not 
> > have
> > the same name on all hosts (so using /dev/sdb1 is very fragile). If you have
> > one host it can be fine.
> >
> > Any reason to add a device to the vm, instead of using an NFS server?
> >
> > I guess that your purpose is testing virt-v2v with oVirt, so you want to 
> > test
> > a common configuration; NFS is very common for oVirt users.
>
> I don't have an NFS server to use for this.

Configuring NFS on some other machine is easy.

I'm using another VM for this, so I can easily test negative flows like stopping
or restarting the NFS server while it is being used by vms or storage
operations.
I'm using 2G alpine vm for this, it works fine even with 1G memory.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EHXYNOUEI5ETO57ZNLKBJY76S5IVYFO2/


[ovirt-users] Re: Unclear how to add local (POSIX) storage

2022-02-03 Thread Nir Soffer
On Thu, Feb 3, 2022 at 2:30 PM Richard W.M. Jones  wrote:
>
>
> I'm following the instructions here:
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/administration_guide/sect-preparing_and_adding_posix_compliant_file_system_storage
>
> I've also added an extra virtual disk to my host node which appears as
> /dev/sdb.  Although the disk is partitioned, /dev/sdb1 is not created.
> Is udev broken in oVirt node?
>
> I cannot see anywhere in the dialog where you specify the name of the
> device (eg. "/dev/sdb1").  So how's it supposed to work?
>
> It doesn't work, giving an information-free error message:
>
>   Error while executing action Add Storage Connection: Problem while trying 
> to mount target

You can find more info on the failure in:
/var/log/vdsm/supervdsmd.log

Posix compliant is basically NFS without some mount options:
https://github.com/oVirt/vdsm/blob/878407297cb7dc892110ae5d6b0403ca97249247/lib/vdsm/storage/storageServer.py#L174

Using a local device on a host is less tested path, I'm not QE is testing
this (Avihai, please correct me if you do).

If you have multiple hosts, this will break if the local device does not have
the same name on all hosts (so using /dev/sdb1 is very fragile). If you have
one host it can be fine.

Any reason to add a device to the vm, instead of using an NFS server?

I guess that your purpose is testing virt-v2v with oVirt, so you want to test
a common configuration; NFS is very common for oVirt users.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NPJMTKRFVAXPDSJCAFJSYWUE4NS4AS6S/


[ovirt-users] Re: Ovirt host error [Cannot configure LVM filter]

2022-02-01 Thread Nir Soffer
On Tue, Feb 1, 2022 at 5:21 PM Ayansh Rocks
 wrote:
>
>
> [root@iondelsvr14 ~]# vdsm-tool config-lvm-filter
> Analyzing host...
> Found these mounted logical volumes on this host:

No mounted lvm logical volumes...

>
> This is the recommended LVM filter for this host:
>
>   filter = [ "r|.*|" ]

So this filter is correct - lvm should not access any device on this host.

>
> This filter allows LVM to access the local devices used by the
> hypervisor, but not shared storage owned by Vdsm. If you add a new
> device to the volume group, you will need to edit the filter manually.
>
> To use the recommended filter we need to add multipath
> blacklist in /etc/multipath/conf.d/vdsm_blacklist.conf:
>
>   blacklist {
>   wwid "362cea7f0c4c75d0028def9c015471c79"
>   }

Is this host booting from multiapth device?

>
>
> Configure host? [yes,NO]

Does it fail? how?

> [root@iondelsvr14 ~]# lsblk
> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda  8:00 446.6G  0 disk
> ├─sda1   8:10   4.7G  0 part /boot
> ├─sda2   8:20 139.7G  0 part /
> ├─sda3   8:30  46.6G  0 part /home
> ├─sda4   8:40 1K  0 part
> ├─sda5   8:50  46.6G  0 part /opt
> ├─sda6   8:60  18.6G  0 part /tmp
> └─sda7   8:70  29.8G  0 part [SWAP]

So this host is not using lvm, this should be fine, but probably less tested.

> [root@iondelsvr14 ~]# multipath -ll

It is strange that you don't have any multipath device - but
"vdsm-tool config-lvm-filter"
found the wwid 362cea7f0c4c75d0028def9c015471c79 and want to blacklist it.

> [root@iondelsvr14 ~]#
>
>
> On Tue, Feb 1, 2022 at 7:55 PM Nir Soffer  wrote:
>>
>> On Tue, Feb 1, 2022 at 4:17 PM Ayansh Rocks
>>  wrote:
>> >
>> > Hello All,
>> >
>> > I was adding few new hosts into the ovirt cluster but a couple of them 
>> > gave this error.
>> >
>> > Ovirt Release - 4.4.10
>> >
>> > Installing Host iondelsvr14.iontrading.com. Check for LVM filter 
>> > configuration error: Cannot configure LVM filter on host, please run: 
>> > vdsm-tool config-lvm-filter.
>> >
>> > However, few of them are able to configure it.
>>
>> Please share the output of
>>
>> vdsm-tool config-lvm-filter
>>
>> On the hosts that fail to configure the filter.
>>
>> Also share output of:
>>
>> - lsblk
>> - multiapth -ll
>>
>> Nir
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z4JH77OGFJAGP4AVENMFJYPWH57HSQ2P/


[ovirt-users] Re: Ovirt host error [Cannot configure LVM filter]

2022-02-01 Thread Nir Soffer
On Tue, Feb 1, 2022 at 4:17 PM Ayansh Rocks
 wrote:
>
> Hello All,
>
> I was adding few new hosts into the ovirt cluster but a couple of them gave 
> this error.
>
> Ovirt Release - 4.4.10
>
> Installing Host iondelsvr14.iontrading.com. Check for LVM filter 
> configuration error: Cannot configure LVM filter on host, please run: 
> vdsm-tool config-lvm-filter.
>
> However, few of them are able to configure it.

Please share the output of

vdsm-tool config-lvm-filter

On the hosts that fail to configure the filter.

Also share output of:

- lsblk
- multiapth -ll

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QECQAVURUQ2PBESYLC4HFLKSAY4T4JLC/


[ovirt-users] Re: What happened to oVirt engine-setup?

2022-01-26 Thread Nir Soffer
On Wed, Jan 26, 2022 at 8:32 AM Yedidyah Bar David  wrote:

> > > Do we recommend RHEL or CentOS as the initial OS?
> >
> > For 4.4.10 your best option is RHEL 8.5.
>
> At least in theory. Do you do this yourself? Heard of others doing it?

I've been running oVirt on RHEL for years for development.

This is basically RHV without the branding and support.
Why do you think this is an unused configuration?

> I personally only use, for development, either CentOS Stream 8 + oVirt,
> or RHEL+RHV. Over the years we did get reports of people using RHEL+oVirt,
> and also fixed a few bugs around this. But if not many people do, we'll
> not hear about the problems, if any, so won't fix them...
>
> >
> > I'm not sure about the status of the RHEL clones (e.g. Rocky). In the past
>
> We did get here a few reports about this. I didn't try this myself.
> Generally speaking, it should work, with current master (4.5), and if
> not, should be rather easy to patch/fix. Not sure about 4.4.
>
> > they were missing the advanced virtualization packages, and could be
> > installed using Centos advanced virtuatation packages which probably are not
> > available now (EOL).
>
> Not sure why you think so. Isn't that exactly the same thing people
> are using when using CentOS? What's the difference? Indeed, if people
> want to use Rocky/Alma/etc and insist on not using anything from CentOS,
> they'll have to rebuild these manually, unless there is already a
> relevant repo with these rebuilt on Rocky/Alma/etc., which I am not
> aware of.

The difference is consuming the advanced virtualization from centos repos
which may disappear since centos is EOL. I don't think we can recommend
such deployment to anyone.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FSUWEMPSWNVSRNFYT73JWZFCCENZ7NGM/


[ovirt-users] Re: What happened to oVirt engine-setup?

2022-01-25 Thread Nir Soffer
On Tue, Jan 25, 2022 at 7:21 PM Richard W.M. Jones  wrote:
>
> On Tue, Jan 25, 2022 at 06:37:24PM +0200, Nir Soffer wrote:
> > On Tue, Jan 25, 2022 at 5:46 PM Richard W.M. Jones  
> > wrote:
> > >
> > > A while back I had oVirt 4.4.7 installed which I used for testing.
> > > For some reason that installation has died in some way, so I'm trying
> > > to install a fresh new oVirt 4.4.10.
> > >
> > > Last time I installed ovirt, it was very easy - I provisioned a couple
> > > of machines, ran engine-setup in one, answered a few questions and
> > > after a few minutes the engine was installed.
> > >
> > > Somehow this has changed and now it's really far more complicated,
> > > involving some ansible things and wanting to create VMs and ssh
> > > everywhere.
> > >
> > > Can I go back to the old/easy way of installing oVirt engine?  And if
> > > so, what happened to the instructions for that?
> >
> > engine-setup still works, maybe you can give move details on what went 
> > wrong?
>
> So I managed to dnf install /usr/bin/engine-setup.  When I ran it, it
> wanted to connect to an external PostgreSQL server.  I'm pretty sure
> that never happened last time.
>
> TBH I'm also going to erase everything and start again because I've
> been round several loops here already.
>
> Do we recommend RHEL or CentOS as the initial OS?

For 4.4.10 your best option is RHEL 8.5.

I'm not sure about the status of the RHEL clones (e.g. Rocky). In the past
they were missing the advanced virtualization packages, and could be
installed using Centos advanced virtuatation packages which probably are not
available now (EOL).

If you cannot use RHEL, Centos Stream 8 is the only option. This is the
version we test on in ovirt system tests.

I don't know about any issue with Centos Stream 8 for engine host, but for
hosts you are stuck with qemu-6.0, since qemu-6.1 is broken, and 6.2 is
not available.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/S4BYUHWLBWVPFPEUX6HBZPAA3OMW5V57/


[ovirt-users] Re: What happened to oVirt engine-setup?

2022-01-25 Thread Nir Soffer
On Tue, Jan 25, 2022 at 5:46 PM Richard W.M. Jones  wrote:
>
> A while back I had oVirt 4.4.7 installed which I used for testing.
> For some reason that installation has died in some way, so I'm trying
> to install a fresh new oVirt 4.4.10.
>
> Last time I installed ovirt, it was very easy - I provisioned a couple
> of machines, ran engine-setup in one, answered a few questions and
> after a few minutes the engine was installed.
>
> Somehow this has changed and now it's really far more complicated,
> involving some ansible things and wanting to create VMs and ssh
> everywhere.
>
> Can I go back to the old/easy way of installing oVirt engine?  And if
> so, what happened to the instructions for that?

engine-setup still works, maybe you can give move details on what went wrong?

When working with the current development version, sometimes upgrading
engine fails and requires manual steps, or dropping reinstalling. But since you
use a stable version it should always work.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3FFANMC5T2W5RN5Q5DOX72GF4Z4FPXIP/


[ovirt-users] Re: Error while removing snapshot: Unable to get volume info

2022-01-10 Thread Nir Soffer
On Mon, Jan 10, 2022 at 5:22 PM Francesco Lorenzini via Users <
users@ovirt.org> wrote:

> My problem should be the same as the one filed here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1948599
>
> So, if I'm correct I must edit DB entries to fix the situations. Although
> I don't like to operate directly the DB, I'll try that and let you know if
> I resolve it.
>

It looks like the volume on vdsm side was already removed, so when engine
try to merge the merge fails.

This is an engine bug - it should handle this case and remove the illegal
snapshot
in the db. But since it does not, you have to do this manually.

Please file an engine bug for this issue.


>
> In the meanwhile, if anyone has any tips or suggestion that doesn't
> involve editing the DB, much appreciate it.
>

I don't think there is another way.

Nir


>
> Regards,
> Francesco
>
> Il 10/01/2022 10:33, francesco--- via Users ha scritto:
>
> Hi all,
>
> I'm trying to remove a snapshot from a HA VM in a setup with glusterfs (2 
> nodes C8 stream oVirt 4.4 + 1 arbiter C8). The error that appears in the vdsm 
> log of the host is:
>
> 2022-01-10 09:33:03,003+0100 ERROR (jsonrpc/4) [api] FINISH merge error=Merge 
> failed: {'top': '441354e7-c234-4079-b494-53fa99cdce6f', 'base': 
> 'fdf38f20-3416-4d75-a159-2a341b1ed637', 'job': 
> '50206e3a-8018-4ea8-b191-e4bc859ae0c7', 'reason': 'Unable to get volume info 
> for domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 
> 441354e7-c234-4079-b494-53fa99cdce6f'} (api:131)
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 285, 
> in merge
> drive.domainID, drive.poolID, drive.imageID, job.top)
>   File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5988, in 
> getVolumeInfo
> (domainID, volumeID))
> vdsm.virt.errors.StorageUnavailableError: Unable to get volume info for 
> domain 574a3cd1-5617-4742-8de9-4732be4f27e0 volume 
> 441354e7-c234-4079-b494-53fa99cdce6f
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in 
> method
> ret = func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 776, in merge
> drive, baseVolUUID, topVolUUID, bandwidth, jobUUID)
>   File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5833, in merge
> driveSpec, baseVolUUID, topVolUUID, bandwidth, jobUUID)
>   File "/usr/lib/python3.6/site-packages/vdsm/virt/livemerge.py", line 288, 
> in merge
> str(e), top=top, base=job.base, job=job_id)
>
> The volume list in the host differs from the engine one:
>
> HOST:
>
> vdsm-tool dump-volume-chains 574a3cd1-5617-4742-8de9-4732be4f27e0 | grep -A10 
> 0b995271-e7f3-41b3-aff7-b5ad7942c10d
>image:0b995271-e7f3-41b3-aff7-b5ad7942c10d
>
>  - fdf38f20-3416-4d75-a159-2a341b1ed637
>status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, 
> type: SPARSE, capacity: 53687091200, truesize: 44255387648
>
>  - 10df3adb-38f4-41d1-be84-b8b5b86e92cc
>status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: 
> SPARSE, capacity: 53687091200, truesize: 7335407616
>
> ls -1 0b995271-e7f3-41b3-aff7-b5ad7942c10d
> 10df3adb-38f4-41d1-be84-b8b5b86e92cc
> 10df3adb-38f4-41d1-be84-b8b5b86e92cc.lease
> 10df3adb-38f4-41d1-be84-b8b5b86e92cc.meta
> fdf38f20-3416-4d75-a159-2a341b1ed637
> fdf38f20-3416-4d75-a159-2a341b1ed637.lease
> fdf38f20-3416-4d75-a159-2a341b1ed637.meta
>
>
> ENGINE:
>
> engine=# select * from images where 
> image_group_id='0b995271-e7f3-41b3-aff7-b5ad7942c10d';
> -[ RECORD 1 ]-+-
> image_guid| 10df3adb-38f4-41d1-be84-b8b5b86e92cc
> creation_date | 2022-01-07 11:23:43+01
> size  | 53687091200
> it_guid   | ----
> parentid  | 441354e7-c234-4079-b494-53fa99cdce6f
> imagestatus   | 1
> lastmodified  | 2022-01-07 11:23:39.951+01
> vm_snapshot_id| bd2291a4-8018-4874-a400-8d044a95347d
> volume_type   | 2
> volume_format | 4
> image_group_id| 0b995271-e7f3-41b3-aff7-b5ad7942c10d
> _create_date  | 2022-01-07 11:23:41.448463+01
> _update_date  | 2022-01-07 11:24:10.414777+01
> active| t
> volume_classification | 0
> qcow_compat   | 2
> -[ RECORD 2 ]-+-
> image_guid| 441354e7-c234-4079-b494-53fa99cdce6f
> creation_date | 2021-12-15 07:16:31.647+01
> size  | 53687091200
> it_guid   | ----
> parentid  | fdf38f20-3416-4d75-a159-2a341b1ed637
> imagestatus   | 1
> lastmodified  | 2022-01-07 11:23:41.448+01
> vm_snapshot_id| 2d610958-59e3-4685-b209-139b4266012f
> volume_type   | 2
> 

[ovirt-users] Re: using stop_reason as a vdsm hook trigger into the UI

2021-12-20 Thread Nir Soffer
On Mon, Dec 20, 2021 at 9:59 PM Nathanaël Blanchet  wrote:

Adding the devel list since question is more about extending oVirt
...
> The idea is to use the stop_reason element into the vm xml definition. But 
> after hours, I realized that this element is writed to the vm definition file 
> only after the VM has been destroyed.

So you want to run the clean hook only if stop reason == "clean"?

I think the way to integrate hooks is to define a custom property
in the vm, and check if the property was defined in the hook.

For example how the localdisk hook is triggered:

def main():
backend = os.environ.get('localdisk')
if backend is None:
return
if backend not in [BACKEND_LVM, BACKEND_LVM_THIN]:
hooking.log("localdisk-hook: unsupported backend: %r" % backend)
return
...

The hook runs only if the environment variable "localdisk" is defined
and configured properly.

vdsm defines the custom properties as environment variables.

On the engine side, you need to add a user defined property:

 engine-config -s UserDefinedVMProperties='localdisk=^(lvm|lvmthin)$'

And configure a custom property with one of the allowed values, like:

localdisk=lvm

See vdsm_hooks/localdisk/README for more info.

If you want to control the cleanup, by adding a "clean" stop reason only when
needed, this will not help, and vdsm hook is probably not the right way
to integrate this.

If your intent is to clean a vm in some special events, but you want
to integrate
this in engine, maybe you should write an engine ui plugin?

The plugin can show the running vms, and provide a clean button that will
shut down the vm and run your custom code.

But maybe you don't need to integrate this in engine, and having a simple
script using ovirt engine API/SDK to shutdown the vm and run the cleanup
code.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V2RYRKKGEPK7PASKYDLD6WZ5F2G6P4KH/


[ovirt-users] Re: vdsm-client delete_checkpoints

2021-12-20 Thread Nir Soffer
On Mon, Dec 20, 2021 at 7:42 PM Tommaso - Shellrent via Users
 wrote:
>
> Hi, someone can give to use us an exemple of the command
>
> vdsm-client VM delete_checkpoints
>
> ?
>
> we have tried a lot of combinations like:
>
> vdsm-client VM delete_checkpoints vmID="ce5d0251-e971-4d89-be1b-4bc28283614c" 
> checkpoint_ids=["e0c56289-bfb3-4a91-9d33-737881972116"]

Why do you need to use this with vdsm?

You should use the ovirt engine API or SDK. We have example script here:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/remove_checkpoint.py

$ ./remove_checkpoint.py -h
usage: remove_checkpoint.py [-h] -c CONFIG [--debug] [--logfile
LOGFILE] vm_uuid checkpoint_uuid

Remove VM checkpoint

positional arguments:
  vm_uuid   VM UUID for removing checkpoint.
  checkpoint_uuid   The removed checkpoint UUID.

options:
  -h, --helpshow this help message and exit
  -c CONFIG, --config CONFIG
Use engine connection details from [CONFIG]
section in ~/.config/ovirt.conf.
  --debug   Log debug level messages to logfile.
  --logfile LOGFILE Log file name (default example.log).

Regardless, if you really need to use vdsm client, here is how to use
the client correctly.

$ vdsm-client VM delete_checkpoints -h
usage: vdsm-client VM delete_checkpoints [-h] [arg=value [arg=value ...]]

positional arguments:
  arg=value   vmID: A UUID of the VM
  checkpoint_ids: List of checkpoints ids to delete,
ordered from oldest to newest.


  JSON representation:
  {
  "vmID": {
  "UUID": "UUID"
  },
  "checkpoint_ids": [
  "string",
  {}
  ]
  }

optional arguments:
  -h, --help  show this help message and exit

The vdsm-client is a tool for developers and support, not for users, so we did
not invest in creating an easy to use command line interface. Instead we made
sure that this tool will need zero maintenance - we never have to change it when
we add or change new APIs, because is simply get a json from the user, and
pass it to vdsm as is.

So when you see the JSON representation, you can build the json request
like this:

$ cat args.json
{
"vmID": "6e95d38f-d9b8-4955-878c-da6d631d0ab2",
"checkpoint_ids": ["b8f3f8e0-660e-49e5-bbb0-58a87ed15b13"]
}

You need to run the command with the -f flag:

$ sudo vdsm-client -f args.json VM delete_checkpoints
{
"checkpoint_ids": [
"b8f3f8e0-660e-49e5-bbb0-58a87ed15b13"
]
}

If you need to automate this, it will be easier to write a python script using
the vdsm client library directly:

$ sudo python3
Python 3.6.8 (default, Oct 15 2021, 10:57:33)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from vdsm import client
>>> c = client.connect("localhost")
>>> c.VM.delete_checkpoints(vmID="6e95d38f-d9b8-4955-878c-da6d631d0ab2", 
>>> checkpoint_ids=["b8f3f8e0-660e-49e5-bbb0-58a87ed15b13"])
{'checkpoint_ids': ['b8f3f8e0-660e-49e5-bbb0-58a87ed15b13']}


Like vdsm-client, the library is meant for oVirt developers, and vdsm
API is private
implementation detail of oVirt, so you should try to use the public
ovirt engine API
instead.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3FCONBNYTX3KHBGGJ5RSZFZCGDSBQWEM/


[ovirt-users] Re: new host addition, Cannot find master domain

2021-12-08 Thread Nir Soffer
On Tue, Dec 7, 2021 at 10:04 AM david  wrote:
>
> hello
> i'm adding a new host into ovirt
> after the installation procces is finished I got an error: "VDSM kvm6 command 
> ConnectStoragePoolVDS failed: Cannot find master domain".
> on the vdsm server in /dev/disk/by-path/ i see the block device but 
> device-mapper not mappying it
> then I found the master's lun in the blacklist: 
> /etc/multipath/conf.d/vdsm_blacklist.conf
> why vdsm host put the lun wwn in blacklist

When adding a host, we run:

vdsm-tool config-lvm-filter

This command put the LUN in the blacklist since it seems to be a local
disk used by the host. Did you have active logical volumes from this LUN
mounted on the host while the host was added to engine?

After you fix the issue (see Vojta reply), please run again:

   vdsm-tool config-lvm-filter

The command may suggest to change the lvm filter, and blacklist the device.
Do not confirm and share the output.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NEO73WJYGVE7ZZDHOTFV4F32ZBXS3ZCQ/


[ovirt-users] Re: Attaching LVM logical volume to VM

2021-11-28 Thread Nir Soffer
On Sun, Nov 28, 2021 at 2:06 PM  wrote:
>
> Thanks for your reply. Greatly appreciated.
>
> > The oVirt way is to create a block storage domain, and create a raw
> > preallocated disk.
> > This will attach a logical volume to the VM.
>
> Problem is that I already have an existing LV that was used before.
> But I guess if I mount that LV on the host I could copy all the data over 
> through scp or similar to the VM with the newly created LV (which requires me 
> to have double the disk space).
>
> Do I understand correctly that a designated VG is used and LVs are created in 
> it and those are exposed as iSCSI volumes and mounted as such in the guest OS?
>

No. It works like this:

1. You configure iSCSI server (e.g. using targetcli), and export one
or more LUNs.
2. You create a iSCSI storage domain in oVirt, connecting to your iSCSI target
3. You select the LUN(s) that should be part of this storage domain
4. oVirt creates a VG with the specified LUN(s)
5. oVirt creates special logical volumes for management (~5G)
6. You create a new disk in oVirt on this storage domain
to use the same raw lv as you used before, use "allocation policy:
preallocated"
7. oVirt creates a logical volume in the VG

So if you want to use existing LV content, you need to create the disk
in oVirt, and then
you can copy the logical volume from the original LV to the new disk:

You need to find the logical volume created for the VM disk, can be done using:

   lvs -o lv_name,tags storage-domain-id | grep disk-id

Then you need to active both lvs, and copy the image:

   lvchange -ay old-vg/old-lv
   lvchange -ay storage-domain-id/lv-name
   qemu-img convert -p -f raw -O raw -t none -T none -n -W
/dev/old-vg/olv-lv dev/storage-domain-id/lv-name
   lvchange -an old-vg/old-lv
   lvchange -an storage-domain-id/lv-name

> The disk space is already allocated to VolumeGroups so I'd have to fiddle to 
> be able to get an extra VG for that storage domain unless I add a disk.

Note that the iSCSI storage domain must be accessible to all hosts in
the data center.
If not you will not be able to activate all the hosts.

Keeping the storage on one of the hypervisors is possible but not recommended.

> I prefer to keep it simple. Perhaps I should create an NFS share and mount 
> that in the guest but then again I'm not that fond of putting it on an NFS 
> share.
>
> >
> > The difference is that the logical volume is served by iSCSI or FC
> > storage server,
> > and you can migrate the VM between hosts.
> >
> > If you really want to use local host storage, the best way is to
> > create a local DC
> > that will include only this host, and use raw preallocated file on the
> > local storage
> > domain.
>
> That would kind of defeat the purpose for me. With the setup I had I could 
> directly mount the LV on the host.
> Perhaps I misunderstand what you're saying here.

oVirt does not support attaching LVs or a local file directly to VMs.

You can also continue to use libvirt and qemu to manage this VM. oVirt
will see it as
external VM running on the host, and it should not try to manage it.

The feature you need, attach logical volume to a VM does not exist, so you
need to either use what the system provides, or manage the VM outside
of the system.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VZGX6VBATSBPIHLGVI5LCTAFIXYMMHJ5/


[ovirt-users] Re: Attaching LVM logical volume to VM

2021-11-28 Thread Nir Soffer
On Sun, Nov 28, 2021 at 12:37 PM  wrote:
>
> Hi,
>
> I've just migrated from Libvirt/kvm to oVirt.
> In libvirt I had a VM that had an LVM logical volume that was attached to a 
> guest as a disk.
>
> However in oVirt I can't immediately find such a capability. I understand 
> that this would "pin" my VM to this host but that's perfectly fine.

Attaching local logical volume is not supported.

> Any pointers how this can be done?

The oVirt way is to create a block storage domain, and create a raw
preallocated disk.
This will attach a logical volume to the VM.

The difference is that the logical volume is served by iSCSI or FC
storage server,
and you can migrate the VM between hosts.

If you really want to use local host storage, the best way is to
create a local DC
that will include only this host, and use raw preallocated file on the
local storage
domain. The performance of LVM or XFS on LVM is almost the same, and some
operations like fallallocate(PUCH_HOLE) can be 100 times faster on XFS.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GS4OMA2ELI35K5RQ2CB6ZZH4DNLFB532/


[ovirt-users] Re: ILLEGAL volume delete via vdsm-client

2021-11-17 Thread Nir Soffer
On Wed, Nov 17, 2021 at 12:14 PM Francesco Lorenzini
 wrote:
>
> It worked now, my bad.
>
> I mistakenly thought that the Storage Pool ID was the ID of the SPM Host and 
> not the one listed under /rhev/data-center/.
>
> I used the correct SPUUID and everything worked correctly:
>
> vdsm-client Volume delete storagepoolID=609ff8db-09c5-435b-b2e5-023d57003138 
> volumeID=5cb3fe58-3e01-4d32-bc7c-5907a4f858a8 
> storagedomainID=e25db7d0-060a-4046-94b5-235f38097cd8 imageID=4d
> 79c1da-34f0-44e3-8b92-c4bcb8524d83 force=true postZero=False
> "d85d0a83-fec0-4472-87fd-e61c8c3e0608"
>
> Removed the task ID d85d0a83-fec0-4472-87fd-e61c8c3e0608 by  vdsm-client Task 
> clear taskID=d85d0a83-fec0-4472-87fd-e61c8c3e0608
>
> Sorry for the waste of time and thank you.

No problem, we are happy to help.

I think the error message could easily be improved - this is the relevant code:

@classmethod
def getPool(cls, spUUID):
if cls._pool.is_connected() and cls._pool.spUUID == spUUID:
return cls._pool

# Calling when pool is not connected or with wrong pool id is client
# error.
raise exception.expected(se.StoragePoolUnknown(spUUID))

We should really report different errors for unconnected pool and
incorrect pool id,
for example:

Incorrect pool id 'c0e7a0c5-8048-4f30-af08-cbd17d797e3b', connected to
pool '609ff8db-09c5-435b-b2e5-023d57003138'

If you think this is useful please file a bug to improve this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QMEQQB5KA2YBK4RH6J4FPJ4FM76KA45R/


[ovirt-users] Re: ILLEGAL volume delete via vdsm-client

2021-11-17 Thread Nir Soffer
On Wed, Nov 17, 2021 at 10:32 AM Francesco Lorenzini <
france...@shellrent.com> wrote:

> Hi Nir,
>
> sadly in the vdsm log there aren't more useful info:
>
> 2021-11-17 09:27:59,419+0100 INFO  (Reactor thread)
> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:56392
> (protocoldetector:61)
> 2021-11-17 09:27:59,426+0100 WARN  (Reactor thread) [vds.dispatcher]
> unhandled write event (betterAsyncore:184)
> 2021-11-17 09:27:59,427+0100 INFO  (Reactor thread)
> [ProtocolDetector.Detector] Detected protocol stomp from ::1:56392
> (protocoldetector:125)
> 2021-11-17 09:27:59,427+0100 INFO  (Reactor thread) [Broker.StompAdapter]
> Processing CONNECT request (stompserver:95)
> 2021-11-17 09:27:59,427+0100 INFO  (JsonRpc (StompReactor))
> [Broker.StompAdapter] Subscribe command received (stompserver:124)
> 2021-11-17 09:27:59,448+0100 INFO  (jsonrpc/0) [vdsm.api] START
> deleteVolume(sdUUID='e25db7d0-060a-4046-94b5-235f38097cd8',
> spUUID='c0e7a0c5-8048-4f30-af08-cbd17d797e3b',
> imgUUID='4d79c1da-34f0-44e3-8b92-c4bcb8524d83',
> volumes=['5cb3fe58-3e01-4d32-bc7c-5907a4f858a8'], postZero='False',
> force='true', discard=False) from=::1,56392,
> task_id=33647125-f8b5-494e-a8e8-17a6fc8aad4d (api:48)
> 2021-11-17 09:27:59,448+0100 INFO  (jsonrpc/0) [vdsm.api] FINISH
> deleteVolume error=Unknown pool id, pool not connected:
> ('c0e7a0c5-8048-4f30-af08-cbd17d797e3b',) from=::1,56392,
> task_id=33647125-f8b5-494e-a8e8-17a6fc8aad4d (api:52)
> 2021-11-17 09:27:59,448+0100 INFO  (jsonrpc/0) [storage.TaskManager.Task]
> (Task='33647125-f8b5-494e-a8e8-17a6fc8aad4d') aborting: Task is aborted:
> "value=Unknown pool id, pool not connected:
> ('c0e7a0c5-8048-4f30-af08-cbd17d797e3b',) abortedcode=309" (task:1182)
> 2021-11-17 09:27:59,448+0100 INFO  (jsonrpc/0) [storage.Dispatcher] FINISH
> deleteVolume error=Unknown pool id, pool not connected:
> ('c0e7a0c5-8048-4f30-af08-cbd17d797e3b',) (dispatcher:81)
> 2021-11-17 09:27:59,448+0100 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
> call Volume.delete failed (error 309) in 0.00 seconds (__init__:312)
>
>
> I know the vdsm-client is maintained by the infra team, should I open a
> new thread there?
>

No, this is not vdsm-client issue.

Can you share a complete vdsm log from this host?

The only possible reason this can fail is:
- pool id is incorrect (do you have multiple DCs?)
- the host is not connected to storage (is host UP in engine?)

Correction for my previous reply - the fact that Volume.getInfo() works
does not
prove that the pool is correct or the host is connected to storage. This
call does
not check the pool id and does not depend on host being active.

If the host is connected to the pool, you should see:

# ls -lh /rhev/data-center/c0e7a0c5-8048-4f30-af08-cbd17d797e3b/mastersd/
total 8.0K
drwxr-xr-x.  2 vdsm kvm  103 Jun  4 01:49 dom_md
drwxr-xr-x. 67 vdsm kvm 4.0K Nov 16 16:37 images
drwxr-xr-x.  5 vdsm kvm 4.0K Oct 10 13:33 master

Francesco
>
> Il 16/11/2021 18:19, Nir Soffer ha scritto:
>
> On Tue, Nov 16, 2021 at 4:07 PM francesco--- via Users  
>  wrote:
>
> Hi all,
>
> I'm trying to delete via vdsm-client toolan illegal volume that is not listed 
> in the engine database. The volume ID is 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8:
>
> [root@ovirthost ~]# vdsm-tool dump-volume-chains 
> e25db7d0-060a-4046-94b5-235f38097cd8
>
> Images volume chains (base volume first)
>
>image:4d79c1da-34f0-44e3-8b92-c4bcb8524d83
>
>  Error: more than one volume pointing to the same parent volume 
> e.g: (_BLANK_UUID<-a), (a<-b), (a<-c)
>
>  Unordered volumes and children:
>
>  - ---- <- 
> 5aad30c7-96f0-433d-95c8-2317e5f80045
>status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, 
> type: SPARSE, capacity: 214748364800, truesize: 165493616640
>
>  - 5aad30c7-96f0-433d-95c8-2317e5f80045 <- 
> 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8
>status: OK, voltype: LEAF, format: COW, legality: ILLEGAL, 
> type: SPARSE, capacity: 214748364800, truesize: 8759619584
>
>  - 5aad30c7-96f0-433d-95c8-2317e5f80045 <- 
> 674e85d8-519a-461f-9dd6-aca44798e088
>status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: 
> SPARSE, capacity: 214748364800, truesize: 200704
>
> With the command vdsm-client Volume getInfo I can retrieve the info about the 
> volume 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8:
>
>  vdsm-client Volume getInfo 
> storagepoolID=c0e7a0c5-8048-4f30-af08-cbd17d797e3b 
> volumeID=5cb3fe58-3e01-4d32-bc7c-5907a4f858a8 
> storagedomainID=e25db7d0-060a-4046-94b5-235f38097cd8 
> imageID=4d79c1da-34f0-44e3-8b92-c4bcb8524d83
> {
> &

[ovirt-users] Re: ILLEGAL volume delete via vdsm-client

2021-11-16 Thread Nir Soffer
On Tue, Nov 16, 2021 at 4:07 PM francesco--- via Users  wrote:
>
> Hi all,
>
> I'm trying to delete via vdsm-client toolan illegal volume that is not listed 
> in the engine database. The volume ID is 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8:
>
> [root@ovirthost ~]# vdsm-tool dump-volume-chains 
> e25db7d0-060a-4046-94b5-235f38097cd8
>
> Images volume chains (base volume first)
>
>image:4d79c1da-34f0-44e3-8b92-c4bcb8524d83
>
>  Error: more than one volume pointing to the same parent volume 
> e.g: (_BLANK_UUID<-a), (a<-b), (a<-c)
>
>  Unordered volumes and children:
>
>  - ---- <- 
> 5aad30c7-96f0-433d-95c8-2317e5f80045
>status: OK, voltype: INTERNAL, format: COW, legality: LEGAL, 
> type: SPARSE, capacity: 214748364800, truesize: 165493616640
>
>  - 5aad30c7-96f0-433d-95c8-2317e5f80045 <- 
> 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8
>status: OK, voltype: LEAF, format: COW, legality: ILLEGAL, 
> type: SPARSE, capacity: 214748364800, truesize: 8759619584
>
>  - 5aad30c7-96f0-433d-95c8-2317e5f80045 <- 
> 674e85d8-519a-461f-9dd6-aca44798e088
>status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: 
> SPARSE, capacity: 214748364800, truesize: 200704
>
> With the command vdsm-client Volume getInfo I can retrieve the info about the 
> volume 5cb3fe58-3e01-4d32-bc7c-5907a4f858a8:
>
>  vdsm-client Volume getInfo 
> storagepoolID=c0e7a0c5-8048-4f30-af08-cbd17d797e3b 
> volumeID=5cb3fe58-3e01-4d32-bc7c-5907a4f858a8 
> storagedomainID=e25db7d0-060a-4046-94b5-235f38097cd8 
> imageID=4d79c1da-34f0-44e3-8b92-c4bcb8524d83
> {
> "apparentsize": "8759676160",
> "capacity": "214748364800",
> "children": [],
> "ctime": "1634958924",
> "description": "",
> "disktype": "DATA",
> "domain": "e25db7d0-060a-4046-94b5-235f38097cd8",
> "format": "COW",
> "generation": 0,
> "image": "4d79c1da-34f0-44e3-8b92-c4bcb8524d83",
> "lease": {
> "offset": 0,
> "owners": [],
> "path": 
> "/rhev/data-center/mnt/ovirthost.com:_data/e25db7d0-060a-4046-94b5-235f38097cd8/images/4d79c1da-34f0-44e3-8b92-c4bcb8524d83/5cb3fe58-3e01-4d32-bc7c-5907a4f858a8.lease",
> "version": null
> },
> "legality": "ILLEGAL",
> "mtime": "0",
> "parent": "5aad30c7-96f0-433d-95c8-2317e5f80045",
> "pool": "",
> "status": "ILLEGAL",
> "truesize": "8759619584",
> "type": "SPARSE",
> "uuid": "5cb3fe58-3e01-4d32-bc7c-5907a4f858a8",
> "voltype": "LEAF"
> }
>
> I can't remove it due to the following error:
>
> vdsm-client Volume delete storagepoolID=c0e7a0c5-8048-4f30-af08-cbd17d797e3b 
> volumeID=5cb3fe58-3e01-4d32-bc7c-5907a4f858a8 
> storagedomainID=e25db7d0-060a-4046-94b5-235f38097cd8 
> imageID=4d79c1da-34f0-44e3-8b92-c4bcb8524d83 force=true
> vdsm-client: Command Volume.delete with args {'storagepoolID': 
> 'c0e7a0c5-8048-4f30-af08-cbd17d797e3b', 'volumeID': 
> '5cb3fe58-3e01-4d32-bc7c-5907a4f858a8', 'storagedomainID': 
> 'e25db7d0-060a-4046-94b5-235f38097cd8', 'imageID': 
> '4d79c1da-34f0-44e3-8b92-c4bcb8524d83', 'force': 'true'} failed:
> (code=309, message=Unknown pool id, pool not connected: 
> ('c0e7a0c5-8048-4f30-af08-cbd17d797e3b',))

If the pool id works in Volume getInfo, it should be correct.

>
> I'm performing the operation directly on the SPM. I searched for a while but 
> I didn't find anything usefull. Any tips or doc that I missed?

What you do looks right. Did you look in vdsm.log? I'm sure there is
more info there
about the error.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4GKETEMAYDMHF2KU6T6FFDSX7KAN6EXV/


[ovirt-users] Re: cloning a VM or creating a template speed is so so slow

2021-11-14 Thread Nir Soffer
On Thu, Nov 11, 2021 at 4:33 AM Pascal D  wrote:
>
> I have been trying to figure out why cloning a VM and creating a template 
> from ovirt is so slow. I am using ovirt 4.3.10 over NFS. My NFS server is 
> running NFS 4 over RAID10 with SSD disks over a 10G network and 9000 MTU
>
> Therocially I should be writing a 50GB file in around 1m30s
> a direct copy from the SPM host server of an image to another image on the 
> same host takes 6m34s
> a cloning from ovirt takes around 29m
>
> So quite a big difference. Therefore I started investigating and found that 
> ovirt launches a qemu-img process with no source and target cache. Therefore 
> thinking that could be the issue, I change the cache mode to writeback and 
> was able to run the exact command in 8m14s. Over 3 times faster. I haven't 
> tried yet other parameters line -o preallocation=metadata

-o preallocation=metadata may work for files, we don't use it since it is not
compatible with block storage (requires allocation of the entire
volume upfront).

> but was wondering why no cache was selected and how to change it to use cache 
> writeback

We don't use the host page cache. There are several issues;

- reading stale data after another host change an image on shared storage
  this should probably not happen with NFS.

- writing to the page cache pollute the page cache with data that is unlikely to
  be needed, since vms also do not use the page cache (for other reasons).
  so you may reclaim memory that should be used by your vms during the copy.

- the kernel like to buffer huge amount of data, and flush too much
data at the same
  time. This cause delays in accessing storage during flushing. This
is may break
  sanlock leases that must have access to storage to update the storage leases.

We improved copy performance a few years ago using the -W option, allowing
concurrent writes. This can speed up copy to block storage (iscsi/fc)
up to 6 times[1].

When we tested this with NFS, we did not see big improvement, so we
did not enable
it. It also recommended to use -W for raw preallocated disk, since it may cause
fragmentation.

You can try to change this in vdsm/storage/sd.py:

 396 def recommends_unordered_writes(self, format):
 397 """
 398 Return True if unordered writes are recommended for
copying an image
 399 using format to this storage domain.
 400
 401 Unordered writes improve copy performance but are
recommended only for
 402 preallocated devices and raw format.
 403 """
 404 return format == sc.RAW_FORMAT and not self.supportsSparseness

This allows -W only on raw preallocated disks. So it will not be used for
raw-sparse (NFS thin) or qcow2-sparse (snapshots on NFS), or for
qcow2 on block storage.

We use unordered writes for any disk in ovirt-imageio, and other tools
like nbdcopy
also always enable unordered writes, so maybe we should enable it in all cases.

To enable unordered writes for any volume, change this to:

def recommends_unordered_writes(self, format):
"""
Allow unordered writes only storage in any format.
"""
return True

If you want to always enable this only for file storage (NFS, GlsuterFS,
LocalFS, posix) add this method in vdsm/storage/nfsSD.py:

class FileStorageDomainManifest(sd.StorageDomainManifest):
...
def recommends_unordered_writes(self, format):
"""
Override StorageDomainManifest to allow on on qcow2 and raw
sparse images.
"""
return True

Please report how it works for you.

If this give good results, file a bug to enable option.

I think we can enable this based on vdsm configuration, so it will be
easy to disable
the option if it causes trouble with some storage domain types or image formats.

> command launched by ovirt:
>  /usr/bin/qemu-img convert -p -t none -T none -f qcow2 
> /rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
>  -O qcow2 -o compat=1.1 
> /rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9

With the change suggested, this command will become:

/usr/bin/qemu-img convert -p -t none -T none -f qcow2
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
-O qcow2 -o compat=1.1 -W
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9

You can test this in the shell without modifying vdsm to test how it
affects performance.

[1] https://bugzilla.redhat.com/1511891#c57

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 

[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-14 Thread Nir Soffer
On Wed, Nov 10, 2021 at 4:46 PM Chris Adams  wrote:
>
> I have seen vdsmd leak memory for years (I've been running oVirt since
> version 3.5), but never been able to nail it down.  I've upgraded a
> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
> still see it happen.  One host in the cluster, which has been up 8 days,
> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
> around half a gigabyte.

Can you share vdsm logs from the time vdsm started?

We have these logs:

2021-11-14 15:16:32,956+0200 DEBUG (health) [health] Checking health (health:93)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] Collected 5001
objects (health:101)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] user=2.46%,
sys=0.74%, rss=108068 kB (-376), threads=47 (health:126)
2021-11-14 15:16:32,977+0200 INFO  (health) [health] LVM cache hit
ratio: 97.64% (hits: 5431 misses: 131) (health:131)

They may provide useful info on the leak.

You need to enable DEBUG logs for root logger in /etc/vdsm/logger.conf:

[logger_root]
level=DEBUG
handlers=syslog,logthread
propagate=0

and restart vdsmd service.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JDA34CQF5FTHVFTRXF4OGKEFJIKJL3NL/


  1   2   3   4   5   6   7   8   9   10   >