[ovirt-users] Unable to install OVirt 4.4.6 on IvyBridge

2021-11-14 Thread clin . yolo
Hi, I'm attempting to build an oVirt node on a server. The host is an RHEL 8.4, 
I'm attempting to install OVirt 4.4.6 as per  instructions at 
https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html.
 I'm receiving the following error:

[ INFO  ] The host has been set in non_operational status, deployment errors:   
code 156: Host redacted.com moved to Non-Operational state as host CPU type is 
not supported in this cluster compatibility version or is not supported at all, 
   code 9000: Failed to verify Power Management configuration for Host 
redacted.com.,
[ INFO  ] skipping: [localhost]
[ INFO  ] You can now connect to https://redacted.com:6900/ovirt-engine/ and 
check the status of this host and eventually remediate it, please continue only 
when the host is listed as 'up'

Now, I note that initially lm_sensors was unable to detect CPU temperatures, 
which was subsequently resolved (I don't recall how), however the issue still 
remains.

The CPU is an Xeon E5-2630 (IvyBridge). I cannot see any definitive CPU support 
catalog, so I'm unsure if this is no longer supported.

Within engine-logs-2021-11-13T12:22:10Z/log/ovirt-engine/engine.log, there were 
occurrences of `IvyBridge-IBRS`. From what I 
can tell, spectre/meltdown bugs have been mitigated:

```
egrep -e "model|cpu family|stepping|microcode" /proc/cpuinfo | sort | uniq
cpu family  : 6
microcode   : 0x42e
model   : 62
model name  : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
stepping: 4
```

In the installation documents, IvyBridge seems supported but the evidence above 
suggests it may not be. Can anyone advise on if this processor is _actually_ 
supported, and if so how I can remediate the issue? Thanks! 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GJS4GPYH7WG2YE444VQPYTWUYTX66JXM/


[ovirt-users] Re: cloning a VM or creating a template speed is so so slow

2021-11-14 Thread Nir Soffer
On Thu, Nov 11, 2021 at 4:33 AM Pascal D  wrote:
>
> I have been trying to figure out why cloning a VM and creating a template 
> from ovirt is so slow. I am using ovirt 4.3.10 over NFS. My NFS server is 
> running NFS 4 over RAID10 with SSD disks over a 10G network and 9000 MTU
>
> Therocially I should be writing a 50GB file in around 1m30s
> a direct copy from the SPM host server of an image to another image on the 
> same host takes 6m34s
> a cloning from ovirt takes around 29m
>
> So quite a big difference. Therefore I started investigating and found that 
> ovirt launches a qemu-img process with no source and target cache. Therefore 
> thinking that could be the issue, I change the cache mode to writeback and 
> was able to run the exact command in 8m14s. Over 3 times faster. I haven't 
> tried yet other parameters line -o preallocation=metadata

-o preallocation=metadata may work for files, we don't use it since it is not
compatible with block storage (requires allocation of the entire
volume upfront).

> but was wondering why no cache was selected and how to change it to use cache 
> writeback

We don't use the host page cache. There are several issues;

- reading stale data after another host change an image on shared storage
  this should probably not happen with NFS.

- writing to the page cache pollute the page cache with data that is unlikely to
  be needed, since vms also do not use the page cache (for other reasons).
  so you may reclaim memory that should be used by your vms during the copy.

- the kernel like to buffer huge amount of data, and flush too much
data at the same
  time. This cause delays in accessing storage during flushing. This
is may break
  sanlock leases that must have access to storage to update the storage leases.

We improved copy performance a few years ago using the -W option, allowing
concurrent writes. This can speed up copy to block storage (iscsi/fc)
up to 6 times[1].

When we tested this with NFS, we did not see big improvement, so we
did not enable
it. It also recommended to use -W for raw preallocated disk, since it may cause
fragmentation.

You can try to change this in vdsm/storage/sd.py:

 396 def recommends_unordered_writes(self, format):
 397 """
 398 Return True if unordered writes are recommended for
copying an image
 399 using format to this storage domain.
 400
 401 Unordered writes improve copy performance but are
recommended only for
 402 preallocated devices and raw format.
 403 """
 404 return format == sc.RAW_FORMAT and not self.supportsSparseness

This allows -W only on raw preallocated disks. So it will not be used for
raw-sparse (NFS thin) or qcow2-sparse (snapshots on NFS), or for
qcow2 on block storage.

We use unordered writes for any disk in ovirt-imageio, and other tools
like nbdcopy
also always enable unordered writes, so maybe we should enable it in all cases.

To enable unordered writes for any volume, change this to:

def recommends_unordered_writes(self, format):
"""
Allow unordered writes only storage in any format.
"""
return True

If you want to always enable this only for file storage (NFS, GlsuterFS,
LocalFS, posix) add this method in vdsm/storage/nfsSD.py:

class FileStorageDomainManifest(sd.StorageDomainManifest):
...
def recommends_unordered_writes(self, format):
"""
Override StorageDomainManifest to allow on on qcow2 and raw
sparse images.
"""
return True

Please report how it works for you.

If this give good results, file a bug to enable option.

I think we can enable this based on vdsm configuration, so it will be
easy to disable
the option if it causes trouble with some storage domain types or image formats.

> command launched by ovirt:
>  /usr/bin/qemu-img convert -p -t none -T none -f qcow2 
> /rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
>  -O qcow2 -o compat=1.1 
> /rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9

With the change suggested, this command will become:

/usr/bin/qemu-img convert -p -t none -T none -f qcow2
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/21f438fb-0c0e-4bdc-abb3-64a7e033cff6/c256a972-4328-4833-984d-fa8e62f76be8
-O qcow2 -o compat=1.1 -W
/rhev/data-center/mnt/nas1.bfit:_home_VMS/8e6bea49-9c62-4e31-a3c9-0be09c2fcdbf/images/5a90515c-066d-43fb-9313-5c7742f68146/ed6dc60d-1d6f-48b6-aa6e-0e7fb1ad96b9

You can test this in the shell without modifying vdsm to test how it
affects performance.

[1] https://bugzilla.redhat.com/1511891#c57

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 

[ovirt-users] Re: cloning a VM or creating a template speed is so so slow

2021-11-14 Thread Pascal DeMilly
Anyone drom redhat has any feedback on this. 3 times speed gain in cloning
or templating a VM makes a big difference imo

On Wed, Nov 10, 2021, 6:41 PM Pascal D  wrote:

> -o  preallocation=metadata brings it down to 7m40s
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTYIU2BFB6E22DUZVGZISJ7K2SJJCYS7/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SBTX3ADVNB6VLSNYK4NALFED4V2775YX/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-14 Thread Nir Soffer
On Wed, Nov 10, 2021 at 4:46 PM Chris Adams  wrote:
>
> I have seen vdsmd leak memory for years (I've been running oVirt since
> version 3.5), but never been able to nail it down.  I've upgraded a
> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
> still see it happen.  One host in the cluster, which has been up 8 days,
> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
> around half a gigabyte.

Can you share vdsm logs from the time vdsm started?

We have these logs:

2021-11-14 15:16:32,956+0200 DEBUG (health) [health] Checking health (health:93)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] Collected 5001
objects (health:101)
2021-11-14 15:16:32,977+0200 DEBUG (health) [health] user=2.46%,
sys=0.74%, rss=108068 kB (-376), threads=47 (health:126)
2021-11-14 15:16:32,977+0200 INFO  (health) [health] LVM cache hit
ratio: 97.64% (hits: 5431 misses: 131) (health:131)

They may provide useful info on the leak.

You need to enable DEBUG logs for root logger in /etc/vdsm/logger.conf:

[logger_root]
level=DEBUG
handlers=syslog,logthread
propagate=0

and restart vdsmd service.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JDA34CQF5FTHVFTRXF4OGKEFJIKJL3NL/