[ovirt-users] VMConsole: Certificate invalid: name is not a listed principal

2021-04-01 Thread Stefan Seifried
Hi,

I recently tried out the vm console feature on an ovirt eval environment 
running 4.4.4.7-1.el8:
1.) pasted the pub-key of my local user into the web interface (User->Options)
2.) connected via ssh like so
ssh -p  sevm -l ovirt-vmconsole
3.) got the list of all the vm's :)
4.) choose a vm with a virtio serial console enabled
5.) Certificate invalid :(
---
Please, enter the id of the Serial Console you want to connect to.
To disconnect from a Serial Console, enter the sequence: <~><.>
SELECT> 24
Certificate invalid: name is not a listed principal
Host key verification failed.
Connection to sevm closed.
---

I guess somethings wrong in "/etc/pki/ovirt-vmconsole". Is there any additional 
information I can get out from the logs or the console proxy (e.g. the "name" 
which is not a listed principal)? 
To be honest I never really worked with SSH certificates, so I fear I will do 
something stupid if I try to fix this head on.

So any advice or help on this issue is appreciated.

Thanks,
Stefan
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ENHIQ3O5JDA7URHXLQCHGRICSFYFL3Y5/


[ovirt-users] Standard operating procedure for a node failure on HCI required

2021-04-01 Thread Thomas Hoberg
oVirt may have started as a vSphere 'look-alike', but it graduated to a Nutanix 
'clone', at least in terms of marketing.

IMHO that means the 3-node hyperconverged default oVirt setup (2 replicas and 1 
arbiter) deserves special love in terms of documenting failure scenarios. 

3-node HCI is supposed to defend you against long-term effects of any single 
point of failure. There is no protection against the loss of dynamic 
state/session data, but state-free services should recover or resume: that's 
what it's all about.

Sadly, what I find missing in the oVirt and Gluster documentation is an SOP 
(standard operating procedure) that one should follow in case of a 
late-night/early-morning on-call wakeup when one of those three HCI nodes 
should have failed... dramatically or via a 'brown out' e.g. where only the 
storage part was actually lost.

My impression is that the oVirt and Gluster teams are barely talking, but in 
HCI that's fatal.

And I sure can't find those recovery procedures, not even in the commercial RH 
documents.

So please, either add them or show me where I missed them.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZFFH2U2RM2R3POGHXUZ3MLI4FB4BVLL/


[ovirt-users] Re: 4.4 -> 4.3

2021-04-01 Thread Thomas Hoberg
I personally consider the fact that you gave up on 4.3/CentOS7 before CentOS 8 
could have even been remotely reliable to run "a free open-source 
virtualization solution for your entire enterprise", a rather violent break of 
trust.

I understand Redhat's motivation with Python 2/3 etc., but users just don't. 
Please just try for a minute to view this from a user's perspective.

With CentOS 7 supported until 2024, we naturally expect the added value on top 
via oVirt to persist just as long.

And with CentOS 8 support lasting until the end of this year, oVirt 4.4 can't 
be considered "Petrus" or a rock to build on.

Most of us run oVirt simply because we are most interested in the VMs it runs 
(tenants paying rent).

We're not interested in keeping oVirt itself stable and from failing after any 
update to the house of cards.

And yes, by now I am sorry to have chosen oVirt at all, finding that 4.3 was 
abandonend before 4.4 or the CentOS 8 below was even stable and long before the 
base OS ran out of support.

To the users out there oVirt is a platform, a tool, not a means to itself.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JHSFTNGNOMYNCE4H2CF55DXIXGMCASMK/


[ovirt-users] Re: Power failure makes cluster and hosted engine unusable

2021-04-01 Thread Thomas Hoberg
I am glad you got it done!

I find that oVirt resembles more an adventure game (with all its huge emotional 
rewards, once you prevail), than a streamlined machine, that just works every 
time you push a button.

Those are boring, sure, but really what I am looking for when the mission is to 
run infrastructure.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RCISGDFBMKJQASAXMAD7SFUDMWBQTS26/


[ovirt-users] Re: Power failure makes cluster and hosted engine unusable

2021-04-01 Thread Seann G. Clark via Users
Following up on this, I was able to recover everything, with only minor (and 
easy to fix) data loss.

The old hosted engine refused to come up, ever after a few hours of sitting. 
That is when I dug into the issue and found the agent service stating the image 
didn't exist/no such file or directory. It seems that was just one aspect of 
storage being impacted from the unexpected outage.

In regards to the memory issue, I was only getting it on one host, but was able 
to install, and recover, on another host in my cluster without the issue.

The broken host has this version of ansible's engine setup package:
ansible-2.9.18-1.el7.noarch
ovirt-ansible-hosted-engine-setup-1.0.32-1.el7.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-hosted-engine-setup-2.3.13-1.el7.noarch

The one that works is:
ansible-2.8.3-1.el7.noarch
ovirt-ansible-hosted-engine-setup-1.0.26-1.el7.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7.noarch
ovirt-hosted-engine-setup-2.3.11-1.el7.noarch

All of the SANLOCK issues I saw before, were remediated on the new deployment 
and recovery of the cluster as well.

Regards,
Seann

From: Roman Bednar
Sent: Thursday, April 01, 2021 6:07 AM
To: Thomas Hoberg 
Cc: users@ovirt.org
Subject: [ovirt-users] Re: Power failure makes cluster and hosted engine 
unusable

Hi Thomas,

Thanks for looking into this, the problem is really somewhere around this tasks 
file. However I just tried faking the memory values directly inside the tasks 
file to something way higher and everything looks fine. I think the problem 
resides in registering the output of the "free -m" at the beginning of this 
file. There are also debug tasks which print registered values from the shell 
commands where we could take a closer look, see if it looks normal (stdout 
mainly).

This part that of the output that Seann provided seems particularly strange: 
Available memory ( {'failed': False, 'changed': False, 'ansible_facts': 
{u'max_mem': u'180746'}}MB )

Normally it should just show the exact value/string, here we're getting some 
dictionary from python most likely. I'd check if the latest version of ansible 
is installed and see if this can be reproduced if there was an update available.

If the issue persists please provide full log of the ansible run (ideally with 
-).


-Roman

On Wed, Mar 31, 2021 at 9:19 PM Thomas Hoberg 
mailto:tho...@hoberg.net>> wrote:
Roman, I believe the bug is in 
/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/pre_checks/validate_memory_size.yml

  - name: Set Max memory
set_fact:
  max_mem: "{{ free_mem.stdout|int + cached_mem.stdout|int - 
he_reserved_memory_MB + he_avail_memory_grace_MB }}"


If these lines are casting the result of `free -m` into 'int', that seems to 
fail at bigger RAM sizes.

I wound up having to delete all the available memory checks from that file to 
have the wizard progress on a machine with 512GB of RAM.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org
Privacy Statement: 
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CARDJXYUPFUFJT2VE2UNXELL2PSUZSPS/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
htt

[ovirt-users] [ANN] oVirt 4.4.6 Second Release Candidate is now available for testing

2021-04-01 Thread Lev Veyde
oVirt 4.4.6 Second Release Candidate is now available for testing

The oVirt Project is pleased to announce the availability of oVirt 4.4.6
Second Release Candidate for testing, as of April 1st, 2021.

This update is the sixth in a series of stabilization updates to the 4.4
series.
How to prevent hosts entering emergency mode after upgrade from oVirt 4.4.1

Note: Upgrading from 4.4.2 GA or later should not require re-doing these
steps, if already performed while upgrading from 4.4.1 to 4.4.2 GA. These
are only required to be done once.

Due to Bug 1837864  -
Host enter emergency mode after upgrading to latest build

If you have your root file system on a multipath device on your hosts you
should be aware that after upgrading from 4.4.1 to 4.4.6 you may get your
host entering emergency mode.

In order to prevent this be sure to upgrade oVirt Engine first, then on
your hosts:

   1.

   Remove the current lvm filter while still on 4.4.1, or in emergency mode
   (if rebooted).
   2.

   Reboot.
   3.

   Upgrade to 4.4.6 (redeploy in case of already being on 4.4.6).
   4.

   Run vdsm-tool config-lvm-filter to confirm there is a new filter in
   place.
   5.

   Only if not using oVirt Node:
   - run "dracut --force --add multipath” to rebuild initramfs with the
   correct filter configuration
   6.

   Reboot.

Documentation

   -

   If you want to try oVirt as quickly as possible, follow the instructions
   on the Download  page.
   -

   For complete installation, administration, and usage instructions, see
   the oVirt Documentation .
   -

   For upgrading from a previous version, see the oVirt Upgrade Guide
   .
   -

   For a general overview of oVirt, see About oVirt
   .

Important notes before you try it

Please note this is a pre-release build.

The oVirt Project makes no guarantees as to its suitability or usefulness.

This pre-release must not be used in production.
Installation instructions

For installation instructions and additional information please refer to:

https://ovirt.org/documentation/

This release is available now on x86_64 architecture for:

* Red Hat Enterprise Linux 8.3 or newer

* CentOS Linux (or similar) 8.3 or newer

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:

* Red Hat Enterprise Linux 8.3 or newer

* CentOS Linux (or similar) 8.3 or newer

* oVirt Node 4.4 based on CentOS Linux 8.3 (available for x86_64 only)

See the release notes [1] for installation instructions and a list of new
features and bugs fixed.

Notes:

- oVirt Appliance is already available for CentOS Linux 8

- oVirt Node NG is already available for CentOS Linux 8

- We found a few issues while testing on CentOS Stream so we are still
basing oVirt 4.4.6 Node and Appliance on CentOS Linux.

Additional Resources:

* Read more about the oVirt 4.4.6 release highlights:
http://www.ovirt.org/release/4.4.6/

* Get more oVirt project updates on Twitter: https://twitter.com/ovirt

* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/


[1] http://www.ovirt.org/release/4.4.6/

[2] http://resources.ovirt.org/pub/ovirt-4.4-pre/iso/

-- 

Lev Veyde

Senior Software Engineer, RHCE | RHCVA | MCITP

Red Hat Israel



l...@redhat.com | lve...@redhat.com

TRIED. TESTED. TRUSTED. 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2MRJ4UJPNBOP6S3MX6ZDS6B5R3RQOADN/


[ovirt-users] New Setup - hosts - are not synchronized with their Logical Network configuration: ovirtmgmt.

2021-04-01 Thread morgan cox
Hi

After installing new hosts they show as status 'up'. however I get a warning 
such as 

Host ng2-ovirt-kvm4's following network(s) are not synchronized with their 
Logical Network configuration: ovirtmgmt.

Which means ovirtmgmt is not working.

I realise the issue is due to no network config for it - its set to DHCP and 
there is no server.

The way I have setup the network on the hosts is

(I used the ovirt-node 4.4.4 iso to install the nodes)

/var/lib/vdsm/persistence/
├── netconf -> /var/lib/vdsm/persistence/netconf.JkcBbtya
└── netconf.JkcBbtya
├── bonds
│   └── bond0
├── devices
└── nets
├── ovirtmgmt
└── td-hv

td-hv - is a bonded bridge + VLAN and has the network config with correct ip 
and where the firewall access is allowing the ports used for ovirt  (see config 
below)

Can I make the existing td-hv profile replace ovirtmgmt ? (i.e add ovirtmgmt 
functionality to existing network device td-hv) 

Or should I use the existing settings from td-hv and add them to ovirtmgmt and 
remove td-hv ?

Config below  - any advice would be great 

---
/etc/sysconfig/network-scripts/ifcfg-bond0.1700

BONDING_OPTS="mode=active-backup downdelay=0 miimon=100 updelay=0"
TYPE=Bond
BONDING_MASTER=yes
NAME=bond0
UUID=5ab633aa-cd30-4ca8-9109-dbb4541f039b
DEVICE=bond0
ONBOOT=yes
HWADDR=
MACADDR=3C:A8:2A:15:F6:12
MTU=1500
LLDP=no
BRIDGE=ovirtmgmt
---

---
[root@ng2-ovirt-kvm4 mcox]# cat /etc/sysconfig/network-scripts/ifcfg-bond0.1700 
VLAN=yes
TYPE=Vlan
PHYSDEV=bond0
VLAN_ID=1700
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
NAME=bond0.1700
UUID=e2b318e8-83af-4573-a637-fe20326f2c1a
DEVICE=bond0.1700
ONBOOT=yes
MTU=1500
LLDP=no
BRIDGE=td-hv
---

---
[root@ng2-ovirt-kvm4 mcox]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
STP=no
TYPE=Bridge
HWADDR=
MTU=1500
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
DHCP_CLIENT_ID=mac
IPV4_DHCP_TIMEOUT=2147483647
IPV4_FAILURE_FATAL=no
IPV6_DISABLED=yes
IPV6INIT=no
NAME=ovirtmgmt
UUID=202586b0-ebe7-48ae-b316-e2cfd2dc4cc8
DEVICE=ovirtmgmt
ONBOOT=yes
AUTOCONNECT_SLAVES=yes
LLDP=no
---





___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LH4ZIX7JVXQIBZHNL6OSCQXWJMB36BAL/


[ovirt-users] Re: Locked disks

2021-04-01 Thread Michal Skrivanek


> On 31. 3. 2021, at 16:29, Giulio Casella  wrote:
> 
> FYI: after upgrading to ovirt 4.4.5 (both manager and ovirt nodes) the
> issue seems fixed (or at least it didn't happen in about a week, with
> 4.4.4 I had the problem every couple of day).

likely thanks to https://bugzilla.redhat.com/show_bug.cgi?id=1796124
There were bunch of other related changes in 4.4.5.
It’s probably not entirely solved yet, but it’s good to see it already helped.
There will be further improvements in 4.4.6 hopefully, and also from 
storware/vprotect side.

> 
> Regards,
> gc
> 
> 
> On 04/02/2021 09:08, Giulio Casella wrote:
>> I've not been able to reproduce, if happens again I'll submit a bugzilla.
>> 
>> Thank you.
>> 
>> Regards,
>> gc
>> 
>> 
>> On 03/02/2021 17:49, Shani Leviim wrote:
>>> In such a case, the disks shouldn't remain locked - sounds like a bug.
>>> This one requires a deeper look.
>>> If you're able to reproduce it again, please open a bug in Bugzilla
>>> (https://bugzilla.redhat.com ) with engine
>>> and vdsm logs,
>>> so we'll be able to investigate it.
>>> 
>>> *Regards,
>>> *
>>> *Shani Leviim
>>> *
>>> 
>>> 
>>> On Wed, Feb 3, 2021 at 5:39 PM Giulio Casella >> > wrote:
>>> 
>>>Hi,
>>>I tried unlock_entity.sh, and it solved the issue. So far so good.
>>> 
>>>But it's still unclear why disks were locked.
>>> 
>>>Let me make an hypothesis: in ovirt 4.3 a failure in snapshot removal
>>>would lead to a snapshot in illegal status. No problem, you can remove
>>>again and the situation is fixed.
>>>In ovirt 4.4 a failure in snapshot removal leave the whole disk in
>>>locked state (maybe a bug?), preventing any further action.
>>> 
>>>Does it make sense?
>>> 
>>> 
>>>On 03/02/2021 12:25, Giulio Casella wrote:
 Hi Shani,
 no tasks listed in UI, and now "taskcleaner.sh -o" reports no task
>>>(just
 before I gave "taskcleaner.sh -r").
 But disks are still locked, and "unlock_entity.sh -q -t all -c"
 (accordingly) reports only two disk's uuid (with their vm's uuid).
 
 Time to give a chance to unlock_entity.sh?
 
 Regards,
 gc
 
 On 03/02/2021 11:52, Shani Leviim wrote:
> Hi Giulio,
> Before running unlock_entity.sh, let's try to find if there's any
>>>task
> in progress.
> Is there any hint on the events in the UI?
> Or try to run [1]:
> ./taskcleaner.sh -o  
> 
> Also, you can verify what entities are locked [2]:
> ./unlock_entity.sh -q -t all -c
> 
> [1]
> 
>>>
>>> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
>>>
>>> 
> 
>>>
>>> >>
>>> >
> [2]
> 
>>>
>>> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh
>>>
>>> 
> 
>>>
>>> >>
>>> >
> 
> *Regards,
> *
> *Shani Leviim
> *
> 
> 
> On Wed, Feb 3, 2021 at 10:43 AM Giulio Casella
>>>mailto:giu...@di.unimi.it>
> >> wrote:
> 
>  Since yesterday I found a couple VMs with locked disk. I
>>>don't know the
>  reason, I suspect some interaction made by our backup system
>>>(vprotect,
>  snapshot based), despite it's working for more than a year.
> 
>  I'd give a chance to unlock_entity.sh script, but it reports:
> 
>  CAUTION, this operation may lead to data corruption and
>>>should be used
>  with care. Please contact support prior to running this command
> 
>  Do you think I should trust? Is it safe? VMs are in production...
> 
>  My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are
>>>oVirt Node
>  4.4.4
> 
> 
>  TIA,
>  Giulio
>  ___
>  Users mailing list -- users@ovirt.org
>>> >>>
>  To unsubscribe send an email to users-le...@ovirt.org
>>>
>  >
>  Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>
>  >>

[ovirt-users] intel_iommu=on kvm-intel.nested=1 deactivates rp_filter kernel option

2021-04-01 Thread Nathanaël Blanchet

Hello,

I have two kind of hosts:

 * some with default ovirt node 4.4 kernel settings
 * some with custom kernel settings including intel_iommu=on
   kvm-intel.nested=1

I can't open vm console from the second category host when binding from 
a different vlan because the host is unreacheable


But if I set sysctl -w net.ipv4.conf.all.rp_filter=2, I can bind the 
host and open a vm console.


I didn't test if this behaviour is because of Hostdev Passthrough & 
SR-IOV or Nested Virtualization.


Is it an expected behaviour or a bug?

--
Nathanaël Blanchet

Supervision réseau
SIRE
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5   
Tél. 33 (0)4 67 54 84 55
Fax  33 (0)4 67 54 84 14
blanc...@abes.fr

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IMIAUNX2BTP3Q7MGX3ALMCOGOIN5IY4O/


[ovirt-users] Re: 4.4 -> 4.3

2021-04-01 Thread Michal Skrivanek
First and foremost that’s something we do not support if your VMs live in a 
4.4+ cluster level in your oVirt 4.4.
The upgrade path for VM configuration is one way only.

if it’s just a handful of VMs it might be way easier to just recreate those VMs 
and move disks

I also wonder what’s reason for running 4.3 these days, we really do not 
develop/patch it for 10 months by now.

Thanks,
michal

> On 31. 3. 2021, at 23:08, Thomas Hoberg  wrote:
> 
> Export domain should work, with the usual constraints that you have to 
> detach/attach the whole domain and you'd probably want to test with one or a 
> few pilot VMs first.
> 
> There could be issues with 'base' templates etc. for VMs that where created 
> as new on 4.4: be sure to try every machine type first. Ideally you have 4.4 
> and 4.3 farms side by side, instead of rebasing your hosts on 4.3 and *then* 
> finding issues. Things to watch out for are hardware base lines (those 
> mitigation-enhanced CPU types can be nasty), BIOS types (Q35 vs. all others) 
> etc.
> 
> Personally I see OVA files as something that should be least risky and 
> minimal functionality that just ought to always work. The oVirt team doesn't 
> seem to share my opinion and views OVA as a VMware->oVirt migration tool, 
> mostly.
> 
> I still try to use OVA export/import for critical VMs, because sometimes it 
> means I can at least resurrect them on a stand-alone KVM host (even VMware 
> should work in theory: in practice I've seen both VMware and VirtualBox barf 
> at oVirt generated OVA exports).
> 
> Note that there is an issue with OVA exports from oVirt 4.3: They can result 
> in empty disks due to a race condition that wasn't fixed even with the last 
> 4.3 release. In your case, that shouldn't bite you, as you are moving in the 
> other direction. But should you decide to go forward again be sure to check 
> your 4.3 OVA exports via 'du -h ' showing more than a few KB of 
> actuall alloaction vs. the potentially multi-TB 'sparse' disk full of zeros 
> 'ls -l' might hint at.
> 
> With oVirt I consider blind faith as extremely ill advised. Everything you 
> haven't tested several times after every change of every component yourself, 
> is much more likely to fail than you ever thought befit a product that 
> carries "a free open-source virtualization solution for your entire 
> enterprise" on its home page.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/GDJSP7UVVAORL3WBPKTRZUNZ7SRJ46E6/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JBUBR35AM6W5GOTC432DY4JDS6X4ARHM/


[ovirt-users] Re: Power failure makes cluster and hosted engine unusable

2021-04-01 Thread Roman Bednar
Hi Thomas,

Thanks for looking into this, the problem is really somewhere around this
tasks file. However I just tried faking the memory values directly inside
the tasks file to something way higher and everything looks fine. I think
the problem resides in registering the output of the "free -m" at the
beginning of this file. There are also debug tasks which print registered
values from the shell commands where we could take a closer look, see if it
looks normal (stdout mainly).

This part that of the output that Seann provided seems particularly
strange: Available memory ( {'failed': False, 'changed': False,
'ansible_facts': {u'max_mem': u'180746'}}MB )

Normally it should just show the exact value/string, here we're getting
some dictionary from python most likely. I'd check if the latest version of
ansible is installed and see if this can be reproduced if there was an
update available.

If the issue persists please provide full log of the ansible run (ideally
with -).


-Roman

On Wed, Mar 31, 2021 at 9:19 PM Thomas Hoberg  wrote:

> Roman, I believe the bug is in
> /usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/pre_checks/validate_memory_size.yml
>
>   - name: Set Max memory
> set_fact:
>   max_mem: "{{ free_mem.stdout|int + cached_mem.stdout|int -
> he_reserved_memory_MB + he_avail_memory_grace_MB }}"
>
>
> If these lines are casting the result of `free -m` into 'int', that seems
> to fail at bigger RAM sizes.
>
> I wound up having to delete all the available memory checks from that file
> to have the wizard progress on a machine with 512GB of RAM.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/CARDJXYUPFUFJT2VE2UNXELL2PSUZSPS/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTFXXEZDZ6V6RHBYDSGIBZ7B2DAFQHHC/