[ovirt-users] Re: self hosted-engine deploy fails on network

2021-02-03 Thread marcel d'heureuse
moin,

i have learned to install a self hosted engine directly to the physical 
interfaces.

later you can move it with the hosted engine into the different bonds or vlans. 

it works fine by me round 20 times.

br
marcel


Am 3. Februar 2021 19:56:47 MEZ schrieb Nardus Geldenhuys :
>Hi oVirt land
>
>Hope you are well. Running into this issue, I hope you can help.
>
>Centos7 and it is updated.
>Ovirt 4.3, latest packages.
>
>My network config:
>
>[root@mob-r1-d-ovirt-aa-1-01 ~]# ip a
>1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
>group
>default qlen 1000
>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>   inet 127.0.0.1/8 scope host lo
>  valid_lft forever preferred_lft forever
>   inet6 ::1/128 scope host
>  valid_lft forever preferred_lft forever
>2: ens1f0:  mtu 1500 qdisc mq
>master
>bond0 state UP group default qlen 1000
>   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
>3: ens1f1:  mtu 1500 qdisc mq
>master
>bond0 state UP group default qlen 1000
>   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
>4: enp11s0f0:  mtu 1500 qdisc mq
>state
>DOWN group default qlen 1000
>   link/ether 00:90:fa:c2:d2:50 brd ff:ff:ff:ff:ff:ff
>5: enp11s0f1:  mtu 1500 qdisc mq
>state
>DOWN group default qlen 1000
>   link/ether 00:90:fa:c2:d2:54 brd ff:ff:ff:ff:ff:ff
>21: bond0:  mtu 1500 qdisc
>noqueue
>state UP group default qlen 1000
>   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
>   inet6 fe80::290:faff:fec2:d248/64 scope link
>  valid_lft forever preferred_lft forever
>22: bond0.1131@bond0:  mtu 1500 qdisc
>noqueue state UP group default qlen 1000
>   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
>   inet 172.18.206.184/23 brd 172.18.207.255 scope global bond0.1131
>  valid_lft forever preferred_lft forever
>   inet6 fe80::290:faff:fec2:d248/64 scope link
>  valid_lft forever preferred_lft forever
>
>[root@mob-r1-d-ovirt-aa-1-01 network-scripts]# cat ifcfg-bond0
>BONDING_OPTS='mode=1 miimon=100'
>TYPE=Bond
>BONDING_MASTER=yes
>PROXY_METHOD=none
>BROWSER_ONLY=no
>IPV6INIT=no
>NAME=bond0
>UUID=c11ef6ef-794f-4683-a068-d6338e5c19b6
>DEVICE=bond0
>ONBOOT=yes
>[root@mob-r1-d-ovirt-aa-1-01 network-scripts]# cat ifcfg-bond0.1131
>DEVICE=bond0.1131
>VLAN=yes
>ONBOOT=yes
>MTU=1500
>IPADDR=172.18.206.184
>NETMASK=255.255.254.0
>GATEWAY=172.18.206.1
>BOOTPROTO=none
>MTU=1500
>DEFROUTE=yes
>NM_CONTROLLED=no
>IPV6INIT=yes
>DNS1=172.20.150.10
>DNS2=172.20.150.11
>
>I get the following error:
>
>[ INFO  ] TASK [ovirt.hosted_engine_setup : Generate output list]
>[ INFO  ] ok: [localhost]
>[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge
>interface if management bridge does not exists]
>[ INFO  ] skipping: [localhost]
> Please indicate a nic to set ovirtmgmt bridge on: (bond0,
>bond0.1131) [bond0.1131]:
> Please specify which way the network connectivity should be
>checked (ping, dns, tcp, none) [dns]:
>..
>..
>..
>..
>..
>[ INFO  ] ok: [localhost]
>[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge
>interface if management bridge does not exists]
>[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
>selected network interface is not valid"}
>[ ERROR ] Failed to execute stage 'Closing up': Failed executing
>ansible-playbook
>[ INFO  ] Stage: Clean up
>
>And if I create the ifcfg-ovirtmgmt as a bridge it fails later.
>
>What is the correct network setup for my bond configuration to do a
>self
>hosted-engine setup ?
>
>Regards
>
>Nar
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MRBIY4SJIANR5KE65OIQHNFSJHI7XD2S/


[ovirt-users] self hosted-engine deploy fails on network

2021-02-03 Thread Nardus Geldenhuys
Hi oVirt land

Hope you are well. Running into this issue, I hope you can help.

Centos7 and it is updated.
Ovirt 4.3, latest packages.

My network config:

[root@mob-r1-d-ovirt-aa-1-01 ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
  valid_lft forever preferred_lft forever
   inet6 ::1/128 scope host
  valid_lft forever preferred_lft forever
2: ens1f0:  mtu 1500 qdisc mq master
bond0 state UP group default qlen 1000
   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
3: ens1f1:  mtu 1500 qdisc mq master
bond0 state UP group default qlen 1000
   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
4: enp11s0f0:  mtu 1500 qdisc mq state
DOWN group default qlen 1000
   link/ether 00:90:fa:c2:d2:50 brd ff:ff:ff:ff:ff:ff
5: enp11s0f1:  mtu 1500 qdisc mq state
DOWN group default qlen 1000
   link/ether 00:90:fa:c2:d2:54 brd ff:ff:ff:ff:ff:ff
21: bond0:  mtu 1500 qdisc noqueue
state UP group default qlen 1000
   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
   inet6 fe80::290:faff:fec2:d248/64 scope link
  valid_lft forever preferred_lft forever
22: bond0.1131@bond0:  mtu 1500 qdisc
noqueue state UP group default qlen 1000
   link/ether 00:90:fa:c2:d2:48 brd ff:ff:ff:ff:ff:ff
   inet 172.18.206.184/23 brd 172.18.207.255 scope global bond0.1131
  valid_lft forever preferred_lft forever
   inet6 fe80::290:faff:fec2:d248/64 scope link
  valid_lft forever preferred_lft forever

[root@mob-r1-d-ovirt-aa-1-01 network-scripts]# cat ifcfg-bond0
BONDING_OPTS='mode=1 miimon=100'
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
IPV6INIT=no
NAME=bond0
UUID=c11ef6ef-794f-4683-a068-d6338e5c19b6
DEVICE=bond0
ONBOOT=yes
[root@mob-r1-d-ovirt-aa-1-01 network-scripts]# cat ifcfg-bond0.1131
DEVICE=bond0.1131
VLAN=yes
ONBOOT=yes
MTU=1500
IPADDR=172.18.206.184
NETMASK=255.255.254.0
GATEWAY=172.18.206.1
BOOTPROTO=none
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=yes
DNS1=172.20.150.10
DNS2=172.20.150.11

I get the following error:

[ INFO  ] TASK [ovirt.hosted_engine_setup : Generate output list]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge
interface if management bridge does not exists]
[ INFO  ] skipping: [localhost]
 Please indicate a nic to set ovirtmgmt bridge on: (bond0,
bond0.1131) [bond0.1131]:
 Please specify which way the network connectivity should be
checked (ping, dns, tcp, none) [dns]:
..
..
..
..
..
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge
interface if management bridge does not exists]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
selected network interface is not valid"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing
ansible-playbook
[ INFO  ] Stage: Clean up

And if I create the ifcfg-ovirtmgmt as a bridge it fails later.

What is the correct network setup for my bond configuration to do a self
hosted-engine setup ?

Regards

Nar
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NRU4CI6GLGL77RDY7O7EGZOD67TJOPJ3/


[ovirt-users] Re: Locked disks

2021-02-03 Thread Shani Leviim
In such a case, the disks shouldn't remain locked - sounds like a bug.
This one requires a deeper look.
If you're able to reproduce it again, please open a bug in Bugzilla (
https://bugzilla.redhat.com) with engine and vdsm logs,
so we'll be able to investigate it.


*Regards,*

*Shani Leviim*


On Wed, Feb 3, 2021 at 5:39 PM Giulio Casella  wrote:

> Hi,
> I tried unlock_entity.sh, and it solved the issue. So far so good.
>
> But it's still unclear why disks were locked.
>
> Let me make an hypothesis: in ovirt 4.3 a failure in snapshot removal
> would lead to a snapshot in illegal status. No problem, you can remove
> again and the situation is fixed.
> In ovirt 4.4 a failure in snapshot removal leave the whole disk in
> locked state (maybe a bug?), preventing any further action.
>
> Does it make sense?
>
>
> On 03/02/2021 12:25, Giulio Casella wrote:
> > Hi Shani,
> > no tasks listed in UI, and now "taskcleaner.sh -o" reports no task (just
> > before I gave "taskcleaner.sh -r").
> > But disks are still locked, and "unlock_entity.sh -q -t all -c"
> > (accordingly) reports only two disk's uuid (with their vm's uuid).
> >
> > Time to give a chance to unlock_entity.sh?
> >
> > Regards,
> > gc
> >
> > On 03/02/2021 11:52, Shani Leviim wrote:
> >> Hi Giulio,
> >> Before running unlock_entity.sh, let's try to find if there's any task
> >> in progress.
> >> Is there any hint on the events in the UI?
> >> Or try to run [1]:
> >> ./taskcleaner.sh -o
> >>
> >> Also, you can verify what entities are locked [2]:
> >> ./unlock_entity.sh -q -t all -c
> >>
> >> [1]
> >>
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
> >> <
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
> >
> >> [2]
> >>
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh
> >> <
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh
> >
> >>
> >> *Regards,
> >> *
> >> *Shani Leviim
> >> *
> >>
> >>
> >> On Wed, Feb 3, 2021 at 10:43 AM Giulio Casella  >> > wrote:
> >>
> >> Since yesterday I found a couple VMs with locked disk. I don't know
> the
> >> reason, I suspect some interaction made by our backup system
> (vprotect,
> >> snapshot based), despite it's working for more than a year.
> >>
> >> I'd give a chance to unlock_entity.sh script, but it reports:
> >>
> >> CAUTION, this operation may lead to data corruption and should be
> used
> >> with care. Please contact support prior to running this command
> >>
> >> Do you think I should trust? Is it safe? VMs are in production...
> >>
> >> My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node
> >> 4.4.4
> >>
> >>
> >> TIA,
> >> Giulio
> >> ___
> >> Users mailing list -- users@ovirt.org 
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> 
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> 
> >> oVirt Code of Conduct:
> >> https://www.ovirt.org/community/about/community-guidelines/
> >> 
> >> List Archives:
> >>
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/
> >> <
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/
> >
> >>
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FEXMZKZFYCWUOVZXZ3C3XZ7VBVYKFJGH/
> >
>
> o
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3LAFCCPZC5BO33PJAZA7EMHCWUKYH74/


[ovirt-users] Re: Changes between Node NG 4.4.3 and 4.4.4

2021-02-03 Thread Shantur Rathore
A bit more on this.

I figured out that 4.4.3 is based on Centos 8.2 and 4.4.4 is based on
Centos 8.3
Looks like the newer kernel is having
https://bugzilla.kernel.org/show_bug.cgi?id=207489 issue.

Not sure if it's updated in the latest 4.4.5-Pre release.

Regards
Shantur

On Wed, Feb 3, 2021 at 12:17 PM Shantur Rathore  wrote:

> Hi,
>
> I have a node 4.4.3 install and it works perfectly for PCI passthrough.
> After I upgraded to Node NG 4.4.4 the PCI passthrough leads to server
> resets. There are no logs in the kernel messages just before reset apart
> from vfio-pci enabling the device.
> So, I assume there is some change related to the kernel or vfio-pci driver
> in the newer 4.4.4 version.
>
> How / Where can I see the differences between Node NG releases?
>
> Thanks,
> Shantur
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZSV3T4ONNKG5LTGY34XB65JKN4OQK56R/


[ovirt-users] Single Node 4.3 to 4.4 help

2021-02-03 Thread Wesley Stewart
I have read through many posts and I think this process seems fairly simple.

https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3

But I just wanted to see if anyone had any gotchas.  I am thinking of
either using

   - RHEL8 (Using developer program, probably best best atm)
   - Ovirt Node (Is ovirt node being deprecated since it is based on
   centos?)
   - Rocky Linux/AlmaLinux/Clear Linux

>From my understanding I should:

   - Enter global maintenance
   - Make a full engine backup
   - Reinstall to a supported OS
   - Deploy ovirt engine using backup.

Looking forward to trying out Ovirt 4.4!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FFDZPAJRBQ5B336JGPBPX2S2HASOT5PI/


[ovirt-users] Re: Locked disks

2021-02-03 Thread Giulio Casella
Hi,
I tried unlock_entity.sh, and it solved the issue. So far so good.

But it's still unclear why disks were locked.

Let me make an hypothesis: in ovirt 4.3 a failure in snapshot removal
would lead to a snapshot in illegal status. No problem, you can remove
again and the situation is fixed.
In ovirt 4.4 a failure in snapshot removal leave the whole disk in
locked state (maybe a bug?), preventing any further action.

Does it make sense?


On 03/02/2021 12:25, Giulio Casella wrote:
> Hi Shani,
> no tasks listed in UI, and now "taskcleaner.sh -o" reports no task (just
> before I gave "taskcleaner.sh -r").
> But disks are still locked, and "unlock_entity.sh -q -t all -c"
> (accordingly) reports only two disk's uuid (with their vm's uuid).
> 
> Time to give a chance to unlock_entity.sh?
> 
> Regards,
> gc
> 
> On 03/02/2021 11:52, Shani Leviim wrote:
>> Hi Giulio,
>> Before running unlock_entity.sh, let's try to find if there's any task
>> in progress.
>> Is there any hint on the events in the UI?
>> Or try to run [1]:
>> ./taskcleaner.sh -o  
>>
>> Also, you can verify what entities are locked [2]:
>> ./unlock_entity.sh -q -t all -c
>>
>> [1]
>> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
>> 
>> [2]
>> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh
>> 
>>
>> *Regards,
>> *
>> *Shani Leviim
>> *
>>
>>
>> On Wed, Feb 3, 2021 at 10:43 AM Giulio Casella > > wrote:
>>
>> Since yesterday I found a couple VMs with locked disk. I don't know the
>> reason, I suspect some interaction made by our backup system (vprotect,
>> snapshot based), despite it's working for more than a year.
>>
>> I'd give a chance to unlock_entity.sh script, but it reports:
>>
>> CAUTION, this operation may lead to data corruption and should be used
>> with care. Please contact support prior to running this command
>>
>> Do you think I should trust? Is it safe? VMs are in production...
>>
>> My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node
>> 4.4.4
>>
>>
>> TIA,
>> Giulio
>> ___
>> Users mailing list -- users@ovirt.org 
>> To unsubscribe send an email to users-le...@ovirt.org
>> 
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> 
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> 
>> List Archives:
>> 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/
>> 
>> 
>>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FEXMZKZFYCWUOVZXZ3C3XZ7VBVYKFJGH/
> 

o
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DF3R5TKES43OJ2X4UF5SU25XLSFX5JG/


[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Yedidyah Bar David
On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi  wrote:
>
> Thanks,
>
> > I didn't check, but am pretty certain that it's not related to the
> > engine db. Do you see such duplicates there as well (using the web ui
> > or sql against it)? If so, fix these first. If no other means, put the
> > host to maintenance and reinstall with the correct name.
>
> Not seeing duplicates in the web UI, only in the --vm-status. Can you please 
> assist me with the sql commands or reference to the database schema + where 
> to check? I'd like to check that first before doing anything too drastic.

/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds'

>
> Note: it only duplicated the hostname after I changed the host_id, before 
> that it had the correct hostname but duplicate host_id.
>
> PS I have a recent backup of the database (just before which I could restore 
> if you think that'll do the trick without breaking anything?
>
>
> On 2021/02/03 16:33, Yedidyah Bar David wrote:
> > On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:
> >>
> >> Hi,
> >>
> >>> Any idea how this happened?
> >>
> >> Somehow related to the power being "pulled" at the wrong time?
> >>
> >>> Perhaps this is a backup done by emacs?
> >>
> >> Not sure what does it but I'm glad it did ;)
> >>
> >>> Please compare it to your other hosts. It should be (mostly?)
> >>> identical, but make sure that host_id= is unique per host. It should
> >>> match the spm host id for this host in the engine database.
> >>
> >> I had to restore one of my hosts (host 1) manually due a cleanup during my 
> >> re-deploy attempts. I managed to do this successfully by copying the 
> >> missing files from another host (host 2) but the first time the host ID 
> >> matched one of the other hosts (which made at least hosted-engine 
> >> --vm-status unhappy) [I hadn't seen your email yet :(]. I subsequently 
> >> corrected the host_id and rebooted the guilty host. Things mostly seem to 
> >> be working now except that in hosted-engine --vm-status my first two hosts 
> >> (the one I copied the .conf from as well as the one I copied it to 
> >> [without changing the ID :O]) now have the same hostname :-/ I'm assuming 
> >> there's a mismatch in the engine database - where/how do I fix that?
> >>
> >
> > I didn't check, but am pretty certain that it's not related to the
> > engine db. Do you see such duplicates there as well (using the web ui
> > or sql against it)? If so, fix these first. If no other means, put the
> > host to maintenance and reinstall with the correct name.
> >
> > If it's just the shared storage, you can try the following. Carefully.
> > Didn't try myself. Try on a test system first.
> >
> > 1. Set global maintenance
> >
> > 2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd
> >
> > 3. hosted-engine --clean_metadata --host-id=1
> >
> > - Perhaps even pass --force-cleanup, not sure when it's needed
> >
> > - Repeat for other IDs as needed
> >
> > 4. Start ovirt-ha-agent (I think this should start all the others, but
> > make sure)
> >
> > 5. Wait a bit. I am pretty certain that they should recreate their
> > entries in the shared storage and eventually --vm-status should look
> > ok.
> >
> > 6. Exit global maintenance
> >
> > Good luck,
> >
> >> Appreciated! (and happy cos our cluster is almost back to normal :) )
> >>
> >> On 2021/02/03 11:30, Yedidyah Bar David wrote:
> >>> On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  
> >>> wrote:
> 
>  Hello and thanks for assisting!
> 
>  I think I may have found the problem :)
> 
>  /etc/ovirt-hosted-engine/hosted-engine.conf
> 
>  is blank.
> 
>  But I do have hosted-engine.conf~
> >>>
> >>> Any idea how this happened?
> >>>
> >>> Perhaps this is a backup done by emacs?
> >>>
> 
>  Can I cp this to restore the original?
> >>>
> >>> Please compare it to your other hosts. It should be (mostly?)
> >>> identical, but make sure that host_id= is unique per host. It should
> >>> match the spm host id for this host in the engine database.
> >>>
> 
>  Anything else I need to do?
> >>>
> >>> Not sure, but better find the root cause to make sure no other damage was 
> >>> done.
> >>>
> >>> Good luck,
> >>>
> 
>  Appreciated
> 
> 
>  On 2021/02/02 11:37, Strahil Nikolov wrote:
> > Usually,
> >
> > I would start with checking the output of the 
> > /var/log/ovirt-hosted-engine-ha/{broker,agent}.log
> >
> > I'm typing it on my phone, so the path could have a typo.
> >
> > Check if the following services (also typed by memory, might have to 
> > remove the 'd') are running:
> > - sanlock
> > - supervdsmd
> > - vdsmd
> >
> >
> > Sometimes, some of my VGs (gluster) are not activated, so if you run 
> > hyperconverged -> you can 'vgchange -ay'.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> > Sent from Yahoo Mail on Android 
> > 

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Thanks,


I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.


Not seeing duplicates in the web UI, only in the --vm-status. Can you please 
assist me with the sql commands or reference to the database schema + where to 
check? I'd like to check that first before doing anything too drastic.

Note: it only duplicated the hostname after I changed the host_id, before that 
it had the correct hostname but duplicate host_id.

PS I have a recent backup of the database (just before which I could restore if 
you think that'll do the trick without breaking anything?


On 2021/02/03 16:33, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:


Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?



I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.

If it's just the shared storage, you can try the following. Carefully.
Didn't try myself. Try on a test system first.

1. Set global maintenance

2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd

3. hosted-engine --clean_metadata --host-id=1

- Perhaps even pass --force-cleanup, not sure when it's needed

- Repeat for other IDs as needed

4. Start ovirt-ha-agent (I think this should start all the others, but
make sure)

5. Wait a bit. I am pretty certain that they should recreate their
entries in the shared storage and eventually --vm-status should look
ok.

6. Exit global maintenance

Good luck,


Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~


Any idea how this happened?

Perhaps this is a backup done by emacs?



Can I cp this to restore the original?


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.



Anything else I need to do?


Not sure, but better find the root cause to make sure no other damage was done.

Good luck,



Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 


  On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
   wrote:
  Hi!

  We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

  hosted-engine --vm-status
  says:
  You must run deploy first

  I tried running deploy with various options but ultimately get stuck at:

  The Host ID is already known. Is this a re-deployment on an additional 
host that was previously set up (Yes, No)[Yes]?
  ...
  [ ERROR ] Failed to execute stage 'Closing up': 

  OR

  The specified storage location already contains a data 

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Yedidyah Bar David
On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:
>
> Hi,
>
> > Any idea how this happened?
>
> Somehow related to the power being "pulled" at the wrong time?
>
> > Perhaps this is a backup done by emacs?
>
> Not sure what does it but I'm glad it did ;)
>
> > Please compare it to your other hosts. It should be (mostly?)
> > identical, but make sure that host_id= is unique per host. It should
> > match the spm host id for this host in the engine database.
>
> I had to restore one of my hosts (host 1) manually due a cleanup during my 
> re-deploy attempts. I managed to do this successfully by copying the missing 
> files from another host (host 2) but the first time the host ID matched one 
> of the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
> hadn't seen your email yet :(]. I subsequently corrected the host_id and 
> rebooted the guilty host. Things mostly seem to be working now except that in 
> hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
> as well as the one I copied it to [without changing the ID :O]) now have the 
> same hostname :-/ I'm assuming there's a mismatch in the engine database - 
> where/how do I fix that?
>

I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.

If it's just the shared storage, you can try the following. Carefully.
Didn't try myself. Try on a test system first.

1. Set global maintenance

2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd

3. hosted-engine --clean_metadata --host-id=1

- Perhaps even pass --force-cleanup, not sure when it's needed

- Repeat for other IDs as needed

4. Start ovirt-ha-agent (I think this should start all the others, but
make sure)

5. Wait a bit. I am pretty certain that they should recreate their
entries in the shared storage and eventually --vm-status should look
ok.

6. Exit global maintenance

Good luck,

> Appreciated! (and happy cos our cluster is almost back to normal :) )
>
> On 2021/02/03 11:30, Yedidyah Bar David wrote:
> > On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:
> >>
> >> Hello and thanks for assisting!
> >>
> >> I think I may have found the problem :)
> >>
> >> /etc/ovirt-hosted-engine/hosted-engine.conf
> >>
> >> is blank.
> >>
> >> But I do have hosted-engine.conf~
> >
> > Any idea how this happened?
> >
> > Perhaps this is a backup done by emacs?
> >
> >>
> >> Can I cp this to restore the original?
> >
> > Please compare it to your other hosts. It should be (mostly?)
> > identical, but make sure that host_id= is unique per host. It should
> > match the spm host id for this host in the engine database.
> >
> >>
> >> Anything else I need to do?
> >
> > Not sure, but better find the root cause to make sure no other damage was 
> > done.
> >
> > Good luck,
> >
> >>
> >> Appreciated
> >>
> >>
> >> On 2021/02/02 11:37, Strahil Nikolov wrote:
> >>> Usually,
> >>>
> >>> I would start with checking the output of the 
> >>> /var/log/ovirt-hosted-engine-ha/{broker,agent}.log
> >>>
> >>> I'm typing it on my phone, so the path could have a typo.
> >>>
> >>> Check if the following services (also typed by memory, might have to 
> >>> remove the 'd') are running:
> >>> - sanlock
> >>> - supervdsmd
> >>> - vdsmd
> >>>
> >>>
> >>> Sometimes, some of my VGs (gluster) are not activated, so if you run 
> >>> hyperconverged -> you can 'vgchange -ay'.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>>
> >>> Sent from Yahoo Mail on Android 
> >>> 
> >>>
> >>>  On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
> >>>   wrote:
> >>>  Hi!
> >>>
> >>>  We had a power outage and all our servers (oVirt hosts) went down. 
> >>> When they started up neither the hosted-engine nor VMs were started.
> >>>
> >>>  hosted-engine --vm-status
> >>>  says:
> >>>  You must run deploy first
> >>>
> >>>  I tried running deploy with various options but ultimately get stuck 
> >>> at:
> >>>
> >>>  The Host ID is already known. Is this a re-deployment on an 
> >>> additional host that was previously set up (Yes, No)[Yes]?
> >>>  ...
> >>>  [ ERROR ] Failed to execute stage 'Closing up':  >>> [Errno 113] No route to host>
> >>>
> >>>  OR
> >>>
> >>>  The specified storage location already contains a data domain. Is 
> >>> this an additional host setup (Yes, No)[Yes]? No
> >>>  [ ERROR ] Re-deploying the engine VM over a previously (partially) 
> >>> deployed system is not supported. Please clean up the storage device or 
> >>> select a different one and retry.
> >>>
> >>>  NOTES:
> >>>  1. This is oVirt v3.6 (legacy install, I know...)
> >>> 

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?

Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~


Any idea how this happened?

Perhaps this is a backup done by emacs?



Can I cp this to restore the original?


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.



Anything else I need to do?


Not sure, but better find the root cause to make sure no other damage was done.

Good luck,



Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 


 On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
  wrote:
 Hi!

 We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

 hosted-engine --vm-status
 says:
 You must run deploy first

 I tried running deploy with various options but ultimately get stuck at:

 The Host ID is already known. Is this a re-deployment on an additional 
host that was previously set up (Yes, No)[Yes]?
 ...
 [ ERROR ] Failed to execute stage 'Closing up': 

 OR

 The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
 [ ERROR ] Re-deploying the engine VM over a previously (partially) 
deployed system is not supported. Please clean up the storage device or select 
a different one and retry.

 NOTES:
 1. This is oVirt v3.6 (legacy install, I know...)
 2. We do have daily engine backups (.bak files) [till the day the power 
failed]

 Any advice/assistance appreciated.

 Thanks!

 Roderick
 ___
 Users mailing list -- users@ovirt.org 
 To unsubscribe send an email to users-le...@ovirt.org 

 Privacy Statement: https://www.ovirt.org/privacy-policy.html 

 oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/ 

 List Archives:
 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
 



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/





___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 

[ovirt-users] Re: Ovirt VLAN Primer

2021-02-03 Thread David Johnson
Thank you.

I got so buried in the mechanics that I lost sight of the purpose of the
tagging. The tagged network should not be able to ping the untagged - that
was the whole purpose of the exercise.

The real problem is that the untagged network is unable to see its gateway
to the internet, which may be something as simple as configuring the
gateway on the router (not an ovirt problem). I was caught up chasing a red
herring by trying to ping the physical network.



On Wed, Feb 3, 2021, 12:26 AM Ales Musil  wrote:

>
>
> On Tue, Feb 2, 2021 at 8:07 PM Dan Yasny  wrote:
>
>>
>>
>> On Tue, Feb 2, 2021 at 2:00 PM David Johnson <
>> djohn...@maxistechnology.com> wrote:
>>
>>> Ah ... so if I connected one of the other ethernet ports to the tagged
>>> traffic (second physical network for tagged traffic), it should work as I
>>> expect?
>>>
>>
>> Yes, if there are no untagged networks attached
>>
>
> Mixing untagged and tagged is not a good practice from a security point of
> view but it should work.
> There might be 2 things blocking traffic to/from VM. Please make sure that
> the network does not have "Port Isolation".
> The second thing might be network filters, it can be disabled in
> corresponding vNIC profile and then rebooting VM or plugging/unplugging VM
> interface will make this change effective.
>
> Regards,
> Ales
>
>
>>
>>
>>> Regards,
>>> David Johnson
>>> Director of Development, Maxis Technology
>>> 844.696.2947 ext 702 (o)  |  479.531.3590 (c)
>>> djohn...@maxistechnology.com
>>>
>>>
>>> [image: Maxis Techncology] 
>>> www.maxistechnology.com
>>>
>>>
>>> *stay connected *
>>>
>>>
>>> On Tue, Feb 2, 2021 at 12:56 PM Dan Yasny  wrote:
>>>
 You're trying to mix tagged and untagged traffic. That, iirc, isn't
 supported for security reasons (the untagged network can see all the tagged
 traffic). You can put multiple tagged networks on the same NIC though.

 Please check with the ovirt folks though, it's been a while since I
 last checked the state of things

 On Tue, Feb 2, 2021 at 1:51 PM David Johnson <
 djohn...@maxistechnology.com> wrote:

> I have a physical network ovirtmgmt, and a logical network 10-non-prod
> with the vlan tag of 10 and the network label of 10.
>
> The physical and vlan have both been dragged to the enp0 NIC on the
> host.
>
> What I understand from this is that the bridge has been there all
> along, but, since I can't ping the host no traffic is crossing it.
>
> Host IP's : *192.168.2.18/24  * and 
> *10.210.100.18/24
> *
> VLAN IP on host: *10.210.10.18/24 *
>
>
> Regards,
>
> David Johnson
>
> On Tue, Feb 2, 2021 at 12:44 PM Dan Yasny  wrote:
>
>>
>>
>> On Tue, Feb 2, 2021 at 1:38 PM David Johnson <
>> djohn...@maxistechnology.com> wrote:
>>
>>> Thanks, this is a step closer, but the details are still very
>>> sketchy.
>>>
>>> Following the instructions at
>>> https://www.ovirt.org/documentation/administration_guide/#appe-Custom_Network_Properties
>>> :
>>>
>>> If I understand the instructions correctly:
>>>
>>>1. Open the host in the Ovirt UI
>>>2. Go to the Network tab
>>>3. Select the NIC I want to bridge to
>>>4. Click "Setup Host Networks"
>>>5. Click the pencil icon on the (host? VLAN?) network
>>>6. Choose the Custom Properties tab
>>>7. In the Custom Properties (Please Select a key), choose
>>>"bridge_opts"
>>>8.  At this point, there is no way to add the keys it looks
>>>like it needs ???   Total loss ???
>>>
>>>
>> You need to create a logical network first. Do you have any of those?
>> Logical networks are where you may add VLAN tags.
>>
>> In the hosts' network setup window you simply drag the logical
>> network to the NIC or bond and save. The VLAN tag and bridge will be
>> created accordingly on the host
>>
>>
>>>
>>> Regards,
>>> David Johnson
>>> Director of Development, Maxis Technology
>>> 844.696.2947 ext 702 (o)  |  479.531.3590 (c)
>>> djohn...@maxistechnology.com
>>>
>>>
>>> [image: Maxis Techncology] 
>>> www.maxistechnology.com
>>>
>>>
>>> *stay connected *
>>>
>>>
>>> On Tue, Feb 2, 2021 at 9:24 AM Dan Yasny  wrote:
>>>


 On Tue, Feb 2, 2021 at 10:20 AM David Johnson <
 djohn...@maxistechnology.com> wrote:

> This is great ... I am missing the bridge (at least).
>
> Does the bridge reside on the host or the VM?  Is it created in
> the Ovirt UI, or in the VM operating system?
>

 On 

[ovirt-users] Changes between Node NG 4.4.3 and 4.4.4

2021-02-03 Thread Shantur Rathore
Hi,

I have a node 4.4.3 install and it works perfectly for PCI passthrough.
After I upgraded to Node NG 4.4.4 the PCI passthrough leads to server
resets. There are no logs in the kernel messages just before reset apart
from vfio-pci enabling the device.
So, I assume there is some change related to the kernel or vfio-pci driver
in the newer 4.4.4 version.

How / Where can I see the differences between Node NG releases?

Thanks,
Shantur
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XNOQPOWUBAYTXB3L6QOSGBG43BZ5PCR5/


[ovirt-users] NodeNG persistence for custom vdsm hooks and firmware

2021-02-03 Thread Shantur Rathore
Hi all,

I have NodeNG 4.4.4 installed and want to know what is the best way of
persisting custom vdsm hooks and some firmware binaries across updates.
I tried to update to 4.4.5-pre and lost my hooks and firmware.

Thanks,
Shantur
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WPY5JP4BIUWH5QAR7M7L7MLY4XI2VMWH/


[ovirt-users] Re: Locked disks

2021-02-03 Thread Giulio Casella
Hi Shani,
no tasks listed in UI, and now "taskcleaner.sh -o" reports no task (just
before I gave "taskcleaner.sh -r").
But disks are still locked, and "unlock_entity.sh -q -t all -c"
(accordingly) reports only two disk's uuid (with their vm's uuid).

Time to give a chance to unlock_entity.sh?

Regards,
gc

On 03/02/2021 11:52, Shani Leviim wrote:
> Hi Giulio,
> Before running unlock_entity.sh, let's try to find if there's any task
> in progress.
> Is there any hint on the events in the UI?
> Or try to run [1]:
> ./taskcleaner.sh -o  
> 
> Also, you can verify what entities are locked [2]:
> ./unlock_entity.sh -q -t all -c
> 
> [1]
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
> 
> [2]
> https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh
> 
> 
> *Regards,
> *
> *Shani Leviim
> *
> 
> 
> On Wed, Feb 3, 2021 at 10:43 AM Giulio Casella  > wrote:
> 
> Since yesterday I found a couple VMs with locked disk. I don't know the
> reason, I suspect some interaction made by our backup system (vprotect,
> snapshot based), despite it's working for more than a year.
> 
> I'd give a chance to unlock_entity.sh script, but it reports:
> 
> CAUTION, this operation may lead to data corruption and should be used
> with care. Please contact support prior to running this command
> 
> Do you think I should trust? Is it safe? VMs are in production...
> 
> My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node
> 4.4.4
> 
> 
> TIA,
> Giulio
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org
> 
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> 
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> 
> List Archives:
> 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/
> 
> 
> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FEXMZKZFYCWUOVZXZ3C3XZ7VBVYKFJGH/


[ovirt-users] Re: Locked disks

2021-02-03 Thread Shani Leviim
Hi Giulio,
Before running unlock_entity.sh, let's try to find if there's any task in
progress.
Is there any hint on the events in the UI?
Or try to run [1]:
./taskcleaner.sh -o

Also, you can verify what entities are locked [2]:
./unlock_entity.sh -q -t all -c

[1]
https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/taskcleaner.sh
[2]
https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/dbutils/unlock_entity.sh


*Regards,*

*Shani Leviim*


On Wed, Feb 3, 2021 at 10:43 AM Giulio Casella  wrote:

> Since yesterday I found a couple VMs with locked disk. I don't know the
> reason, I suspect some interaction made by our backup system (vprotect,
> snapshot based), despite it's working for more than a year.
>
> I'd give a chance to unlock_entity.sh script, but it reports:
>
> CAUTION, this operation may lead to data corruption and should be used
> with care. Please contact support prior to running this command
>
> Do you think I should trust? Is it safe? VMs are in production...
>
> My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node 4.4.4
>
>
> TIA,
> Giulio
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NEQA5ORDQJIGUZG2VRJ4THU2HJKCYWPA/


[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Yedidyah Bar David
On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:
>
> Hello and thanks for assisting!
>
> I think I may have found the problem :)
>
> /etc/ovirt-hosted-engine/hosted-engine.conf
>
> is blank.
>
> But I do have hosted-engine.conf~

Any idea how this happened?

Perhaps this is a backup done by emacs?

>
> Can I cp this to restore the original?

Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.

>
> Anything else I need to do?

Not sure, but better find the root cause to make sure no other damage was done.

Good luck,

>
> Appreciated
>
>
> On 2021/02/02 11:37, Strahil Nikolov wrote:
> > Usually,
> >
> > I would start with checking the output of the 
> > /var/log/ovirt-hosted-engine-ha/{broker,agent}.log
> >
> > I'm typing it on my phone, so the path could have a typo.
> >
> > Check if the following services (also typed by memory, might have to remove 
> > the 'd') are running:
> > - sanlock
> > - supervdsmd
> > - vdsmd
> >
> >
> > Sometimes, some of my VGs (gluster) are not activated, so if you run 
> > hyperconverged -> you can 'vgchange -ay'.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> > Sent from Yahoo Mail on Android 
> > 
> >
> > On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
> >  wrote:
> > Hi!
> >
> > We had a power outage and all our servers (oVirt hosts) went down. When 
> > they started up neither the hosted-engine nor VMs were started.
> >
> > hosted-engine --vm-status
> > says:
> > You must run deploy first
> >
> > I tried running deploy with various options but ultimately get stuck at:
> >
> > The Host ID is already known. Is this a re-deployment on an additional 
> > host that was previously set up (Yes, No)[Yes]?
> > ...
> > [ ERROR ] Failed to execute stage 'Closing up':  > 113] No route to host>
> >
> > OR
> >
> > The specified storage location already contains a data domain. Is this 
> > an additional host setup (Yes, No)[Yes]? No
> > [ ERROR ] Re-deploying the engine VM over a previously (partially) 
> > deployed system is not supported. Please clean up the storage device or 
> > select a different one and retry.
> >
> > NOTES:
> > 1. This is oVirt v3.6 (legacy install, I know...)
> > 2. We do have daily engine backups (.bak files) [till the day the power 
> > failed]
> >
> > Any advice/assistance appreciated.
> >
> > Thanks!
> >
> > Roderick
> > ___
> > Users mailing list -- users@ovirt.org 
> > To unsubscribe send an email to users-le...@ovirt.org 
> > 
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html 
> > 
> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/ 
> > 
> > List Archives:
> > 
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
> >  
> > 
> >
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/



-- 
Didi
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7R34CA2UJ6SMM7JYSXPDV4TDQAN6HNOP/


[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~

Can I cp this to restore the original?

Anything else I need to do?

Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 


On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
 wrote:
Hi!

We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

hosted-engine --vm-status
says:
You must run deploy first

I tried running deploy with various options but ultimately get stuck at:

The Host ID is already known. Is this a re-deployment on an additional host 
that was previously set up (Yes, No)[Yes]?
...
[ ERROR ] Failed to execute stage 'Closing up': 

OR

The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
[ ERROR ] Re-deploying the engine VM over a previously (partially) deployed 
system is not supported. Please clean up the storage device or select a 
different one and retry.

NOTES:
1. This is oVirt v3.6 (legacy install, I know...)
2. We do have daily engine backups (.bak files) [till the day the power 
failed]

Any advice/assistance appreciated.

Thanks!

Roderick
___
Users mailing list -- users@ovirt.org 
To unsubscribe send an email to users-le...@ovirt.org 

Privacy Statement: https://www.ovirt.org/privacy-policy.html 

oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/ 

List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
 



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/


[ovirt-users] Locked disks

2021-02-03 Thread Giulio Casella
Since yesterday I found a couple VMs with locked disk. I don't know the
reason, I suspect some interaction made by our backup system (vprotect,
snapshot based), despite it's working for more than a year.

I'd give a chance to unlock_entity.sh script, but it reports:

CAUTION, this operation may lead to data corruption and should be used
with care. Please contact support prior to running this command

Do you think I should trust? Is it safe? VMs are in production...

My manager is 4.4.4.7-1.el8 (CentOS stream 8), hosts are oVirt Node 4.4.4


TIA,
Giulio
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4HYMDMHOKC5DCHDP6CFLM4RWJQNN7R4/