[ovirt-users] Disconnected Server has closed the connection.

2020-09-14 Thread info
It seems that the installation is all done, but I have a problem. it takes very 
long to open the web pages, plus it disconnect all the time. it is impossible 
to do anything.

I can ping the hostname as I set up a sub-domain for it. to be honest, I am new 
to this and it took me days to get to this point. I think there are some issues 
with my network settings.

if there are any oVirt experts that can check my installation and give me 
advice about how to improve it, it will be greatly appreciated.

I have done an "Installing oVirt as a self-hosted engine using the Cockpit web 
interface"
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: hosted engine migration

2020-09-14 Thread ddqlo
I have tried. It does not work.
My host network:

Will this help?











在 2020-09-09 22:52:25,"Strahil Nikolov"  写道:
>I think that you can try to set one of the HE hosts into maintenance and then 
>use UI to 'reinstall'. Don't forget to mark the host as a HE host also (some 
>dropdown in the UI wizard).
>
>Best Regards,
>Strahil Nikolov
>
>
>
>
>
>
>В вторник, 8 септември 2020 г., 10:24:00 Гринуич+3, Yedidyah Bar David 
> написа: 
>
>
>
>
>
>On Tue, Sep 8, 2020 at 4:33 AM ddqlo  wrote:
>> my hosts cpu:Intel Haswell-noTSX Family
>> cluster cpu:Intel Haswell-noTSX Family
>> HostedEngine vm cpu:Intel Haswell-noTSX Family
>> 
>> When I tried to put the host in maintenance in web UI, I got an error:
>> 
>> 
>
>Adding Arik. Arik - any idea what else to test?
> 
>> 
>> 
>> When I typed the command, I got this:
>> [root@node22 ~]# hosted-engine --set-maintenance --mode=local
>> Unable to enter local maintenance mode: the engine VM is running on the 
>> current host, please migrate it before entering local maintenance mode.
>
>Sorry, I wasn't aware that this was disabled since 4.3.5, about a year ago:
>
>https://gerrit.ovirt.org/#/q/Ia06b9bc6e65a7937e6d6462c001b59572369fe66,n,z
>
>So you'll have to first fix migration on engine level.
>
>Best regards,
>
> 
>> 
>> 
>> 
>> 
>> At 2020-09-07 12:52:01, "Yedidyah Bar David"  wrote:
>>>On Mon, Sep 7, 2020 at 4:13 AM ddqlo  wrote:

 I have found some engine logs:

 2020-09-07 09:00:45,428+08 INFO  
 [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-6) 
 [29259482-1515-4c10-8458-59354a0953ac] Candidate host 'node22' 
 ('585b374b-4c82-4f5c-aad7-196d9f5d5625') was filtered out by 
 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)

 2020-09-07 09:00:45,428+08 INFO  
 [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-6) 
 [29259482-1515-4c10-8458-59354a0953ac] Candidate host 'node28' 
 ('a678a15d-19e6-46f2-80bf-c3181197a0a6') was filtered out by 
 'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)


 It seems that both of the two hosts were filtered out.
>>>
>>>So please check cpu conf of the hosts, cluster, VM, etc.
>>>
>>>One more thing you can try is to force putting the host in maintenance
>>>- this will require migrating the engine VM.
>>>
>>>If the engine refuses to do that, because it can't migrate the VM due
>>>to above issue, you can try instead:
>>>
>>>hosted-engine --set-maintenance --mode=local
>>>
>>>I think this overrides the engine and will force a migration. Didn't
>>>try recently.
>>>
>>>Best regards,
>>>




 在 2020-09-07 07:50:55,"ddqlo"  写道:

 I could not find any logs because the migration button is disabled in the 
 web UI. It seems that the engine migration operation is prevented at 
 first. Any other ideas? Thanks!







 在 2020-09-01 00:06:19,"Strahil Nikolov"  写道:
 >I'm running oVirt 4.3.10 and I can migrate my Engine from node to node.
 >I had one similar issue , but powering off and on the HE has fixed it.
 >
 >You have to check the vdsm log on the source and on destination in order 
 >to figure out what is going on.
 >Also you might consider checking the libvirt logs on the destination.
 >
 >Best Regards,
 >Strahil Nikolov
 >
 >
 >
 >
 >
 >
 >В понеделник, 31 август 2020 г., 10:47:22 Гринуич+3, ddqlo 
 > написа:
 >
 >
 >
 >
 >
 >Thanks! The scores of all nodes are not '0'. I find that someone has 
 >already asked a question like this. It seems that  this feature has been 
 >disabled in 4.3. I am not sure if it is enabled in 4.4.
 >
 >
 >在 2020-08-29 02:27:03,"Strahil Nikolov"  :
 >>Have you checked under a shell the output of 'hosted-engine --vm-status' 
 >>. Check the Score of the hosts. Maybe there is a node with score of '0' ?
 >>
 >>Best Regards,
 >>Strahil Nikolov
 >>
 >>
 >>
 >>
 >>
 >>
 >>В вторник, 25 август 2020 г., 13:46:18 Гринуич+3, 董青龙  
 >>написа:
 >>
 >>
 >>
 >>
 >>
 >>Hi all,
 >>I have an ovirt4.3.10.4 environment of 2 hosts. Normal vms in 
 >> this environment could be migrated, but the hosted engine vm could not 
 >> be migrated. Anyone can help? Thanks a lot!
 >>
 >>hosts status:
 >>
 >>normal vm migration:
 >>
 >>hosted engine vm migration:
 >>
 >>
 >>
 >>
 >>___
 >>Users mailing list -- users@ovirt.org
 >>To unsubscribe send an email to users-le...@ovirt.org
 >>Privacy Statement: https://www.ovirt.org/privacy-policy.html
 >>oVirt Code of Conduct: 
 >>https://www.ovirt.org/community/about/community-guidelines/
 >>List Archives: 
 

[ovirt-users] Re: hosted engine migration

2020-09-14 Thread ddqlo
--== Host node28 (id: 1) status ==--




conf_on_shared_storage : True

Status up-to-date  : True

Hostname   : node28

Host ID: 1

Engine status  : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down_unexpected", "detail": "unknown"}

Score  : 1800

stopped: False

Local maintenance  : False

crc32  : 4ac6105b

local_conf_timestamp   : 1794597

Host timestamp : 1794597

Extra metadata (valid at timestamp):

metadata_parse_version=1

metadata_feature_version=1

timestamp=1794597 (Tue Sep 15 09:47:17 2020)

host-id=1

score=1800

vm_conf_refresh_time=1794597 (Tue Sep 15 09:47:17 2020)

conf_on_shared_storage=True

maintenance=False

state=EngineDown

stopped=False







--== Host node22 (id: 2) status ==--




conf_on_shared_storage : True

Status up-to-date  : True

Hostname   : node22

Host ID: 2

Engine status  : {"health": "good", "vm": "up", "detail": 
"Up"}

Score  : 1800

stopped: False

Local maintenance  : False

crc32  : ffc41893

local_conf_timestamp   : 1877876

Host timestamp : 1877876

Extra metadata (valid at timestamp):

metadata_parse_version=1

metadata_feature_version=1

timestamp=1877876 (Tue Sep 15 09:47:13 2020)

host-id=2

score=1800

vm_conf_refresh_time=1877876 (Tue Sep 15 09:47:13 2020)

conf_on_shared_storage=True

maintenance=False

state=EngineUp

stopped=False

















在 2020-09-09 01:32:55,"Strahil Nikolov"  写道:
>What is the output of 'hosted-engine --vm-status' on the node where the 
>HostedEngine is running ?
>
>
>Best Regards,
>Strahil Nikolov
>
>
>
>
>
>
>В понеделник, 7 септември 2020 г., 03:53:13 Гринуич+3, ddqlo  
>написа: 
>
>
>
>
>
>I could not find any logs because the migration button is disabled in the web 
>UI. It seems that the engine migration operation is prevented at first. Any 
>other ideas? Thanks!
>
>
>
>
>
>
>
>在 2020-09-01 00:06:19,"Strahil Nikolov"  写道:
>>I'm running oVirt 4.3.10 and I can migrate my Engine from node to node.
>>I had one similar issue , but powering off and on the HE has fixed it.
>>
>>You have to check the vdsm log on the source and on destination in order to 
>>figure out what is going on.
>>Also you might consider checking the libvirt logs on the destination.
>>
>>Best Regards,
>>Strahil Nikolov
>>
>>
>>
>>
>>
>>
>>В понеделник, 31 август 2020 г., 10:47:22 Гринуич+3, ddqlo  
>>написа: 
>>
>>
>>
>>
>>
>>Thanks! The scores of all nodes are not '0'. I find that someone has already 
>>asked a question like this. It seems that  this feature has been disabled in 
>>4.3. I am not sure if it is enabled in 4.4.
>>
>>
>>在 2020-08-29 02:27:03,"Strahil Nikolov"  :
>>>Have you checked under a shell the output of 'hosted-engine --vm-status' . 
>>>Check the Score of the hosts. Maybe there is a node with score of '0' ?
>>>
>>>Best Regards,
>>>Strahil Nikolov
>>>
>>>
>>>
>>>
>>>
>>>
>>>В вторник, 25 август 2020 г., 13:46:18 Гринуич+3, 董青龙  
>>>написа: 
>>>
>>>
>>>
>>>
>>>
>>>Hi all,
>>>I have an ovirt4.3.10.4 environment of 2 hosts. Normal vms in this 
>>> environment could be migrated, but the hosted engine vm could not be 
>>> migrated. Anyone can help? Thanks a lot!
>>>
>>>hosts status:
>>>
>>>normal vm migration:
>>>
>>>hosted engine vm migration:
>>>
>>>
>>>
>>> 
>>>___
>>>Users mailing list -- users@ovirt.org
>>>To unsubscribe send an email to users-le...@ovirt.org
>>>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>oVirt Code of Conduct: 
>>>https://www.ovirt.org/community/about/community-guidelines/
>>>List Archives: 
>>>https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZXHE2AJX42HNHOMYHTDCUUIU3VQTQTLF/
>>
>>
>>
>>
>> 
>>___
>>Users mailing list -- users@ovirt.org
>>To unsubscribe send an email to users-le...@ovirt.org
>>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>oVirt Code of Conduct: 
>>https://www.ovirt.org/community/about/community-guidelines/
>>List Archives: 
>>https://lists.ovirt.org/archives/list/users@ovirt.org/message/IAYLFLC6K42OUPZSZU3P3ZYAU66LGSCD/
>
>
>
>
> 
>___
>Users mailing list -- users@ovirt.org
>To unsubscribe send an email to users-le...@ovirt.org
>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct: 
>https://www.ovirt.org/community/about/community-guidelines/

[ovirt-users] Re: hosted engine migration

2020-09-14 Thread ddqlo
diffidences:


  
a15b30fd-2de2-4bea-922d-d0de2ee3b76a
..

..

  

  32904772
  8226193

















在 2020-09-09 01:35:59,"Strahil Nikolov"  写道:
>You can use the following:
>
>vim ~/.bashrc
>
>alias virsh='virsh -c 
>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
>
>
>source ~/.bashrc
>
>#Show host capabilities
>virsh capabilities
>
>Now repeat on the other nodes. Compare the CPU from the 3 outputs.
>
>
>Best Regards,
>Strahil Nikolov
>
>
>
>
>
>
>В понеделник, 7 септември 2020 г., 04:11:52 Гринуич+3, ddqlo  
>написа: 
>
>
>
>
>
>I have found some engine logs:
>
>2020-09-07 09:00:45,428+08 INFO  
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-6) 
>[29259482-1515-4c10-8458-59354a0953ac] Candidate host 'node22' 
>('585b374b-4c82-4f5c-aad7-196d9f5d5625') was filtered out by 
>'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)
>2020-09-07 09:00:45,428+08 INFO  
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-6) 
>[29259482-1515-4c10-8458-59354a0953ac] Candidate host 'node28' 
>('a678a15d-19e6-46f2-80bf-c3181197a0a6') was filtered out by 
>'VAR__FILTERTYPE__INTERNAL' filter 'CPU' (correlation id: null)
>
>It seems that both of the two hosts were filtered out.
>
>
>
>
>
>在 2020-09-07 07:50:55,"ddqlo"  写道:
>> I could not find any logs because the migration button is disabled in the 
>> web UI. It seems that the engine migration operation is prevented at first. 
>> Any other ideas? Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 在 2020-09-01 00:06:19,"Strahil Nikolov"  写道:
>>>I'm running oVirt 4.3.10 and I can migrate my Engine from node to node.
>>>I had one similar issue , but powering off and on the HE has fixed it.
>>>
>>>You have to check the vdsm log on the source and on destination in order to 
>>>figure out what is going on.
>>>Also you might consider checking the libvirt logs on the destination.
>>>
>>>Best Regards,
>>>Strahil Nikolov
>>>
>>>
>>>
>>>
>>>
>>>
>>>В понеделник, 31 август 2020 г., 10:47:22 Гринуич+3, ddqlo  
>>>написа: 
>>>
>>>
>>>
>>>
>>>
>>>Thanks! The scores of all nodes are not '0'. I find that someone has already 
>>>asked a question like this. It seems that  this feature has been disabled in 
>>>4.3. I am not sure if it is enabled in 4.4.
>>>
>>>
>>>在 2020-08-29 02:27:03,"Strahil Nikolov"  :
Have you checked under a shell the output of 'hosted-engine --vm-status' . 
Check the Score of the hosts. Maybe there is a node with score of '0' ?

Best Regards,
Strahil Nikolov






В вторник, 25 август 2020 г., 13:46:18 Гринуич+3, 董青龙  
написа: 





Hi all,
I have an ovirt4.3.10.4 environment of 2 hosts. Normal vms in this 
 environment could be migrated, but the hosted engine vm could not be 
 migrated. Anyone can help? Thanks a lot!

hosts status:

normal vm migration:

hosted engine vm migration:



 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZXHE2AJX42HNHOMYHTDCUUIU3VQTQTLF/
>>>
>>>
>>>
>>>
>>> 
>>>___
>>>Users mailing list -- users@ovirt.org
>>>To unsubscribe send an email to users-le...@ovirt.org
>>>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>oVirt Code of Conduct: 
>>>https://www.ovirt.org/community/about/community-guidelines/
>>>List Archives: 
>>>https://lists.ovirt.org/archives/list/users@ovirt.org/message/IAYLFLC6K42OUPZSZU3P3ZYAU66LGSCD/
>> 
>> 
>> 
>> 
>>  
>
>
>
> 
>___
>Users mailing list -- users@ovirt.org
>To unsubscribe send an email to users-le...@ovirt.org
>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct: 
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives: 
>https://lists.ovirt.org/archives/list/users@ovirt.org/message/HRCQR7Y6AMUW6HAVINQVSIRB6B6WGXMN/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Multiple GPU Passthrough with NVLink (Invalid I/O region)

2020-09-14 Thread Arman Khalatyan
any progress in this gpu question?
in our setup we have supermicro boards with intel xeon gold 6146 + 2 T4
we add extra line in the /etc/default/grub
"rd.driver.blacklist=nouveau nouveau.modeset=0 pci-stub.ids=xxx:xxx
intel_iommu=on"
would be interesting if the nvlink was the showstopper.



Arman Khalatyan  schrieb am Sa., 5. Sept. 2020, 00:38:

> same here ☺️, on Monday will check them.
>
> Michael Jones  schrieb am Fr., 4. Sept. 2020, 22:01:
>
>> Yea pass through, I think vgpu you have to pay for driver upgrade with
>> nvidia, I've not tried that and don't know the price, didn't find getting
>> info on it easy last time I tried.
>>
>> Have used in both legacy and uefi boot machines, don't know the chipsets
>> off the top of my head, will look on Monday.
>>
>>
>> On Fri, 4 Sep 2020, 20:56 Vinícius Ferrão, 
>> wrote:
>>
>>> Thanks Michael and Arman.
>>>
>>> To make things clear, you guys are using Passthrough, right? It’s not
>>> vGPU. The 4x GPUs are added on the “Host Devices” tab of the VM.
>>> What I’m trying to achieve is add the 4x V100 directly to one specific
>>> VM.
>>>
>>> And finally can you guys confirm which BIOS type is being used in your
>>> machines? I’m with Q35 Chipset with UEFI BIOS. I haven’t tested it with
>>> legacy, perhaps I’ll give it a try.
>>>
>>> Thanks again.
>>>
>>> On 4 Sep 2020, at 14:09, Michael Jones  wrote:
>>>
>>> Also use multiple t4, also p4, titans, no issues but never used the
>>> nvlink
>>>
>>> On Fri, 4 Sep 2020, 16:02 Arman Khalatyan,  wrote:
>>>
 hi,
 with the 2xT4 we haven't seen any trouble. we have no nvlink there.

 did u try to disable the nvlink?



 Vinícius Ferrão via Users  schrieb am Fr., 4. Sept.
 2020, 08:39:

> Hello, here we go again.
>
> I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a
> single VM; but things aren’t that good. Only one GPU shows up on the VM.
> lspci is able to show the GPUs, but three of them are unusable:
>
> 08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
> 16GB] (rev a1)
> 09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
> 16GB] (rev a1)
> 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
> 16GB] (rev a1)
> 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2
> 16GB] (rev a1)
>
> There are some errors on dmesg, regarding a misconfigured BIOS:
>
> [   27.295972] nvidia: loading out-of-tree module taints kernel.
> [   27.295980] nvidia: module license 'NVIDIA' taints kernel.
> [   27.295981] Disabling lock debugging due to kernel taint
> [   27.304180] nvidia: module verification failed: signature and/or
> required key missing - tainting kernel
> [   27.364244] nvidia-nvlink: Nvlink Core is being initialized, major
> device number 241
> [   27.579261] nvidia :09:00.0: enabling device ( -> 0002)
> [   27.579560] NVRM: This PCI I/O region assigned to your NVIDIA
> device is invalid:
>NVRM: BAR1 is 0M @ 0x0 (PCI::09:00.0)
> [   27.579560] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.579566] nvidia: probe of :09:00.0 failed with error -1
> [   27.580727] NVRM: This PCI I/O region assigned to your NVIDIA
> device is invalid:
>NVRM: BAR0 is 0M @ 0x0 (PCI::0a:00.0)
> [   27.580729] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.580734] nvidia: probe of :0a:00.0 failed with error -1
> [   27.581299] NVRM: This PCI I/O region assigned to your NVIDIA
> device is invalid:
>NVRM: BAR0 is 0M @ 0x0 (PCI::0b:00.0)
> [   27.581300] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.581305] nvidia: probe of :0b:00.0 failed with error -1
> [   27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
> [   27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module
> 450.51.06  Sun Jul 19 20:02:54 UTC 2020
> [   27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting
> Driver for UNIX platforms  450.51.06  Sun Jul 19 20:06:42 UTC 2020
>
> The host is Secure Intel Skylake (x86_64). VM is running with Q35
> Chipset with UEFI (pc-q35-rhel8.2.0)
>
> I’ve tried to change the I/O mapping options on the host, tried with
> 56TB and 12TB without success. Same results. Didn’t tried with 512GB since
> the machine have 768GB of system RAM.
>
> Tried blacklisting the nouveau on the host, nothing.
> Installed NVIDIA drivers on the host, nothing.
>
> In the host I can use the 4x V100, but inside a single VM it’s
> impossible.
>
> Any suggestions?
>
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: 

[ovirt-users] Re: Testing ovirt 4.4.1 Nested KVM on Skylake-client (core i5) does not work

2020-09-14 Thread wodel youchi
Hi,

I didn't use "host-passthrough" because :
1 - testing ovirt worked for me until this new 4.4 version.
2 - "host-passthrough" is not listed as an option when using virt-manager.

Regards.

Le lun. 14 sept. 2020 à 16:21, Strahil Nikolov  a
écrit :

> Why don't you use 'host-passthrough' cpu type ?
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В неделя, 13 септември 2020 г., 20:31:44 Гринуич+3, wodel youchi <
> wodel.you...@gmail.com> написа:
>
>
>
>
>
> Hi,
>
> I've been using my core i5 6500 (skylake-client) for some time now to test
> oVirt on my machine.
> However this is no longer the case.
>
> I am using Fedora 32 as my base system with nested-kvm enabled, when I try
> to install oVirt 4.4 as HCI single node, I get an error in the last phase
> which consists of copying the VM-Manager to the engine volume and boot it.
> It is the boot that causes the problem, I get an error about the CPU :
> the CPU is incompatible with host CPU: Host CPU does not provide required
> features: mpx
>
> This is the CPU part from virsh domcapabilities on my physical machine
>  name='host-model' supported='yes'>   fallback='forbid'>Skylake-Client-IBRS  Intel
> name='vmx'/> policy='require' name='hypervisor'/>   name='tsc_adjust'/>  
> name='md-clear'/>  
> policy='require' name='ssbd'/>   name='xsaves'/>  
> policy='require' name='ibpb'/>   name='amd-ssbd'/>   name='skip-l1dfl-vmentry'/> supported='yes'>  qemu64   usable='yes'>qemu32  phenom
>  pentium3   usable='yes'>pentium2  pentium
>  n270   usable='yes'>kvm64  kvm32
>  coreduo   usable='yes'>core2duo  athlon
>  Westmere-IBRS   usable='yes'>Westmere   usable='no'>Skylake-Server-IBRS   usable='no'>Skylake-Server   usable='yes'>Skylake-Client-IBRS   usable='yes'>Skylake-Client   usable='yes'>SandyBridge-IBRS   usable='yes'>SandyBridge  Penryn
>  Opteron_G5   usable='no'>Opteron_G4  Opteron_G3
>  Opteron_G2   usable='yes'>Opteron_G1   usable='yes'>Nehalem-IBRS  Nehalem
>  IvyBridge-IBRS   usable='yes'>IvyBridge   usable='no'>Icelake-Server   usable='no'>Icelake-Client   usable='yes'>Haswell-noTSX-IBRS   usable='yes'>Haswell-noTSX   usable='yes'>Haswell-IBRS  Haswell
>  EPYC-IBPB   usable='no'>EPYC  Dhyana   usable='yes'>Conroe   usable='no'>Cascadelake-Server   usable='yes'>Broadwell-noTSX-IBRS   usable='yes'>Broadwell-noTSX   usable='yes'>Broadwell-IBRS   usable='yes'>Broadwell  486
>  
>
> Here is the lscpu of my physical machine
> # lscpu Architecture:x86_64 CPU op-mode(s):
>  32-bit, 64-bit Byte Order:  Little
> Endian Address sizes:   39 bits physical, 48 bits virtual
> CPU(s):  4 On-line CPU(s) list: 0-3
> Thread(s) per core:  1 Core(s) per socket:  4
> Socket(s):   1 NUMA node(s):1
> Vendor ID:   GenuineIntel CPU family:
>  6 Model:   94 Model name:
>  Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Stepping:
>3 CPU MHz: 954.588 CPU max
> MHz: 3600. CPU min MHz:
> 800. BogoMIPS:6399.96
> Virtualization:  VT-x L1d cache:   128
> KiB L1i cache:   128 KiB L2 cache:
>1 MiB L3 cache:6 MiB NUMA
> node0 CPU(s):   0-3 Vulnerability Itlb multihit: KVM:
> Mitigation: Split huge pages Vulnerability L1tf:  Mitigation;
> PTE Inversion; VMX conditional cache flushes, SMT disabled Vulnerability
> Mds:   Mitigation; Clear CPU buffers; SMT disabled
> Vulnerability Meltdown:  Mitigation; PTI Vulnerability Spec store
> bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
> Vulnerability Spectre v1:Mitigation; usercopy/swapgs barriers and
> __user pointer sanitization Vulnerability Spectre v2:Mitigation;
> Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB
> filling Vulnerability Srbds: Vulnerable: No microcode
> Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT
> disabled Flags:   fpu vme de pse tsc msr pae mce
> cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2
> ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
> t_tsc art arch_perfmon pebs bts rep_good
> nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16
>

[ovirt-users] Re: New oVirt Install - Host Engine Deployment Fails

2020-09-14 Thread Michael Blanton

Thanks for the quick response.

Ansible task reports them as Xeon 5130.
According to Intel Ark these fall in the Woodcrest family, which is 
older the Nehalem.


Obviously the CPUs support virtualization.
I also confirmed the required extensions from the oVirt documents.

# grep -E 'svm|vmx' /proc/cpuinfo | grep n

Question for my lab:
So is this a situation where "Woodcrest" is simply not in the dictionary?
Is there a way to manually add that or "force" it, just to get the 
engine to deploy? That way I can kick the tires on oVirt while I 
consider an upgrade to my lab systems. Knowing ahead of time that it is 
a "hack" and unsupported.


Question for product:
If this is an unsupported CPU, shouldn't the installer/Hosted Engine 
Deployment flag that at the beginning of the process, not 45 minutes 
later when trying to move the already created VM to shared storage?


Thanks again



On 9/14/2020 12:45 PM, Edward Berger wrote:
What is the CPU?  I'm asking because you said it was old servers, and at 
some point oVirt started filtering out old CPU types which were no 
longer supported under windows.   There was also the case where if a 
certain bios option wasn't enabled (AES?) a westmere(supported) reported 
as an older model(unsupported).



On Mon, Sep 14, 2020 at 12:20 PM > wrote:


I am attempting a new oVirt install. I have two nodes installed
(with oVirt Node 4.4). I have NFS shared storage for the hosted engine.
Both nodes are Dell quad core Xeon CPUs with 32GB of RAM. Both have
been hypervisors before, XCP-NG and Proxmox. However I'm very
interested to learn oVirt now.

The hosted engine deployment (through cockpit) fails during the
"Finish" stage.
I do see the initial files created on the NFS storage.

[ INFO ] TASK [ovirt.hosted_engine_setup : Convert CPU model name]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes
an option with an undefined variable. The error was: 'dict object'
has no attribute ''\n\nThe error appears to be in

'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml':
line 105, column 16, but may\nbe elsewhere in the file depending on
the exact syntax problem.\n\nThe offending line appears to be:\n\n#
- debug: var=server_cpu_dict\n ^ here\n\nThere appears to be both
'k=v' shorthand syntax and YAML in this task. Only one syntax may be
used.\n"}

2020-09-13 17:39:56,507+ ERROR ansible failed {
     "ansible_host": "localhost",
     "ansible_playbook":
"/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
     "ansible_result": {
         "_ansible_no_log": false,
         "msg": "The task includes an option with an undefined
variable. The error was: 'dict object' has no attribute ''
\n\nThe error appears to be in

'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_targ
et_hosted_engine_vm.yml': line 105, column 16, but may\nbe elsewhere
in the file depending on the exact syntax problem.\
n\nThe offending line appears to be:\n\n#  - debug:
var=server_cpu_dict\n               ^ here\n\nThere appears to be bo
th 'k=v' shorthand syntax and YAML in this task. Only one syntax may
be used.\n"
     },
     "ansible_task": "Convert CPU model name",
     "ansible_type": "task",
     "status": "FAILED",
     "task_duration": 1
}

I can see the host engine is created and running locally on the node.
I can event SSH into the HostedEngineLocal instance.

[root@ovirt-node01]# virsh --readonly list
  Id   Name                State
---
  1    HostedEngineLocal   running


Looking at the "Convert CPU model name" task:

https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml



set_fact:
       cluster_cpu_model: "{{ server_cpu_dict[cluster_cpu.type] }}"

server_cpu_dict is good, I can find that in the logs, cluster_cpu is
undefined.
But this is normal correct? The Cluster CPU type is "undefined"
until the first host is added to the cluster.
The error makes it seems that server_cpu_dict and not
cluster_cpu.type is the problem.
I'm not sure this is really the problem, but that is the only 
undefined variable I can find.


Any advice or recommendation is appreciated
-Thanks in advance
___
Users mailing list -- users@ovirt.org 
To unsubscribe send an email to users-le...@ovirt.org

Privacy Statement: https://www.ovirt.org/privacy-policy.html

[ovirt-users] Re: New oVirt Install - Host Engine Deployment Fails

2020-09-14 Thread Edward Berger
What is the CPU?  I'm asking because you said it was old servers, and at
some point oVirt started filtering out old CPU types which were no longer
supported under windows.   There was also the case where if a certain bios
option wasn't enabled (AES?) a westmere(supported) reported as an older
model(unsupported).


On Mon, Sep 14, 2020 at 12:20 PM  wrote:

> I am attempting a new oVirt install. I have two nodes installed (with
> oVirt Node 4.4). I have NFS shared storage for the hosted engine.
> Both nodes are Dell quad core Xeon CPUs with 32GB of RAM. Both have been
> hypervisors before, XCP-NG and Proxmox. However I'm very interested to
> learn oVirt now.
>
> The hosted engine deployment (through cockpit) fails during the "Finish"
> stage.
> I do see the initial files created on the NFS storage.
>
> [ INFO ] TASK [ovirt.hosted_engine_setup : Convert CPU model name]
> [ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an
> option with an undefined variable. The error was: 'dict object' has no
> attribute ''\n\nThe error appears to be in
> '/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml':
> line 105, column 16, but may\nbe elsewhere in the file depending on the
> exact syntax problem.\n\nThe offending line appears to be:\n\n# - debug:
> var=server_cpu_dict\n ^ here\n\nThere appears to be both 'k=v' shorthand
> syntax and YAML in this task. Only one syntax may be used.\n"}
>
> 2020-09-13 17:39:56,507+ ERROR ansible failed {
> "ansible_host": "localhost",
> "ansible_playbook":
> "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
> "ansible_result": {
> "_ansible_no_log": false,
> "msg": "The task includes an option with an undefined variable.
> The error was: 'dict object' has no attribute ''
> \n\nThe error appears to be in
> '/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_targ
> et_hosted_engine_vm.yml': line 105, column 16, but may\nbe elsewhere in
> the file depending on the exact syntax problem.\
> n\nThe offending line appears to be:\n\n#  - debug: var=server_cpu_dict\n
>  ^ here\n\nThere appears to be bo
> th 'k=v' shorthand syntax and YAML in this task. Only one syntax may be
> used.\n"
> },
> "ansible_task": "Convert CPU model name",
> "ansible_type": "task",
> "status": "FAILED",
> "task_duration": 1
> }
>
> I can see the host engine is created and running locally on the node.
> I can event SSH into the HostedEngineLocal instance.
>
> [root@ovirt-node01]# virsh --readonly list
>  Id   NameState
> ---
>  1HostedEngineLocal   running
>
>
> Looking at the "Convert CPU model name" task:
>
> https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml
>
> set_fact:
>   cluster_cpu_model: "{{ server_cpu_dict[cluster_cpu.type] }}"
>
> server_cpu_dict is good, I can find that in the logs, cluster_cpu is
> undefined.
> But this is normal correct? The Cluster CPU type is "undefined" until the
> first host is added to the cluster.
> The error makes it seems that server_cpu_dict and not cluster_cpu.type is
> the problem.
> I'm not sure this is really the problem, but that is the only  undefined
> variable I can find.
>
> Any advice or recommendation is appreciated
> -Thanks in advance
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Newbie question on network

2020-09-14 Thread Edward Berger
For others having issues with VM network routing...

virbr0 is usually installed by default on centos, etc. to facilitate
containers networking via NAT.
If I'm not planning on running any containers I usually yum remove the
associated packages,
and a reboot to make sure networking is OK.

Sometimes the ipv4 autoconfig stuff also gets in the way of desired routing,
and I add NOZEROCONF=yes to /etc/sysconfig/network to disable it.

When I've seen the !X in traceroutes it has usually been a firewall config
issue on the remote host I'm tracing to.



On Mon, Sep 14, 2020 at 11:50 AM Valerio Luccio 
wrote:

> Thanks to all that gave me hints/suggestions, especially to Edward that
> pointed me to the right setup page.
>
> I was able to set up correctly my internal switch network, but I'm still
> having issues with the virtual network that connects to the university
> backbone. The route on the VM is setup just like on the host and the
> engine. I also tested traceroute and found out something puzzling:
>
>1. traceroute on the host and the engine for any node on the
>university network returns the answer in milliseconds.
>2. traceroute on the VM for a node on my internal switch returns the
>answer in milliseconds with the flag.
>3. traceroute on the VM for the host or the engine returns answer
>after more than 10 seconds with the flag !X
>4. traceroute on the VM for other hosts on the network returns the
>flag !H (host not reachable).
>
> According to the traceroute man pages the !X flag indicates "communication
> administratively prohibited". I temporarily turned off the firewall on the
> host and the engine and that removed the !X flag from 3, but did not speed
> it up and did not solve 4.
>
> Anyone have any clue what I should try next ?
>
> Thanks in advance,
> --
> As a result of Coronavirus-related precautions, NYU and the Center for
> Brain Imaging operations will be managed remotely until further notice.
> All telephone calls and e-mail correspondence are being monitored remotely
> during our normal business hours of 9am-5pm, Monday through Friday.
>
> For MRI scanner-related emergency, please contact: Keith Sanzenbach at
> keith.sanzenb...@nyu.edu and/or Pablo Velasco at pablo.vela...@nyu.edu
> For computer/hardware/software emergency, please contact: Valerio Luccio
> at valerio.luc...@nyu.edu
> For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at
> chr...@nyu.edu
> For CBI-related administrative emergency, please contact: Jennifer Mangan
> at jennifer.man...@nyu.edu
>
> Valerio Luccio (212) 998-8736
> Center for Brain Imaging 4 Washington Place, Room 158
> New York University New York, NY 10003
>
> "In an open world, who needs windows or gates ?"
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] New oVirt Install - Host Engine Deployment Fails

2020-09-14 Thread mblanton
I am attempting a new oVirt install. I have two nodes installed (with oVirt 
Node 4.4). I have NFS shared storage for the hosted engine.
Both nodes are Dell quad core Xeon CPUs with 32GB of RAM. Both have been 
hypervisors before, XCP-NG and Proxmox. However I'm very interested to learn 
oVirt now.

The hosted engine deployment (through cockpit) fails during the "Finish" stage.
I do see the initial files created on the NFS storage.

[ INFO ] TASK [ovirt.hosted_engine_setup : Convert CPU model name]
[ ERROR ] fatal: [localhost]: FAILED! => {"msg": "The task includes an option 
with an undefined variable. The error was: 'dict object' has no attribute 
''\n\nThe error appears to be in 
'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml':
 line 105, column 16, but may\nbe elsewhere in the file depending on the exact 
syntax problem.\n\nThe offending line appears to be:\n\n# - debug: 
var=server_cpu_dict\n ^ here\n\nThere appears to be both 'k=v' shorthand syntax 
and YAML in this task. Only one syntax may be used.\n"}

2020-09-13 17:39:56,507+ ERROR ansible failed {
"ansible_host": "localhost",
"ansible_playbook": 
"/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml",
"ansible_result": {
"_ansible_no_log": false,
"msg": "The task includes an option with an undefined variable. The 
error was: 'dict object' has no attribute ''
\n\nThe error appears to be in 
'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/create_target_vm/01_create_targ
et_hosted_engine_vm.yml': line 105, column 16, but may\nbe elsewhere in the 
file depending on the exact syntax problem.\
n\nThe offending line appears to be:\n\n#  - debug: var=server_cpu_dict\n   
^ here\n\nThere appears to be bo
th 'k=v' shorthand syntax and YAML in this task. Only one syntax may be used.\n"
},
"ansible_task": "Convert CPU model name",
"ansible_type": "task",
"status": "FAILED",
"task_duration": 1
}

I can see the host engine is created and running locally on the node.
I can event SSH into the HostedEngineLocal instance.

[root@ovirt-node01]# virsh --readonly list
 Id   NameState
---
 1HostedEngineLocal   running


Looking at the "Convert CPU model name" task:
https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml

set_fact:
  cluster_cpu_model: "{{ server_cpu_dict[cluster_cpu.type] }}"

server_cpu_dict is good, I can find that in the logs, cluster_cpu is undefined. 
But this is normal correct? The Cluster CPU type is "undefined" until the first 
host is added to the cluster.
The error makes it seems that server_cpu_dict and not cluster_cpu.type is the 
problem.
I'm not sure this is really the problem, but that is the only  undefined 
variable I can find.

Any advice or recommendation is appreciated
-Thanks in advance
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Newbie question on network

2020-09-14 Thread Valerio Luccio
Thanks to all that gave me hints/suggestions, especially to Edward that 
pointed me to the right setup page.


I was able to set up correctly my internal switch network, but I'm still 
having issues with the virtual network that connects to the university 
backbone. The route on the VM is setup just like on the host and the 
engine. I also tested traceroute and found out something puzzling:


1. traceroute on the host and the engine for any node on the university
   network returns the answer in milliseconds.
2. traceroute on the VM for a node on my internal switch returns the
   answer in milliseconds with the flag.
3. traceroute on the VM for the host or the engine returns answer after
   more than 10 seconds with the flag !X
4. traceroute on the VM for other hosts on the network returns the flag
   !H (host not reachable).

According to the traceroute man pages the !X flag indicates 
"communication administratively prohibited". I temporarily turned off 
the firewall on the host and the engine and that removed the !X flag 
from 3, but did not speed it up and did not solve 4.


Anyone have any clue what I should try next ?

Thanks in advance,

--
As a result of Coronavirus-related precautions, NYU and the Center for 
Brain Imaging operations will be managed remotely until further notice.
All telephone calls and e-mail correspondence are being monitored 
remotely during our normal business hours of 9am-5pm, Monday through 
Friday.
For MRI scanner-related emergency, please contact: Keith Sanzenbach at 
keith.sanzenb...@nyu.edu and/or Pablo Velasco at pablo.vela...@nyu.edu
For computer/hardware/software emergency, please contact: Valerio Luccio 
at valerio.luc...@nyu.edu
For TMS/EEG-related emergency, please contact: Chrysa Papadaniil at 
chr...@nyu.edu
For CBI-related administrative emergency, please contact: Jennifer 
Mangan at jennifer.man...@nyu.edu


Valerio Luccio  (212) 998-8736
Center for Brain Imaging4 Washington Place, Room 158
New York University New York, NY 10003

   "In an open world, who needs windows or gates ?"

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Testing ovirt 4.4.1 Nested KVM on Skylake-client (core i5) does not work

2020-09-14 Thread Strahil Nikolov via Users
Why don't you use 'host-passthrough' cpu type ?

Best Regards,
Strahil Nikolov






В неделя, 13 септември 2020 г., 20:31:44 Гринуич+3, wodel youchi 
 написа: 





Hi,

I've been using my core i5 6500 (skylake-client) for some time now to test 
oVirt on my machine.
However this is no longer the case.

I am using Fedora 32 as my base system with nested-kvm enabled, when I try to 
install oVirt 4.4 as HCI single node, I get an error in the last phase which 
consists of copying the VM-Manager to the engine volume and boot it.
It is the boot that causes the problem, I get an error about the CPU :
the CPU is incompatible with host CPU: Host CPU does not provide required 
features: mpx

This is the CPU part from virsh domcapabilities on my physical machine
          Skylake-Client-IBRS  Intel  
            
                              
qemu64  qemu32  
phenom  pentium3 
 pentium2  pentium  n270  kvm64  kvm32  coreduo  core2duo  
athlon  Westmere-IBRS  Westmere    
  Skylake-Server-IBRS  Skylake-Server  Skylake-Client-IBRS  Skylake-Client  SandyBridge-IBRS  SandyBridge  Penryn  
Opteron_G5  Opteron_G4  Opteron_G3  
Opteron_G2  Opteron_G1  Nehalem-IBRS   
   Nehalem  IvyBridge-IBRS  IvyBridge  
Icelake-Server  Icelake-Client  Haswell-noTSX-IBRS  Haswell-noTSX  Haswell-IBRS  Haswell  
EPYC-IBPB  EPYC  
Dhyana  Conroe  
Cascadelake-Server  Broadwell-noTSX-IBRS  Broadwell-noTSX  Broadwell-IBRS  Broadwell  
486      

Here is the lscpu of my physical machine
# lscpu Architecture:    x86_64 CPU op-mode(s): 
 32-bit, 64-bit Byte Order:  Little Endian Address sizes:   
39 bits physical, 48 bits virtual CPU(s):   
   4 On-line CPU(s) list: 0-3 Thread(s) per core:  1 
Core(s) per socket:  4 Socket(s):   1 NUMA 
node(s):    1 Vendor ID:   GenuineIntel CPU 
family:  6 Model:   94 Model name:  
Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Stepping:   
 3 CPU MHz: 954.588 CPU max MHz:    
 3600. CPU min MHz: 800. BogoMIPS:  
  6399.96 Virtualization:  VT-x L1d cache:  
 128 KiB L1i cache:   128 KiB L2 cache: 
   1 MiB L3 cache:    6 MiB NUMA node0 CPU(s):  
 0-3 Vulnerability Itlb multihit: KVM: Mitigation: Split huge 
pages Vulnerability L1tf:  Mitigation; PTE Inversion; VMX 
conditional cache flushes, SMT disabled Vulnerability Mds:   
Mitigation; Clear CPU buffers; SMT disabled Vulnerability Meltdown:  
Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl and seccomp Vulnerability Spectre v1:    
Mitigation; usercopy/swapgs barriers and __user pointer sanitization 
Vulnerability Spectre v2:    Mitigation; Full generic retpoline, IBPB 
conditional, IBRS_FW, STIBP disabled, RSB filling Vulnerability Srbds:  
   Vulnerable: No microcode Vulnerability Tsx async abort:   Mitigation; Clear 
CPU buffers; SMT disabled Flags:   fpu vme de pse tsc 
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr 
sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan  
   t_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg 
fma cx16  xtpr pdcm pcid sse4_1 sse4_2 x2apic 
movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 
3dnowprefetch cpuid_fault invpcid_single pti ssbd   
   ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap 
clflushopt in tel_pt xsaveopt xsavec xgetbv1 
xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear 
flush_l1d



Here is the CPU part from virsh dumpxml of my ovirt hypervisor
    Skylake-Client-IBRS    Intel    
     
               
                        
                      

Here is the lcpu of my ovirt hypervisor
[root@node1 ~]# lscpu  Architecture :  x86_64 Mode(s) 
opératoire(s) des processeurs : 32-bit, 64-bit Boutisme :   
   Little Endian Processeur(s) : 4 Liste de 
processeur(s) en ligne :   0-3 Thread(s) par cœur :   

[ovirt-users] Re: Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes

2020-09-14 Thread Thomas Hoberg

Am 14.09.2020 um 15:23 schrieb tho...@hoberg.net:

Sorry two times now:
1. It is a duplicate post, because the delay for posts to show up on the 
web site is ever longer (as I am responding via mail, the first post is 
still not shown...)


2. It seems to have been a wild goose chase: The gluster daemon from 
group B did eventually regain quorum (or returned to its senses) some 
time later... the error message is pretty scary and IMHO somewhat 
misleading, but...


With oVirt one must learn to be patient, evidently all that self-healing 
built-in depends on state machines turning their cogs and gears, not on 
admins pushing for things to happen... sorry!

Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly 
because it's part of oVirt HCI, so don't just send me away, please!

Problem: GlusterD refusing to start due to quorum issues for volumes where it 
isn’t contributing any brick

(I've had this before on a different farm, but there it was transitory. Now I 
have it in a more observable manner, that's why I open a new topic)

In a test farm with recycled servers, I started running Gluster via oVirt 
3node-HCI, because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with 
'engine', 'vmstore' and 'data' volumes, one brick on each node.

I then got another five machines with hardware specs that were rather different 
to group A, so I set those up as group B to mostly act as compute nodes, but 
also to provide extra storage, mostly to be used externally as GlusterFS 
shares. It took a bit of fiddling with Ansible but I got these 5 nodes to serve 
two more Gluster volumes 'tape' and 'scratch' using dispersed bricks (4 
disperse:1 redundancy), RAID5 in my mind.

The two groups are in one Gluster, not because they serve bricks to the same 
volumes, but because oVirt doesn't like nodes to be in different Glusters (or 
actually, to already be in a Gluster when you add them as host node). But the 
two groups provide bricks to distinct volumes, there is no overlap.

After setup things have been running fine for weeks, but now I needed to 
restart a machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none 
from original oVirt ‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster 
daemon refuses to start, citing a loss of quorum for these three volumes, even 
if it has no bricks in them… which makes no sense to me.

I am afraid the source of the issue is concept issues: I clearly don't really 
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), 
are not as related as one might assume from the marketing materials on the 
oVirt home-page.

But most of all I'd like to know: How do I fix this now?

I can't heal 'tape' and 'scratch', which are growing ever more apart while the 
glusterd on this machine in group B refuses to come online for lack of a quorum 
on volumes where it is not contributing bricks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:



<>___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users]oVirt HCI issue: GlusterD refusing to start due to quorum issues for volumes where it isn’t contributing any brick

2020-09-14 Thread thomas
Sorry if it's a duplicate: I go an error on post... And yes I posted it on the 
Gluster slack first, but I am using Gluster only because the marketing on oVirt 
HCI worked so well...

I got 3 recycled servers for an oVirt test environment first and set those up 
as 3-node HCI using defaults mostly, 2 replica + 1 arbiter, 'engine', 'vmstore' 
and 'data' volumes with single bricks for each node. I call these group A.

Then I got another set of five machines, let's call them group B, with somewhat 
different hardware characteristics than group A, but nicely similar between 
themselves. I wanted to add these to the farm as compute nodes but also use 
their storage as general GlusterFS storage for a wider use.

Group B machines were added as hosts and set up to run hosted-engine, but they 
do not contribute bricks to the normal oVirt volumes 'engine', 'vmstore' or 
'data'. With some Ansible trickery I managed to set up two dispersed volumes (4 
data: 1 redundancy) on group B 'scratch' and 'tape', mostly for external 
GlusterFS use. oVirt picked them up automagically, so I guess they could also 
be used with VMs.

I expect to get more machines and adding them one-by-one to dispersed volumes 
with a fine balance between capacity and redundancy made me so enthusiastic 
about oVirt HCI in the first place...

After some weeks of fine operation I had to restart a machine from group B for 
maintenance. When it came back up, GlusterD refuses to come online, because it 
doesn't have "quorum for volumes 'engine', 'vmstore' and 'data'"

It's a small surprise it doesn't *have* quorum, what's a bigger surprise is 
that it *asks* for quorum in a volume where it's not contributing any bricks. 
What's worse is that it then refuses to start serving its bricks for 'scratch' 
and 'tape', which are now growing apart without any chance of healing.

How do I fix this?

Is this a bug (my interpretation) or do I fundamentlly misunderstand how 
Gluster as a hyper scale out file system is supposed to work with potentially 
thousands of hosts contributing each dozens of bricks to each of hundreds of 
volumes in a single name space?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes

2020-09-14 Thread thomas
Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly 
because it's part of oVirt HCI, so don't just send me away, please!

Problem: GlusterD refusing to start due to quorum issues for volumes where it 
isn’t contributing any brick

(I've had this before on a different farm, but there it was transitory. Now I 
have it in a more observable manner, that's why I open a new topic)

In a test farm with recycled servers, I started running Gluster via oVirt 
3node-HCI, because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with 
'engine', 'vmstore' and 'data' volumes, one brick on each node.

I then got another five machines with hardware specs that were rather different 
to group A, so I set those up as group B to mostly act as compute nodes, but 
also to provide extra storage, mostly to be used externally as GlusterFS 
shares. It took a bit of fiddling with Ansible but I got these 5 nodes to serve 
two more Gluster volumes 'tape' and 'scratch' using dispersed bricks (4 
disperse:1 redundancy), RAID5 in my mind.

The two groups are in one Gluster, not because they serve bricks to the same 
volumes, but because oVirt doesn't like nodes to be in different Glusters (or 
actually, to already be in a Gluster when you add them as host node). But the 
two groups provide bricks to distinct volumes, there is no overlap.

After setup things have been running fine for weeks, but now I needed to 
restart a machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none 
from original oVirt ‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster 
daemon refuses to start, citing a loss of quorum for these three volumes, even 
if it has no bricks in them… which makes no sense to me.

I am afraid the source of the issue is concept issues: I clearly don't really 
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), 
are not as related as one might assume from the marketing materials on the 
oVirt home-page.

But most of all I'd like to know: How do I fix this now?

I can't heal 'tape' and 'scratch', which are growing ever more apart while the 
glusterd on this machine in group B refuses to come online for lack of a quorum 
on volumes where it is not contributing bricks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Probably dns problem: Internal JSON-RPC error

2020-09-14 Thread jan.kleefeld
Hi, thanks for your support so far.
I disabled the ipv6 auto config and have chosen “ipv6 link local only” in nmtui 
network manager. I discovered that there is only one ipv4 dns entry in 
/etc/resolv.conf then (because the ipv6 dhcp-server is no longer asked). This 
setting solves my issue and everything worked just fine.

 

Here is your requested supervdsm snippet:

https://jan-home.de/public/ovirt/supervdsm.log

 

Greetings from Germany,

Jan

 

Von: Dominik Holler  
Gesendet: Montag, 14. September 2020 10:23
An: jan.kleef...@jan-home.de
Cc: users ; Ales Musil ; Yedidyah Bar David 

Betreff: Re: [ovirt-users] Probably dns problem: Internal JSON-RPC error

 

 

 

On Sun, Sep 13, 2020 at 11:35 AM Yedidyah Bar David mailto:d...@redhat.com> > wrote:

On Sun, Sep 13, 2020 at 11:27 AM mailto:jan.kleef...@jan-home.de> > wrote:
>
> I have a clean installed CentOS 8.2 2004 on my server. The self hosted engine 
> deploy (ovirt-4.4) aborts with the following message:
>
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host 
> has been set in non_operational status, deployment errors: code 505: Host 
> red.colors.ovirt.local installation failed. Failed to configure management 
> network on the host., code 1120: Failed to configure management network on 
> host red.colors.ovirt.local due to setup networks failure., code 9000: Failed 
> to verify Power Management configuration for Host red.colors.ovirt.local., 
> code 10802: VDSM red.colors.ovirt.local command HostSetupNetworksVDS failed: 
> Internal JSON-RPC error:
> {'reason': '
> desired
> ===
> ---
> dns-resolver:
>  search: []
>  server:
>  - 192.168.2.150
>  - fe80::1%eno1
>
>  current
>  ===
>  ---
>  dns-resolver:
>  search: []
>  server:
>  - 192.168.2.150
>
>  difference
>  ==
>  --- desired
>  +++ current
>  @@ -3,4 +3,3 @@
>  search: []
>  server:
>  - 192.168.2.150
>  - - fe80::1%eno1
>
>  '}, fix accordingly and re-deploy."}
>
> ~# cat /etc/resolv.conf
>  # Generated by NetworkManager
>  search colors.ovirt.local
>  nameserver 192.168.2.150
>  nameserver fe80::1%eno1
>
> I am confused, because the probalby missing line is present. Is there maybe 
> another config file, where this last line could be missing?
> Maybe I can force somehow the installer, to reload the resolv conf, so it can 
> fetch ne new line?

Can you please check/share other relevant logs (and perhaps other
parts of this one)? Perhaps upload somewhere and share a link, or open
a bug in bugzilla and attach there.

Adding Dominik.

Perhaps we fail to parse the line 'nameserver fe80::1%eno1'?

 

Yes, IPv6 link local nameserver might not work for static IP addresses, but 
might work for dynamic.

Also mixing IPv4 and IPv6 nameservers might confuse lower layers.

 

Can you share at least the line containing

"call setupNetworks with "

and

"desired state"

of supervdsm.log?

 

 

 

Thanks and best regards,
-- 
Didi

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] What is the purpose of memory deflation in oVirt memory ballooning?

2020-09-14 Thread pub . virtualization
Hi, guys

Why does momd(ballooning manager in oVirt) explicitly deflate the balloon when 
host gets plenty of memory?

as far as I know, momd is supporting memory ballooning with setMemory API to 
inflate/deflate the balloon in guest
and I've just checked the memory change in the guest after inflating the 
balloon.

as expected, memory(total, free, available) in the guest was reduced just after 
inflating the balloon, but it was "automatically" restored to its initial 
memory after a few seconds.

So, here I'm wondering why deflation is additionally required even though it 
can be restored automatically just after seconds.

Thanks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Testing ovirt 4.4.1 Nested KVM on Skylake-client (core i5) does not work

2020-09-14 Thread Nir Soffer
On Mon, Sep 14, 2020 at 8:42 AM Yedidyah Bar David  wrote:
>
> On Mon, Sep 14, 2020 at 12:28 AM wodel youchi  wrote:
> >
> > Hi,
> >
> > Thanks for the help, I think I found the solution using this link : 
> > https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/
> >
> > When executing : virsh dumpxml on my ovirt hypervisor I saw that the mpx 
> > flag was disabled, so I edited the XML file of the hypervisor VM and I did 
> > this : add the already enabled features and enable mpx with them. I 
> > stopped/started my hyerpvisor VM and voila, le nested VM-Manager has booted 
> > successfully.
> >
> >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >   
> Thanks for the report!
>
> Would you like to open a bug about this?
>
> A possible fix is probably to pass relevant options to the
> virt-install command in ovirt-ansible-hosted-engine-setup.
> Either always - no idea what the implications are - or
> optionally, or even allow the user to pass arbitrary options.

I don't think we need to do such change on our side. This seems like a
hard to reproduce libvirt bug.

The strange thing is that after playing with the XML generated by
virt-manager, using

[x] Copy host CPU configuration

Creating this XML:

  
Skylake-Client-IBRS
Intel



















  

Or using this XML in virt-manager:

  

Both work with these cluster CPU Type:

- Secure Intel Skylake Client Family
- Intel Skylake Client Family

I think the best place to discuss this is libvirt-users mailing list:
https://www.redhat.com/mailman/listinfo/libvirt-users

Nir

> Thanks and best regards,
>
> >
> >
> > Regards.
> >
> > Le dim. 13 sept. 2020 à 19:47, Nir Soffer  a écrit :
> >>
> >> On Sun, Sep 13, 2020 at 8:32 PM wodel youchi  
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I've been using my core i5 6500 (skylake-client) for some time now to 
> >> > test oVirt on my machine.
> >> > However this is no longer the case.
> >> >
> >> > I am using Fedora 32 as my base system with nested-kvm enabled, when I 
> >> > try to install oVirt 4.4 as HCI single node, I get an error in the last 
> >> > phase which consists of copying the VM-Manager to the engine volume and 
> >> > boot it.
> >> > It is the boot that causes the problem, I get an error about the CPU :
> >> > the CPU is incompatible with host CPU: Host CPU does not provide 
> >> > required features: mpx
> >> >
> >> > This is the CPU part from virsh domcapabilities on my physical machine
> >> > 
> >> >
> >> >
> >> >  Skylake-Client-IBRS
> >> >  Intel
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >  
> >> >
> >> >
> >> >  qemu64
> >> >  qemu32
> >> >  phenom
> >> >  pentium3
> >> >  pentium2
> >> >  pentium
> >> >  n270
> >> >  kvm64
> >> >  kvm32
> >> >  coreduo
> >> >  core2duo
> >> >  athlon
> >> >  Westmere-IBRS
> >> >  Westmere
> >> >  Skylake-Server-IBRS
> >> >  Skylake-Server
> >> >  Skylake-Client-IBRS
> >> >  Skylake-Client
> >> >  SandyBridge-IBRS
> >> >  SandyBridge
> >> >  Penryn
> >> >  Opteron_G5
> >> >  Opteron_G4
> >> >  Opteron_G3
> >> >  Opteron_G2
> >> >  Opteron_G1
> >> >  Nehalem-IBRS
> >> >  Nehalem
> >> >  IvyBridge-IBRS
> >> >  IvyBridge
> >> >  Icelake-Server
> >> >  Icelake-Client
> >> >  Haswell-noTSX-IBRS
> >> >  Haswell-noTSX
> >> >  Haswell-IBRS
> >> >  Haswell
> >> >  EPYC-IBPB
> >> >  EPYC
> >> >  Dhyana
> >> >  Conroe
> >> >  Cascadelake-Server
> >> >  Broadwell-noTSX-IBRS
> >> >  Broadwell-noTSX
> >> >  Broadwell-IBRS
> >> >  Broadwell
> >> >  486
> >> >
> >> >  
> >> >
> >> > Here is the lscpu of my physical machine
> >> > # lscpu
> >> > Architecture:x86_64
> >> > CPU op-mode(s):  32-bit, 64-bit
> >> > Byte Order:  Little Endian
> >> > Address sizes:   39 bits physical, 48 bits virtual
> >> > CPU(s):  4
> >> > On-line CPU(s) list: 0-3
> >> > Thread(s) per core:  1
> >> > Core(s) per socket:  4
> >> > Socket(s):   1
> >> > NUMA node(s):1
> >> > Vendor ID:   GenuineIntel
> >> > CPU family:  6
> >> > Model:   94
> >> > Model name:  Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
> >> > Stepping:3
> >> > CPU MHz: 954.588
> >> > CPU max MHz: 3600.
> >> 

[ovirt-users] Re: Enable a cluster node to run the hosted engine

2020-09-14 Thread Yedidyah Bar David
On Mon, Sep 14, 2020 at 11:18 AM  wrote:
>
> Hi there,
>
> currently my team is evaluating oVirt and we're also testing several fail 
> scenarios, backup and so on.
> One scenario was:
> - hyperconverged oVirt cluster with 3 nodes
> - self-hosted engine
> - simulate the break down of one of the nodes by power off
> - to replace it make a clean install of a new node and reintegrate it in the 
> cluster

How exactly did you do that?

>
> Actually everything worked out fine. The new installed node and related 
> bricks (vmstore, data, engine) were added to the existing Gluster storage and 
> it was added to the oVirt cluster (as host).
>
> But there's one remaining problem: The new host doesn't have the grey crown, 
> which means it's unable to run the hosted engine. How can I achieve that?
> I also found out that the ovirt-ha-agent and ovirt-ha-broker isn't 
> started/enabled on that node. Reason is that the 
> /etc/ovirt-hosted-engine/hosted-engine.conf doesn't exist. I guess this is 
> not only a problem concerning the hosted engine, but also for HA VM's.

When you add a host to the engine, one of the options in the dialog is
to deploy it as a hosted-engine.
If you don't, you won't get this crown, nor these services, nor its
status in 'hosted-engine --vm-status'.

If you didn't, perhaps try to move to maintenance and reinstall,
adding this option.

If you did choose it, that's perhaps a bug - please check/share
relevant logs (e.g. in /var/log/ovirt-engine, including host-deploy/).

Best regards,

>
> Thank you for any advice and greetings,
> Marcus
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:



-- 
Didi
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Probably dns problem: Internal JSON-RPC error

2020-09-14 Thread Dominik Holler
On Sun, Sep 13, 2020 at 11:35 AM Yedidyah Bar David  wrote:

> On Sun, Sep 13, 2020 at 11:27 AM  wrote:
> >
> > I have a clean installed CentOS 8.2 2004 on my server. The self hosted
> engine deploy (ovirt-4.4) aborts with the following message:
> >
> > [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> host has been set in non_operational status, deployment errors: code 505:
> Host red.colors.ovirt.local installation failed. Failed to configure
> management network on the host., code 1120: Failed to configure management
> network on host red.colors.ovirt.local due to setup networks failure., code
> 9000: Failed to verify Power Management configuration for Host
> red.colors.ovirt.local., code 10802: VDSM red.colors.ovirt.local command
> HostSetupNetworksVDS failed: Internal JSON-RPC error:
> > {'reason': '
> > desired
> > ===
> > ---
> > dns-resolver:
> >  search: []
> >  server:
> >  - 192.168.2.150
> >  - fe80::1%eno1
> >
> >  current
> >  ===
> >  ---
> >  dns-resolver:
> >  search: []
> >  server:
> >  - 192.168.2.150
> >
> >  difference
> >  ==
> >  --- desired
> >  +++ current
> >  @@ -3,4 +3,3 @@
> >  search: []
> >  server:
> >  - 192.168.2.150
> >  - - fe80::1%eno1
> >
> >  '}, fix accordingly and re-deploy."}
> >
> > ~# cat /etc/resolv.conf
> >  # Generated by NetworkManager
> >  search colors.ovirt.local
> >  nameserver 192.168.2.150
> >  nameserver fe80::1%eno1
> >
> > I am confused, because the probalby missing line is present. Is there
> maybe another config file, where this last line could be missing?
> > Maybe I can force somehow the installer, to reload the resolv conf, so
> it can fetch ne new line?
>
> Can you please check/share other relevant logs (and perhaps other
> parts of this one)? Perhaps upload somewhere and share a link, or open
> a bug in bugzilla and attach there.
>
> Adding Dominik.
>
> Perhaps we fail to parse the line 'nameserver fe80::1%eno1'?
>
>
Yes, IPv6 link local nameserver might not work for static IP addresses, but
might work for dynamic.
Also mixing IPv4 and IPv6 nameservers might confuse lower layers.

Can you share at least the line containing
"call setupNetworks with "
and
"desired state"
of supervdsm.log?




> Thanks and best regards,
> --
> Didi
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Enable a cluster node to run the hosted engine

2020-09-14 Thread rap
Hi there,

currently my team is evaluating oVirt and we're also testing several fail 
scenarios, backup and so on.
One scenario was:
- hyperconverged oVirt cluster with 3 nodes
- self-hosted engine
- simulate the break down of one of the nodes by power off
- to replace it make a clean install of a new node and reintegrate it in the 
cluster

Actually everything worked out fine. The new installed node and related bricks 
(vmstore, data, engine) were added to the existing Gluster storage and it was 
added to the oVirt cluster (as host).

But there's one remaining problem: The new host doesn't have the grey crown, 
which means it's unable to run the hosted engine. How can I achieve that?
I also found out that the ovirt-ha-agent and ovirt-ha-broker isn't 
started/enabled on that node. Reason is that the 
/etc/ovirt-hosted-engine/hosted-engine.conf doesn't exist. I guess this is not 
only a problem concerning the hosted engine, but also for HA VM's.

Thank you for any advice and greetings,
Marcus
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: OVN Geneve tunnels not been established

2020-09-14 Thread Konstantinos Betsis
Hi Dominik

When these commands are used on the ovirt-engine host the output is the one
depicted in your email.
For your reference see also below:

[root@ath01-ovirt01 certs]# ovn-nbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-nbctl get-connection
ptcp:6641
[root@ath01-ovirt01 certs]# ovn-sbctl get-ssl
Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
CA Certificate: /etc/pki/ovirt-engine/ca.pem
Bootstrap: false
[root@ath01-ovirt01 certs]# ovn-sbctl get-connection
read-write role="" ptcp:6642
[root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
-rw-r-. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
-rw---. 1 root root  2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-ndb.p12
-rw-r-. 1 root hugetlbfs 1828 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
-rw---. 1 root root  2893 Jun 25 11:08
/etc/pki/ovirt-engine/keys/ovn-sdb.p12

When i try the above commands on the node hosts the following happens:
ovn-nbctl get-ssl / get-connection
ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection
failed (No such file or directory)
The above i believe is expected since no northbound connections should be
established from the host nodes.

ovn-sbctl get-ssl /get-connection
The output is stuck till i terminate it.

For the requested logs the below are found in the ovsdb-server-sb.log

2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146:
connection dropped (Protocol error)
2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188:
connection dropped (Protocol error)
2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044:
connection dropped (Protocol error)
2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148:
connection dropped (Protocol error)
2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190:
received SSL data on JSON-RPC channel
2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190:
connection dropped (Protocol error)
2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046:
connection dropped (Protocol error)
2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150:
connection dropped (Protocol error)
2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192:
connection dropped (Protocol error)
2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in last
8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in last
8 seconds (most recently, 1 seconds ago) due to excessive rate
2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048:
received SSL data on JSON-RPC channel
2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048:
connection dropped (Protocol error)
2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152:
connection dropped (Protocol error)
2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194:
connection dropped (Protocol error)
2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050:
connection dropped (Protocol error)
2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154:
connection dropped (Protocol error)
2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error
parsing stream: line 0, column 0, byte 0: invalid character U+0016
2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in last
12 seconds (most recently, 4 seconds ago) due to excessive rate
2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196:
received SSL data on JSON-RPC channel
2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196:
connection dropped (Protocol error)
2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052:
connection dropped (Protocol error)


How can we fix these SSL errors?
I thought vdsm did the certificate provisioning on the host nodes as to
communicate to the engine host node.

On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler  wrote:

>