[ovirt-users] Re: Suggestion to switch to nightly

2023-04-27 Thread Ryan Bullock
I was afraid that might be the case.

Appreciate the response.

Regards,

Ryan Bullock

On Thu, Apr 27, 2023 at 7:24 AM Sandro Bonazzola 
wrote:

>
>
> Il giorno gio 27 apr 2023 alle ore 16:16 Ryan Bullock 
> ha scritto:
>
>> Hey Sandro,
>>
>> Does this apply to oVirt Node as well?
>>
>
> oVirt Node may be a bit more complicated as switching to nightly means
> you'll have a new node every day.
> Given the development effort on oVirt Node (not) happening these days I
> would not encourage using oVirt Node.
>
>
>
>>
>>
>> Thanks,
>>
>> Ryan Bullock
>>
>> On Fri, Apr 14, 2023 at 3:04 AM Sandro Bonazzola 
>> wrote:
>>
>>> Hi,
>>>
>>> As you probably noticed there were no regular releases after oVirt 4.5.4
>>> <https://ovirt.org/release/4.5.4/> in December 2022.
>>>
>>> Despite the calls to action to the community and to the companies
>>> involved with oVirt, there have been no uptake of the leading of the oVirt
>>> project yet.
>>>
>>> The developers at Red Hat still dedicating time to the project are now
>>> facing the fact they lack the time to do formal releases despite they keep
>>> fixing platform regressions like the recent ones due to the new ansible
>>> changes. That makes a nightly snapshot setup a more stable environment than
>>> oVirt 4.5.4.
>>>
>>> For this reason, we would like to suggest the user community to enable
>>> nightly repositories for oVirt by following the procedure at:
>>> https://www.ovirt.org/develop/dev-process/install-nightly-snapshot.html
>>>
>>> This will ensure that the latest fixes for the platform regressions will
>>> be promptly available.
>>>
>>> Regards,
>>> --
>>>
>>> Sandro Bonazzola
>>>
>>> MANAGER, SOFTWARE ENGINEERING - Red Hat In-Vehicle Operating System
>>>
>>> Red Hat EMEA <https://www.redhat.com/>
>>>
>>> sbona...@redhat.com
>>> <https://www.redhat.com/>
>>>
>>> *Red Hat respects your work life balance. Therefore there is no need to
>>> answer this email out of your office hours.*
>>>
>>>
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DMCC5QCHL6ECXN674JOLABH36U2LVJLJ/
>>>
>>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING - Red Hat In-Vehicle Operating System
>
> Red Hat EMEA <https://www.redhat.com/>
>
> sbona...@redhat.com
> <https://www.redhat.com/>
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M25VHTXHSQGZC2MYWOH7GI6NTO2M2F7P/


[ovirt-users] Re: Suggestion to switch to nightly

2023-04-27 Thread Ryan Bullock
Hey Sandro,

Does this apply to oVirt Node as well?

Thanks,

Ryan Bullock

On Fri, Apr 14, 2023 at 3:04 AM Sandro Bonazzola 
wrote:

> Hi,
>
> As you probably noticed there were no regular releases after oVirt 4.5.4
> <https://ovirt.org/release/4.5.4/> in December 2022.
>
> Despite the calls to action to the community and to the companies involved
> with oVirt, there have been no uptake of the leading of the oVirt project
> yet.
>
> The developers at Red Hat still dedicating time to the project are now
> facing the fact they lack the time to do formal releases despite they keep
> fixing platform regressions like the recent ones due to the new ansible
> changes. That makes a nightly snapshot setup a more stable environment than
> oVirt 4.5.4.
>
> For this reason, we would like to suggest the user community to enable
> nightly repositories for oVirt by following the procedure at:
> https://www.ovirt.org/develop/dev-process/install-nightly-snapshot.html
>
> This will ensure that the latest fixes for the platform regressions will
> be promptly available.
>
> Regards,
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING - Red Hat In-Vehicle Operating System
>
> Red Hat EMEA <https://www.redhat.com/>
>
> sbona...@redhat.com
> <https://www.redhat.com/>
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DMCC5QCHL6ECXN674JOLABH36U2LVJLJ/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YDQQSOGJXFECCLO6ZR746275ABTCQWW5/


[ovirt-users] Re: Unable to ugprade cluster level to 4.7 for the hosted engine (only)

2022-05-19 Thread Ryan Bullock
Re-ran the timezone fix (/usr/share/ovirt-engine/dbscripts/engine-psql.sh
-c "update vm_static SET time_zone='Etc/GMT' where
vm_name='HostedEngine';") to be sure, then did a "hosted-engine
--vm-shutdown" and a "hosted-engine --vm-start" and it finally let me
update the cluster level.

Thanks!

On Thu, May 19, 2022 at 1:30 AM Strahil Nikolov 
wrote:

> Have you tried the windows style (a.k.a restart the Engine) ?
> Best Regards,
> Strahil Nikolov
>
> On Wed, May 18, 2022 at 22:57, Ryan Bullock
>  wrote:
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OYEZNZD3ZFD4TLIPGVUZWXGEVVGHUKR2/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5KN5RCOQUUGQUJZATZQO2HDLIO6UKGV7/


[ovirt-users] Re: Unable to ugprade cluster level to 4.7 for the hosted engine (only)

2022-05-18 Thread Ryan Bullock
Has anyone had any success with solving this issue? I'm running into it as
well after upgrading to 4.5. I'm unable to change any settings on the
hosted engine, with everything reporting settings as locked.

Regards,

Ryan Bullock

On Tue, May 10, 2022 at 3:00 AM lists--- via Users  wrote:

> Hello,
>
> i upgraded my engine and nodes to 4.5 a few days ago and now planning to
> upgrade the cluster level compatibility from 4.6 to 4.7. First i tried
> doing this from the cluster settings, but it fails because hosted-engine
> settings are locked. So i tried it by hand but again got the locked error,
> i found i cant change any values on the hosted engine. Changing
> compatiblity level on all other VMs worked fine and there are on 4.7 now.
>
> I read about the timezone issue in 4.4.8, so i checked the timezone of my
> hosted engine it is filled with "Standard: (GMTZ) Greenwhich Standard
> Time". To be sure, i just did a
> "/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "update vm_static SET
> time_zone='Etc/GMT' where vm_name='HostedEngine';"" and it changed the
> timezone, but settings are still locked and i am unable to change the
> compatibility level.
>
> Any idea how to solve this?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ENA2IU7N62YFMYOOQJ6NA7JSIF74ZFJ6/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OYEZNZD3ZFD4TLIPGVUZWXGEVVGHUKR2/


[ovirt-users] Re: ovirt-4.4 morrors failing

2022-02-01 Thread Ryan Bullock
I am getting update failures for ovirt-node as well because of this.

Is there any kind of work around or do I need to manually edit
ovirt-4.4-dependencies.repo on every host to point to the vault?

Thanks.

Regards,

Ryan Bullock

On Tue, Feb 1, 2022 at 5:19 AM Sandro Bonazzola  wrote:

>
>
> Il giorno mar 1 feb 2022 alle ore 13:45 Ayansh Rocks <
> shashank123rast...@gmail.com> ha scritto:
>
>> Thanks Lev, I hope it will work on Alma.
>>
>
> It passes repository closure on Alma but we lack resources to actively
> testing it there.
>
>
>
>>
>> Regards
>> Ayansh Rocks
>>
>> On Tue, Feb 1, 2022 at 4:01 PM Lev Veyde  wrote:
>>
>>> Hi,
>>>
>>> We just released a new version of the ovirt-release package that
>>> includes this temporary vault fix (4.4.10.1).
>>>
>>> Thanks in advance,
>>>
>>> On Tue, Feb 1, 2022 at 9:31 AM Sandro Bonazzola 
>>> wrote:
>>>
>>>>
>>>>
>>>> Il giorno lun 31 gen 2022 alle ore 21:17 Thomas Hoberg <
>>>> tho...@hoberg.net> ha scritto:
>>>>
>>>>> > Hi Emilio,
>>>>> >
>>>>> > Yes, looks like the patch that should fix this issue is already here:
>>>>> > https://github.com/oVirt/ovirt-release/pull/93 , but indeed it
>>>>> still hasn't
>>>>> > been reviewed and merged yet.
>>>>>
>>>>
>>>> Hi, the patch has not been merged yet because the OpsTools repo for
>>>> CentOS Stream has not been yet populated by the OpsTools SIG.
>>>> I contacted the chair of the SIG last week but he was on PTO and
>>>> returning only this week.
>>>> As a temporary solution you can redirect the repositories to the vault:
>>>> https://vault.centos.org/8.5.2111/
>>>>
>>>>
>>>>> >
>>>>> > I hope that we'll have a fixed version very soon, but meanwhile you
>>>>> can try
>>>>> > to simply apply the changes manually in your *testing* env.
>>>>>
>>>>> So I did, but I can't help wondering: how well will code tested
>>>>> against "stream" work on RHEL, Alma, Rocky, Liberty, VzLinux?
>>>>> How well will an engine evidently built on "stream" work with hosts
>>>>> based on RHEL etc.?
>>>>> Shouldn't you in fact switch the engine to RHEL etc., too?
>>>>>
>>>>>
>>>>> >
>>>>> > Thanks in advance,
>>>>> >
>>>>> > On Mon, Jan 31, 2022 at 8:05 PM Emilio Del Plato >>>> wrote:
>>>>> ___
>>>>> Users mailing list -- users@ovirt.org
>>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MZTD4JSDLSVQ7XBSRCQF4PFRPHJYCVQT/
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sandro Bonazzola
>>>>
>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>>>>
>>>> Red Hat EMEA <https://www.redhat.com/>
>>>>
>>>> sbona...@redhat.com
>>>> <https://www.redhat.com/>
>>>>
>>>> *Red Hat respects your work life balance. Therefore there is no need to
>>>> answer this email out of your office hours.*
>>>>
>>>>
>>>> ___
>>>> Users mailing list -- users@ovirt.org
>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>> oVirt Code of Conduct:
>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>> List Archives:
>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/D7KAW7Q45PJYWSHWBKBGTSNQYOUDESPQ/
>>>>
>>>
>>>
>>> --
>>>
>>> Lev Veyde
>>>
>>> Senior Software Engineer, RHCE | RHCVA | MCITP
>>>
>>> Red Hat Israel
>>>
>>> <https://www.redhat.com>
>>>
>>> l...@redhat.com | lve...@redhat.com
>>> <https://red.ht/sig>
>>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>>
>>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA <https://www.redhat.com/>
>
> sbona...@redhat.com
> <https://www.redhat.com/>
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TLQY36AZGZZTZKG2OZDJOF6CTYX4F6Z2/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSLID6S66IU5NOFGGVYMHLSKI6LWDX5T/


[ovirt-users] Re: [ovirt-announce] oVirt 4.4.6 is now generally available

2021-05-06 Thread Ryan Bullock
We upgraded from 4.4.5 to 4.4.6, but the Hosted Engine seems to stay on
CentOS Linux. Should we be converting the hosted engine to CentOS Stream?

Is it safe to follow the procedure at https://www.centos.org/centos-stream/
for this?

Thanks,

Regards,

Ryan Bullock.

On Tue, May 4, 2021 at 8:46 AM Lev Veyde  wrote:

> The oVirt project is excited to announce the general availability of oVirt
> 4.4.6 , as of May 4th, 2021.
>
> This release unleashes an altogether more powerful and flexible open
> source virtualization solution that encompasses hundreds of individual
> changes and a wide range of enhancements across the engine, storage,
> network, user interface, and analytics, as compared to oVirt 4.3.
> Important notes before you install / upgrade
>
> Please note that oVirt 4.4 only supports clusters and data centers with
> compatibility version 4.2 and above. If clusters or data centers are
> running with an older compatibility version, you need to upgrade them to at
> least 4.2 (4.3 is recommended).
>
> Please note that in RHEL 8 / CentOS 8 several devices that worked on EL7
> are no longer supported.
>
> For example, the megaraid_sas driver is removed. If you use Enterprise
> Linux 8 hosts you can try to provide the necessary drivers for the
> deprecated hardware using the DUD method (See the users’ mailing list
> thread on this at
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NDSVUZSESOXEFJNPHOXUH4HOOWRIRSB4/
> )
>
> Rebase on CentOS Stream
>
> Starting with oVirt 4.4.6 both oVirt Node and oVirt Engine Appliance are
> based on CentOS Stream.
>
> You can still install oVirt 4.4.6 on Red Hat Enterprise Linux 8.3,  CentOS
> Linux 8.3 or equivalent but in order to use cluster level 4.6 you’ll have
> to wait for 8.4 to be available.
>
> Please note that existing oVirt Nodes updating to 4.4.6 will automatically
> be based on CentOS Stream.
> Documentation
>
>-
>
>If you want to try oVirt as quickly as possible, follow the
>instructions on the Download <https://ovirt.org/download/> page.
>-
>
>For complete installation, administration, and usage instructions, see
>the oVirt Documentation <https://ovirt.org/documentation/>.
>-
>
>For upgrading from a previous version, see the oVirt Upgrade Guide
><https://ovirt.org/documentation/upgrade_guide/>.
>-
>
>For a general overview of oVirt, see About oVirt
><https://ovirt.org/community/about.html>.
>
> What’s new in oVirt 4.4.6 Release?
>
> This update is the sixth in a series of stabilization updates to the 4.4
> series.
>
> This release is available now on x86_64 architecture for:
>
>-
>
>Red Hat Enterprise Linux 8.3
>-
>
>CentOS Linux (or similar) 8.3
>-
>
>CentOS Stream 8
>
>
> This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
> for:
>
>-
>
>Red Hat Enterprise Linux 8.3
>-
>
>CentOS Linux (or similar) 8.3
>-
>
>oVirt Node NG (based on CentOS Stream 8)
>-
>
>CentOS Stream
>
>
>
> oVirt Node and Appliance have been updated, including:
>
>-
>
>oVirt 4.4.6: https://www.ovirt.org/release/4.4.6/
>-
>
>CentOS Stream 8
>-
>
>Ansible 2.9.20:
>
> https://github.com/ansible/ansible/blob/stable-2.9/changelogs/CHANGELOG-v2.9.rst#v2-9-20
>
>-
>
>Advanced Virtualization 8.4
>
>
>
> See the release notes [1] for installation instructions and a list of new
> features and bugs fixed.
>
> Notes:
>
>-
>
>oVirt Appliance is already available for CentOS Stream 8
>-
>
>oVirt Node NG is already available for CentOS Stream 8
>
>
> Additional resources:
>
>-
>
>Read more about the oVirt 4.4.6 release highlights:
>https://www.ovirt.org/release/4.4.6/
>-
>
>Get more oVirt project updates on Twitter: https://twitter.com/ovirt
>-
>
>Check out the latest project news on the oVirt blog:
>https://blogs.ovirt.org/
>
>
> [1] https://www.ovirt.org/release/4.4.6/
>
> [2] https://resources.ovirt.org/pub/ovirt-4.4/iso/
>
> --
>
> Lev Veyde
>
> Senior Software Engineer, RHCE | RHCVA | MCITP
>
> Red Hat Israel
>
> <https://www.redhat.com>
>
> l...@redhat.com | lve...@redhat.com
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> ___
> Announce mailing list -- annou...@ovirt.org
> To unsubscribe send an email to announce-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy

[ovirt-users] Re: Issues deploying 4.4 with HE on new EPYC hosts

2020-05-26 Thread Ryan Bullock
Does 8.1 enable AVIC by default?

May be related to this:

https://bugzilla.redhat.com/show_bug.cgi?id=1694170

Can try disabling avic in the kvm module on the hosts and see if that
allows them to activate.

Regards,

Ryan

On Tue, May 26, 2020 at 9:11 AM Mark R  wrote:

> Hello all,
>
> I have some EPYC servers that are not yet in production, so I wanted to go
> ahead and move them off of 4.3 (which was working) to 4.4. I flattened and
> reinstalled the hosts with CentOS 8.1 Minimal and installed all updates.
> Some very simple networking, just a bond and two iSCSI interfaces. After
> adding the oVirt 4.4 repo and installing the requirements, I run
> 'hosted-engine --deploy' and proceed through the setup. Everything looks as
> though it is going nicely and the local HE starts and runs perfectly. After
> copying the HE disks out to storage, the system tries to start it there but
> is using a different CPU definition and it's impossible to start it. At
> this point I'm stuck but hoping someone knows the fix, because this is as
> vanilla a deployment as I could attempt and it appears EPYC CPUs are a
> no-go right now with 4.4.
>
> When the HostedEngineLocal VM is running, the CPU definition is:
>   
> EPYC-IBPB
> AMD
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
>
> Once the HostedEngine VM is defined and trying to start, the CPU
> definition is simply:
>
>   
> EPYC
> 
> 
> 
> 
>   
> 
>   
>
> On attempts to start it, the host is logging this error:  "CPU is
> incompatible with host CPU: Host CPU does not provide required features:
> virt-ssbd".
>
> So, the HostedEngineLocal VM works because it has a requirement set for
> 'amd-ssbd' instead of 'virt-ssbd', and a VM requiring 'virt-ssbd' can't run
> on EPYC CPUs with CentOS 8.1.  As mentioned, the HostedEngine ran fine on
> oVirt 4.3 with CentOS 7.8, and on 4.3 the cpu definition also required
> 'virt-ssbd', so I can only imagine that perhaps this is due to the more
> recent 4.x kernel that I now need HE to require 'amd-ssbd' instead?
>
> Any clues to help with this? I can completely wipe/reconfigure the hosts
> as needed so I'm willing to try whatever so that I can move forward with a
> 4.4 deployment.
>
> Thanks!
> Mark
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KZHDCDE6JYADDMFSZD6AXYBP6SPV4TGA/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JLN66DAVMAPOH53WEU4EE5UBMPKBZGQI/


[ovirt-users] Re: AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-17 Thread Ryan Bullock
Hey Steven,

Including just the cpuFlags, since the output is pretty verbose. Let me
know if you need anything else from the output.

Without avic=1 (Works Fine):
"cpuFlags":
"fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,art,rep_good,nopl,nonstop_tsc,extd_apicid,amd_dcm,aperfmperf,eagerfpu,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,movbe,popcnt,aes,xsave,avx,f16c,rdrand,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,abm,sse4a,misalignsse,3dnowprefetch,osvw,skinit,wdt,tce,topoext,perfctr_core,perfctr_nb,bpext,perfctr_l2,cpb,hw_pstate,sme,retpoline_amd,ssbd,ibpb,vmmcall,fsgsbase,bmi1,avx2,smep,bmi2,rdseed,adx,smap,clflushopt,sha_ni,xsaveopt,xsavec,xgetbv1,clzero,irperf,xsaveerptr,arat,npt,lbrv,svm_lock,nrip_save,tsc_scale,vmcb_clean,flushbyasid,decodeassists,pausefilter,pfthreshold,avic,v_vmsave_vmload,vgif,overflow_recov,succor,smca,model_Opteron_G3,model_Opteron_G2,model_kvm32,model_kvm64,model_Westmere,model_Nehalem,model_Conroe,model_EPYC-IBPB,model_Opteron_G1,model_SandyBridge,model_qemu32,model_Penryn,model_pentium2,model_486,model_qemu64,model_cpu64-rhel6,model_EPYC,model_pentium,model_pentium3"

With avic=1 (Problem Configuration):
"cpuFlags":
"fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,art,rep_good,nopl,nonstop_tsc,extd_apicid,amd_dcm,aperfmperf,eagerfpu,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,movbe,popcnt,aes,xsave,avx,f16c,rdrand,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,abm,sse4a,misalignsse,3dnowprefetch,osvw,skinit,wdt,tce,topoext,perfctr_core,perfctr_nb,bpext,perfctr_l2,cpb,hw_pstate,sme,retpoline_amd,ssbd,ibpb,vmmcall,fsgsbase,bmi1,avx2,smep,bmi2,rdseed,adx,smap,clflushopt,sha_ni,xsaveopt,xsavec,xgetbv1,clzero,irperf,xsaveerptr,arat,npt,lbrv,svm_lock,nrip_save,tsc_scale,vmcb_clean,flushbyasid,decodeassists,pausefilter,pfthreshold,avic,v_vmsave_vmload,vgif,overflow_recov,succor,smca"

Flags stay the same, but with avic=1 no models are shown as supported.
Also, I opened this bug https://bugzilla.redhat.com/show_bug.cgi?id=1675030
regarding the avic=1 setting seemingly requiring the x2apic flag.

-Ryan

On Sun, Feb 17, 2019 at 5:22 AM Steven Rosenberg 
wrote:

> Dear Ryan Bullock,
>
> I am currently looking at this issue:
>
>
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4Y4X7UGDEYSB5JK45TLDERNM7IMTHIYY/
>
> We would like more information concerning the CPU Flags (even though you
> included them in your engine log dump above).
>
> Could you run the following command on the same host running: AMD
> EPYC-IBPB
>
> vdsm-client Host getCapabilities
>
> Please send me the output, especially the CPU Flags.
>
> Thank you in advance for your help.
>
> With Best Regards.
>
> Steven Rosenberg.
>
>
>
> On Thu, Feb 7, 2019 at 6:35 PM Ryan Bullock  wrote:
>
>> That would explain it.
>>
>> Would removing the host and then reinstalling it under a new 4.3 cluster
>> work without having to set the entire old cluster into maintenance to
>> change the cpu? Then I could just restart VM's into the new cluster as we
>> transition to minimize downtime.
>>
>> Thanks for the info!
>>
>> Ryan
>>
>> On Thu, Feb 7, 2019 at 7:56 AM Greg Sheremeta 
>> wrote:
>>
>>> AMD EPYC IBPB is deprecated in 4.3.
>>> The deprecated CPUs (cpus variable, that entire list) are:
>>>
>>> https://gerrit.ovirt.org/#/c/95310/7/frontend/webadmin/modules/webadmin/src/main/java/org/ovirt/engine/ui/webadmin/widget/table/column/ClusterAdditionalStatusColumn.java
>>>
>>> So, *-IBRS [IBRS-SSBD is still ok], Epyc IBPB, Conroe, Penryn, and
>>> Opteron G1-3. If you have those, you need to change it to a supported type
>>> while it's in 4.2 still.
>>>
>>> Greg
>>>
>>> On Thu, Feb 7, 2019 at 1:11 AM Ryan Bullock  wrote:
>>>
>>>> We just updated our engine to 4.3, but when I tried to update one of
>>>> our AMD EPYC hosts it could not activate with the error:
>>>>
>>>> Host vmc2h2 moved to Non-Operational state as host CPU type is not
>>>> supported in this cluster compatibility version or is not supported at all.
>>>>
>>>> Relevant (I think) parts from the the engine log:
>>>>
>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-82) [ee51a70] Could not
>>>> find server cpu for server 'vmc2h2' (745a14c6-9d31-48a4-9566-914647d83f53),
>>>> flags:
>>>> 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,

[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-15 Thread Ryan Bullock
Just to add, after we updated to 4.3 our gluster just went south.
Thankfully gluster is only secondary storage for us, and our primary
storage is an ISCSI SAN.  We migrated everything over to the SAN that we
could, but a few VM's got corrupted by gluster (data was gone). Right now
we just have gluster off and set to maintenance because the connectivity
issues were causing our main cluster to continuously migrate VMs.

Looking at the gluster hosts themselves I noticed that heal info would
often report one brick down, even if ovirt didn't. Checking the status of
glusterd would also show health checks failed.

Feb  6 15:36:37 vmc3h1 glusterfs-virtstore[17036]: [2019-02-06
23:36:37.937041] M [MSGID: 113075]
[posix-helpers.c:1957:posix_health_check_thread_proc] 0-VirstStore-posix:
health-check failed, going down
Feb  6 15:36:37 vmc3h1 glusterfs-virtstore[17036]: [2019-02-06
23:36:37.937561] M [MSGID: 113075]
[posix-helpers.c:1975:posix_health_check_thread_proc] 0-VirstStore-posix:
still alive! -> SIGTERM

I think the health-check is failing (maybe erroneously) which is then
killing the brick. When this happens it just causing a continuous cycle of
brick up-down and healing, and in turn connectivity issues.

This is our second time running into issues with gluster, so I think we are
going to sideline it for awhile.

-Ryan

On Thu, Feb 14, 2019 at 12:47 PM Darryl Scott 
wrote:

> I do believe something went wrong after fully updating everything last
> Friday.  I updated all the ovirt compute nodes on Friday and gluster/engine
> on Saturday.  I have been experiencing these issues every since.  I have
> pour over engine.log and seems to be connection to storage issue.
>
>
> --
> *From:* Jayme 
> *Sent:* Thursday, February 14, 2019 1:52:59 AM
> *To:* Darryl Scott
> *Cc:* users
> *Subject:* Re: [ovirt-users] Ovirt Cluster completely unstable
>
> I have a three node HCI gluster which was previously running 4.2 with zero
> problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
> with the upgrade process, but aside from that I also discovered other users
> with severe GlusterFS problems since the upgrade to new GlusterFS version.
> It is less than 24 hours since I upgrade my cluster and I just got a notice
> that one of my GlusterFS bricks is offline.  There does appear to be a very
> real and serious issue here with the latest updates.
>
>
> On Wed, Feb 13, 2019 at 7:26 PM  wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/IMUKFFANNJXLKXNVGMMJ6Y7MOLW2CQE3/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MVHTTLRZRN4XJA6BXCQQ5NIVPH7SPJCU/


[ovirt-users] Re: AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-11 Thread Ryan Bullock
So this looks like it is a bug in qemu/libvirtd

We had avic=1 set for kvm_amd, but when this is set the qemu capabilities
cache showed all EPYC/AMD Variants as unusable and blocking due to missing
'x2apic'. My guess is that it should probably be looking for the avic flag
instead. My other guess is that avic doesn't actually get enabled at all
when it is turned on.

Even though I had disable avic earlier in testing, libvirt did not pickup
the capabilities change until I cleared its cache.

I'm also assuming some of the verification code changes in ovirt from 4.2
to 4.3 or libvirt updated and exposed this.

Apparently qemu will just drop unsupported flags when starting a VM. Which
is why this was working before.

Mystery solved.

Regards,

Ryan

On Sun, Feb 10, 2019 at 8:54 AM Greg Sheremeta  wrote:

> Thanks, Ryan.
> I opened https://bugzilla.redhat.com/show_bug.cgi?id=1674265 to track
> this.
>
> Greg
>
> On Sat, Feb 9, 2019 at 5:50 PM Ryan Bullock  wrote:
>
>> Got a host activated!
>>
>> 1. Update host to 4.3
>> 2. rm /var/cache/libvirt/qemu/capabilities/*.xml
>> 3. systemctl restart libvirtd
>> 4. Activate host
>>
>> Seems like some kind of stuck state going from 4.2 -> 4.3
>>
>> Hope this helps someone else.
>>
>> On Sat, Feb 9, 2019 at 1:12 PM Ryan Bullock  wrote:
>>
>>> I tried that too, but it still complains about an unsupported CPU in the
>>> new cluster. Even if I leave the cluster level at 4.2, if I update the host
>>> to 4.3 it can't activate under a 4.2 cluster.
>>> Makes me think something changed in how it verifies the CPU support and
>>> for some reason it is not liking my EPYC systems.
>>>
>>> On Sat, Feb 9, 2019 at 10:18 AM Juhani Rautiainen <
>>> juhani.rautiai...@gmail.com> wrote:
>>>
>>>> On Sat, Feb 9, 2019 at 7:43 PM Ryan Bullock  wrote:
>>>> >
>>>> > So I tried making a new cluster with a 4.2 compatibility level and
>>>> moving one of my EPYC hosts into it. I then updated the host to 4.3 and
>>>> switched the cluster version 4.3 + set cluster cpu to the new AMD EPYC IBPD
>>>> SSBD (also tried plain AMD EPYC). It still fails to make the host
>>>> operational complaining that 'CPU type is not supported in this cluster
>>>> compatibility version or is not supported at all'.
>>>> >
>>>> When I did this with Epyc I made new cluster wth 4.3 level and Epyc
>>>> CPU. And then moved the nodes to it. Maybe try that? I also had to
>>>> move couple of VM's to new cluster because old cluster couldn't
>>>> upgrade with those. When nodes and couple problem VM's were in new
>>>> cluster I could upgrade old cluster to new level.
>>>>
>>>> -Juhani
>>>>
>>>
>
> --
>
> GREG SHEREMETA
>
> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
>
> Red Hat NA
>
> <https://www.redhat.com/>
>
> gsher...@redhat.comIRC: gshereme
> <https://red.ht/sig>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EOERAVPFSCYOAMPIGEDCKI7W4CHNX7UM/


[ovirt-users] Re: AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-09 Thread Ryan Bullock
So I tried making a new cluster with a 4.2 compatibility level and moving
one of my EPYC hosts into it. I then updated the host to 4.3 and switched
the cluster version 4.3 + set cluster cpu to the new AMD EPYC IBPD SSBD
(also tried plain AMD EPYC). It still fails to make the host operational
complaining that 'CPU type is not supported in this cluster compatibility
version or is not supported at all'.

I tried a few iterations of updating, moving, activating, reinstalling,
etc, but none of them seem to work.

The hosts are running CentOS Linux release 7.6.1810 (Core), all packages
are up to date.

I checked my CPU flags, and I can't see anything missing.

cat /proc/cpuinfo | head -n 26
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 1
model name  : AMD EPYC 7551P 32-Core Processor
stepping: 2
microcode   : 0x8001227
cpu MHz : 2000.000
cache size  : 512 KB
physical id : 0
siblings: 64
core id : 0
cpu cores   : 32
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall *nx* mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid
amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1
sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy *svm*
extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce
topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate sme
retpoline_amd *ssbd ibpb* vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx
smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr
arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif
overflow_recov succor smca
bogomips: 3992.39
TLB size: 2560 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

I just can't seem to figure out why 4.3 does not like these EPYC systems.
We have Skylake and Sandybridge clusters that are full on 4.3 now with the
SSBD variant CPUs.

I'm at a loss as to what to try next. Only thing I can think of is to
reinstall the host OS or try ovirt-node,  but I would like to avoid that if
I can.

Thank you for all the help so far.

Regards,

Ryan

On Fri, Feb 8, 2019 at 9:07 AM Ryan Bullock  wrote:

> This procedure worked for our HE, which is on Skylake.
>
> I think I have a process that should work for moving our EPYC clusters to
> 4.3. If it works this weekend I will post it for others.
>
> Ryan
>
> On Thu, Feb 7, 2019 at 12:06 PM Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Thu, Feb 7, 2019 at 7:15 PM Juhani Rautiainen <
>> juhani.rautiai...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Feb 7, 2019 at 6:52 PM Simone Tiraboschi 
>>> wrote:
>>>
>>>>
>>>>
>>>> For an hosted-engine cluster we have a manual workaround procedure
>>>> documented here:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1672859#c1
>>>>
>>>>
>>> I managed to upgrade my Epyc cluster with those steps. I made new
>>> cluster with Epyc CPU Type and cluster already in 4.3 level. Starting
>>> engine in new cluster complained something about not finding vm with that
>>> uuid but it still started engine fine. When all nodes were in new cluster I
>>> still couldn't upgrade old cluster because engine was complaining that
>>> couple of VM's couldn't be upgraded (something to do with custom level). I
>>> moved them to new cluster too. Had to just change networks to management
>>> for the move. After that I could upgrade old cluster to Epyc and 4.3 level.
>>> Then I just moved VM's and nodes back (same steps but backwards). After
>>> that you can remove the extra cluster and raise datacenter to 4.3 level.
>>>
>>> -Juhani
>>>
>>
>>
>> Thanks for the report!
>> We definitively have to  figure out a better upgrade flow when a cluster
>> CPU change is required/advised.
>>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z4NOJFYIOOBGAOP3WXV5WHKRQS22TXH5/


[ovirt-users] Re: AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-08 Thread Ryan Bullock
This procedure worked for our HE, which is on Skylake.

I think I have a process that should work for moving our EPYC clusters to
4.3. If it works this weekend I will post it for others.

Ryan

On Thu, Feb 7, 2019 at 12:06 PM Simone Tiraboschi 
wrote:

>
>
> On Thu, Feb 7, 2019 at 7:15 PM Juhani Rautiainen <
> juhani.rautiai...@gmail.com> wrote:
>
>>
>>
>> On Thu, Feb 7, 2019 at 6:52 PM Simone Tiraboschi 
>> wrote:
>>
>>>
>>>
>>> For an hosted-engine cluster we have a manual workaround procedure
>>> documented here:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1672859#c1
>>>
>>>
>> I managed to upgrade my Epyc cluster with those steps. I made new cluster
>> with Epyc CPU Type and cluster already in 4.3 level. Starting engine in new
>> cluster complained something about not finding vm with that uuid but it
>> still started engine fine. When all nodes were in new cluster I still
>> couldn't upgrade old cluster because engine was complaining that couple of
>> VM's couldn't be upgraded (something to do with custom level). I moved them
>> to new cluster too. Had to just change networks to management for the move.
>> After that I could upgrade old cluster to Epyc and 4.3 level. Then I just
>> moved VM's and nodes back (same steps but backwards). After that you can
>> remove the extra cluster and raise datacenter to 4.3 level.
>>
>> -Juhani
>>
>
>
> Thanks for the report!
> We definitively have to  figure out a better upgrade flow when a cluster
> CPU change is required/advised.
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E5OSFFKNUM4NA2TVL2FUXJCV5PKD6V4B/


[ovirt-users] Re: AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-07 Thread Ryan Bullock
That would explain it.

Would removing the host and then reinstalling it under a new 4.3 cluster
work without having to set the entire old cluster into maintenance to
change the cpu? Then I could just restart VM's into the new cluster as we
transition to minimize downtime.

Thanks for the info!

Ryan

On Thu, Feb 7, 2019 at 7:56 AM Greg Sheremeta  wrote:

> AMD EPYC IBPB is deprecated in 4.3.
> The deprecated CPUs (cpus variable, that entire list) are:
>
> https://gerrit.ovirt.org/#/c/95310/7/frontend/webadmin/modules/webadmin/src/main/java/org/ovirt/engine/ui/webadmin/widget/table/column/ClusterAdditionalStatusColumn.java
>
> So, *-IBRS [IBRS-SSBD is still ok], Epyc IBPB, Conroe, Penryn, and
> Opteron G1-3. If you have those, you need to change it to a supported type
> while it's in 4.2 still.
>
> Greg
>
> On Thu, Feb 7, 2019 at 1:11 AM Ryan Bullock  wrote:
>
>> We just updated our engine to 4.3, but when I tried to update one of our
>> AMD EPYC hosts it could not activate with the error:
>>
>> Host vmc2h2 moved to Non-Operational state as host CPU type is not
>> supported in this cluster compatibility version or is not supported at all.
>>
>> Relevant (I think) parts from the the engine log:
>>
>> (EE-ManagedThreadFactory-engineScheduled-Thread-82) [ee51a70] Could not
>> find server cpu for server 'vmc2h2' (745a14c6-9d31-48a4-9566-914647d83f53),
>> flags:
>> 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,art,rep_good,nopl,nonstop_tsc,extd_apicid,amd_dcm,aperfmperf,eagerfpu,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,movbe,popcnt,aes,xsave,avx,f16c,rdrand,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,abm,sse4a,misalignsse,3dnowprefetch,osvw,skinit,wdt,tce,topoext,perfctr_core,perfctr_nb,bpext,perfctr_l2,cpb,hw_pstate,sme,retpoline_amd,ssbd,ibpb,vmmcall,fsgsbase,bmi1,avx2,smep,bmi2,rdseed,adx,smap,clflushopt,sha_ni,xsaveopt,xsavec,xgetbv1,clzero,irperf,xsaveerptr,arat,npt,lbrv,svm_lock,nrip_save,tsc_scale,vmcb_clean,flushbyasid,decodeassists,pausefilter,pfthreshold,avic,v_vmsave_vmload,vgif,overflow_recov,succor,smca'
>> 2019-02-06 17:23:58,527-08 INFO
>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-82) [7f6d4f0d] START,
>> SetVdsStatusVDSCommand(HostName = vmc2h2,
>> SetVdsStatusVDSCommandParameters:{hostId='745a14c6-9d31-48a4-9566-914647d83f53',
>> status='NonOperational',
>> nonOperationalReason='CPU_TYPE_INCOMPATIBLE_WITH_CLUSTER'
>>
>>
>> From virsh -r capabilities:
>>
>> 
>>   x86_64
>>   EPYC-IBPB
>>   AMD
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>> 
>>
>> I also tried creating a new 4.3 cluster, set to the AMD EPYC IPBDB SSBD
>> and moving the host into it, but it failed to move it into that cluster
>> with a similar error about an unsupported CPU (for some reason it also made
>> me clear the additional kernel options as well, we use 1gb hugepages). I
>> have not yet tried removing the host entirely and adding it as part of
>> creating the new cluster.
>>
>> We have been/are using a database change to update the 4.2 cluster level
>> to include EPYC support with the following entries (can post the whole
>> query if needed):
>> 7:AMD EPYC:svm,nx,model_EPYC:EPYC:x86_64; 8:AMD EPYC
>> IBPB:svm,nx,ibpb,model_EPYC:EPYC-IBPB:x86_64
>>
>> We have been running 4.2 with this for awhile. We did apply the same
>> changes after the 4.3 update, but only for the 4.2 cluster level. We only
>> used the AMD EPYC IBPB model.
>>
>> Reverting the host back to 4.2 allows it to activate and run normally.
>>
>> Anyone have any ideas as to why it can't seem to find the cpu type?
>>
>> Thanks,
>>
>> Ryan Bullock
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4Y4X7UGDEYSB5JK45TLDERNM7IMTHIYY/
>>
>
>
> --
>
> GREG SHEREMETA
>
> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
>
> Red Hat NA
>
> <https://www.redhat.com/>
>
> gsher...@redhat.comIRC: gshereme
> <https://red.ht/sig>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMDXG35JGXXEA3IKDWKFRS5OICZIXQYL/


[ovirt-users] AMD EPYC 4.3 upgrade 'CPU type is not supported in this cluster compatibility version or is not supported at all'

2019-02-06 Thread Ryan Bullock
We just updated our engine to 4.3, but when I tried to update one of our
AMD EPYC hosts it could not activate with the error:

Host vmc2h2 moved to Non-Operational state as host CPU type is not
supported in this cluster compatibility version or is not supported at all.

Relevant (I think) parts from the the engine log:

(EE-ManagedThreadFactory-engineScheduled-Thread-82) [ee51a70] Could not
find server cpu for server 'vmc2h2' (745a14c6-9d31-48a4-9566-914647d83f53),
flags:
'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,pdpe1gb,rdtscp,lm,constant_tsc,art,rep_good,nopl,nonstop_tsc,extd_apicid,amd_dcm,aperfmperf,eagerfpu,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,movbe,popcnt,aes,xsave,avx,f16c,rdrand,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,abm,sse4a,misalignsse,3dnowprefetch,osvw,skinit,wdt,tce,topoext,perfctr_core,perfctr_nb,bpext,perfctr_l2,cpb,hw_pstate,sme,retpoline_amd,ssbd,ibpb,vmmcall,fsgsbase,bmi1,avx2,smep,bmi2,rdseed,adx,smap,clflushopt,sha_ni,xsaveopt,xsavec,xgetbv1,clzero,irperf,xsaveerptr,arat,npt,lbrv,svm_lock,nrip_save,tsc_scale,vmcb_clean,flushbyasid,decodeassists,pausefilter,pfthreshold,avic,v_vmsave_vmload,vgif,overflow_recov,succor,smca'
2019-02-06 17:23:58,527-08 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-82) [7f6d4f0d] START,
SetVdsStatusVDSCommand(HostName = vmc2h2,
SetVdsStatusVDSCommandParameters:{hostId='745a14c6-9d31-48a4-9566-914647d83f53',
status='NonOperational',
nonOperationalReason='CPU_TYPE_INCOMPATIBLE_WITH_CLUSTER'


>From virsh -r capabilities:


  x86_64
  EPYC-IBPB
  AMD
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


I also tried creating a new 4.3 cluster, set to the AMD EPYC IPBDB SSBD and
moving the host into it, but it failed to move it into that cluster with a
similar error about an unsupported CPU (for some reason it also made me
clear the additional kernel options as well, we use 1gb hugepages). I have
not yet tried removing the host entirely and adding it as part of creating
the new cluster.

We have been/are using a database change to update the 4.2 cluster level to
include EPYC support with the following entries (can post the whole query
if needed):
7:AMD EPYC:svm,nx,model_EPYC:EPYC:x86_64; 8:AMD EPYC
IBPB:svm,nx,ibpb,model_EPYC:EPYC-IBPB:x86_64

We have been running 4.2 with this for awhile. We did apply the same
changes after the 4.3 update, but only for the 4.2 cluster level. We only
used the AMD EPYC IBPB model.

Reverting the host back to 4.2 allows it to activate and run normally.

Anyone have any ideas as to why it can't seem to find the cpu type?

Thanks,

Ryan Bullock
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4Y4X7UGDEYSB5JK45TLDERNM7IMTHIYY/


[ovirt-users] Re: Direct LUNs and VM Pauses

2018-07-31 Thread Ryan Bullock
Sorry for the slow reply, was out sick end of last week.

Thank you Nir! You have been very helpful in getting a grasp on this issue.

I have gone ahead and open an RFE for resuming on a Direct LUN:

https://bugzilla.redhat.com/show_bug.cgi?id=1610459

Thanks again!

Regards,

Ryan

On Tue, Jul 24, 2018 at 12:30 PM, Nir Soffer  wrote:

> On Tue, Jul 24, 2018 at 8:30 PM Ryan Bullock  wrote:
> ...
>
>> Vdsm does monitor multipath events for all LUNs, but they are used only
>>> for reporting purposes, see:
>>> https://ovirt.org/develop/release-management/features/
>>> storage/multipath-events/
>>>
>>> We could use the events for resuming vms using the multipath devices that
>>> became available. This functionality will be even more important in the
>>> next version
>>> since we plan to move to LUN per disk model.
>>>
>>>
>>
>> I will look at doing this. At the very least I feel that
>> differences/limitations between storage back-ends/methods should be
>> documented. Just so users don't run into any surprises.
>>
>
> You can file a bug for documenting this issue.
>
> ...
>
>> My other question is, how can I keep my VMs with Direct LUNs from pausing
>>>> during short outages? Can I put configurations in my multipath.conf for
>>>> just the wwids of my Direct LUNs to increase the ‘no_path_retry’ to prevent
>>>> the VMs from pausing in the first place? I know in general you don’t want
>>>> to increase the ‘no_path_retry’ because it can cause timeout issues with
>>>> VDSM and SPM operations (LVM changes, etc). But in the case of a Direct LUN
>>>> would it cause any problems?
>>>>
>>>
>>> You can add a drop-in multipath configuration that will change
>>> no_path_retry for specific device, or multiapth.
>>>
>>> Increasing no_path_retry will cause larger delays when vdsm try to
>>> access the LUNs via lvm commands, but the delay should be only on
>>> the first access when a LUN is not available.
>>>
>>>
>> Would that increased delay cause any sort of issues for Ovirt (e.g.
>> thinking a node is offline/unresponsive) if set globally in multipath.conf?
>> Since a Direct LUN doesn't use LVM, would this even be a consideration if
>> the increased delay was limited to the Direct LUN only?
>>
>
> Vdsm scans all LUNs to discover oVirt volumes, so it will be effected by
> multipath
> configuration applied only for direct LUNs.
>
> Increasing no_path_retry for any LUN will increase the chance to delay some
> vdsm flows accessing LUNs (e.g. updating lvm cache, scsi rescan, listing
> devices).
> But the delay happens once when the multipath device loose all paths. The
> benefit
> is smaller chance that a VM will pause or restart because of short outage.
>
> Nir
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VCLQX46CLZOXWR3A7NXUOAFVZICUVCDH/


[ovirt-users] Re: Direct LUNs and VM Pauses

2018-07-24 Thread Ryan Bullock
On Tue, Jul 24, 2018 at 5:51 AM, Nir Soffer  wrote:

> On Mon, Jul 23, 2018 at 9:35 PM Ryan Bullock  wrote:
>
>> Hello All,
>>
>> We recently stood up a new Ovirt install backed by an ISCSI SAN and it
>> has been working great, but there are a few quirks I am trying to iron out.
>>
>> We have run into an issue where when we fail-over our SAN (for
>> maintenance, or otherwise) any VM with a Direct LUN gets paused and doesn’t
>> resume. VMs without a direct LUN never paused.
>>
>
> I guess the other VMs did get paused, but they were resumed
> automatically by the system, so from your point of view, they did
> not "pause".
>
> You can check vdsm log if the other vms did pause and resume. I'm not
> sure engine UI reports all pause and resume events.
>
>

Ah, Ok. That would make sense. I had checked the events via the UI and it
didn't show any pauses, but I had not checked the actual VDSM logs on the
hosts. Unfortunately my logs of for the period have rolled off. I had
noticed this behaviour during our first firmware upgrade on our SAN about a
month ago. Since VM leases allowed us to maintain HA I just put it in my
list of things to follow up on. Going forward I will make sure to double
check the VDSM logs to see what is happening in the background.

> Digging through posts on this list and reading some bug reports, it seems
>> like this a known quirk with how Ovirt handles Direct LUNs (it doesn't
>> monitor the LUNs and so it wont resume the VM).
>>
>
> Right.
>
> Can you file a bug for supporting this?
>
> Vdsm does monitor multipath events for all LUNs, but they are used only
> for reporting purposes, see:
> https://ovirt.org/develop/release-management/features/
> storage/multipath-events/
>
> We could use the events for resuming vms using the multipath devices that
> became available. This functionality will be even more important in the
> next version
> since we plan to move to LUN per disk model.
>
>

I will look at doing this. At the very least I feel that
differences/limitations between storage back-ends/methods should be
documented. Just so users don't run into any surprises.

> To get the VMs to automatically restart I have attached VM leases to them
>> and that seems to work fine, not as nice as a pause and resume, but it
>> minimizes downtime.
>>
>
> Cool!
>
>
>> What I’m trying to understand is why the VMs with Direct LUNs paused, and
>> ones without didn’t. My only speculation is that since the Non-Direct is
>> using LVM on top of ISCSI, that LVM is adding its own layer of timeouts
>> that cause it to mask the outage?
>>
>
> I don't know about additional retry mechanism in the data-path for LVM
> based disks. I think we use the same multipath failover behavior.
>
>
>> My other question is, how can I keep my VMs with Direct LUNs from pausing
>> during short outages? Can I put configurations in my multipath.conf for
>> just the wwids of my Direct LUNs to increase the ‘no_path_retry’ to prevent
>> the VMs from pausing in the first place? I know in general you don’t want
>> to increase the ‘no_path_retry’ because it can cause timeout issues with
>> VDSM and SPM operations (LVM changes, etc). But in the case of a Direct LUN
>> would it cause any problems?
>>
>
> You can add a drop-in multipath configuration that will change
> no_path_retry for specific device, or multiapth.
>
> Increasing no_path_retry will cause larger delays when vdsm try to
> access the LUNs via lvm commands, but the delay should be only on
> the first access when a LUN is not available.
>
>
Would that increased delay cause any sort of issues for Ovirt (e.g.
thinking a node is offline/unresponsive) if set globally in multipath.conf?
Since a Direct LUN doesn't use LVM, would this even be a consideration if
the increased delay was limited to the Direct LUN only?

Here is an example drop-in file:
>
> # cat /etc/multipath/conf.d/my.conf
> devices {
> device {
> vendor "my-vendor"
> product "my-product"
> # based on 5 seconds monitor interval, queue I/O for
> # 60 seconds when no path is available, before failing.
> no_path_retry 12
> }
> }
>
> multipaths {
> multipath {
> wwid "my-wwidr"
> no_path_retry 12
> }
> }
>
>
Yep, this was my plan.

See "man multipath.conf" for more info.
>
> Nir
>

Thanks,

Ryan
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NXZHSYHIDVO4W2CCVU6G7SBFLELC7APV/


[ovirt-users] Direct LUNs and VM Pauses

2018-07-23 Thread Ryan Bullock
Hello All,

We recently stood up a new Ovirt install backed by an ISCSI SAN and it has
been working great, but there are a few quirks I am trying to iron out.

We have run into an issue where when we fail-over our SAN (for maintenance,
or otherwise) any VM with a Direct LUN gets paused and doesn’t resume. VMs
without a direct LUN never paused. Digging through posts on this list and
reading some bug reports, it seems like this a known quirk with how Ovirt
handles Direct LUNs (it doesn't monitor the LUNs and so it wont resume the
VM). To get the VMs to automatically restart I have attached VM leases to
them and that seems to work fine, not as nice as a pause and resume, but it
minimizes downtime.

What I’m trying to understand is why the VMs with Direct LUNs paused, and
ones without didn’t. My only speculation is that since the Non-Direct is
using LVM on top of ISCSI, that LVM is adding its own layer of timeouts
that cause it to mask the outage?

My other question is, how can I keep my VMs with Direct LUNs from pausing
during short outages? Can I put configurations in my multipath.conf for
just the wwids of my Direct LUNs to increase the ‘no_path_retry’ to prevent
the VMs from pausing in the first place? I know in general you don’t want
to increase the ‘no_path_retry’ because it can cause timeout issues with
VDSM and SPM operations (LVM changes, etc). But in the case of a Direct LUN
would it cause any problems?


Thank you,

Ryan
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZKW5T4TJX2VP6M3DEEKNDQTUAZMRRSX/