[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread David Malcolm
On Fri, 2021-11-12 at 09:54 +0100, Sandro Bonazzola wrote:
> Il giorno ven 12 nov 2021 alle ore 09:50 Sandro Bonazzola <
> sbona...@redhat.com> ha scritto:
> 
> > 
> > 
> > Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
> > sbona...@redhat.com> ha scritto:
> > 
> > > 
> > > 
> > > Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams
> > > 
> > > ha scritto:
> > > 
> > > > I have seen vdsmd leak memory for years (I've been running
> > > > oVirt since
> > > > version 3.5), but never been able to nail it down.  I've
> > > > upgraded a
> > > > cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-
> > > > stream), and I
> > > > still see it happen.  One host in the cluster, which has been
> > > > up 8 days,
> > > > has vdsmd with 4.3 GB resident memory.  On a couple of other
> > > > hosts, it's
> > > > around half a gigabyte.
> > > > 
> > > > In the past, it seemed more likely to happen on the hosted
> > > > engine hosts
> > > > and/or the SPM host... but the host with the 4.3 GB vdsmd is
> > > > not either
> > > > of those.
> > > > 
> > > > I'm not sure what I do that would make my setup "special"
> > > > compared to
> > > > others; I loaded a pretty minimal install of CentOS 8-stream,
> > > > with the
> > > > only extra thing being I add the core parts of the Dell
> > > > PowerEdge
> > > > OpenManage tools (so I can get remote SNMP hardware
> > > > monitoring).
> > > > 
> > > > When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is
> > > > a single
> > > > anonymous block (which I'm guessing is just the python general
> > > > memory
> > > > allocator).
> > > > 
> > > > I thought maybe the switch to CentOS 8 and python 3 might clear
> > > > something up, but obviously not.  Any ideas?
> > > > 
> > > 
> > > I guess we still have the reproducibility issue (
> > > https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
> > > ).
> > > But maybe in the meanwhile there's a new way to track things
> > > down. +Marcin
> > > Sobczyk  ?
> > > 
> > > 
> > > 
> > Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?
> > 
> 
> +David Malcolm  I saw your slides on python
> memory
> leak debugging, maybe you can give some suggestions here.

I haven't worked on Python itself in > 8 years, so my knowledge is out-
of-date here.

Adding in Victor Stinner, who has worked on the CPython memory
allocators more recently, and, in particular, implemented the
tracemalloc library linked to above.

Dave
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GAASNRR7WWCVJTKFKZIJM6MYLAI6VPGU/


[ovirt-users] Re: how to forcibly remove dead node from Gluster 3 replica 1 arbiter setup?

2021-11-12 Thread Strahil Nikolov via Users
As I mentioned in the slack, the safest approach is to:
1. Reduce the volume to replica 1 (there is no need to keep the arbiter until 
resynchronization
gluster volume remove-brick VOLUME replica 1  
beclovkvma02.bec.net:/data/brick2/brick2   
beclovkvma03.bec.net:/data/brick1/brick2 
beclovkvma02.bec.net:/data/brick3/brick3 
beclovkvma03.bec.net:/data/brick1/brick3 
beclovkvma02.bec.net:/data/brick4/brick4 
beclovkvma03.bec.net:/data/brick1/brick4 
beclovkvma02.bec.net:/data/brick5/brick5 
beclovkvma03.bec.net:/data/brick1/brick5 
beclovkvma02.bec.net:/data/brick6/brick6 
beclovkvma03.bec.net:/data/brick1/brick6 
beclovkvma02.bec.net:/data/brick7/brick7 
beclovkvma03.bec.net:/data/brick1/brick7 
beclovkvma02.bec.net:/data/brick8/brick8 
beclovkvma03.bec.net:/data/brick1/brick8 force
Note: I might have missed a brick, so verify that you are selecting all bricks 
for the arbiter and beclovkvma02
2. Remove the broken nodegluster peer detach beclovkvma02.bec.net force
3. Add the freshly installed host:gluster peer probe beclovkvma04.bec.net
4. Umount all bricks on the arbiter.Then reformat them:mkfs.xfs -f -i size=512 
/path/to/each/arbiter/brick/LV
5. Check if fstab is using UUID and if yes -> update with the /dev/VG/LV or 
with the new UUIDs (blkid should help)
6. Mount all bricks on the arbiter - no errors should be reported:mount -a
7. Umount , reformat and remount all bricks on beclovkvma04.bec.net . Don't 
forget to check the fstab. 'mount -a' is your first friend
8. Readd the bricks to the volume. Order is important (first 04, then arbiter, 
04, arbiter...)
gluster volume brick-add VOLUME replica 3 arbiter1 
beclovkvma04.bec.net:/data/brick2/brick2   
beclovkvma03.bec.net:/data/brick1/brick2 
beclovkvma04.bec.net:/data/brick3/brick3 
beclovkvma03.bec.net:/data/brick1/brick3 
beclovkvma04.bec.net:/data/brick4/brick4 
beclovkvma03.bec.net:/data/brick1/brick4 
beclovkvma04.bec.net:/data/brick5/brick5 
beclovkvma03.bec.net:/data/brick1/brick5 
beclovkvma04.bec.net:/data/brick6/brick6 
beclovkvma03.bec.net:/data/brick1/brick6 
beclovkvma04.bec.net:/data/brick7/brick7 
beclovkvma03.bec.net:/data/brick1/brick7 
beclovkvma04.bec.net:/data/brick8/brick8 
beclovkvma03.bec.net:/data/brick1/brick8
9. Trigger the full heal:gluster volume heal VOLUME full
10. If your bricks are high performant and you need to speed up the healing you 
can increase these volume settings:
- cluster.shd-max-threads- cluster.shd-wait-qlenght

Best Regards,Strahil Nikolov 
 
  On Fri, Nov 12, 2021 at 8:21, dhanaraj.ramesh--- via Users 
wrote:   wanted to remove beclovkvma02.bec.net as the node was dead, now I 
reinstalled this node and trying to add as 4th node - beclovkvma04.bec.net 
however since the system UUID is same Im not able to add the node in ovirt 
gluster.. 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XXY6FD7G6PUYKEBQ6ZORVYZI4L6NSRFW/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QJONICSZUUAXJJAMBJQYMJRCM73S4VKG/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno ven 12 nov 2021 alle ore 09:50 Sandro Bonazzola <
sbona...@redhat.com> ha scritto:

>
>
> Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
> sbona...@redhat.com> ha scritto:
>
>>
>>
>> Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams 
>> ha scritto:
>>
>>> I have seen vdsmd leak memory for years (I've been running oVirt since
>>> version 3.5), but never been able to nail it down.  I've upgraded a
>>> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
>>> still see it happen.  One host in the cluster, which has been up 8 days,
>>> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
>>> around half a gigabyte.
>>>
>>> In the past, it seemed more likely to happen on the hosted engine hosts
>>> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
>>> of those.
>>>
>>> I'm not sure what I do that would make my setup "special" compared to
>>> others; I loaded a pretty minimal install of CentOS 8-stream, with the
>>> only extra thing being I add the core parts of the Dell PowerEdge
>>> OpenManage tools (so I can get remote SNMP hardware monitoring).
>>>
>>> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
>>> anonymous block (which I'm guessing is just the python general memory
>>> allocator).
>>>
>>> I thought maybe the switch to CentOS 8 and python 3 might clear
>>> something up, but obviously not.  Any ideas?
>>>
>>
>> I guess we still have the reproducibility issue (
>> https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
>> ).
>> But maybe in the meanwhile there's a new way to track things down. +Marcin
>> Sobczyk  ?
>>
>>
>>
> Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?
>

+David Malcolm  I saw your slides on python memory
leak debugging, maybe you can give some suggestions here.


>
>
>
>>
>>
>>> --
>>> Chris Adams 
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>>>
>>
>>
>> --
>>
>> Sandro Bonazzola
>>
>> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>>
>> Red Hat EMEA 
>>
>> sbona...@redhat.com
>> 
>>
>> *Red Hat respects your work life balance. Therefore there is no need to
>> answer this email out of your office hours.*
>>
>>
>>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IZIL7DCUJW6IXNKDH32YGPRRQY5Q5HXC/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno ven 12 nov 2021 alle ore 09:47 Sandro Bonazzola <
sbona...@redhat.com> ha scritto:

>
>
> Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams  ha
> scritto:
>
>> I have seen vdsmd leak memory for years (I've been running oVirt since
>> version 3.5), but never been able to nail it down.  I've upgraded a
>> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
>> still see it happen.  One host in the cluster, which has been up 8 days,
>> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
>> around half a gigabyte.
>>
>> In the past, it seemed more likely to happen on the hosted engine hosts
>> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
>> of those.
>>
>> I'm not sure what I do that would make my setup "special" compared to
>> others; I loaded a pretty minimal install of CentOS 8-stream, with the
>> only extra thing being I add the core parts of the Dell PowerEdge
>> OpenManage tools (so I can get remote SNMP hardware monitoring).
>>
>> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
>> anonymous block (which I'm guessing is just the python general memory
>> allocator).
>>
>> I thought maybe the switch to CentOS 8 and python 3 might clear
>> something up, but obviously not.  Any ideas?
>>
>
> I guess we still have the reproducibility issue (
> https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
> ).
> But maybe in the meanwhile there's a new way to track things down. +Marcin
> Sobczyk  ?
>
>
>
Perhaps https://docs.python.org/3.6/library/tracemalloc.html ?



>
>
>> --
>> Chris Adams 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>>
>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PD5T57TRMEUZEX7YMR7A6XNV5NSGCMFB/


[ovirt-users] Re: Upgraded to oVirt 4.4.9, still have vdsmd memory leak

2021-11-12 Thread Sandro Bonazzola
Il giorno mer 10 nov 2021 alle ore 15:45 Chris Adams  ha
scritto:

> I have seen vdsmd leak memory for years (I've been running oVirt since
> version 3.5), but never been able to nail it down.  I've upgraded a
> cluster to oVirt 4.4.9 (reloading the hosts with CentOS 8-stream), and I
> still see it happen.  One host in the cluster, which has been up 8 days,
> has vdsmd with 4.3 GB resident memory.  On a couple of other hosts, it's
> around half a gigabyte.
>
> In the past, it seemed more likely to happen on the hosted engine hosts
> and/or the SPM host... but the host with the 4.3 GB vdsmd is not either
> of those.
>
> I'm not sure what I do that would make my setup "special" compared to
> others; I loaded a pretty minimal install of CentOS 8-stream, with the
> only extra thing being I add the core parts of the Dell PowerEdge
> OpenManage tools (so I can get remote SNMP hardware monitoring).
>
> When I run "pmap $(pidof -x vdsmd)", the bulk of the RAM use is a single
> anonymous block (which I'm guessing is just the python general memory
> allocator).
>
> I thought maybe the switch to CentOS 8 and python 3 might clear
> something up, but obviously not.  Any ideas?
>

I guess we still have the reproducibility issue (
https://lists.ovirt.org/archives/list/de...@ovirt.org/thread/KO5SEPAZMLBWSBS6OJZ73YVPLHIAFOLV/
).
But maybe in the meanwhile there's a new way to track things down. +Marcin
Sobczyk  ?




> --
> Chris Adams 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3PTE35WMIVGLV2W47YVQUHCVOI6LGIPM/
>


-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YWFMABWWTTSBN3HCJFFGTICS7V2WQD3G/


[ovirt-users] Re: upgrade dependency issues

2021-11-12 Thread Sandro Bonazzola
Il giorno gio 11 nov 2021 alle ore 23:19 David White <
dmwhite...@protonmail.com> ha scritto:

> Hi team,
> I saw that RHEL 8.5 was released yesterday, so I just put one of my hosts
> that doesn't have local gluster storage into maintenance mode and again
> attempted an update.
>
> The update again failed through the oVirt web UI, and a `yum update` from
> the command line again failed with the same issues as I documented in this
> email thread earlier.
> So I ran `yum update --disablerepo ovirt\*` successfully, then I tried a
> standard yum update again.
>
> I'm still getting the same problems. Is there still an issue with
> upgrading to the latest version of oVirt, even when on RHEL 8.5?
> I'm not going to paste the full stdout here, because there's a lot, but
> here's all of the "Problems".
>
> Should I try a --no-best here?
>
> Problem 1: cannot install the best update candidate for package
> vdsm-4.40.80.6-1.el8.x86_64
> Problem 2: package vdsm-gluster-4.40.90.4-1.el8.x86_64 requires vdsm =
> 4.40.90.4-1.el8, but none of the providers can be installed
> Problem 3: package ovirt-host-dependencies-4.4.9-2.el8.x86_64 requires
> vdsm >= 4.40.90, but none of the providers can be installed
> Problem 4: package ovirt-host-4.4.9-2.el8.x86_64 requires
> ovirt-host-dependencies = 4.4.9-2.el8, but none of the providers can be
> installed
> Problem 5: package ovirt-provider-ovn-driver-1.2.34-1.el8.noarch requires
> vdsm, but none of the providers can be installed
> Problem 6: package ovirt-hosted-engine-ha-2.4.9-1.el8.noarch requires vdsm
> >= 4.40.0, but none of the providers can be installed
> Problem 7: problem with installed package vdsm-4.40.80.6-1.el8.x86_64
> Problem 8: problem with installed package
> vdsm-gluster-4.40.80.6-1.el8.x86_64
> Problem 9: problem with installed package
> ovirt-provider-ovn-driver-1.2.34-1.el8.noarch
> Problem 10: problem with installed package
> ovirt-hosted-engine-ha-2.4.8-1.el8.noarch
> Problem 11: package ovirt-hosted-engine-setup-2.5.4-2.el8.noarch requires
> ovirt-hosted-engine-ha >= 2.4, but none of the providers can be installed
> Problem 12: problem with installed package
> ovirt-host-dependencies-4.4.8-1.el8.x86_64
>

There's something weird here because: "package
vdsm-gluster-4.40.90.4-1.el8.x86_64 requires vdsm = 4.40.90.4-1.el8, but
none of the providers can be installed" but there's no explanation on why
vdsm-4.40.90.4 couldn't be installed.

What worries me is the "problem with installed package" list.

Can you please run again with "--debugsolver" ?








>
> I tried rebooting the system, but am getting the same errors, even on the
> RHEL 8.5 kernel.
>
> As an aside, I'm having a really frustrating time with my network
> configurations somehow getting somewhat reset on every system reboot, and
> it always takes me a while to get full network connectivity up and running
> again. I'm running into that issue yet again, but I don't want to derail
> the topic here... I can send another email if necessary.
>
> Sent with ProtonMail  Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Tuesday, October 26th, 2021 at 9:37 AM, Sandro Bonazzola <
> sbona...@redhat.com> wrote:
>
>
>
> Il giorno mar 26 ott 2021 alle ore 15:32 Gianluca Cecchi <
> gianluca.cec...@gmail.com> ha scritto:
>
>> On Tue, Oct 26, 2021 at 12:12 PM Sandro Bonazzola 
>> wrote:
>>
>>> Thanks for the report, my team is looking into the dedependency failures.
>>> oVirt 4.4.9 has been developed on CentOS Stream 8 and some dependencies
>>> are not yet available on RHEL 8.4 and derivatives.
>>>
>>
>> Ok, fair enough you only test on CentOS Stream 8, but at least I think
>> you should change what you are going to write in the next release notes,
>> putting only what actually tested.
>>
>> For 4.4.9 there was:
>>
>> "
>>
>> This release is available now on x86_64 architecture for:
>>
>>-
>>
>>Red Hat Enterprise Linux 8.4
>>-
>>
>>CentOS Linux (or similar) 8.4
>>-
>>
>>CentOS Stream 8
>>
>>
>> This release supports Hypervisor Hosts on x86_64 and ppc64le
>> architectures for:
>>
>>-
>>
>>Red Hat Enterprise Linux 8.4
>>-
>>
>>CentOS Linux (or similar) 8.4
>>-
>>
>>oVirt Node NG (based on CentOS Stream 8)
>>-
>>
>>CentOS Stream 8
>>
>>
>> "
>> So one understands that at least installation/upgrade from 4.4.8 to 4.4.9
>> has been validated when the hosts are in CentOS 8.4 or in RH EL 8.4, that
>> currently are the latest 8.4 level released, while it seems both fails
>> right now, correct?
>>
>> Gianluca
>>
>
> It fails right now, correct.
>
>
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
> *Red Hat respects your work life balance. Therefore there is no need to
> answer this email out of your office hours.*
>
>
>
>

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA