It looks like this is really same issue as in https://bugzilla.redhat.com/show_bug.cgi?id=1667007 so it's not direclty issue in neutron but in openvswitch. I will then mark it as invalid for neutron but feel free to change it if that would be different issue.
** Tags added: ovs ** Bug watch added: Red Hat Bugzilla #1667007 https://bugzilla.redhat.com/show_bug.cgi?id=1667007 ** Changed in: neutron Status: New => Invalid ** Changed in: neutron Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1823818 Title: Memory leak in some neutron agents Status in kolla: Confirmed Status in neutron: Invalid Bug description: We have an OpenStack deployment using rocky release. We have seen a memory leak issue in some neutron agents twice in our environment since it was first deployed this Jan. Below are some of the commands we ran to identify the issue and their corresponding output: This was on one of the compute nodes: ----------------------------------------------- [root@c1s4 ~]# ps aux --sort -rss|head -n1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 42435 48229 3.5 73.1 98841060 96323252 pts/13 S+ 2018 1881:25 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini ----------------------------------------------- And this was on one of the controller nodes: ----------------------------------------------- [root@r1 neutron]# ps aux --sort -rss|head USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 42435 30940 3.1 48.6 68596320 64144784 pts/37 S+ Jan08 588:26 /usr/bin/python2 /usr/bin/neutron-lbaasv2-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/lbaas_agent.ini --config-file /etc/neutron/neutron_lbaas.conf 42435 20902 2.8 26.1 36055484 34408952 pts/35 S+ Jan08 525:12 /usr/bin/python2 /usr/bin/neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini 42434 34199 7.1 6.0 39420516 8033480 pts/11 Sl+ 2018 3620:08 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin- dir=/usr/lib64/mysql/plugin --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_on=ON --log-error=/var/log/kolla/mariadb/mariadb.log --pid- file=/var/lib/mysql/mariadb.pid --port=3306 --wsrep_start_position=0809f452-0251-11e9-8e60-6ad108d9be7b:0 42435 8327 2.6 2.2 3546004 3001772 pts/10 S+ Jan17 152:04 /usr/bin/python2 /usr/bin/neutron-l3-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/neutron_vpnaas.conf --config-file /etc/neutron/l3_agent.ini --config-file /etc/neutron/fwaas_driver.ini 42435 40171 2.6 2.1 3893480 2840852 pts/19 S+ Jan16 190:54 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini root 42430 3.1 0.3 4412216 495492 pts/29 SLl+ Jan16 231:20 /usr/sbin/ovs-vswitchd unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --log-file=/var/log/kolla/openvswitch/ovs-vswitchd.log --------------------------------------------- When it happened, we saw a lot of 'OSError: [Errno 12] Cannot allocate memory' ERRORs in different neutron-* logs, because there were no free mem left. However, we don't know yet what had triggered the memory leakage. Here is our globals.yml: --------------------------------------------- [root@r1 kolla]# cat globals.yml |grep -v "^#"|tr -s "\n" --- openstack_release: "rocky" kolla_internal_vip_address: "172.21.69.22" enable_barbican: "yes" enable_ceph: "yes" enable_ceph_mds: "yes" enable_ceph_rgw: "yes" enable_cinder: "yes" enable_neutron_lbaas: "yes" enable_neutron_fwaas: "yes" enable_neutron_agent_ha: "yes" enable_ceph_rgw_keystone: "yes" ceph_pool_pg_num: 16 ceph_pool_pgp_num: 16 ceph_osd_store_type: "xfs" glance_backend_ceph: "yes" glance_backend_file: "no" glance_enable_rolling_upgrade: "no" ironic_dnsmasq_dhcp_range: tempest_image_id: tempest_flavor_ref_id: tempest_public_network_id: tempest_floating_network_name: ----------------------------------------------- I did some search on google and found this ovs bug is highly related https://bugzilla.redhat.com/show_bug.cgi?id=1667007 I am not sure if the fix has been included in the latest Rocky kolla images? Best regards, Lei To manage notifications about this bug go to: https://bugs.launchpad.net/kolla/+bug/1823818/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

