Hi all,
Just in case someone else runs into this problem, we wanted to give an update
on this as we've solved most of it.
Long story short, when the neutorn's get_security_groups API is hit with an
admin context, it attempts to get all security groups. Since we have so many
security groups, this effectively causes neutron-server to hang. We did three
things to mitigate or fix this:
1. If neutron.db.securitygroups_db.SecurityGroupDbMixin#get_security_groups
is called in a way that we know will cause it to hang, we fail fast and return
an error. This will allow "normal" calls to that method to complete without
issue. The obvious downside to this is that the caller will get an error, but
the caller would have gotten a time out previously, so this isn't any worse and
neutron-server won't hang. We don't intend to upstream this as it is a bit of a
hack.
2. In
neutron.db.securitygroups_db.SecurityGroupDbMixin#_get_security_groups_on_port
(which is called when creating a port), we ensured that get_security_groups is
getting called with a proper tenant_id filter. It wasn't before and because
this gets called with an admin context from nova-scheduler, it would attempt to
get all security groups, which it doesn't need.
3. We found that this commit (which isn't in a maintenance release yet)
fixed one of the problem areas:
*
https://github.com/openstack/nova/commit/19fdaa225abd007a13cd38c742e27c5ee620186c
* https://review.openstack.org/#/c/30048/
* We cherry picked that and we're now applying it as a patch via Anvil.
It's already been back ported to stable/havana, so once it get's into a
maintenance release, we'll be able to remove the patch.
We think #2 still exists as an upstream bug in master. Will investigate further
and submit a bug and patch if someone else hasn't already addressed it.
/Craig J
From: Mike Dorman <[email protected]<mailto:[email protected]>>
Date: Wednesday, February 5, 2014 5:36 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: [Openstack] [neutron] neutron-server iterating over all security
groups, not just those in the project
We're seeing an issue where neutron-server (Havana) iterates over all security
groups (with an individual SELECT query for each), rather than just the
security groups in the tenant. We can trigger this by creating a port using
the default security group. If we specify no security groups, or a specific
security group, it works fine.
We have ~1000 tenants and 10 security groups in each tenant in this
environment. So this ultimately results in 10k SQL queries, which tanks
neutron-server for a few minutes. Note that all the tenants are in the same
network.
Still trying to run down where in the code this is happening. But I've been
able to trace the SQL queries up to when it starts the iteration:
http://pastebin.com/ZkP5idkJ
You can see where the first two queries get the groups/rules just for the
specific tenant. But then after that, it's the same queries, but for
groups/rules in all tenants.
We will continue looking into it to see what we can find, but any suggestions
or ideas would be appreciated.
Thanks,
Mike
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack