Public bug reported:
This was observed during tests on environment with several controllers: when a
routers with gateways and subnets are created at a high rate, sometimes port
creation for router gateway may fail with DBDeadlock. In several cases that I
investigated I found that deadlock happens when router port is created in
parallel with dhcp port(s) creation on other servers. Generally we have
simultaneous port creation. Port creation involves locking 'ports' and
'binding' tables: get_locked_port_and_binding() ml2 db method, which
essentially does:
port = (session.query(models_v2.Port).
enable_eagerloads(False).
filter_by(id=port_id).
with_lockmode('update').
one())
binding = (session.query(models.PortBinding).
enable_eagerloads(False).
filter_by(port_id=port_id).
with_lockmode('update').
one())
Also there are locks during ip allocation for the port.
I'm not sure how exacly this may lead to deadlock. It may probably happen due
to specifics of Galera working in active-active
mode: throwing deadlock errors when it fails to validate a change with other
members of the cluster.
Examples of tracebacks:
http://paste.openstack.org/show/399624/
http://paste.openstack.org/show/405057/
** Affects: neutron
Importance: Undecided
Assignee: Oleg Bondarev (obondarev)
Status: New
** Tags: db ml2
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1479738
Title:
DB deadlocks on simultaneous port creation
Status in neutron:
New
Bug description:
This was observed during tests on environment with several controllers: when
a routers with gateways and subnets are created at a high rate, sometimes
port creation for router gateway may fail with DBDeadlock. In several cases
that I investigated I found that deadlock happens when router port is created
in parallel with dhcp port(s) creation on other servers. Generally we have
simultaneous port creation. Port creation involves locking 'ports' and
'binding' tables: get_locked_port_and_binding() ml2 db method, which
essentially does:
port = (session.query(models_v2.Port).
enable_eagerloads(False).
filter_by(id=port_id).
with_lockmode('update').
one())
binding = (session.query(models.PortBinding).
enable_eagerloads(False).
filter_by(port_id=port_id).
with_lockmode('update').
one())
Also there are locks during ip allocation for the port.
I'm not sure how exacly this may lead to deadlock. It may probably happen due
to specifics of Galera working in active-active
mode: throwing deadlock errors when it fails to validate a change with other
members of the cluster.
Examples of tracebacks:
http://paste.openstack.org/show/399624/
http://paste.openstack.org/show/405057/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1479738/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp