On Mon, Feb 16, 2015, at 07:50 PM, Mike Bayer wrote: > hi all - > > I’ve been researching this cinder issue > https://bugs.launchpad.net/cinder/+bug/1417018 and I’ve found something > concerning. > > Basically there’s a race condition here which is occurring because a > single MySQL connection is shared between two green threads. This occurs > while Cinder is doing its startup work in cinder/volume/manager.py -> > init_host(), and at the same time a request comes in from a separate > service call that seems to be part of the startup. > > The log output at http://paste.openstack.org/show/175571/ shows this > happening. I can break it down: > > > 1. A big query for volumes occurs as part of > "self.db.volume_get_all_by_host(ctxt, self.host)”. To reproduce the > error more regularly I’ve placed it into a loop of 100 calls. We can see > that thread id is 68089648 MySQL connection is 3a9c5a0. > > 2015-02-16 19:32:47.236 INFO sqlalchemy.engine.base.Engine > [req-ed3c0248-6ee5-4063-80b5-77c5c9a23c81 None None] tid: 68089648, > connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt > SELECT volumes.created_at AS > 2015-02-16 19:32:47.237 INFO sqlalchemy.engine.base.Engine > [req-ed3c0248-6ee5-4063-80b5-77c5c9a23c81 None None] > ('localhost.localdomain@ceph', 'localhost.localdomain@ceph#%’) > > 2. A “ping” query comes in related to a different API call - different > thread ID, *same* connection > > 2015-02-16 19:32:47.276 INFO sqlalchemy.engine.base.Engine > [req-600ef638-cb45-4a34-a3ab-6d22d83cfd00 None None] tid: 68081456, > connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt > SELECT 1 > 2015-02-16 19:32:47.279 INFO sqlalchemy.engine.base.Engine > [req-600ef638-cb45-4a34-a3ab-6d22d83cfd00 None None] () > > 3. The first statement is still in the middle of invocation, so we get a > failure, either a mismatch of the statement to the cursor, or a MySQL > lost connection (stack trace begins) > > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, > in fire_timers > … more stack trace > > 4. another thread id, *same* connection. > > 2015-02-16 19:32:47.290 INFO sqlalchemy.engine.base.Engine > [req-f980de7c-151d-4fed-b45e-d12b133859a6 None None] tid: 61238160, > connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt > SELECT 1 > > rows = [process(row, None) for row in fetch] > File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", > line 363, in _instance > tuple([row[column] for column in pk_cols]) > File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", > line 331, in _key_fallback > expression._string_or_unprintable(key)) > NoSuchColumnError: "Could not locate column in row for column > 'volumes.id'" > 2015-02-16 19:32:47.293 ERROR cinder.openstack.common.threadgroup [-] > "Could not locate column in row for column 'volumes.id’" > > > So I’m not really sure what’s going on here. Cinder seems to have some > openstack greenlet code of its own in > cinder/openstack/common/threadgroup.py, I don’t know the purpose of this > code. SQLAlchemy’s connection pool has been tested a lot with eventlet > / gevent and this has never been reported before. This is a very > severe race and I’d think that this would be happening all over the > place.
The threadgroup module is from the Oslo incubator, so if you need to review the git history you'll want to look at that copy. Doug > > Current status is that I’m continuing to try to determine why this is > happening here, and seemingly nowhere else. > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev