Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

Mike Bayer Wed, 14 Sep 2016 08:48:18 -0700


On 09/14/2016 11:08 AM, Mike Bayer wrote:



On 09/14/2016 09:15 AM, Sean Dague wrote:

I noticed the following issues happening quite often now in the
opportunistic db tests for nova -
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22sqlalchemy.exc.ResourceClosedError%5C%22



It looks like some race has been introduced where the various db
connections are not fully isolated from each other like they used to be.
The testing magic for this is buried pretty deep in oslo.db.


that error message occurs when a connection that is intended against a
SELECT statement fails to provide a cursor.description attribute.  It is
typically a driver-level bug in the MySQL world and corresponds to
mis-handled failure modes from the MySQL connection.

By "various DB connections are not fully isolated from each other" are
you suggesting that a single in-Python connection object itself is being
shared among multiple greenlets?   I'm not aware of a change in oslo.db
that would be a relationship to such an effect.

So, I think by "fully isolated from each other" what you really mean is"operations upon a connection are not fully isolated from the subsequentuse of that connection", since that's what I see in the logs. Aconnection is attempting to be used during teardown to drop tables,however it's in this essentially broken state from a PyMySQLperspective, which would indicate something has gone wrong with this(pooled) connection in the preceding test that could not be detected orreverted once the connection was returned to the pool.

From Roman's observation, it looks like a likely source of thiscorruption is a timeout that is interrupting the state of the PyMySQLconnection. In the preceding stack trace, PyMySQL is encountering araise as it attempts to call "self._sock.recv_into(b)", and it seemslike some combination of eventlet's response to signals and thefixtures.Timeout() fixture is the cause of this interruption. As anadditional wart, something else is getting involved and turning it intoan IndexError, I'm not sure what that part is yet though I can imaginethat might be SQLAlchemy mis-interpreting what it expects to be aPyMySQL exception class, since we normally look inside ofexception.args[0] to get the MySQL error code. With a blank exceptionlike fixtures.TimeoutException, .args is the empty tuple.

The PyMySQL connection is now in an invalid state and unable to performa SELECT statement correctly, but the connection is not invalidated andis instead returned to the connection pool in a broken state. So thesubsequent teardown, if it uses this same connection (which is likely),fails because the connection has been interrupted in the middle of itswork and not given the chance to clean up.

Seems like the use of fixtures.Timeout() fixture here is not organizedto work with a database operation in progress, especially aneventlet-monkeypatched PyMySQL. Ideally, if something like a timeoutdue to a signal handler occurs, the entire connection pool should bedisposed (quickest way, engine.dispose()), or at the very least (andmuch more targeted), the connection that's involved should beinvalidated from the pool, e.g. connection.invalidate().

The change to the environment here would be that this timeout ishappening at all - the reason for that is not yet known. If oslo.db'sversion were involved in this error, I would guess that it would berelated to this timeout condition being caused, and not anything to dowith the connection provisioning.


Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.

    -Sean


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

Reply via email to