I've uploaded the logs here: https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log
The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032 At 2018-05-14T11:11:39.970, we got the log " jjs rec already in pool checkin", but I just deleted `return` in code, so the problem happen again. 在 2018年5月14日星期一 UTC+8上午10:48:49,Mike Bayer写道: > > as well as the log that shows this: > > + if self in pool._pool.queue: > + pool.logger.debug("jjs rec already in pool checkin, skip %r", > self) > + return > > > actually happening. > > > > > On Sun, May 13, 2018 at 10:42 PM, Mike Bayer <mik...@zzzcomputing.com > <javascript:>> wrote: > > On Sun, May 13, 2018 at 10:35 PM, <jiajun...@gmail.com <javascript:>> > wrote: > >> "jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We > work > >> together and add logs in the same environment, add prefix of name to > get a > >> keyword for log filter. > >> > >> The doubled ConnectionRecords(with None connection) are got by two > different > >> coroutines, and the variable record.connection is rewrite by the second > >> coroutines. > >> After the record being put back to QueuePool again, there are two same > >> record with same connection in the pool, which will cause the next two > >> coroutines get conflict and deadlock in psycopg2 as above. > > > > OK, I need the detailed step by step that shows how the two coroutines > > are interacting such that they both are returning the same connection. > > The nova log above seems to show just one coroutine I assume since > > there is just one nova request ID, if I understand correctly. > > > >> > >> The logs of psycopg2 has been pasted at > >> https://github.com/psycopg/psycopg2/issues/703 > > > > The above issue says the bug has been found in SQLAlchemy but there is > > no bug report yet and I've not yet been shown how the behavior here is > > possible. I posted a straightforward example above, can you > > please show me where the two coroutines get mixed up ? > > > > > > > > > >> > >> > >> > >> 在 2018年5月11日星期五 UTC+8上午3:05:46,Mike Bayer写道: > >>> > >>> I tried to follow your log lines, which is difficult because I don't > >>> know where those log statements actually are and I don't know what > >>> acronyms like "jjs", "cjr", mean. But it does look like it's al in > >>> one thread (so not async GC) and all in one Nova request (so perhaps > >>> just one greenlet involved, unless other greenlets are affecting this > >>> without them being displayed in this log snippet). I do see the > >>> connection record doubled at the end. So here is the illustration of > >>> those steps in isolation, the connection record is not doubled. Can > >>> you alter this program to show this condition occurring? > >>> > >>> from sqlalchemy import create_engine > >>> > >>> e = create_engine("postgresql://scott:tiger@localhost/test", > >>> echo_pool='debug') > >>> e.connect().close() > >>> > >>> assert len(e.pool._pool.queue) == 1 > >>> > >>> c1 = e.connect() > >>> > >>> assert len(e.pool._pool.queue) == 0 > >>> > >>> c1.engine.pool._invalidate(c1._Connection__connection, None) > >>> c1.invalidate(e) > >>> > >>> # connection record is back > >>> assert len(e.pool._pool.queue) == 1 > >>> > >>> # do a reconnect > >>> c1.connection > >>> > >>> # uses the record again > >>> assert len(e.pool._pool.queue) == 0 > >>> > >>> # close the Connection > >>> c1.close() > >>> > >>> # record is back > >>> assert len(e.pool._pool.queue) == 1 > >>> > >>> # etc. > >>> c1 = e.connect() > >>> assert len(e.pool._pool.queue) == 0 > >>> > >>> c1.close() > >>> assert len(e.pool._pool.queue) == 1 > >>> > >>> > >>> > >>> > >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer <mik...@zzzcomputing.com> > > >>> wrote: > >>> > Here's a print statement you might want to try: > >>> > > >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py > >>> > index 89a4cea7c..452db4883 100644 > >>> > --- a/lib/sqlalchemy/pool.py > >>> > +++ b/lib/sqlalchemy/pool.py > >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection, > connection_record, > >>> > been garbage collected. > >>> > > >>> > """ > >>> > + print( > >>> > + "FINALIZE FAIRY: GREENLET ID: %s, DB CONN: %s, FAIRY: %s, > REF: > >>> > %s " > >>> > + "CONNECTION_REC: %s CONNECTION_REC REF %s" % ( > >>> > + "put greenlet id here", > >>> > + connection, > >>> > + fairy, > >>> > + ref, > >>> > + connection_record, > >>> > + connection_record.fairy_ref > >>> > + if connection_record is not None else "n/a" > >>> > + ) > >>> > + ) > >>> > + > >>> > _refs.discard(connection_record) > >>> > > >>> > if ref is not None: > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer < > mik...@zzzcomputing.com> > >>> > wrote: > >>> >> On Thu, May 10, 2018 at 7:23 AM, <jiajun...@gmail.com> wrote: > >>> >>> Hi Mike, > >>> >>> > >>> >>> I added more logs and finnaly find the reason and a method to > avoid > >>> >>> the > >>> >>> deadlock between coroutines. > >>> >>> > >>> >>> In summary, the deadlock happend because two different coroutines > use > >>> >>> the > >>> >>> same ConnectionRecord with same connection. > >>> >>> It happends under QueuePool, when the DB connections are reset by > the > >>> >>> DB > >>> >>> server(not related with the backend driver). > >>> >>> > >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, and > apply a > >>> >>> weakref of the ConnectionFairy, with callback function > >>> >>> `_finalize_fairy`. > >>> >>> 2.When the connection get disconnected, it receives an > >>> >>> exception(psycopg2.OperationalError), and goto > >>> >>> Connection._handle_dbapi_exception. > >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both > >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`. > >>> >>> 4.In `ConnectionFairy.invalidate`, `_ConnectionRecord._checkin` is > >>> >>> called, > >>> >>> and put the ConnectionRecord back to the QueuePool. > >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is called, > and > >>> >>> **del** > >>> >>> the ConnectionFairy. > >>> >>> 6.The **del** operation will make weakref of ConnectionFairy to > dead, > >>> >>> and > >>> >>> the callback `_finalize_fairy` executed. > >>> >>> 7.In `_finalize_fairy`, another `_ConnectionRecord._checkin` is > >>> >>> called, put > >>> >>> the ConnectionRecord to the QueuePool again. > >>> >> > >>> >> Here's what does not make sense. You are claiming that a simple > >>> >> invalidation will result in the same ConnectionRecord being checked > in > >>> >> twice. This is obviously not the case, what is special in your > >>> >> situation that makes this happen? > >>> >> > >>> >> If I follow the steps you refer towards, in step 4, when > >>> >> ConnectionFairy._checkin is called, that calls finalize_fairy > >>> >> directly, which then calls ConnectionRecord.checkin(). Within > >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is set to > None. > >>> >> Then back in ConnectionFairy._checkin(), it also sets > >>> >> self._connection_record = None. > >>> >> > >>> >> If we come back into finalize_fairy a *second* time then, as a > result > >>> >> of the "del" which occurs in _handle_error(), and the > >>> >> connection_record is present, the function will exit immediately, > >>> >> because: > >>> >> > >>> >> if connection_record.fairy_ref is not ref: > >>> >> return > >>> >> > >>> >> So the situation as given is not possible without concurrent access > to > >>> >> the ConnectionFairy being introduced externally to the pool. > >>> >> > >>> >> There is exactly one thing I can think of that is both unique to > the > >>> >> psycopg2-style of asynchronous connection, not the PyMySQL form of > it, > >>> >> that might be important here, and it's that I think psycopg2's > >>> >> connection.close() method will defer to another greenlet. That > means > >>> >> when we're inside of ConnectionRecord.invalidate(), and we call > >>> >> self.__close(), that could be deferring to another greenlet. But > >>> >> this all happens before the connection is returned to the pool, so > I > >>> >> still don't know what is happening. > >>> >> > >>> >>> > >>> >>> The logs in detail is below. To prove my result is right, I tried > to > >>> >>> avoid > >>> >>> the scienario with this patch, and it worked. > >>> >>> > >>> >>> def checkin(self): > >>> >>> self.fairy_ref = None > >>> >>> connection = self.connection > >>> >>> pool = self.__pool > >>> >>> > >>> >>> while self.finalize_callback: > >>> >>> finalizer = self.finalize_callback.pop() > >>> >>> finalizer(connection) > >>> >>> + if self in pool._pool.queue: > >>> >>> + pool.logger.debug("jjs rec already in pool checkin, skip > %r", > >>> >>> self) > >>> >>> + return > >>> >> > >>> >> as mentioned above it's not possible to reach this point without > >>> >> concurrent access to the pooled connection, which means this fix > could > >>> >> fail also (with at least threaded-style concurrency, not greenlet > >>> >> style). The original source of the greenlet issue must still be > >>> >> identified. > >>> >> > >>> >> I would need a self-contained proof of concept that demonstrates > this > >>> >> issue occurring. The log output is not that helpful because it at > >>> >> least would need to write out the identifier of the current > greenlet, > >>> >> since it is very likely that greenlets have something to do with > this > >>> >> issue. > >>> >> > >>> >> > >>> >> > >>> >>> > >>> >>> if pool.dispatch.checkin: > >>> >>> pool.dispatch.checkin(connection, self) > >>> >>> pool._return_conn(self) > >>> >>> > >>> >>> Trace the ConnectionRecord 0x88fc5d0 > >>> >>> > >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new connection > >>> >>> <connection object at 0x81f69b0; closed: 0> > >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn success > >>> >>> <connection > >>> >>> object at 0x81f69b0; closed: 0>, rec.connection <connection object > at > >>> >>> 0x81f69b0; closed: 0>, pool queue > >>> >>> deque([<sqlalchemy.pool._ConnectionRecord > >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at > >>> >>> 0x79fdad0>]) > >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout fairy > >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> pool queue > >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at 0x7a04810>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>]) > >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection <connection > >>> >>> object at > >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool size: 10 > >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: 40 > Current > >>> >>> Checked > >>> >>> out connections: 8, queue > deque([<sqlalchemy.pool._ConnectionRecord > >>> >>> object > >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at > >>> >>> 0x79fdad0>]) > >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs > >>> >>> _ConnectionFairy._checkout > >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, rec.conn > >>> >>> <connection object at 0x81f69b0; closed: 0> > >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor ERROR > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:122 error] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn > >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, err > >>> >>> connection > >>> >>> pointer is NULL > >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:122 error] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter newraise > >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:122 error] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter finally > >>> >>>>>Connection._handle_dbapi_exception --> finally --> > >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate** > >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate --> > >>> >>>>> ConnectionFairy.invalidate --> _ConnectionRecord.invalidate & > >>> >>>>> **self._checkin()** > >>> >>>>>self.invalidate() --> **del self._root.__connection** > >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs connectionFairy > >>> >>> invalidate > >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor INFO > [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:109 info] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate connection > >>> >>> <connection > >>> >>> object at 0x81f69b0; closed: 2> (reason: > OperationalError:connection > >>> >>> pointer > >>> >>> is NULL > >>> >>>>>_ConnectionRecord.invalidate(e) --> > >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, e) & > >>> >>>>> self.__close() & > >>> >>>>> self.connection = None > >>> >>>>>self.__close() --> self.finalize_callback.clear() & > >>> >>>>> self.__pool._close_connection(self.connection) > >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing connection > >>> >>> <connection > >>> >>> object at 0x81f69b0; closed: 2> > >>> >>>>>Pool._close_connection(connection) --> > >>> >>>>> self._dialect.do_close(connection) > >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) --> > >>> >>>>> dbapi_connection.close() FIN > >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs > >>> >>> _ConnectionFairy._checkin > >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at > 0x88fc5d0> > >>> >>> rec.conn None > >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, conn_rec, > pool, > >>> >>>>> None, > >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None > >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs _finalize_fairy > args > >>> >>> conn > >>> >>> None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord > object at > >>> >>> 0x88fc5d0>. ref None, fairy <sqlalchemy.pool._ConnectionFairy > object > >>> >>> at > >>> >>> 0x329c0d0> > >>> >>>>>_finalize_fairy --> connection_record.checkin() > >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to > >>> >>> connection_record.checkin > >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin > >>> >>> None, > >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at > >>> >>> 0x7524590>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]), > >>> >>> finalize_callback > >>> >>> deque([]) > >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, > >>> >>> pool > >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at > 0x7524590>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) > >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor ERROR > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:122 error] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception during reset > or > >>> >>> similar > >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to > >>> >>> connection_record.invalidate > >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to > >>> >>> connection_record.checkin > >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin > >>> >>> None, > >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at > >>> >>> 0x7524590>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]), > >>> >>> finalize_callback > >>> >>> deque([]) > >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor DEBUG > >>> >>> [pid:4528] > >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] > >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, > >>> >>> pool > >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at > 0x7524590>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, > >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04,jiajun...@gmail.com写道: > >>> >>>> > >>> >>>> I found why we could get the same connection twice from the > pool... > >>> >>>> > >>> >>>> Because we put it into pool twice. > >>> >>>> > >>> >>>> Any suggestion for finding why it put the same connection into > pool > >>> >>>> within > >>> >>>> different coroutines? > >>> >>>> > >>> >>>> > >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG > >>> >>>> [pid:15755] > >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug] > >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn checkin > >>> >>>> <connection > >>> >>>> object at 0x111d9740; closed: 0> > >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor DEBUG > >>> >>>> [pid:15755] > >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug] > >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn checkin > >>> >>>> <connection > >>> >>>> object at 0x111d9740; closed: 0> > >>> >>>> ... > >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor DEBUG > >>> >>>> [pid:15755] > >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug] > >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection <connection > >>> >>>> object at > >>> >>>> 0x111d9740; closed: 0> checked out from pool > >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor DEBUG > >>> >>>> [pid:15755] > >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug] > >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection <connection > >>> >>>> object at > >>> >>>> 0x111d9740; closed: 0> checked out from pool > >>> >>>> > >>> >>>> > >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52,jiajun...@gmail.com写道: > >>> >>>>> > >>> >>>>> I added logs in sqlalchemy/pool.py > >>> >>>>> > >>> >>>>> QueuePool.checkout > >>> >>>>> > >>> >>>>> @classmethod > >>> >>>>> def checkout(cls, pool): > >>> >>>>> rec = pool._do_get() > >>> >>>>> try: > >>> >>>>> dbapi_connection = rec.get_connection() > >>> >>>>> pool.logger.debug("jjs get conn success %s", > >>> >>>>> dbapi_connection) > >>> >>>>> except: > >>> >>>>> pool.logger.debug("jjs get error in get_connection") > >>> >>>>> with util.safe_reraise(): > >>> >>>>> rec.checkin() > >>> >>>>> echo = pool._should_log_debug() > >>> >>>>> fairy = _ConnectionFairy(dbapi_connection, rec, echo) > >>> >>>>> rec.fairy_ref = weakref.ref( > >>> >>>>> fairy, > >>> >>>>> lambda ref: _finalize_fairy and > >>> >>>>> _finalize_fairy( > >>> >>>>> dbapi_connection, > >>> >>>>> rec, pool, ref, echo) > >>> >>>>> ) > >>> >>>>> _refs.add(rec) > >>> >>>>> if echo: > >>> >>>>> pool.logger.debug("jjs conn %s, pool status %s", > >>> >>>>> dbapi_connection, pool.status()) > >>> >>>>> pool.logger.debug("Connection %r checked out from pool", > >>> >>>>> dbapi_connection) > >>> >>>>> return fairy > >>> >>>>> > >>> >>>>> Then I got logs below: > >>> >>>>> > >>> >>>>> coroutine A: > >>> >>>>> > >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info] > >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is > False, > >>> >>>>> timeout > >>> >>>>> 30 > >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] > >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new > connection > >>> >>>>> <connection object at 0x903f8c0; closed: 0> > >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] > >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success > >>> >>>>> <connection > >>> >>>>> object at 0x903f8c0; closed: 0> > >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] > >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection > >>> >>>>> object at > >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections > in > >>> >>>>> pool: 1 > >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out > >>> >>>>> connections: 45 > >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] > >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection > <connection > >>> >>>>> object at > >>> >>>>> 0x903f8c0; closed: 0> checked out from pool > >>> >>>>> > >>> >>>>> coroutine B: > >>> >>>>> > >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info] > >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is > False, > >>> >>>>> timeout > >>> >>>>> 30 > >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] > >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success > >>> >>>>> <connection > >>> >>>>> object at 0x903f8c0; closed: 0> > >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] > >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection > >>> >>>>> object at > >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in > >>> >>>>> pool: 0 > >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out > >>> >>>>> connections: 46 > >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG > >>> >>>>> [pid:55528] > >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] > >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection > <connection > >>> >>>>> object at > >>> >>>>> 0x903f8c0; closed: 0> checked out from pool > >>> >>>>> > >>> >>>>> > >>> >>>>> It seems that when we create_connection in QueuePool._do_get, we > may > >>> >>>>> not > >>> >>>>> get the lock _pool.mutex(~Queue.mutex). > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道: > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <jiajun...@gmail.com> wrote: > >>> >>>>>>> > >>> >>>>>>> We added coroutine_id in psycopg2, and found that two > coroutine > >>> >>>>>>> use the > >>> >>>>>>> same connection before the pthread_mutex_lock released. > >>> >>>>>>> > >>> >>>>>>> Maybe something in the connection pool goes wrong? > >>> >>>>>>> [pid] [coroutine_id] msg > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> This seems unlikely. Openstack uses the SQLAlchemy connection > >>> >>>>>> pool > >>> >>>>>> with eventlet patching in conjunction with the MySQL drivers > and > >>> >>>>>> nothing > >>> >>>>>> like that happens. The biggest issue SQLAlchemy had with > >>> >>>>>> greenlets is that > >>> >>>>>> a timed out greenlet would throw GreenletExit and cause a dirty > >>> >>>>>> connection > >>> >>>>>> to go back into the pool, that was fixed in 1.2. > >>> >>>>>> > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn > >>> >>>>>>> 0x94122f0, > >>> >>>>>>> cb 0x23ad320 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE > 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn > 0x94122f0, > >>> >>>>>>> async_cursor 0x881ac00 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn > 0x94122f0, > >>> >>>>>>> async_cursor 0x881ac00 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, > autocommit > >>> >>>>>>> = 0, > >>> >>>>>>> status = 2 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 > >>> >>>>>>> __lock 1 > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> Below is the logs grep by coroutine id: > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]finish send query, before psyco_wait, conn > >>> >>>>>>> 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn > >>> >>>>>>> 0x94122f0, > >>> >>>>>>> cb 0x23ad320 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE > 0x94122f0 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: poll writing > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ > >>> >>>>>>> > >>> >>>>>>> ----- > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for > >>> >>>>>>> connection > >>> >>>>>>> at 0x8de2d30 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: init cursor object at > 0xa6c2650 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null), > conn = > >>> >>>>>>> 0x8de2d30 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: good cursor object at > 0xa6c2650, > >>> >>>>>>> refcnt = 1 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650: > >>> >>>>>>> refcnt = > >>> >>>>>>> 1 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn > 0x94122f0, > >>> >>>>>>> async_cursor 0x881ac00 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn > 0x94122f0, > >>> >>>>>>> async_cursor 0x881ac00 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, > autocommit > >>> >>>>>>> = 0, > >>> >>>>>>> status = 2 > >>> >>>>>>> > >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 > >>> >>>>>>> __lock 1 > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> 在 2018年4月28日星期六 UTC+8下午4:07:34,jiajun...@gmail.com写道: > >>> >>>>>>>> > >>> >>>>>>>> We reproduced this problem and added logs in psycopg2, found > a > >>> >>>>>>>> confusing thing and reported it to psycopg2. > >>> >>>>>>>> > >>> >>>>>>>> It seems sqlalchemy & psycopg2 stack flow is: > >>> >>>>>>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit > >>> >>>>>>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn) > >>> >>>>>>>> 3.get exception in sqlalchemy, and do_rollback > >>> >>>>>>>> 4.from sqlalchemy to psycopg: pq_abort > >>> >>>>>>>> 5.psycopg get pthread_mutex_lock deadlock > >>> >>>>>>>> > >>> >>>>>>>> What we have done to reproduce it is: stop master pg-server > and > >>> >>>>>>>> promote the slave pg-server to master, with moving FIP from > old > >>> >>>>>>>> master to > >>> >>>>>>>> slave. At the same time, let nova-conductor do quite a lot db > >>> >>>>>>>> query > >>> >>>>>>>> requests. > >>> >>>>>>>> > >>> >>>>>>>> [1] https://github.com/psycopg/psycopg2/issues/703 > >>> >>>>>>>> > >>> >>>>>>>> 在 2018年4月23日星期一 UTC+8下午9:45:04,Mike Bayer写道: > >>> >>>>>>>>> > >>> >>>>>>>>> On Mon, Apr 23, 2018 at 9:03 AM, <jiajun...@gmail.com> > wrote: > >>> >>>>>>>>> > Sorry for reply on this topic. > >>> >>>>>>>>> > > >>> >>>>>>>>> > We recently get the same problem in our production > >>> >>>>>>>>> > environment. > >>> >>>>>>>>> > > >>> >>>>>>>>> > I found a patch in other lib [1], and they added > conn.close() > >>> >>>>>>>>> > in > >>> >>>>>>>>> > exception > >>> >>>>>>>>> > psycopg2.ProgrammingError. > >>> >>>>>>>>> > > >>> >>>>>>>>> > Shall we do the same in [2] ? > >>> >>>>>>>>> > >>> >>>>>>>>> SQLAlchemy does things much more carefully than that, we > parse > >>> >>>>>>>>> the > >>> >>>>>>>>> error message for specific ones that correspond to > "connection > >>> >>>>>>>>> is no > >>> >>>>>>>>> longer usable", we call these "is_disconnect" but it can be > any > >>> >>>>>>>>> invalidating condition. > >>> >>>>>>>>> > >>> >>>>>>>>> You can make these yourself, and they can also be made to be > >>> >>>>>>>>> part of > >>> >>>>>>>>> oslo.db, using the handle_error event: > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error > > >>> >>>>>>>>> > >>> >>>>>>>>> within oslo.db you would want to propose a change here: > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387 > > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > >>> >>>>>>>>> > > >>> >>>>>>>>> > [1] > >>> >>>>>>>>> > > https://github.com/aio-libs/aiopg/pull/415/files?diff=split > >>> >>>>>>>>> > > >>> >>>>>>>>> > [2] > >>> >>>>>>>>> > > >>> >>>>>>>>> > > >>> >>>>>>>>> > > https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289 > > >>> >>>>>>>>> > > >>> >>>>>>>>> > > >>> >>>>>>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31,JinRong Cai写道: > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> Hi Michael , > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> I am using openstack with postgresql which sqlalchemy and > >>> >>>>>>>>> >> oslo_db > >>> >>>>>>>>> >> module > >>> >>>>>>>>> >> were used. > >>> >>>>>>>>> >> And there are some problems after my pg database switched > >>> >>>>>>>>> >> over. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> Here is my switch over process: > >>> >>>>>>>>> >> 1. nova-conductor(python application) is running with DB > >>> >>>>>>>>> >> connection > >>> >>>>>>>>> >> strings point to vip , which is in primary site(A) of pg. > >>> >>>>>>>>> >> 2. switch VIP from primary(A) to new primary(B) > >>> >>>>>>>>> >> 3. switch over pg: shutdown primary(A), promopt > standby(B) to > >>> >>>>>>>>> >> new > >>> >>>>>>>>> >> primary. > >>> >>>>>>>>> >> 4. nova-conductor is running in the whole process. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> After some seconds, I found some nova-conductor processes > are > >>> >>>>>>>>> >> hang > >>> >>>>>>>>> >> with > >>> >>>>>>>>> >> status futex_wait_queue_me, and the status of the query > in DB > >>> >>>>>>>>> >> is > >>> >>>>>>>>> >> "idle in > >>> >>>>>>>>> >> transaction", the transaction was not commited or > rollbacked! > >>> >>>>>>>>> >> I think disconnection was handled in the oslo_db, which > will > >>> >>>>>>>>> >> send > >>> >>>>>>>>> >> a > >>> >>>>>>>>> >> ping(select 1) to DB. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> If DB was switchd over, the connection in the pool would > be > >>> >>>>>>>>> >> set > >>> >>>>>>>>> >> with > >>> >>>>>>>>> >> status invalid, and reconnect after next check out. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> ###error messages from nova-conductor > >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] > >>> >>>>>>>>> >> [tid:122397712] > >>> >>>>>>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError] > >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError > >>> >>>>>>>>> >> exception > >>> >>>>>>>>> >> wrapped from > >>> >>>>>>>>> >> (psycopg2.ProgrammingError) execute cannot be used while > an > >>> >>>>>>>>> >> asynchronous > >>> >>>>>>>>> >> query is underway [SQL: 'SELECT 1'] > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback > (most > >>> >>>>>>>>> >> recent > >>> >>>>>>>>> >> call > >>> >>>>>>>>> >> last): > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File > >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", > line > >>> >>>>>>>>> >> 1139, > >>> >>>>>>>>> >> in > >>> >>>>>>>>> >> _execute_context > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters context) > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File > >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py", > line > >>> >>>>>>>>> >> 450, > >>> >>>>>>>>> >> in > >>> >>>>>>>>> >> do_execute > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters > >>> >>>>>>>>> >> cursor.execute(statement, > >>> >>>>>>>>> >> parameters) > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters > >>> >>>>>>>>> >> ProgrammingError: > >>> >>>>>>>>> >> execute > >>> >>>>>>>>> >> cannot be used while an asynchronous query is underway > >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters > >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] > >>> >>>>>>>>> >> [tid:122397712] > >>> >>>>>>>>> >> [log.py:122 error] > [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] > >>> >>>>>>>>> >> Error closing > >>> >>>>>>>>> >> cursor > >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool Traceback (most > >>> >>>>>>>>> >> recent > >>> >>>>>>>>> >> call last): > >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool File > >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", > line > >>> >>>>>>>>> >> 1226, > >>> >>>>>>>>> >> in > >>> >>>>>>>>> >> _safe_close_cursor > >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool > cursor.close() > >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: > >>> >>>>>>>>> >> close > >>> >>>>>>>>> >> cannot be > >>> >>>>>>>>> >> used while an asynchronous query is underway > >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> ###ps status of nova-conductor > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> > POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log > >>> >>>>>>>>> >> # cat > >>> >>>>>>>>> >> /proc/33316/stack > >>> >>>>>>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120 > >>> >>>>>>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280 > >>> >>>>>>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0 > >>> >>>>>>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180 > >>> >>>>>>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b > >>> >>>>>>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> ### stack of the nova-conductor process > >>> >>>>>>>>> >> POD6-Mongodb03:/tmp # pstack 33316 > >>> >>>>>>>>> >> #0 0x00002b8449e35f4d in __lll_lock_wait () from > >>> >>>>>>>>> >> /lib64/libpthread.so.0 > >>> >>>>>>>>> >> #1 0x00002b8449e31d02 in _L_lock_791 () from > >>> >>>>>>>>> >> /lib64/libpthread.so.0 > >>> >>>>>>>>> >> #2 0x00002b8449e31c08 in pthread_mutex_lock () from > >>> >>>>>>>>> >> /lib64/libpthread.so.0 > >>> >>>>>>>>> >> #3 0x00002b84554c44ab in pq_abort () from > >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so > >>> >>>>>>>>> >> #4 0x00002b84554c955e in psyco_conn_rollback () from > >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so > >>> >>>>>>>>> >> #5 0x00002b8449b42b50 in PyEval_EvalFrameEx () from > >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 > >>> >>>>>>>>> >> #6 0x00002b8449b42ad0 in PyEval_EvalFrameEx () from > >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> The psycopg2 was trying to close the cursor, and try to > get > >>> >>>>>>>>> >> the > >>> >>>>>>>>> >> mutex lock > >>> >>>>>>>>> >> "pthread_mutex_lock", but it seems that the cursor was > used > >>> >>>>>>>>> >> by > >>> >>>>>>>>> >> other > >>> >>>>>>>>> >> session. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> Questions: > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> 1. What the error "ProgrammingError: close cannot be used > >>> >>>>>>>>> >> while an > >>> >>>>>>>>> >> asynchronous query is underway" mean? > >>> >>>>>>>>> >> AFAIK, these caused by psycopg2, which means a > asynchronous > >>> >>>>>>>>> >> query > >>> >>>>>>>>> >> was > >>> >>>>>>>>> >> executed in one connection. > >>> >>>>>>>>> >> But the I think the sqlalchemy was thread safe since it > was > >>> >>>>>>>>> >> patched by > >>> >>>>>>>>> >> eventlet, see details in > >>> >>>>>>>>> >> eventlet/support/psycopg2_patcher.py > >>> >>>>>>>>> >> And we can see different green thread number in the log, > as: > >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397712] > >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397815] > >>> >>>>>>>>> >> So, I guess the connection pool in one process is safe. > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> 2. The nova-conductor was a multi-thread python client, > which > >>> >>>>>>>>> >> forked > >>> >>>>>>>>> >> several child process. > >>> >>>>>>>>> >> ps -elf|grep -i nova-conductor > >>> >>>>>>>>> >> 30878 1 pool_s /usr/bin/nova-conductor > >>> >>>>>>>>> >> 36364 1 ep_pol /usr/bin/nova-conductor > >>> >>>>>>>>> >> 36365 1 futex_ /usr/bin/nova-conductor > >>> >>>>>>>>> >> 36366 1 ep_pol /usr/bin/nova-conductor > >>> >>>>>>>>> >> 36367 1 ep_pol /usr/bin/nova-conductor > >>> >>>>>>>>> >> 36368 1 ep_pol /usr/bin/nova-conductor > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> If the nova-conductor was started with only one child, > the > >>> >>>>>>>>> >> problem > >>> >>>>>>>>> >> was not > >>> >>>>>>>>> >> happen. > >>> >>>>>>>>> >> Does this mean the connection/engine CAN NOT shared in > these > >>> >>>>>>>>> >> child > >>> >>>>>>>>> >> processes? > >>> >>>>>>>>> >> > >>> >>>>>>>>> >> Thanks. > >>> >>>>>>>>> > > >>> >>>>>>>>> > -- > >>> >>>>>>>>> > SQLAlchemy - > >>> >>>>>>>>> > The Python SQL Toolkit and Object Relational Mapper > >>> >>>>>>>>> > > >>> >>>>>>>>> > http://www.sqlalchemy.org/ > >>> >>>>>>>>> > > >>> >>>>>>>>> > To post example code, please provide an MCVE: Minimal, > >>> >>>>>>>>> > Complete, > >>> >>>>>>>>> > and > >>> >>>>>>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve > for > >>> >>>>>>>>> > a > >>> >>>>>>>>> > full > >>> >>>>>>>>> > description. > >>> >>>>>>>>> > --- > >>> >>>>>>>>> > You received this message because you are subscribed to > the > >>> >>>>>>>>> > Google > >>> >>>>>>>>> > Groups > >>> >>>>>>>>> > "sqlalchemy" group. > >>> >>>>>>>>> > To unsubscribe from this group and stop receiving emails > from > >>> >>>>>>>>> > it, > >>> >>>>>>>>> > send an > >>> >>>>>>>>> > email to sqlalchemy+...@googlegroups.com. > >>> >>>>>>>>> > To post to this group, send email to > >>> >>>>>>>>> > sqlal...@googlegroups.com. > >>> >>>>>>>>> > Visit this group at > >>> >>>>>>>>> > https://groups.google.com/group/sqlalchemy. > >>> >>>>>>>>> > For more options, visit https://groups.google.com/d/optout. > > >>> >>>>>>> > >>> >>>>>>> -- > >>> >>>>>>> SQLAlchemy - > >>> >>>>>>> The Python SQL Toolkit and Object Relational Mapper > >>> >>>>>>> > >>> >>>>>>> http://www.sqlalchemy.org/ > >>> >>>>>>> > >>> >>>>>>> To post example code, please provide an MCVE: Minimal, > Complete, > >>> >>>>>>> and > >>> >>>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve > for a > >>> >>>>>>> full > >>> >>>>>>> description. > >>> >>>>>>> --- > >>> >>>>>>> You received this message because you are subscribed to the > Google > >>> >>>>>>> Groups "sqlalchemy" group. > >>> >>>>>>> To unsubscribe from this group and stop receiving emails from > it, > >>> >>>>>>> send > >>> >>>>>>> an email to sqlalchemy+...@googlegroups.com. > >>> >>>>>>> To post to this group, send email to sqlal...@googlegroups.com. > > >>> >>>>>>> Visit this group at https://groups.google.com/group/sqlalchemy. > > >>> >>>>>>> For more options, visit https://groups.google.com/d/optout. > >>> >>> > >>> >>> -- > >>> >>> SQLAlchemy - > >>> >>> The Python SQL Toolkit and Object Relational Mapper > >>> >>> > >>> >>> http://www.sqlalchemy.org/ > >>> >>> > >>> >>> To post example code, please provide an MCVE: Minimal, Complete, > and > >>> >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a > full > >>> >>> description. > >>> >>> --- > >>> >>> You received this message because you are subscribed to the Google > >>> >>> Groups > >>> >>> "sqlalchemy" group. > >>> >>> To unsubscribe from this group and stop receiving emails from it, > send > >>> >>> an > >>> >>> email to sqlalchemy+...@googlegroups.com. > >>> >>> To post to this group, send email to sqlal...@googlegroups.com. > >>> >>> Visit this group at https://groups.google.com/group/sqlalchemy. > >>> >>> For more options, visit https://groups.google.com/d/optout. > >> > >> -- > >> SQLAlchemy - > >> The Python SQL Toolkit and Object Relational Mapper > >> > >> http://www.sqlalchemy.org/ > >> > >> To post example code, please provide an MCVE: Minimal, Complete, and > >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full > >> description. > >> --- > >> You received this message because you are subscribed to the Google > Groups > >> "sqlalchemy" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an > >> email to sqlalchemy+...@googlegroups.com <javascript:>. > >> To post to this group, send email to sqlal...@googlegroups.com > <javascript:>. > >> Visit this group at https://groups.google.com/group/sqlalchemy. > >> For more options, visit https://groups.google.com/d/optout. > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.