Re: [sqlalchemy] Re: long transaction after database switched over

jiajunsu . zju Mon, 14 May 2018 05:03:08 -0700

I've uploaded the logs 
here: https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log


The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032

At  2018-05-14T11:11:39.970, we got the log " jjs rec already in pool 
checkin", but I just deleted `return` in code, so the problem happen again.




在 2018年5月14日星期一 UTC+8上午10:48:49，Mike Bayer写道：
>
> as well as the log that shows this: 
>
> +    if self in pool._pool.queue: 
> +        pool.logger.debug("jjs rec already in pool checkin, skip %r", 
> self) 
> +        return 
>
>
> actually happening. 
>
>
>
>
> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer <[email protected] 
> <javascript:>> wrote: 
> > On Sun, May 13, 2018 at 10:35 PM,  <[email protected] <javascript:>> 
> wrote: 
> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We 
> work 
> >> together and add logs in the same environment, add prefix of name to 
> get a 
> >> keyword for log filter. 
> >> 
> >> The doubled ConnectionRecords(with None connection) are got by two 
> different 
> >> coroutines, and the variable record.connection is rewrite by the second 
> >> coroutines. 
> >> After the record being put back to QueuePool again, there are two same 
> >> record with same connection in the pool, which will cause the next two 
> >> coroutines get conflict and deadlock in psycopg2 as above. 
> > 
> > OK, I need the detailed step by step that shows how the two coroutines 
> > are interacting such that they both are returning the same connection. 
> >   The nova log above seems to show just one coroutine I assume since 
> > there is just one nova request ID, if I understand correctly. 
> > 
> >> 
> >> The logs of psycopg2 has been pasted at 
> >> https://github.com/psycopg/psycopg2/issues/703 
> > 
> > The above issue says the bug has been found in SQLAlchemy but there is 
> > no bug report yet and I've not yet been shown how the behavior here is 
> > possible.      I posted a straightforward example above, can you 
> > please show me where the two coroutines get mixed up ? 
> > 
> > 
> > 
> > 
> >> 
> >> 
> >> 
> >> 在 2018年5月11日星期五 UTC+8上午3:05:46，Mike Bayer写道： 
> >>> 
> >>> I tried to follow your log lines, which is difficult because I don't 
> >>> know where those log statements actually are and I don't know what 
> >>> acronyms like "jjs", "cjr", mean.   But it does look like it's al in 
> >>> one thread (so not async GC) and all in one Nova request (so perhaps 
> >>> just one greenlet involved, unless other greenlets are affecting this 
> >>> without them being displayed in this log snippet).   I do see the 
> >>> connection record doubled at the end.   So here is the illustration of 
> >>> those steps in isolation, the connection record is not doubled.  Can 
> >>> you alter this program to show this condition occurring? 
> >>> 
> >>> from sqlalchemy import create_engine 
> >>> 
> >>> e = create_engine("postgresql://scott:tiger@localhost/test", 
> >>> echo_pool='debug') 
> >>> e.connect().close() 
> >>> 
> >>> assert len(e.pool._pool.queue) == 1 
> >>> 
> >>> c1 = e.connect() 
> >>> 
> >>> assert len(e.pool._pool.queue) == 0 
> >>> 
> >>> c1.engine.pool._invalidate(c1._Connection__connection, None) 
> >>> c1.invalidate(e) 
> >>> 
> >>> # connection record is back 
> >>> assert len(e.pool._pool.queue) == 1 
> >>> 
> >>> # do a reconnect 
> >>> c1.connection 
> >>> 
> >>> # uses the record again 
> >>> assert len(e.pool._pool.queue) == 0 
> >>> 
> >>> # close the Connection 
> >>> c1.close() 
> >>> 
> >>> # record is back 
> >>> assert len(e.pool._pool.queue) == 1 
> >>> 
> >>> # etc. 
> >>> c1 = e.connect() 
> >>> assert len(e.pool._pool.queue) == 0 
> >>> 
> >>> c1.close() 
> >>> assert len(e.pool._pool.queue) == 1 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer <[email protected]> 
>
> >>> wrote: 
> >>> > Here's a print statement you might want to try: 
> >>> > 
> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py 
> >>> > index 89a4cea7c..452db4883 100644 
> >>> > --- a/lib/sqlalchemy/pool.py 
> >>> > +++ b/lib/sqlalchemy/pool.py 
> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection, 
> connection_record, 
> >>> >      been garbage collected. 
> >>> > 
> >>> >      """ 
> >>> > +    print( 
> >>> > +        "FINALIZE FAIRY:  GREENLET ID: %s, DB CONN: %s, FAIRY: %s, 
> REF: 
> >>> > %s  " 
> >>> > +        "CONNECTION_REC: %s CONNECTION_REC REF %s" % ( 
> >>> > +            "put greenlet id here", 
> >>> > +            connection, 
> >>> > +            fairy, 
> >>> > +            ref, 
> >>> > +            connection_record, 
> >>> > +            connection_record.fairy_ref 
> >>> > +            if connection_record is not None else "n/a" 
> >>> > +        ) 
> >>> > +    ) 
> >>> > + 
> >>> >      _refs.discard(connection_record) 
> >>> > 
> >>> >      if ref is not None: 
> >>> > 
> >>> > 
> >>> > 
> >>> > 
> >>> > 
> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer <
> [email protected]> 
> >>> > wrote: 
> >>> >> On Thu, May 10, 2018 at 7:23 AM,  <[email protected]> wrote: 
> >>> >>> Hi Mike, 
> >>> >>> 
> >>> >>> I added more logs and finnaly find the reason and a method to 
> avoid 
> >>> >>> the 
> >>> >>> deadlock between coroutines. 
> >>> >>> 
> >>> >>> In summary, the deadlock happend because two different coroutines 
> use 
> >>> >>> the 
> >>> >>> same ConnectionRecord with same connection. 
> >>> >>> It happends under QueuePool, when the DB connections are reset by 
> the 
> >>> >>> DB 
> >>> >>> server(not related with the backend driver). 
> >>> >>> 
> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, and 
> apply a 
> >>> >>> weakref of the ConnectionFairy, with callback function 
> >>> >>> `_finalize_fairy`. 
> >>> >>> 2.When the connection get disconnected, it receives an 
> >>> >>> exception(psycopg2.OperationalError), and goto 
> >>> >>> Connection._handle_dbapi_exception. 
> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both 
> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`. 
> >>> >>> 4.In `ConnectionFairy.invalidate`, `_ConnectionRecord._checkin` is 
> >>> >>> called, 
> >>> >>> and put the ConnectionRecord back to the QueuePool. 
> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is called, 
> and 
> >>> >>> **del** 
> >>> >>> the ConnectionFairy. 
> >>> >>> 6.The **del** operation will make weakref of ConnectionFairy to 
> dead, 
> >>> >>> and 
> >>> >>> the callback `_finalize_fairy` executed. 
> >>> >>> 7.In `_finalize_fairy`, another `_ConnectionRecord._checkin` is 
> >>> >>> called, put 
> >>> >>> the ConnectionRecord to the QueuePool again. 
> >>> >> 
> >>> >> Here's what does not make sense.  You are claiming that a simple 
> >>> >> invalidation will result in the same ConnectionRecord being checked 
> in 
> >>> >> twice.   This is obviously not the case, what is special in your 
> >>> >> situation that makes this happen? 
> >>> >> 
> >>> >> If I follow the steps you refer towards, in step 4, when 
> >>> >> ConnectionFairy._checkin is called, that calls finalize_fairy 
> >>> >> directly, which then calls ConnectionRecord.checkin().  Within 
> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is set to 
> None. 
> >>> >>    Then back in ConnectionFairy._checkin(), it also sets 
> >>> >> self._connection_record = None. 
> >>> >> 
> >>> >> If we come back into finalize_fairy a *second* time then, as a 
> result 
> >>> >> of the "del" which occurs in _handle_error(), and the 
> >>> >> connection_record is present, the function will exit immediately, 
> >>> >> because: 
> >>> >> 
> >>> >>        if connection_record.fairy_ref is not ref: 
> >>> >>             return 
> >>> >> 
> >>> >> So the situation as given is not possible without concurrent access 
> to 
> >>> >> the ConnectionFairy being introduced externally to the pool. 
> >>> >> 
> >>> >> There is exactly one thing I can think of that is both unique to 
> the 
> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL form of 
> it, 
> >>> >> that might be important here, and it's that I think psycopg2's 
> >>> >> connection.close() method will defer to another greenlet.  That 
> means 
> >>> >> when we're inside of ConnectionRecord.invalidate(), and we call 
> >>> >> self.__close(), that could be deferring to another greenlet.   But 
> >>> >> this all happens before the connection is returned to the pool, so 
> I 
> >>> >> still don't know what is happening. 
> >>> >> 
> >>> >>> 
> >>> >>> The logs in detail is below. To prove my result is right, I tried 
> to 
> >>> >>> avoid 
> >>> >>> the scienario with this patch, and it worked. 
> >>> >>> 
> >>> >>> def checkin(self): 
> >>> >>>     self.fairy_ref = None 
> >>> >>>     connection = self.connection 
> >>> >>>     pool = self.__pool 
> >>> >>> 
> >>> >>>     while self.finalize_callback: 
> >>> >>>         finalizer = self.finalize_callback.pop() 
> >>> >>>         finalizer(connection) 
> >>> >>> +    if self in pool._pool.queue: 
> >>> >>> +        pool.logger.debug("jjs rec already in pool checkin, skip 
> %r", 
> >>> >>> self) 
> >>> >>> +        return 
> >>> >> 
> >>> >> as mentioned above it's not possible to reach this point without 
> >>> >> concurrent access to the pooled connection, which means this fix 
> could 
> >>> >> fail also (with at least threaded-style concurrency, not greenlet 
> >>> >> style).   The original source of the greenlet issue must still be 
> >>> >> identified. 
> >>> >> 
> >>> >> I would need a self-contained proof of concept that demonstrates 
> this 
> >>> >> issue occurring.  The log output is not that helpful because it at 
> >>> >> least would need to write out the identifier of the current 
> greenlet, 
> >>> >> since it is very likely that greenlets have something to do with 
> this 
> >>> >> issue. 
> >>> >> 
> >>> >> 
> >>> >> 
> >>> >>> 
> >>> >>>     if pool.dispatch.checkin: 
> >>> >>>         pool.dispatch.checkin(connection, self) 
> >>> >>>     pool._return_conn(self) 
> >>> >>> 
> >>> >>> Trace the ConnectionRecord 0x88fc5d0 
> >>> >>> 
> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new connection 
> >>> >>> <connection object at 0x81f69b0; closed: 0> 
> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn success 
> >>> >>> <connection 
> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection <connection object 
> at 
> >>> >>> 0x81f69b0; closed: 0>, pool queue 
> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord 
> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at 
> >>> >>> 0x79fdad0>]) 
> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout fairy 
> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> pool queue 
> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at 0x7a04810>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>]) 
> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection <connection 
> >>> >>> object at 
> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool size: 10 
> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: 40 
> Current 
> >>> >>> Checked 
> >>> >>> out connections: 8, queue 
> deque([<sqlalchemy.pool._ConnectionRecord 
> >>> >>> object 
> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at 
> >>> >>> 0x79fdad0>]) 
> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs 
> >>> >>> _ConnectionFairy._checkout 
> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, rec.conn 
> >>> >>> <connection object at 0x81f69b0; closed: 0> 
> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor ERROR 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn 
> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, err 
> >>> >>> connection 
> >>> >>> pointer is NULL 
> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter newraise 
> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter finally 
> >>> >>>>>Connection._handle_dbapi_exception --> finally --> 
> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate** 
> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate --> 
> >>> >>>>> ConnectionFairy.invalidate --> _ConnectionRecord.invalidate & 
> >>> >>>>> **self._checkin()** 
> >>> >>>>>self.invalidate() --> **del self._root.__connection** 
> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs connectionFairy 
> >>> >>> invalidate 
> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor INFO 
> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:109 info] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate connection 
> >>> >>> <connection 
> >>> >>> object at 0x81f69b0; closed: 2> (reason: 
> OperationalError:connection 
> >>> >>> pointer 
> >>> >>> is NULL 
> >>> >>>>>_ConnectionRecord.invalidate(e)  --> 
> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, e) & 
> >>> >>>>> self.__close() & 
> >>> >>>>> self.connection = None 
> >>> >>>>>self.__close() --> self.finalize_callback.clear() & 
> >>> >>>>> self.__pool._close_connection(self.connection) 
> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing connection 
> >>> >>> <connection 
> >>> >>> object at 0x81f69b0; closed: 2> 
> >>> >>>>>Pool._close_connection(connection) --> 
> >>> >>>>> self._dialect.do_close(connection) 
> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) --> 
> >>> >>>>> dbapi_connection.close() FIN 
> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs 
> >>> >>> _ConnectionFairy._checkin 
> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at 
> 0x88fc5d0> 
> >>> >>> rec.conn None 
> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, conn_rec, 
> pool, 
> >>> >>>>> None, 
> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None 
> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs _finalize_fairy 
> args 
> >>> >>> conn 
> >>> >>> None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord 
> object at 
> >>> >>> 0x88fc5d0>. ref None, fairy <sqlalchemy.pool._ConnectionFairy 
> object 
> >>> >>> at 
> >>> >>> 0x329c0d0> 
> >>> >>>>>_finalize_fairy --> connection_record.checkin() 
> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to 
> >>> >>> connection_record.checkin 
> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin 
> >>> >>> None, 
> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at 
> >>> >>> 0x7524590>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]), 
> >>> >>> finalize_callback 
> >>> >>> deque([]) 
> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, 
> >>> >>> pool 
> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at 
> 0x7524590>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) 
> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor ERROR 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception during reset 
> or 
> >>> >>> similar 
> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to 
> >>> >>> connection_record.invalidate 
> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to 
> >>> >>> connection_record.checkin 
> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin 
> >>> >>> None, 
> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at 
> >>> >>> 0x7524590>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]), 
> >>> >>> finalize_callback 
> >>> >>> deque([]) 
> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor DEBUG 
> >>> >>> [pid:4528] 
> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] 
> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, 
> >>> >>> pool 
> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at 
> 0x7524590>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, 
> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) 
> >>> >>> 
> >>> >>> 
> >>> >>> 
> >>> >>> 
> >>> >>> 
> >>> >>> 
> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04，[email protected]写道： 
> >>> >>>> 
> >>> >>>> I found why we could get the same connection twice from the 
> pool... 
> >>> >>>> 
> >>> >>>> Because we put it into pool twice. 
> >>> >>>> 
> >>> >>>> Any suggestion for finding why it put the same connection into 
> pool 
> >>> >>>> within 
> >>> >>>> different coroutines? 
> >>> >>>> 
> >>> >>>> 
> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG 
> >>> >>>> [pid:15755] 
> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug] 
> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn checkin 
> >>> >>>> <connection 
> >>> >>>> object at 0x111d9740; closed: 0> 
> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor DEBUG 
> >>> >>>> [pid:15755] 
> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug] 
> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn checkin 
> >>> >>>> <connection 
> >>> >>>> object at 0x111d9740; closed: 0> 
> >>> >>>> ... 
> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor DEBUG 
> >>> >>>> [pid:15755] 
> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug] 
> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection <connection 
> >>> >>>> object at 
> >>> >>>> 0x111d9740; closed: 0> checked out from pool 
> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor DEBUG 
> >>> >>>> [pid:15755] 
> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug] 
> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection <connection 
> >>> >>>> object at 
> >>> >>>> 0x111d9740; closed: 0> checked out from pool 
> >>> >>>> 
> >>> >>>> 
> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52，[email protected]写道： 
> >>> >>>>> 
> >>> >>>>> I added logs in sqlalchemy/pool.py 
> >>> >>>>> 
> >>> >>>>> QueuePool.checkout 
> >>> >>>>> 
> >>> >>>>> @classmethod 
> >>> >>>>> def checkout(cls, pool): 
> >>> >>>>>     rec = pool._do_get() 
> >>> >>>>>     try: 
> >>> >>>>>         dbapi_connection = rec.get_connection() 
> >>> >>>>>         pool.logger.debug("jjs get conn success %s", 
> >>> >>>>> dbapi_connection) 
> >>> >>>>>     except: 
> >>> >>>>>         pool.logger.debug("jjs get error in get_connection") 
> >>> >>>>>         with util.safe_reraise(): 
> >>> >>>>>             rec.checkin() 
> >>> >>>>>     echo = pool._should_log_debug() 
> >>> >>>>>     fairy = _ConnectionFairy(dbapi_connection, rec, echo) 
> >>> >>>>>     rec.fairy_ref = weakref.ref( 
> >>> >>>>>         fairy, 
> >>> >>>>>         lambda ref: _finalize_fairy and 
> >>> >>>>>         _finalize_fairy( 
> >>> >>>>>             dbapi_connection, 
> >>> >>>>>             rec, pool, ref, echo) 
> >>> >>>>>     ) 
> >>> >>>>>     _refs.add(rec) 
> >>> >>>>>     if echo: 
> >>> >>>>>         pool.logger.debug("jjs conn %s, pool status %s", 
> >>> >>>>> dbapi_connection, pool.status()) 
> >>> >>>>>         pool.logger.debug("Connection %r checked out from pool", 
> >>> >>>>>                           dbapi_connection) 
> >>> >>>>>     return fairy 
> >>> >>>>> 
> >>> >>>>> Then I got logs below: 
> >>> >>>>> 
> >>> >>>>> coroutine A: 
> >>> >>>>> 
> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info] 
> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is 
> False, 
> >>> >>>>> timeout 
> >>> >>>>> 30 
> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] 
> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new 
> connection 
> >>> >>>>> <connection object at 0x903f8c0; closed: 0> 
> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] 
> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success 
> >>> >>>>> <connection 
> >>> >>>>> object at 0x903f8c0; closed: 0> 
> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] 
> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection 
> >>> >>>>> object at 
> >>> >>>>> 0x903f8c0;  closed: 0>, pool status Pool size: 10  Connections 
> in 
> >>> >>>>> pool: 1 
> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out 
> >>> >>>>> connections: 45 
> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] 
> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection 
> <connection 
> >>> >>>>> object at 
> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool 
> >>> >>>>> 
> >>> >>>>> coroutine B: 
> >>> >>>>> 
> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info] 
> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is 
> False, 
> >>> >>>>> timeout 
> >>> >>>>> 30 
> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] 
> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success 
> >>> >>>>> <connection 
> >>> >>>>> object at 0x903f8c0;  closed: 0> 
> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] 
> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection 
> >>> >>>>> object at 
> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10  Connections in 
> >>> >>>>> pool: 0 
> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out 
> >>> >>>>> connections: 46 
> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG 
> >>> >>>>> [pid:55528] 
> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] 
> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection 
> <connection 
> >>> >>>>> object at 
> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool 
> >>> >>>>> 
> >>> >>>>> 
> >>> >>>>> It seems that when we create_connection in QueuePool._do_get, we 
> may 
> >>> >>>>> not 
> >>> >>>>> get the lock _pool.mutex(~Queue.mutex). 
> >>> >>>>> 
> >>> >>>>> 
> >>> >>>>> 
> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14，Mike Bayer写道： 
> >>> >>>>>> 
> >>> >>>>>> 
> >>> >>>>>> 
> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <[email protected]> wrote: 
> >>> >>>>>>> 
> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two 
> coroutine 
> >>> >>>>>>> use the 
> >>> >>>>>>> same connection before the pthread_mutex_lock released. 
> >>> >>>>>>> 
> >>> >>>>>>> Maybe something in the connection pool goes wrong? 
> >>> >>>>>>> [pid] [coroutine_id] msg 
> >>> >>>>>> 
> >>> >>>>>> 
> >>> >>>>>> 
> >>> >>>>>> This seems unlikely.   Openstack uses the SQLAlchemy connection 
> >>> >>>>>> pool 
> >>> >>>>>> with eventlet patching in conjunction with the MySQL drivers 
> and 
> >>> >>>>>> nothing 
> >>> >>>>>> like that happens.   The biggest issue SQLAlchemy had with 
> >>> >>>>>> greenlets is that 
> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause a dirty 
> >>> >>>>>> connection 
> >>> >>>>>> to go back into the pool, that was fixed in 1.2. 
> >>> >>>>>> 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 
> >>> >>>>>>> 0x94122f0, 
> >>> >>>>>>> cb 0x23ad320 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 
> 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 
> 0x94122f0, 
> >>> >>>>>>> async_cursor 0x881ac00 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 
> 0x94122f0, 
> >>> >>>>>>> async_cursor 0x881ac00 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, 
> autocommit 
> >>> >>>>>>> = 0, 
> >>> >>>>>>> status = 2 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 
> >>> >>>>>>> __lock 1 
> >>> >>>>>>> 
> >>> >>>>>>> 
> >>> >>>>>>> Below is the logs grep by coroutine id: 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]finish send query, before psyco_wait, conn 
> >>> >>>>>>> 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 
> >>> >>>>>>> 0x94122f0, 
> >>> >>>>>>> cb 0x23ad320 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 
> 0x94122f0 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: poll writing 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ 
> >>> >>>>>>> 
> >>> >>>>>>> ----- 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for 
> >>> >>>>>>> connection 
> >>> >>>>>>> at 0x8de2d30 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: init cursor object at 
> 0xa6c2650 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null), 
> conn = 
> >>> >>>>>>> 0x8de2d30 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: good cursor object at 
> 0xa6c2650, 
> >>> >>>>>>> refcnt = 1 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650: 
> >>> >>>>>>> refcnt = 
> >>> >>>>>>> 1 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 
> 0x94122f0, 
> >>> >>>>>>> async_cursor 0x881ac00 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 
> 0x94122f0, 
> >>> >>>>>>> async_cursor 0x881ac00 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, 
> autocommit 
> >>> >>>>>>> = 0, 
> >>> >>>>>>> status = 2 
> >>> >>>>>>> 
> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 
> >>> >>>>>>> __lock 1 
> >>> >>>>>>> 
> >>> >>>>>>> 
> >>> >>>>>>> 在 2018年4月28日星期六 UTC+8下午4:07:34，[email protected]写道： 
> >>> >>>>>>>> 
> >>> >>>>>>>> We reproduced this problem and added logs in psycopg2, found 
> a 
> >>> >>>>>>>> confusing thing and reported it to psycopg2. 
> >>> >>>>>>>> 
> >>> >>>>>>>> It seems sqlalchemy & psycopg2 stack flow is: 
> >>> >>>>>>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit 
> >>> >>>>>>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn) 
> >>> >>>>>>>> 3.get exception in sqlalchemy, and do_rollback 
> >>> >>>>>>>> 4.from sqlalchemy to psycopg: pq_abort 
> >>> >>>>>>>> 5.psycopg get pthread_mutex_lock deadlock 
> >>> >>>>>>>> 
> >>> >>>>>>>> What we have done to reproduce it is: stop master pg-server 
> and 
> >>> >>>>>>>> promote the slave pg-server to master, with moving FIP from 
> old 
> >>> >>>>>>>> master to 
> >>> >>>>>>>> slave. At the same time, let nova-conductor do quite a lot db 
> >>> >>>>>>>> query 
> >>> >>>>>>>> requests. 
> >>> >>>>>>>> 
> >>> >>>>>>>> [1] https://github.com/psycopg/psycopg2/issues/703 
> >>> >>>>>>>> 
> >>> >>>>>>>> 在 2018年4月23日星期一 UTC+8下午9:45:04，Mike Bayer写道： 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> On Mon, Apr 23, 2018 at 9:03 AM,  <[email protected]> 
> wrote: 
> >>> >>>>>>>>> > Sorry for reply on this topic. 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > We recently get the same problem in our production 
> >>> >>>>>>>>> > environment. 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > I found a patch in other lib [1], and they added 
> conn.close() 
> >>> >>>>>>>>> > in 
> >>> >>>>>>>>> > exception 
> >>> >>>>>>>>> > psycopg2.ProgrammingError. 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > Shall we do the same in [2] ? 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> SQLAlchemy does things much more carefully than that, we 
> parse 
> >>> >>>>>>>>> the 
> >>> >>>>>>>>> error message for specific ones that correspond to 
> "connection 
> >>> >>>>>>>>> is no 
> >>> >>>>>>>>> longer usable", we call these "is_disconnect" but it can be 
> any 
> >>> >>>>>>>>> invalidating condition. 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> You can make these yourself, and they can also be made to be 
> >>> >>>>>>>>> part of 
> >>> >>>>>>>>> oslo.db, using the handle_error event: 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error
>  
> >>> >>>>>>>>> 
> >>> >>>>>>>>> within oslo.db you would want to propose a change here: 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387
>  
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > [1] 
> >>> >>>>>>>>> > 
> https://github.com/aio-libs/aiopg/pull/415/files?diff=split 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > [2] 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > 
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289
>  
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31，JinRong Cai写道： 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> Hi  Michael , 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> I am using openstack with postgresql which sqlalchemy and 
> >>> >>>>>>>>> >> oslo_db 
> >>> >>>>>>>>> >> module 
> >>> >>>>>>>>> >> were used. 
> >>> >>>>>>>>> >> And there are some problems after my pg database switched 
> >>> >>>>>>>>> >> over. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> Here is my switch over process: 
> >>> >>>>>>>>> >> 1. nova-conductor(python application) is running with DB 
> >>> >>>>>>>>> >> connection 
> >>> >>>>>>>>> >> strings point to vip , which is in primary site(A) of pg. 
> >>> >>>>>>>>> >> 2. switch VIP from primary(A) to new primary(B) 
> >>> >>>>>>>>> >> 3. switch over pg: shutdown primary(A), promopt 
> standby(B) to 
> >>> >>>>>>>>> >> new 
> >>> >>>>>>>>> >> primary. 
> >>> >>>>>>>>> >> 4. nova-conductor is running in the whole process. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> After some seconds, I found some nova-conductor processes 
> are 
> >>> >>>>>>>>> >> hang 
> >>> >>>>>>>>> >> with 
> >>> >>>>>>>>> >> status futex_wait_queue_me, and the status of the query 
> in DB 
> >>> >>>>>>>>> >> is 
> >>> >>>>>>>>> >> "idle in 
> >>> >>>>>>>>> >> transaction", the transaction was not commited or 
> rollbacked! 
> >>> >>>>>>>>> >> I think disconnection was handled in the oslo_db, which 
> will 
> >>> >>>>>>>>> >> send 
> >>> >>>>>>>>> >> a 
> >>> >>>>>>>>> >> ping(select 1) to DB. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> If DB was switchd over, the connection in the pool would 
> be 
> >>> >>>>>>>>> >> set 
> >>> >>>>>>>>> >> with 
> >>> >>>>>>>>> >> status invalid, and reconnect after next check out. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> ###error messages from nova-conductor 
> >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] 
> >>> >>>>>>>>> >> [tid:122397712] 
> >>> >>>>>>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError] 
> >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError 
> >>> >>>>>>>>> >> exception 
> >>> >>>>>>>>> >> wrapped from 
> >>> >>>>>>>>> >> (psycopg2.ProgrammingError) execute cannot be used while 
> an 
> >>> >>>>>>>>> >> asynchronous 
> >>> >>>>>>>>> >> query is underway [SQL: 'SELECT 1'] 
> >>> >>>>>>>>> >>  36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback 
> (most 
> >>> >>>>>>>>> >> recent 
> >>> >>>>>>>>> >> call 
> >>> >>>>>>>>> >> last): 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File 
> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 
> >>> >>>>>>>>> >> 1139, 
> >>> >>>>>>>>> >> in 
> >>> >>>>>>>>> >> _execute_context 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters     context) 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File 
> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py", 
> line 
> >>> >>>>>>>>> >> 450, 
> >>> >>>>>>>>> >> in 
> >>> >>>>>>>>> >> do_execute 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters 
> >>> >>>>>>>>> >> cursor.execute(statement, 
> >>> >>>>>>>>> >> parameters) 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters 
> >>> >>>>>>>>> >> ProgrammingError: 
> >>> >>>>>>>>> >> execute 
> >>> >>>>>>>>> >> cannot be used while an asynchronous query is underway 
> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters 
> >>> >>>>>>>>> >>  localhost nova-conductor ERROR [pid:36365] [MainThread] 
> >>> >>>>>>>>> >> [tid:122397712] 
> >>> >>>>>>>>> >> [log.py:122 error] 
> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] 
> >>> >>>>>>>>> >> Error closing 
> >>> >>>>>>>>> >> cursor 
> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool Traceback (most 
> >>> >>>>>>>>> >> recent 
> >>> >>>>>>>>> >> call last): 
> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool   File 
> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 
> >>> >>>>>>>>> >> 1226, 
> >>> >>>>>>>>> >> in 
> >>> >>>>>>>>> >> _safe_close_cursor 
> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool     
> cursor.close() 
> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: 
> >>> >>>>>>>>> >> close 
> >>> >>>>>>>>> >> cannot be 
> >>> >>>>>>>>> >> used while an asynchronous query is underway 
> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> ###ps status of nova-conductor 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> 
> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log 
> >>> >>>>>>>>> >> # cat 
> >>> >>>>>>>>> >> /proc/33316/stack 
> >>> >>>>>>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120 
> >>> >>>>>>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280 
> >>> >>>>>>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0 
> >>> >>>>>>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180 
> >>> >>>>>>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b 
> >>> >>>>>>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> ### stack of the nova-conductor process 
> >>> >>>>>>>>> >> POD6-Mongodb03:/tmp # pstack 33316 
> >>> >>>>>>>>> >> #0  0x00002b8449e35f4d in __lll_lock_wait () from 
> >>> >>>>>>>>> >> /lib64/libpthread.so.0 
> >>> >>>>>>>>> >> #1  0x00002b8449e31d02 in _L_lock_791 () from 
> >>> >>>>>>>>> >> /lib64/libpthread.so.0 
> >>> >>>>>>>>> >> #2  0x00002b8449e31c08 in pthread_mutex_lock () from 
> >>> >>>>>>>>> >> /lib64/libpthread.so.0 
> >>> >>>>>>>>> >> #3  0x00002b84554c44ab in pq_abort () from 
> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so 
> >>> >>>>>>>>> >> #4  0x00002b84554c955e in psyco_conn_rollback () from 
> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so 
> >>> >>>>>>>>> >> #5  0x00002b8449b42b50 in PyEval_EvalFrameEx () from 
> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 
> >>> >>>>>>>>> >> #6  0x00002b8449b42ad0 in PyEval_EvalFrameEx () from 
> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> The psycopg2 was trying to close the cursor, and try to 
> get 
> >>> >>>>>>>>> >> the 
> >>> >>>>>>>>> >> mutex lock 
> >>> >>>>>>>>> >> "pthread_mutex_lock", but it seems that the cursor was 
> used 
> >>> >>>>>>>>> >> by 
> >>> >>>>>>>>> >> other 
> >>> >>>>>>>>> >> session. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> Questions: 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> 1. What the error "ProgrammingError: close cannot be used 
> >>> >>>>>>>>> >> while an 
> >>> >>>>>>>>> >> asynchronous query is underway" mean? 
> >>> >>>>>>>>> >> AFAIK, these caused by psycopg2, which means a 
> asynchronous 
> >>> >>>>>>>>> >> query 
> >>> >>>>>>>>> >> was 
> >>> >>>>>>>>> >> executed in one connection. 
> >>> >>>>>>>>> >> But the I think the sqlalchemy was thread safe since it 
> was 
> >>> >>>>>>>>> >> patched by 
> >>> >>>>>>>>> >> eventlet, see details in 
> >>> >>>>>>>>> >> eventlet/support/psycopg2_patcher.py 
> >>> >>>>>>>>> >> And we can see different green thread number in the log, 
> as: 
> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397712] 
> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397815] 
> >>> >>>>>>>>> >> So, I guess the connection pool in one process is safe. 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> 2. The nova-conductor was a multi-thread python client, 
> which 
> >>> >>>>>>>>> >> forked 
> >>> >>>>>>>>> >> several child process. 
> >>> >>>>>>>>> >> ps -elf|grep -i nova-conductor 
> >>> >>>>>>>>> >> 30878  1 pool_s /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 36364  1 ep_pol /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 36365  1 futex_ /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 36366  1 ep_pol /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 36367  1 ep_pol /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 36368  1 ep_pol /usr/bin/nova-conductor 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> If the nova-conductor was started with only one child, 
> the 
> >>> >>>>>>>>> >> problem 
> >>> >>>>>>>>> >> was not 
> >>> >>>>>>>>> >> happen. 
> >>> >>>>>>>>> >> Does this mean the connection/engine CAN NOT shared in 
> these 
> >>> >>>>>>>>> >> child 
> >>> >>>>>>>>> >> processes? 
> >>> >>>>>>>>> >> 
> >>> >>>>>>>>> >> Thanks. 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > -- 
> >>> >>>>>>>>> > SQLAlchemy - 
> >>> >>>>>>>>> > The Python SQL Toolkit and Object Relational Mapper 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > http://www.sqlalchemy.org/ 
> >>> >>>>>>>>> > 
> >>> >>>>>>>>> > To post example code, please provide an MCVE: Minimal, 
> >>> >>>>>>>>> > Complete, 
> >>> >>>>>>>>> > and 
> >>> >>>>>>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve 
> for 
> >>> >>>>>>>>> > a 
> >>> >>>>>>>>> > full 
> >>> >>>>>>>>> > description. 
> >>> >>>>>>>>> > --- 
> >>> >>>>>>>>> > You received this message because you are subscribed to 
> the 
> >>> >>>>>>>>> > Google 
> >>> >>>>>>>>> > Groups 
> >>> >>>>>>>>> > "sqlalchemy" group. 
> >>> >>>>>>>>> > To unsubscribe from this group and stop receiving emails 
> from 
> >>> >>>>>>>>> > it, 
> >>> >>>>>>>>> > send an 
> >>> >>>>>>>>> > email to [email protected]. 
> >>> >>>>>>>>> > To post to this group, send email to 
> >>> >>>>>>>>> > [email protected]. 
> >>> >>>>>>>>> > Visit this group at 
> >>> >>>>>>>>> > https://groups.google.com/group/sqlalchemy. 
> >>> >>>>>>>>> > For more options, visit https://groups.google.com/d/optout. 
>
> >>> >>>>>>> 
> >>> >>>>>>> -- 
> >>> >>>>>>> SQLAlchemy - 
> >>> >>>>>>> The Python SQL Toolkit and Object Relational Mapper 
> >>> >>>>>>> 
> >>> >>>>>>> http://www.sqlalchemy.org/ 
> >>> >>>>>>> 
> >>> >>>>>>> To post example code, please provide an MCVE: Minimal, 
> Complete, 
> >>> >>>>>>> and 
> >>> >>>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve 
> for a 
> >>> >>>>>>> full 
> >>> >>>>>>> description. 
> >>> >>>>>>> --- 
> >>> >>>>>>> You received this message because you are subscribed to the 
> Google 
> >>> >>>>>>> Groups "sqlalchemy" group. 
> >>> >>>>>>> To unsubscribe from this group and stop receiving emails from 
> it, 
> >>> >>>>>>> send 
> >>> >>>>>>> an email to [email protected]. 
> >>> >>>>>>> To post to this group, send email to [email protected]. 
>
> >>> >>>>>>> Visit this group at https://groups.google.com/group/sqlalchemy. 
>
> >>> >>>>>>> For more options, visit https://groups.google.com/d/optout. 
> >>> >>> 
> >>> >>> -- 
> >>> >>> SQLAlchemy - 
> >>> >>> The Python SQL Toolkit and Object Relational Mapper 
> >>> >>> 
> >>> >>> http://www.sqlalchemy.org/ 
> >>> >>> 
> >>> >>> To post example code, please provide an MCVE: Minimal, Complete, 
> and 
> >>> >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a 
> full 
> >>> >>> description. 
> >>> >>> --- 
> >>> >>> You received this message because you are subscribed to the Google 
> >>> >>> Groups 
> >>> >>> "sqlalchemy" group. 
> >>> >>> To unsubscribe from this group and stop receiving emails from it, 
> send 
> >>> >>> an 
> >>> >>> email to [email protected]. 
> >>> >>> To post to this group, send email to [email protected]. 
> >>> >>> Visit this group at https://groups.google.com/group/sqlalchemy. 
> >>> >>> For more options, visit https://groups.google.com/d/optout. 
> >> 
> >> -- 
> >> SQLAlchemy - 
> >> The Python SQL Toolkit and Object Relational Mapper 
> >> 
> >> http://www.sqlalchemy.org/ 
> >> 
> >> To post example code, please provide an MCVE: Minimal, Complete, and 
> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> >> description. 
> >> --- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "sqlalchemy" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to [email protected] <javascript:>. 
> >> To post to this group, send email to [email protected] 
> <javascript:>. 
> >> Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: long transaction after database switched over

Reply via email to