Re: [sqlalchemy] Re: long transaction after database switched over

Mike Bayer Mon, 14 May 2018 08:10:49 -0700

OK I need the exact pool.py you are using as well that includes all
these logging statements and any other adjustments that were made.




On Mon, May 14, 2018 at 8:02 AM,  <[email protected]> wrote:
> I've uploaded the logs here:
> https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log
>
> The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032
>
> At  2018-05-14T11:11:39.970, we got the log " jjs rec already in pool
> checkin", but I just deleted `return` in code, so the problem happen again.
>
>
>
>
> 在 2018年5月14日星期一 UTC+8上午10:48:49，Mike Bayer写道：
>>
>> as well as the log that shows this:
>>
>> +    if self in pool._pool.queue:
>> +        pool.logger.debug("jjs rec already in pool checkin, skip %r",
>> self)
>> +        return
>>
>>
>> actually happening.
>>
>>
>>
>>
>> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer <[email protected]>
>> wrote:
>> > On Sun, May 13, 2018 at 10:35 PM,  <[email protected]> wrote:
>> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We
>> >> work
>> >> together and add logs in the same environment, add prefix of name to
>> >> get a
>> >> keyword for log filter.
>> >>
>> >> The doubled ConnectionRecords(with None connection) are got by two
>> >> different
>> >> coroutines, and the variable record.connection is rewrite by the second
>> >> coroutines.
>> >> After the record being put back to QueuePool again, there are two same
>> >> record with same connection in the pool, which will cause the next two
>> >> coroutines get conflict and deadlock in psycopg2 as above.
>> >
>> > OK, I need the detailed step by step that shows how the two coroutines
>> > are interacting such that they both are returning the same connection.
>> >   The nova log above seems to show just one coroutine I assume since
>> > there is just one nova request ID, if I understand correctly.
>> >
>> >>
>> >> The logs of psycopg2 has been pasted at
>> >> https://github.com/psycopg/psycopg2/issues/703
>> >
>> > The above issue says the bug has been found in SQLAlchemy but there is
>> > no bug report yet and I've not yet been shown how the behavior here is
>> > possible.      I posted a straightforward example above, can you
>> > please show me where the two coroutines get mixed up ?
>> >
>> >
>> >
>> >
>> >>
>> >>
>> >>
>> >> 在 2018年5月11日星期五 UTC+8上午3:05:46，Mike Bayer写道：
>> >>>
>> >>> I tried to follow your log lines, which is difficult because I don't
>> >>> know where those log statements actually are and I don't know what
>> >>> acronyms like "jjs", "cjr", mean.   But it does look like it's al in
>> >>> one thread (so not async GC) and all in one Nova request (so perhaps
>> >>> just one greenlet involved, unless other greenlets are affecting this
>> >>> without them being displayed in this log snippet).   I do see the
>> >>> connection record doubled at the end.   So here is the illustration of
>> >>> those steps in isolation, the connection record is not doubled.  Can
>> >>> you alter this program to show this condition occurring?
>> >>>
>> >>> from sqlalchemy import create_engine
>> >>>
>> >>> e = create_engine("postgresql://scott:tiger@localhost/test",
>> >>> echo_pool='debug')
>> >>> e.connect().close()
>> >>>
>> >>> assert len(e.pool._pool.queue) == 1
>> >>>
>> >>> c1 = e.connect()
>> >>>
>> >>> assert len(e.pool._pool.queue) == 0
>> >>>
>> >>> c1.engine.pool._invalidate(c1._Connection__connection, None)
>> >>> c1.invalidate(e)
>> >>>
>> >>> # connection record is back
>> >>> assert len(e.pool._pool.queue) == 1
>> >>>
>> >>> # do a reconnect
>> >>> c1.connection
>> >>>
>> >>> # uses the record again
>> >>> assert len(e.pool._pool.queue) == 0
>> >>>
>> >>> # close the Connection
>> >>> c1.close()
>> >>>
>> >>> # record is back
>> >>> assert len(e.pool._pool.queue) == 1
>> >>>
>> >>> # etc.
>> >>> c1 = e.connect()
>> >>> assert len(e.pool._pool.queue) == 0
>> >>>
>> >>> c1.close()
>> >>> assert len(e.pool._pool.queue) == 1
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer <[email protected]>
>> >>> wrote:
>> >>> > Here's a print statement you might want to try:
>> >>> >
>> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py
>> >>> > index 89a4cea7c..452db4883 100644
>> >>> > --- a/lib/sqlalchemy/pool.py
>> >>> > +++ b/lib/sqlalchemy/pool.py
>> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection,
>> >>> > connection_record,
>> >>> >      been garbage collected.
>> >>> >
>> >>> >      """
>> >>> > +    print(
>> >>> > +        "FINALIZE FAIRY:  GREENLET ID: %s, DB CONN: %s, FAIRY: %s,
>> >>> > REF:
>> >>> > %s  "
>> >>> > +        "CONNECTION_REC: %s CONNECTION_REC REF %s" % (
>> >>> > +            "put greenlet id here",
>> >>> > +            connection,
>> >>> > +            fairy,
>> >>> > +            ref,
>> >>> > +            connection_record,
>> >>> > +            connection_record.fairy_ref
>> >>> > +            if connection_record is not None else "n/a"
>> >>> > +        )
>> >>> > +    )
>> >>> > +
>> >>> >      _refs.discard(connection_record)
>> >>> >
>> >>> >      if ref is not None:
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer
>> >>> > <[email protected]>
>> >>> > wrote:
>> >>> >> On Thu, May 10, 2018 at 7:23 AM,  <[email protected]> wrote:
>> >>> >>> Hi Mike,
>> >>> >>>
>> >>> >>> I added more logs and finnaly find the reason and a method to
>> >>> >>> avoid
>> >>> >>> the
>> >>> >>> deadlock between coroutines.
>> >>> >>>
>> >>> >>> In summary, the deadlock happend because two different coroutines
>> >>> >>> use
>> >>> >>> the
>> >>> >>> same ConnectionRecord with same connection.
>> >>> >>> It happends under QueuePool, when the DB connections are reset by
>> >>> >>> the
>> >>> >>> DB
>> >>> >>> server(not related with the backend driver).
>> >>> >>>
>> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, and
>> >>> >>> apply a
>> >>> >>> weakref of the ConnectionFairy, with callback function
>> >>> >>> `_finalize_fairy`.
>> >>> >>> 2.When the connection get disconnected, it receives an
>> >>> >>> exception(psycopg2.OperationalError), and goto
>> >>> >>> Connection._handle_dbapi_exception.
>> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both
>> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`.
>> >>> >>> 4.In `ConnectionFairy.invalidate`, `_ConnectionRecord._checkin` is
>> >>> >>> called,
>> >>> >>> and put the ConnectionRecord back to the QueuePool.
>> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is called,
>> >>> >>> and
>> >>> >>> **del**
>> >>> >>> the ConnectionFairy.
>> >>> >>> 6.The **del** operation will make weakref of ConnectionFairy to
>> >>> >>> dead,
>> >>> >>> and
>> >>> >>> the callback `_finalize_fairy` executed.
>> >>> >>> 7.In `_finalize_fairy`, another `_ConnectionRecord._checkin` is
>> >>> >>> called, put
>> >>> >>> the ConnectionRecord to the QueuePool again.
>> >>> >>
>> >>> >> Here's what does not make sense.  You are claiming that a simple
>> >>> >> invalidation will result in the same ConnectionRecord being checked
>> >>> >> in
>> >>> >> twice.   This is obviously not the case, what is special in your
>> >>> >> situation that makes this happen?
>> >>> >>
>> >>> >> If I follow the steps you refer towards, in step 4, when
>> >>> >> ConnectionFairy._checkin is called, that calls finalize_fairy
>> >>> >> directly, which then calls ConnectionRecord.checkin().  Within
>> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is set to
>> >>> >> None.
>> >>> >>    Then back in ConnectionFairy._checkin(), it also sets
>> >>> >> self._connection_record = None.
>> >>> >>
>> >>> >> If we come back into finalize_fairy a *second* time then, as a
>> >>> >> result
>> >>> >> of the "del" which occurs in _handle_error(), and the
>> >>> >> connection_record is present, the function will exit immediately,
>> >>> >> because:
>> >>> >>
>> >>> >>        if connection_record.fairy_ref is not ref:
>> >>> >>             return
>> >>> >>
>> >>> >> So the situation as given is not possible without concurrent access
>> >>> >> to
>> >>> >> the ConnectionFairy being introduced externally to the pool.
>> >>> >>
>> >>> >> There is exactly one thing I can think of that is both unique to
>> >>> >> the
>> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL form of
>> >>> >> it,
>> >>> >> that might be important here, and it's that I think psycopg2's
>> >>> >> connection.close() method will defer to another greenlet.  That
>> >>> >> means
>> >>> >> when we're inside of ConnectionRecord.invalidate(), and we call
>> >>> >> self.__close(), that could be deferring to another greenlet.   But
>> >>> >> this all happens before the connection is returned to the pool, so
>> >>> >> I
>> >>> >> still don't know what is happening.
>> >>> >>
>> >>> >>>
>> >>> >>> The logs in detail is below. To prove my result is right, I tried
>> >>> >>> to
>> >>> >>> avoid
>> >>> >>> the scienario with this patch, and it worked.
>> >>> >>>
>> >>> >>> def checkin(self):
>> >>> >>>     self.fairy_ref = None
>> >>> >>>     connection = self.connection
>> >>> >>>     pool = self.__pool
>> >>> >>>
>> >>> >>>     while self.finalize_callback:
>> >>> >>>         finalizer = self.finalize_callback.pop()
>> >>> >>>         finalizer(connection)
>> >>> >>> +    if self in pool._pool.queue:
>> >>> >>> +        pool.logger.debug("jjs rec already in pool checkin, skip
>> >>> >>> %r",
>> >>> >>> self)
>> >>> >>> +        return
>> >>> >>
>> >>> >> as mentioned above it's not possible to reach this point without
>> >>> >> concurrent access to the pooled connection, which means this fix
>> >>> >> could
>> >>> >> fail also (with at least threaded-style concurrency, not greenlet
>> >>> >> style).   The original source of the greenlet issue must still be
>> >>> >> identified.
>> >>> >>
>> >>> >> I would need a self-contained proof of concept that demonstrates
>> >>> >> this
>> >>> >> issue occurring.  The log output is not that helpful because it at
>> >>> >> least would need to write out the identifier of the current
>> >>> >> greenlet,
>> >>> >> since it is very likely that greenlets have something to do with
>> >>> >> this
>> >>> >> issue.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>>     if pool.dispatch.checkin:
>> >>> >>>         pool.dispatch.checkin(connection, self)
>> >>> >>>     pool._return_conn(self)
>> >>> >>>
>> >>> >>> Trace the ConnectionRecord 0x88fc5d0
>> >>> >>>
>> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new connection
>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn success
>> >>> >>> <connection
>> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection <connection object
>> >>> >>> at
>> >>> >>> 0x81f69b0; closed: 0>, pool queue
>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x79fdad0>])
>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout fairy
>> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> pool queue
>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at 0x7a04810>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>])
>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection <connection
>> >>> >>> object at
>> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool size: 10
>> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: 40
>> >>> >>> Current
>> >>> >>> Checked
>> >>> >>> out connections: 8, queue
>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>> >>> >>> object
>> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x79fdad0>])
>> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>> >>> >>> _ConnectionFairy._checkout
>> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, rec.conn
>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor ERROR
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn
>> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, err
>> >>> >>> connection
>> >>> >>> pointer is NULL
>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter newraise
>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter finally
>> >>> >>>>>Connection._handle_dbapi_exception --> finally -->
>> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate**
>> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate -->
>> >>> >>>>> ConnectionFairy.invalidate --> _ConnectionRecord.invalidate &
>> >>> >>>>> **self._checkin()**
>> >>> >>>>>self.invalidate() --> **del self._root.__connection**
>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs connectionFairy
>> >>> >>> invalidate
>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor INFO
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:109 info]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate connection
>> >>> >>> <connection
>> >>> >>> object at 0x81f69b0; closed: 2> (reason:
>> >>> >>> OperationalError:connection
>> >>> >>> pointer
>> >>> >>> is NULL
>> >>> >>>>>_ConnectionRecord.invalidate(e)  -->
>> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, e) &
>> >>> >>>>> self.__close() &
>> >>> >>>>> self.connection = None
>> >>> >>>>>self.__close() --> self.finalize_callback.clear() &
>> >>> >>>>> self.__pool._close_connection(self.connection)
>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing connection
>> >>> >>> <connection
>> >>> >>> object at 0x81f69b0; closed: 2>
>> >>> >>>>>Pool._close_connection(connection) -->
>> >>> >>>>> self._dialect.do_close(connection)
>> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) -->
>> >>> >>>>> dbapi_connection.close() FIN
>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>> >>> >>> _ConnectionFairy._checkin
>> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x88fc5d0>
>> >>> >>> rec.conn None
>> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, conn_rec,
>> >>> >>>>> pool,
>> >>> >>>>> None,
>> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None
>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs _finalize_fairy
>> >>> >>> args
>> >>> >>> conn
>> >>> >>> None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord
>> >>> >>> object at
>> >>> >>> 0x88fc5d0>. ref None, fairy <sqlalchemy.pool._ConnectionFairy
>> >>> >>> object
>> >>> >>> at
>> >>> >>> 0x329c0d0>
>> >>> >>>>>_finalize_fairy --> connection_record.checkin()
>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> connection_record.checkin
>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin
>> >>> >>> None,
>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x7524590>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]),
>> >>> >>> finalize_callback
>> >>> >>> deque([])
>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin,
>> >>> >>> pool
>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x7524590>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor ERROR
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception during reset
>> >>> >>> or
>> >>> >>> similar
>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> connection_record.invalidate
>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> connection_record.checkin
>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin
>> >>> >>> None,
>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x7524590>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]),
>> >>> >>> finalize_callback
>> >>> >>> deque([])
>> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor DEBUG
>> >>> >>> [pid:4528]
>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin,
>> >>> >>> pool
>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> 0x7524590>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>,
>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04，[email protected]写道：
>> >>> >>>>
>> >>> >>>> I found why we could get the same connection twice from the
>> >>> >>>> pool...
>> >>> >>>>
>> >>> >>>> Because we put it into pool twice.
>> >>> >>>>
>> >>> >>>> Any suggestion for finding why it put the same connection into
>> >>> >>>> pool
>> >>> >>>> within
>> >>> >>>> different coroutines?
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG
>> >>> >>>> [pid:15755]
>> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug]
>> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn checkin
>> >>> >>>> <connection
>> >>> >>>> object at 0x111d9740; closed: 0>
>> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor DEBUG
>> >>> >>>> [pid:15755]
>> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug]
>> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn checkin
>> >>> >>>> <connection
>> >>> >>>> object at 0x111d9740; closed: 0>
>> >>> >>>> ...
>> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor DEBUG
>> >>> >>>> [pid:15755]
>> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug]
>> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection <connection
>> >>> >>>> object at
>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor DEBUG
>> >>> >>>> [pid:15755]
>> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug]
>> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection <connection
>> >>> >>>> object at
>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52，[email protected]写道：
>> >>> >>>>>
>> >>> >>>>> I added logs in sqlalchemy/pool.py
>> >>> >>>>>
>> >>> >>>>> QueuePool.checkout
>> >>> >>>>>
>> >>> >>>>> @classmethod
>> >>> >>>>> def checkout(cls, pool):
>> >>> >>>>>     rec = pool._do_get()
>> >>> >>>>>     try:
>> >>> >>>>>         dbapi_connection = rec.get_connection()
>> >>> >>>>>         pool.logger.debug("jjs get conn success %s",
>> >>> >>>>> dbapi_connection)
>> >>> >>>>>     except:
>> >>> >>>>>         pool.logger.debug("jjs get error in get_connection")
>> >>> >>>>>         with util.safe_reraise():
>> >>> >>>>>             rec.checkin()
>> >>> >>>>>     echo = pool._should_log_debug()
>> >>> >>>>>     fairy = _ConnectionFairy(dbapi_connection, rec, echo)
>> >>> >>>>>     rec.fairy_ref = weakref.ref(
>> >>> >>>>>         fairy,
>> >>> >>>>>         lambda ref: _finalize_fairy and
>> >>> >>>>>         _finalize_fairy(
>> >>> >>>>>             dbapi_connection,
>> >>> >>>>>             rec, pool, ref, echo)
>> >>> >>>>>     )
>> >>> >>>>>     _refs.add(rec)
>> >>> >>>>>     if echo:
>> >>> >>>>>         pool.logger.debug("jjs conn %s, pool status %s",
>> >>> >>>>> dbapi_connection, pool.status())
>> >>> >>>>>         pool.logger.debug("Connection %r checked out from pool",
>> >>> >>>>>                           dbapi_connection)
>> >>> >>>>>     return fairy
>> >>> >>>>>
>> >>> >>>>> Then I got logs below:
>> >>> >>>>>
>> >>> >>>>> coroutine A:
>> >>> >>>>>
>> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info]
>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is
>> >>> >>>>> False,
>> >>> >>>>> timeout
>> >>> >>>>> 30
>> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new
>> >>> >>>>> connection
>> >>> >>>>> <connection object at 0x903f8c0; closed: 0>
>> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success
>> >>> >>>>> <connection
>> >>> >>>>> object at 0x903f8c0; closed: 0>
>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection
>> >>> >>>>> object at
>> >>> >>>>> 0x903f8c0;  closed: 0>, pool status Pool size: 10  Connections
>> >>> >>>>> in
>> >>> >>>>> pool: 1
>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out
>> >>> >>>>> connections: 45
>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection
>> >>> >>>>> <connection
>> >>> >>>>> object at
>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>> >>> >>>>>
>> >>> >>>>> coroutine B:
>> >>> >>>>>
>> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info]
>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is
>> >>> >>>>> False,
>> >>> >>>>> timeout
>> >>> >>>>> 30
>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success
>> >>> >>>>> <connection
>> >>> >>>>> object at 0x903f8c0;  closed: 0>
>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection
>> >>> >>>>> object at
>> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10  Connections in
>> >>> >>>>> pool: 0
>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out
>> >>> >>>>> connections: 46
>> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG
>> >>> >>>>> [pid:55528]
>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection
>> >>> >>>>> <connection
>> >>> >>>>> object at
>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> It seems that when we create_connection in QueuePool._do_get, we
>> >>> >>>>> may
>> >>> >>>>> not
>> >>> >>>>> get the lock _pool.mutex(~Queue.mutex).
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14，Mike Bayer写道：
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <[email protected]> wrote:
>> >>> >>>>>>>
>> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two
>> >>> >>>>>>> coroutine
>> >>> >>>>>>> use the
>> >>> >>>>>>> same connection before the pthread_mutex_lock released.
>> >>> >>>>>>>
>> >>> >>>>>>> Maybe something in the connection pool goes wrong?
>> >>> >>>>>>> [pid] [coroutine_id] msg
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> This seems unlikely.   Openstack uses the SQLAlchemy connection
>> >>> >>>>>> pool
>> >>> >>>>>> with eventlet patching in conjunction with the MySQL drivers
>> >>> >>>>>> and
>> >>> >>>>>> nothing
>> >>> >>>>>> like that happens.   The biggest issue SQLAlchemy had with
>> >>> >>>>>> greenlets is that
>> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause a dirty
>> >>> >>>>>> connection
>> >>> >>>>>> to go back into the pool, that was fixed in 1.2.
>> >>> >>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> cb 0x23ad320
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE
>> >>> >>>>>>> 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0,
>> >>> >>>>>>> autocommit
>> >>> >>>>>>> = 0,
>> >>> >>>>>>> status = 2
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174
>> >>> >>>>>>> __lock 1
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> Below is the logs grep by coroutine id:
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]finish send query, before psyco_wait, conn
>> >>> >>>>>>> 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> cb 0x23ad320
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE
>> >>> >>>>>>> 0x94122f0
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: poll writing
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ
>> >>> >>>>>>>
>> >>> >>>>>>> -----
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for
>> >>> >>>>>>> connection
>> >>> >>>>>>> at 0x8de2d30
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: init cursor object at
>> >>> >>>>>>> 0xa6c2650
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null),
>> >>> >>>>>>> conn =
>> >>> >>>>>>> 0x8de2d30
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: good cursor object at
>> >>> >>>>>>> 0xa6c2650,
>> >>> >>>>>>> refcnt = 1
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650:
>> >>> >>>>>>> refcnt =
>> >>> >>>>>>> 1
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>>>>>> 0x94122f0,
>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0,
>> >>> >>>>>>> autocommit
>> >>> >>>>>>> = 0,
>> >>> >>>>>>> status = 2
>> >>> >>>>>>>
>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174
>> >>> >>>>>>> __lock 1
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> 在 2018年4月28日星期六 UTC+8下午4:07:34，[email protected]写道：
>> >>> >>>>>>>>
>> >>> >>>>>>>> We reproduced this problem and added logs in psycopg2, found
>> >>> >>>>>>>> a
>> >>> >>>>>>>> confusing thing and reported it to psycopg2.
>> >>> >>>>>>>>
>> >>> >>>>>>>> It seems sqlalchemy & psycopg2 stack flow is:
>> >>> >>>>>>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit
>> >>> >>>>>>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn)
>> >>> >>>>>>>> 3.get exception in sqlalchemy, and do_rollback
>> >>> >>>>>>>> 4.from sqlalchemy to psycopg: pq_abort
>> >>> >>>>>>>> 5.psycopg get pthread_mutex_lock deadlock
>> >>> >>>>>>>>
>> >>> >>>>>>>> What we have done to reproduce it is: stop master pg-server
>> >>> >>>>>>>> and
>> >>> >>>>>>>> promote the slave pg-server to master, with moving FIP from
>> >>> >>>>>>>> old
>> >>> >>>>>>>> master to
>> >>> >>>>>>>> slave. At the same time, let nova-conductor do quite a lot db
>> >>> >>>>>>>> query
>> >>> >>>>>>>> requests.
>> >>> >>>>>>>>
>> >>> >>>>>>>> [1] https://github.com/psycopg/psycopg2/issues/703
>> >>> >>>>>>>>
>> >>> >>>>>>>> 在 2018年4月23日星期一 UTC+8下午9:45:04，Mike Bayer写道：
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> On Mon, Apr 23, 2018 at 9:03 AM,  <[email protected]>
>> >>> >>>>>>>>> wrote:
>> >>> >>>>>>>>> > Sorry for reply on this topic.
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > We recently get the same problem in our production
>> >>> >>>>>>>>> > environment.
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > I found a patch in other lib [1], and they added
>> >>> >>>>>>>>> > conn.close()
>> >>> >>>>>>>>> > in
>> >>> >>>>>>>>> > exception
>> >>> >>>>>>>>> > psycopg2.ProgrammingError.
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > Shall we do the same in [2] ?
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> SQLAlchemy does things much more carefully than that, we
>> >>> >>>>>>>>> parse
>> >>> >>>>>>>>> the
>> >>> >>>>>>>>> error message for specific ones that correspond to
>> >>> >>>>>>>>> "connection
>> >>> >>>>>>>>> is no
>> >>> >>>>>>>>> longer usable", we call these "is_disconnect" but it can be
>> >>> >>>>>>>>> any
>> >>> >>>>>>>>> invalidating condition.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> You can make these yourself, and they can also be made to be
>> >>> >>>>>>>>> part of
>> >>> >>>>>>>>> oslo.db, using the handle_error event:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> within oslo.db you would want to propose a change here:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > [1]
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > https://github.com/aio-libs/aiopg/pull/415/files?diff=split
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > [2]
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31，JinRong Cai写道：
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> Hi  Michael ,
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> I am using openstack with postgresql which sqlalchemy and
>> >>> >>>>>>>>> >> oslo_db
>> >>> >>>>>>>>> >> module
>> >>> >>>>>>>>> >> were used.
>> >>> >>>>>>>>> >> And there are some problems after my pg database switched
>> >>> >>>>>>>>> >> over.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> Here is my switch over process:
>> >>> >>>>>>>>> >> 1. nova-conductor(python application) is running with DB
>> >>> >>>>>>>>> >> connection
>> >>> >>>>>>>>> >> strings point to vip , which is in primary site(A) of pg.
>> >>> >>>>>>>>> >> 2. switch VIP from primary(A) to new primary(B)
>> >>> >>>>>>>>> >> 3. switch over pg: shutdown primary(A), promopt
>> >>> >>>>>>>>> >> standby(B) to
>> >>> >>>>>>>>> >> new
>> >>> >>>>>>>>> >> primary.
>> >>> >>>>>>>>> >> 4. nova-conductor is running in the whole process.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> After some seconds, I found some nova-conductor processes
>> >>> >>>>>>>>> >> are
>> >>> >>>>>>>>> >> hang
>> >>> >>>>>>>>> >> with
>> >>> >>>>>>>>> >> status futex_wait_queue_me, and the status of the query
>> >>> >>>>>>>>> >> in DB
>> >>> >>>>>>>>> >> is
>> >>> >>>>>>>>> >> "idle in
>> >>> >>>>>>>>> >> transaction", the transaction was not commited or
>> >>> >>>>>>>>> >> rollbacked!
>> >>> >>>>>>>>> >> I think disconnection was handled in the oslo_db, which
>> >>> >>>>>>>>> >> will
>> >>> >>>>>>>>> >> send
>> >>> >>>>>>>>> >> a
>> >>> >>>>>>>>> >> ping(select 1) to DB.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> If DB was switchd over, the connection in the pool would
>> >>> >>>>>>>>> >> be
>> >>> >>>>>>>>> >> set
>> >>> >>>>>>>>> >> with
>> >>> >>>>>>>>> >> status invalid, and reconnect after next check out.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> ###error messages from nova-conductor
>> >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread]
>> >>> >>>>>>>>> >> [tid:122397712]
>> >>> >>>>>>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError]
>> >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError
>> >>> >>>>>>>>> >> exception
>> >>> >>>>>>>>> >> wrapped from
>> >>> >>>>>>>>> >> (psycopg2.ProgrammingError) execute cannot be used while
>> >>> >>>>>>>>> >> an
>> >>> >>>>>>>>> >> asynchronous
>> >>> >>>>>>>>> >> query is underway [SQL: 'SELECT 1']
>> >>> >>>>>>>>> >>  36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback
>> >>> >>>>>>>>> >> (most
>> >>> >>>>>>>>> >> recent
>> >>> >>>>>>>>> >> call
>> >>> >>>>>>>>> >> last):
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File
>> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py",
>> >>> >>>>>>>>> >> line
>> >>> >>>>>>>>> >> 1139,
>> >>> >>>>>>>>> >> in
>> >>> >>>>>>>>> >> _execute_context
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters     context)
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File
>> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py",
>> >>> >>>>>>>>> >> line
>> >>> >>>>>>>>> >> 450,
>> >>> >>>>>>>>> >> in
>> >>> >>>>>>>>> >> do_execute
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters
>> >>> >>>>>>>>> >> cursor.execute(statement,
>> >>> >>>>>>>>> >> parameters)
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters
>> >>> >>>>>>>>> >> ProgrammingError:
>> >>> >>>>>>>>> >> execute
>> >>> >>>>>>>>> >> cannot be used while an asynchronous query is underway
>> >>> >>>>>>>>> >>   36365 ERROR oslo_db.sqlalchemy.exc_filters
>> >>> >>>>>>>>> >>  localhost nova-conductor ERROR [pid:36365] [MainThread]
>> >>> >>>>>>>>> >> [tid:122397712]
>> >>> >>>>>>>>> >> [log.py:122 error]
>> >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85]
>> >>> >>>>>>>>> >> Error closing
>> >>> >>>>>>>>> >> cursor
>> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool Traceback (most
>> >>> >>>>>>>>> >> recent
>> >>> >>>>>>>>> >> call last):
>> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool   File
>> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py",
>> >>> >>>>>>>>> >> line
>> >>> >>>>>>>>> >> 1226,
>> >>> >>>>>>>>> >> in
>> >>> >>>>>>>>> >> _safe_close_cursor
>> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool
>> >>> >>>>>>>>> >> cursor.close()
>> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError:
>> >>> >>>>>>>>> >> close
>> >>> >>>>>>>>> >> cannot be
>> >>> >>>>>>>>> >> used while an asynchronous query is underway
>> >>> >>>>>>>>> >>   36365 ERROR sqlalchemy.pool.QueuePool
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> ###ps status of nova-conductor
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log
>> >>> >>>>>>>>> >> # cat
>> >>> >>>>>>>>> >> /proc/33316/stack
>> >>> >>>>>>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120
>> >>> >>>>>>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280
>> >>> >>>>>>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0
>> >>> >>>>>>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180
>> >>> >>>>>>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
>> >>> >>>>>>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> ### stack of the nova-conductor process
>> >>> >>>>>>>>> >> POD6-Mongodb03:/tmp # pstack 33316
>> >>> >>>>>>>>> >> #0  0x00002b8449e35f4d in __lll_lock_wait () from
>> >>> >>>>>>>>> >> /lib64/libpthread.so.0
>> >>> >>>>>>>>> >> #1  0x00002b8449e31d02 in _L_lock_791 () from
>> >>> >>>>>>>>> >> /lib64/libpthread.so.0
>> >>> >>>>>>>>> >> #2  0x00002b8449e31c08 in pthread_mutex_lock () from
>> >>> >>>>>>>>> >> /lib64/libpthread.so.0
>> >>> >>>>>>>>> >> #3  0x00002b84554c44ab in pq_abort () from
>> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so
>> >>> >>>>>>>>> >> #4  0x00002b84554c955e in psyco_conn_rollback () from
>> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so
>> >>> >>>>>>>>> >> #5  0x00002b8449b42b50 in PyEval_EvalFrameEx () from
>> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0
>> >>> >>>>>>>>> >> #6  0x00002b8449b42ad0 in PyEval_EvalFrameEx () from
>> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> The psycopg2 was trying to close the cursor, and try to
>> >>> >>>>>>>>> >> get
>> >>> >>>>>>>>> >> the
>> >>> >>>>>>>>> >> mutex lock
>> >>> >>>>>>>>> >> "pthread_mutex_lock", but it seems that the cursor was
>> >>> >>>>>>>>> >> used
>> >>> >>>>>>>>> >> by
>> >>> >>>>>>>>> >> other
>> >>> >>>>>>>>> >> session.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> Questions:
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> 1. What the error "ProgrammingError: close cannot be used
>> >>> >>>>>>>>> >> while an
>> >>> >>>>>>>>> >> asynchronous query is underway" mean?
>> >>> >>>>>>>>> >> AFAIK, these caused by psycopg2, which means a
>> >>> >>>>>>>>> >> asynchronous
>> >>> >>>>>>>>> >> query
>> >>> >>>>>>>>> >> was
>> >>> >>>>>>>>> >> executed in one connection.
>> >>> >>>>>>>>> >> But the I think the sqlalchemy was thread safe since it
>> >>> >>>>>>>>> >> was
>> >>> >>>>>>>>> >> patched by
>> >>> >>>>>>>>> >> eventlet, see details in
>> >>> >>>>>>>>> >> eventlet/support/psycopg2_patcher.py
>> >>> >>>>>>>>> >> And we can see different green thread number in the log,
>> >>> >>>>>>>>> >> as:
>> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397712]
>> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397815]
>> >>> >>>>>>>>> >> So, I guess the connection pool in one process is safe.
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> 2. The nova-conductor was a multi-thread python client,
>> >>> >>>>>>>>> >> which
>> >>> >>>>>>>>> >> forked
>> >>> >>>>>>>>> >> several child process.
>> >>> >>>>>>>>> >> ps -elf|grep -i nova-conductor
>> >>> >>>>>>>>> >> 30878  1 pool_s /usr/bin/nova-conductor
>> >>> >>>>>>>>> >> 36364  1 ep_pol /usr/bin/nova-conductor
>> >>> >>>>>>>>> >> 36365  1 futex_ /usr/bin/nova-conductor
>> >>> >>>>>>>>> >> 36366  1 ep_pol /usr/bin/nova-conductor
>> >>> >>>>>>>>> >> 36367  1 ep_pol /usr/bin/nova-conductor
>> >>> >>>>>>>>> >> 36368  1 ep_pol /usr/bin/nova-conductor
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> If the nova-conductor was started with only one child,
>> >>> >>>>>>>>> >> the
>> >>> >>>>>>>>> >> problem
>> >>> >>>>>>>>> >> was not
>> >>> >>>>>>>>> >> happen.
>> >>> >>>>>>>>> >> Does this mean the connection/engine CAN NOT shared in
>> >>> >>>>>>>>> >> these
>> >>> >>>>>>>>> >> child
>> >>> >>>>>>>>> >> processes?
>> >>> >>>>>>>>> >>
>> >>> >>>>>>>>> >> Thanks.
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > --
>> >>> >>>>>>>>> > SQLAlchemy -
>> >>> >>>>>>>>> > The Python SQL Toolkit and Object Relational Mapper
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > http://www.sqlalchemy.org/
>> >>> >>>>>>>>> >
>> >>> >>>>>>>>> > To post example code, please provide an MCVE: Minimal,
>> >>> >>>>>>>>> > Complete,
>> >>> >>>>>>>>> > and
>> >>> >>>>>>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve
>> >>> >>>>>>>>> > for
>> >>> >>>>>>>>> > a
>> >>> >>>>>>>>> > full
>> >>> >>>>>>>>> > description.
>> >>> >>>>>>>>> > ---
>> >>> >>>>>>>>> > You received this message because you are subscribed to
>> >>> >>>>>>>>> > the
>> >>> >>>>>>>>> > Google
>> >>> >>>>>>>>> > Groups
>> >>> >>>>>>>>> > "sqlalchemy" group.
>> >>> >>>>>>>>> > To unsubscribe from this group and stop receiving emails
>> >>> >>>>>>>>> > from
>> >>> >>>>>>>>> > it,
>> >>> >>>>>>>>> > send an
>> >>> >>>>>>>>> > email to [email protected].
>> >>> >>>>>>>>> > To post to this group, send email to
>> >>> >>>>>>>>> > [email protected].
>> >>> >>>>>>>>> > Visit this group at
>> >>> >>>>>>>>> > https://groups.google.com/group/sqlalchemy.
>> >>> >>>>>>>>> > For more options, visit
>> >>> >>>>>>>>> > https://groups.google.com/d/optout.
>> >>> >>>>>>>
>> >>> >>>>>>> --
>> >>> >>>>>>> SQLAlchemy -
>> >>> >>>>>>> The Python SQL Toolkit and Object Relational Mapper
>> >>> >>>>>>>
>> >>> >>>>>>> http://www.sqlalchemy.org/
>> >>> >>>>>>>
>> >>> >>>>>>> To post example code, please provide an MCVE: Minimal,
>> >>> >>>>>>> Complete,
>> >>> >>>>>>> and
>> >>> >>>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for
>> >>> >>>>>>> a
>> >>> >>>>>>> full
>> >>> >>>>>>> description.
>> >>> >>>>>>> ---
>> >>> >>>>>>> You received this message because you are subscribed to the
>> >>> >>>>>>> Google
>> >>> >>>>>>> Groups "sqlalchemy" group.
>> >>> >>>>>>> To unsubscribe from this group and stop receiving emails from
>> >>> >>>>>>> it,
>> >>> >>>>>>> send
>> >>> >>>>>>> an email to [email protected].
>> >>> >>>>>>> To post to this group, send email to
>> >>> >>>>>>> [email protected].
>> >>> >>>>>>> Visit this group at
>> >>> >>>>>>> https://groups.google.com/group/sqlalchemy.
>> >>> >>>>>>> For more options, visit https://groups.google.com/d/optout.
>> >>> >>>
>> >>> >>> --
>> >>> >>> SQLAlchemy -
>> >>> >>> The Python SQL Toolkit and Object Relational Mapper
>> >>> >>>
>> >>> >>> http://www.sqlalchemy.org/
>> >>> >>>
>> >>> >>> To post example code, please provide an MCVE: Minimal, Complete,
>> >>> >>> and
>> >>> >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a
>> >>> >>> full
>> >>> >>> description.
>> >>> >>> ---
>> >>> >>> You received this message because you are subscribed to the Google
>> >>> >>> Groups
>> >>> >>> "sqlalchemy" group.
>> >>> >>> To unsubscribe from this group and stop receiving emails from it,
>> >>> >>> send
>> >>> >>> an
>> >>> >>> email to [email protected].
>> >>> >>> To post to this group, send email to [email protected].
>> >>> >>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >>> >>> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> SQLAlchemy -
>> >> The Python SQL Toolkit and Object Relational Mapper
>> >>
>> >> http://www.sqlalchemy.org/
>> >>
>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> >> description.
>> >> ---
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "sqlalchemy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an
>> >> email to [email protected].
>> >> To post to this group, send email to [email protected].
>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: long transaction after database switched over

Reply via email to