can you add these lines to your logging, ensure that the check for
"already in pool" is present, and produce new logs?

        try:
            fairy = fairy or _ConnectionFairy(
                connection, connection_record, echo)
            assert fairy.connection is connection
            pool.logger.debug("ABOUT TO RESET CONNECTION %s CONNECTION
REC %s FAIRY %s", connection, fairy, connection_record)
            fairy._reset(pool)
            pool.logger.debug("COMPLETED RESET CONNECTION %s
CONNECTION REC %s FAIRY %s", connection, fairy, connection_record)

            # Immediately close detached instances
            if not connection_record:
                pool._close_connection(connection)
        except BaseException as e:
            pool.logger.debug("EXCEPTION RAISED IN RESET CONNECTION %s
CONNECTION REC %s FAIRY %s: %s", connection, fairy, connection_record,
e)
            pool.logger.error(
                "Exception during reset or similar", exc_info=True)
            if connection_record:
                pool.logger.debug("jjs going to connection_record.invalidate")
                connection_record.invalidate(e=e)
            if not isinstance(e, Exception):
                pool.logger.debug("jjs e %s is not Exception, raise again", e)
                raise

my current theory is that a greenlet is being blocked in _reset(),
then waking up later but is being corrupted at the greenlet level in
some way.



On Tue, May 15, 2018 at 12:15 PM, Mike Bayer <[email protected]> wrote:
>
>
> On Tue, May 15, 2018, 3:26 AM <[email protected]> wrote:
>>
>> We got the log "jjs _finalize_fairy args" just once but the code in
>> function _finalize_fairy seem to be called twice. It also confused me.
>
>
> It's because that code originated elsewhere and got blocked, then it picks
> up again but the callstack is somehow corrupted (or logging isn't telling
> the truth).
>
> Can you please give me:
>
> Exact version of eventlet in use
> Exact version of psycopg2 and psyco-green
> Exact python interpreter version
> Exact version of SQLAlchemy
> Exact version of Oslo.db
>
> Thanks
>
>
>
>>
>> BTW, I tried to add code[1] to avoid double checkin in _finalize_fairy but
>> failed.
>> That means the code in pool.py#L670-692 was called without L646-668.
>>
>> [1] https://github.com/jiajunsu/test_logs/blob/master/pool.py#L659
>>
>> 在 2018年5月15日星期二 UTC+8下午12:41:17,Mike Bayer写道:
>>>
>>> s
>>>
>>>
>>>
>>> On Mon, May 14, 2018 at 9:38 PM,  <[email protected]> wrote:
>>> > The tid is coroutine_id, we get it by id(greenlet.getcurrent()) in each
>>> > coroutine(green thread).
>>> >
>>> > We run nova-conductor with monkey_patch, the same as [1],
>>> > eventlet.monkey_patch(os=False)
>>> >
>>> > Our nova-conductor is running with multi workers.
>>> >
>>> > It's confusing why the _finalize_fairy is called twice but only log
>>> > once at
>>> > the entrance, that's why I guess it's called by weakref's callback.
>>>
>>> For 130446032 which is the tid you mentioned, _finalize_fairy is only
>>> called once in your log:
>>>
>>> $ cat nc_5679.log | grep -e "130446032.*jjs _finalize"
>>> 2018-05-14T11:11:39.955+08:00 localhost nova-conductor DEBUG
>>> [pid:5679] [MainThread] [tid:130446032] [log.py:104 debug]
>>> [req-841bb432-7cd2-4fa3-9def-559a34b05a21] jjs _finalize_fairy args
>>> conn None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord
>>> object at 0x70cdb10>. ref None, fairy
>>> <sqlalchemy.pool._ConnectionFairy object at 0x8a87390> debug
>>>
>>> Also, the log shows that no _finalize_fairy is ever called due to
>>> "del" - this would show the weakref "ref" as non-None:
>>>
>>> $ cat nc_5679.log | grep -e "FINALIZE FAIRY.*REF:.*" | grep -v "REF:
>>> None"
>>> $
>>>
>>> I would assume by "finalize fairy" called twice, you're referring to
>>> the fact that you see the "exception during reset or simlar" message.
>>> But that message is still AFAICT impossible to be emitted without the
>>> function being entered, and a corresponding "FINALIZE FAIRY" log line
>>> preceding it, unless something is going wrong with the logging, or the
>>> interpreter is somehow not functioning correctly (I wonder if eventlet
>>> had any such issues).   I'm struggling to see any other possibility
>>> but not coming up with any.
>>>
>>> One weird thing I'm trying to figure out is that I see a connection
>>> being returned to the pool normally, but then nothing is logged about
>>> it after that:
>>>
>>> 2018-05-14T11:11:21.183+08:00 localhost nova-conductor DEBUG
>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
>>> [req-19a57f61-bc64-4575-8253-00b5bfe62a74] Connection <connection
>>> object at 0x7f419c0; closed: 0> being returned to pool,
>>> connection_record.connection <connection object at 0x7f419c0; closed:
>>> 0> debug
>>>
>>>
>>> the next log message from tid:130446352 should be "jjs conn reset".
>>> But that never happens.  The next we see of that tid is:
>>>
>>> 2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG
>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
>>> [req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs
>>> _ConnectionFairy._checkout fariy None debug
>>> 2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG
>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
>>> [req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs before pool._do_get
>>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
>>> 0x7777e10>, <sqlalchemy.pool._ConnectionRecord object at 0x75fb7d0>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x778d810>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x7600610>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x6ea7e10>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x7798050>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x796d110>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x7798d50>,
>>> <sqlalchemy.pool._ConnectionRecord object at 0x75f5710>]) debug
>>>
>>> that is, it's checking out a new connection ?  If it were blocking on
>>> trying to reset the connection, we wouldn't see that tid again.  I
>>> wonder if the greenlet was interrupted or something.    I'm finding
>>> various impossibilities like this going on and not able to come up
>>> with any theory other than the interpreter is somehow not functioning
>>> correcltly, or the logging here is somehow incomplete.
>>>
>>> I notice you're on a very old version of SQLAlchemy (three years old).
>>>     You really should be on at least the 1.1 series because for
>>> greenlets you need to have:
>>>
>>> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3803.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> >
>>> > [1]https://github.com/openstack/nova/blob/master/nova/cmd/__init__.py
>>> >
>>> >
>>> > 在 2018年5月14日星期一 UTC+8下午11:59:43,Mike Bayer写道:
>>> >>
>>> >> I notice in this log file there are multiple "tid", which I assume to
>>> >> be thread identifiers ?   E.g. OS-level threads?    Are you using
>>> >> eventlet or not?    Are you monkeypatching SQLAlchemy and then using
>>> >> multiple OS-level threads at the same time? Because that likely
>>> >> doesn't work.
>>> >>
>>> >> Particularly, the log here illustrates an impossible codepath.
>>> >>
>>> >> We enter finalize_fairy with connection is None:
>>> >>
>>> >> 2018-05-14T11:11:39.961+08:00 localhost nova-conductor DEBUG
>>> >> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
>>> >> [req-74ef6531-2d92-40ea-a524-f54e11edab06] FINALIZE FAIRY: DB CONN:
>>> >> None, FAIRY: <sqlalchemy.pool._ConnectionFairy object at 0x87dd3d0>,
>>> >> REF: None  CONNECTION_REC: <sqlalchemy.pool._ConnectionRecord object
>>> >> at 0x776fe90> CONNECTION_REC REF <weakref at 0x7c5cdb8; to
>>> >> '_ConnectionFairy' at 0x87dd3d0> debug
>>> >>
>>> >> this seems to make it through to the checkin at the bottom.
>>> >>
>>> >> Then, we see this:
>>> >>
>>> >> 2018-05-14T11:11:39.968+08:00 localhost nova-conductor ERROR
>>> >> [pid:5679] [MainThread] [tid:130446032] [log.py:122 error]
>>> >> [req-841bb432-7cd2-4fa3-9def-559a34b05a21] Exception during reset or
>>> >> similar
>>> >>
>>> >> The only place that message is generated is *in* finalize_fairy,
>>> >> *before* the checkin occurs, and *only* if _finalize_fairy was called
>>> >> *with* a connection that is non -None.    There is no other call
>>> >> _finalize_fairy with that request identifier, therefore I see no
>>> >> evidence of where this message "exception during reset" can be
>>> >> generated from - there would at least need to be an additional log
>>> >> line with "FINALIZE FAIRY" and the connection non-None for this to be
>>> >> reached.    I see no way that the pool.py shipped with SQLAlchemy can
>>> >> possibly generate this series of steps.
>>> >>
>>> >> This is an extremely intricate issue and I still have only the vaguest
>>> >> notion of how you are running SQLAlchemy / Nova.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Mon, May 14, 2018 at 11:10 AM, Mike Bayer <[email protected]>
>>> >> wrote:
>>> >> > OK I need the exact pool.py you are using as well that includes all
>>> >> > these logging statements and any other adjustments that were made.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, May 14, 2018 at 8:02 AM,  <[email protected]> wrote:
>>> >> >> I've uploaded the logs here:
>>> >> >>
>>> >> >> https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log
>>> >> >>
>>> >> >> The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032
>>> >> >>
>>> >> >> At  2018-05-14T11:11:39.970, we got the log " jjs rec already in
>>> >> >> pool
>>> >> >> checkin", but I just deleted `return` in code, so the problem
>>> >> >> happen
>>> >> >> again.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 在 2018年5月14日星期一 UTC+8上午10:48:49,Mike Bayer写道:
>>> >> >>>
>>> >> >>> as well as the log that shows this:
>>> >> >>>
>>> >> >>> +    if self in pool._pool.queue:
>>> >> >>> +        pool.logger.debug("jjs rec already in pool checkin, skip
>>> >> >>> %r",
>>> >> >>> self)
>>> >> >>> +        return
>>> >> >>>
>>> >> >>>
>>> >> >>> actually happening.
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer
>>> >> >>> <[email protected]>
>>> >> >>> wrote:
>>> >> >>> > On Sun, May 13, 2018 at 10:35 PM,  <[email protected]> wrote:
>>> >> >>> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong
>>> >> >>> >> Cai). We
>>> >> >>> >> work
>>> >> >>> >> together and add logs in the same environment, add prefix of
>>> >> >>> >> name
>>> >> >>> >> to
>>> >> >>> >> get a
>>> >> >>> >> keyword for log filter.
>>> >> >>> >>
>>> >> >>> >> The doubled ConnectionRecords(with None connection) are got by
>>> >> >>> >> two
>>> >> >>> >> different
>>> >> >>> >> coroutines, and the variable record.connection is rewrite by
>>> >> >>> >> the
>>> >> >>> >> second
>>> >> >>> >> coroutines.
>>> >> >>> >> After the record being put back to QueuePool again, there are
>>> >> >>> >> two
>>> >> >>> >> same
>>> >> >>> >> record with same connection in the pool, which will cause the
>>> >> >>> >> next
>>> >> >>> >> two
>>> >> >>> >> coroutines get conflict and deadlock in psycopg2 as above.
>>> >> >>> >
>>> >> >>> > OK, I need the detailed step by step that shows how the two
>>> >> >>> > coroutines
>>> >> >>> > are interacting such that they both are returning the same
>>> >> >>> > connection.
>>> >> >>> >   The nova log above seems to show just one coroutine I assume
>>> >> >>> > since
>>> >> >>> > there is just one nova request ID, if I understand correctly.
>>> >> >>> >
>>> >> >>> >>
>>> >> >>> >> The logs of psycopg2 has been pasted at
>>> >> >>> >> https://github.com/psycopg/psycopg2/issues/703
>>> >> >>> >
>>> >> >>> > The above issue says the bug has been found in SQLAlchemy but
>>> >> >>> > there
>>> >> >>> > is
>>> >> >>> > no bug report yet and I've not yet been shown how the behavior
>>> >> >>> > here
>>> >> >>> > is
>>> >> >>> > possible.      I posted a straightforward example above, can you
>>> >> >>> > please show me where the two coroutines get mixed up ?
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> 在 2018年5月11日星期五 UTC+8上午3:05:46,Mike Bayer写道:
>>> >> >>> >>>
>>> >> >>> >>> I tried to follow your log lines, which is difficult because I
>>> >> >>> >>> don't
>>> >> >>> >>> know where those log statements actually are and I don't know
>>> >> >>> >>> what
>>> >> >>> >>> acronyms like "jjs", "cjr", mean.   But it does look like it's
>>> >> >>> >>> al
>>> >> >>> >>> in
>>> >> >>> >>> one thread (so not async GC) and all in one Nova request (so
>>> >> >>> >>> perhaps
>>> >> >>> >>> just one greenlet involved, unless other greenlets are
>>> >> >>> >>> affecting
>>> >> >>> >>> this
>>> >> >>> >>> without them being displayed in this log snippet).   I do see
>>> >> >>> >>> the
>>> >> >>> >>> connection record doubled at the end.   So here is the
>>> >> >>> >>> illustration of
>>> >> >>> >>> those steps in isolation, the connection record is not
>>> >> >>> >>> doubled.
>>> >> >>> >>> Can
>>> >> >>> >>> you alter this program to show this condition occurring?
>>> >> >>> >>>
>>> >> >>> >>> from sqlalchemy import create_engine
>>> >> >>> >>>
>>> >> >>> >>> e = create_engine("postgresql://scott:tiger@localhost/test",
>>> >> >>> >>> echo_pool='debug')
>>> >> >>> >>> e.connect().close()
>>> >> >>> >>>
>>> >> >>> >>> assert len(e.pool._pool.queue) == 1
>>> >> >>> >>>
>>> >> >>> >>> c1 = e.connect()
>>> >> >>> >>>
>>> >> >>> >>> assert len(e.pool._pool.queue) == 0
>>> >> >>> >>>
>>> >> >>> >>> c1.engine.pool._invalidate(c1._Connection__connection, None)
>>> >> >>> >>> c1.invalidate(e)
>>> >> >>> >>>
>>> >> >>> >>> # connection record is back
>>> >> >>> >>> assert len(e.pool._pool.queue) == 1
>>> >> >>> >>>
>>> >> >>> >>> # do a reconnect
>>> >> >>> >>> c1.connection
>>> >> >>> >>>
>>> >> >>> >>> # uses the record again
>>> >> >>> >>> assert len(e.pool._pool.queue) == 0
>>> >> >>> >>>
>>> >> >>> >>> # close the Connection
>>> >> >>> >>> c1.close()
>>> >> >>> >>>
>>> >> >>> >>> # record is back
>>> >> >>> >>> assert len(e.pool._pool.queue) == 1
>>> >> >>> >>>
>>> >> >>> >>> # etc.
>>> >> >>> >>> c1 = e.connect()
>>> >> >>> >>> assert len(e.pool._pool.queue) == 0
>>> >> >>> >>>
>>> >> >>> >>> c1.close()
>>> >> >>> >>> assert len(e.pool._pool.queue) == 1
>>> >> >>> >>>
>>> >> >>> >>>
>>> >> >>> >>>
>>> >> >>> >>>
>>> >> >>> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer
>>> >> >>> >>> <[email protected]>
>>> >> >>> >>> wrote:
>>> >> >>> >>> > Here's a print statement you might want to try:
>>> >> >>> >>> >
>>> >> >>> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py
>>> >> >>> >>> > index 89a4cea7c..452db4883 100644
>>> >> >>> >>> > --- a/lib/sqlalchemy/pool.py
>>> >> >>> >>> > +++ b/lib/sqlalchemy/pool.py
>>> >> >>> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection,
>>> >> >>> >>> > connection_record,
>>> >> >>> >>> >      been garbage collected.
>>> >> >>> >>> >
>>> >> >>> >>> >      """
>>> >> >>> >>> > +    print(
>>> >> >>> >>> > +        "FINALIZE FAIRY:  GREENLET ID: %s, DB CONN: %s,
>>> >> >>> >>> > FAIRY:
>>> >> >>> >>> > %s,
>>> >> >>> >>> > REF:
>>> >> >>> >>> > %s  "
>>> >> >>> >>> > +        "CONNECTION_REC: %s CONNECTION_REC REF %s" % (
>>> >> >>> >>> > +            "put greenlet id here",
>>> >> >>> >>> > +            connection,
>>> >> >>> >>> > +            fairy,
>>> >> >>> >>> > +            ref,
>>> >> >>> >>> > +            connection_record,
>>> >> >>> >>> > +            connection_record.fairy_ref
>>> >> >>> >>> > +            if connection_record is not None else "n/a"
>>> >> >>> >>> > +        )
>>> >> >>> >>> > +    )
>>> >> >>> >>> > +
>>> >> >>> >>> >      _refs.discard(connection_record)
>>> >> >>> >>> >
>>> >> >>> >>> >      if ref is not None:
>>> >> >>> >>> >
>>> >> >>> >>> >
>>> >> >>> >>> >
>>> >> >>> >>> >
>>> >> >>> >>> >
>>> >> >>> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer
>>> >> >>> >>> > <[email protected]>
>>> >> >>> >>> > wrote:
>>> >> >>> >>> >> On Thu, May 10, 2018 at 7:23 AM,  <[email protected]>
>>> >> >>> >>> >> wrote:
>>> >> >>> >>> >>> Hi Mike,
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> I added more logs and finnaly find the reason and a method
>>> >> >>> >>> >>> to
>>> >> >>> >>> >>> avoid
>>> >> >>> >>> >>> the
>>> >> >>> >>> >>> deadlock between coroutines.
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> In summary, the deadlock happend because two different
>>> >> >>> >>> >>> coroutines
>>> >> >>> >>> >>> use
>>> >> >>> >>> >>> the
>>> >> >>> >>> >>> same ConnectionRecord with same connection.
>>> >> >>> >>> >>> It happends under QueuePool, when the DB connections are
>>> >> >>> >>> >>> reset
>>> >> >>> >>> >>> by
>>> >> >>> >>> >>> the
>>> >> >>> >>> >>> DB
>>> >> >>> >>> >>> server(not related with the backend driver).
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy,
>>> >> >>> >>> >>> and
>>> >> >>> >>> >>> apply a
>>> >> >>> >>> >>> weakref of the ConnectionFairy, with callback function
>>> >> >>> >>> >>> `_finalize_fairy`.
>>> >> >>> >>> >>> 2.When the connection get disconnected, it receives an
>>> >> >>> >>> >>> exception(psycopg2.OperationalError), and goto
>>> >> >>> >>> >>> Connection._handle_dbapi_exception.
>>> >> >>> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both
>>> >> >>> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`.
>>> >> >>> >>> >>> 4.In `ConnectionFairy.invalidate`,
>>> >> >>> >>> >>> `_ConnectionRecord._checkin` is
>>> >> >>> >>> >>> called,
>>> >> >>> >>> >>> and put the ConnectionRecord back to the QueuePool.
>>> >> >>> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is
>>> >> >>> >>> >>> called,
>>> >> >>> >>> >>> and
>>> >> >>> >>> >>> **del**
>>> >> >>> >>> >>> the ConnectionFairy.
>>> >> >>> >>> >>> 6.The **del** operation will make weakref of
>>> >> >>> >>> >>> ConnectionFairy
>>> >> >>> >>> >>> to
>>> >> >>> >>> >>> dead,
>>> >> >>> >>> >>> and
>>> >> >>> >>> >>> the callback `_finalize_fairy` executed.
>>> >> >>> >>> >>> 7.In `_finalize_fairy`, another
>>> >> >>> >>> >>> `_ConnectionRecord._checkin`
>>> >> >>> >>> >>> is
>>> >> >>> >>> >>> called, put
>>> >> >>> >>> >>> the ConnectionRecord to the QueuePool again.
>>> >> >>> >>> >>
>>> >> >>> >>> >> Here's what does not make sense.  You are claiming that a
>>> >> >>> >>> >> simple
>>> >> >>> >>> >> invalidation will result in the same ConnectionRecord being
>>> >> >>> >>> >> checked
>>> >> >>> >>> >> in
>>> >> >>> >>> >> twice.   This is obviously not the case, what is special in
>>> >> >>> >>> >> your
>>> >> >>> >>> >> situation that makes this happen?
>>> >> >>> >>> >>
>>> >> >>> >>> >> If I follow the steps you refer towards, in step 4, when
>>> >> >>> >>> >> ConnectionFairy._checkin is called, that calls
>>> >> >>> >>> >> finalize_fairy
>>> >> >>> >>> >> directly, which then calls ConnectionRecord.checkin().
>>> >> >>> >>> >> Within
>>> >> >>> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is
>>> >> >>> >>> >> set
>>> >> >>> >>> >> to
>>> >> >>> >>> >> None.
>>> >> >>> >>> >>    Then back in ConnectionFairy._checkin(), it also sets
>>> >> >>> >>> >> self._connection_record = None.
>>> >> >>> >>> >>
>>> >> >>> >>> >> If we come back into finalize_fairy a *second* time then,
>>> >> >>> >>> >> as a
>>> >> >>> >>> >> result
>>> >> >>> >>> >> of the "del" which occurs in _handle_error(), and the
>>> >> >>> >>> >> connection_record is present, the function will exit
>>> >> >>> >>> >> immediately,
>>> >> >>> >>> >> because:
>>> >> >>> >>> >>
>>> >> >>> >>> >>        if connection_record.fairy_ref is not ref:
>>> >> >>> >>> >>             return
>>> >> >>> >>> >>
>>> >> >>> >>> >> So the situation as given is not possible without
>>> >> >>> >>> >> concurrent
>>> >> >>> >>> >> access
>>> >> >>> >>> >> to
>>> >> >>> >>> >> the ConnectionFairy being introduced externally to the
>>> >> >>> >>> >> pool.
>>> >> >>> >>> >>
>>> >> >>> >>> >> There is exactly one thing I can think of that is both
>>> >> >>> >>> >> unique
>>> >> >>> >>> >> to
>>> >> >>> >>> >> the
>>> >> >>> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL
>>> >> >>> >>> >> form
>>> >> >>> >>> >> of
>>> >> >>> >>> >> it,
>>> >> >>> >>> >> that might be important here, and it's that I think
>>> >> >>> >>> >> psycopg2's
>>> >> >>> >>> >> connection.close() method will defer to another greenlet.
>>> >> >>> >>> >> That
>>> >> >>> >>> >> means
>>> >> >>> >>> >> when we're inside of ConnectionRecord.invalidate(), and we
>>> >> >>> >>> >> call
>>> >> >>> >>> >> self.__close(), that could be deferring to another
>>> >> >>> >>> >> greenlet.
>>> >> >>> >>> >> But
>>> >> >>> >>> >> this all happens before the connection is returned to the
>>> >> >>> >>> >> pool,
>>> >> >>> >>> >> so
>>> >> >>> >>> >> I
>>> >> >>> >>> >> still don't know what is happening.
>>> >> >>> >>> >>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> The logs in detail is below. To prove my result is right,
>>> >> >>> >>> >>> I
>>> >> >>> >>> >>> tried
>>> >> >>> >>> >>> to
>>> >> >>> >>> >>> avoid
>>> >> >>> >>> >>> the scienario with this patch, and it worked.
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> def checkin(self):
>>> >> >>> >>> >>>     self.fairy_ref = None
>>> >> >>> >>> >>>     connection = self.connection
>>> >> >>> >>> >>>     pool = self.__pool
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>     while self.finalize_callback:
>>> >> >>> >>> >>>         finalizer = self.finalize_callback.pop()
>>> >> >>> >>> >>>         finalizer(connection)
>>> >> >>> >>> >>> +    if self in pool._pool.queue:
>>> >> >>> >>> >>> +        pool.logger.debug("jjs rec already in pool
>>> >> >>> >>> >>> checkin,
>>> >> >>> >>> >>> skip
>>> >> >>> >>> >>> %r",
>>> >> >>> >>> >>> self)
>>> >> >>> >>> >>> +        return
>>> >> >>> >>> >>
>>> >> >>> >>> >> as mentioned above it's not possible to reach this point
>>> >> >>> >>> >> without
>>> >> >>> >>> >> concurrent access to the pooled connection, which means
>>> >> >>> >>> >> this
>>> >> >>> >>> >> fix
>>> >> >>> >>> >> could
>>> >> >>> >>> >> fail also (with at least threaded-style concurrency, not
>>> >> >>> >>> >> greenlet
>>> >> >>> >>> >> style).   The original source of the greenlet issue must
>>> >> >>> >>> >> still
>>> >> >>> >>> >> be
>>> >> >>> >>> >> identified.
>>> >> >>> >>> >>
>>> >> >>> >>> >> I would need a self-contained proof of concept that
>>> >> >>> >>> >> demonstrates
>>> >> >>> >>> >> this
>>> >> >>> >>> >> issue occurring.  The log output is not that helpful
>>> >> >>> >>> >> because it
>>> >> >>> >>> >> at
>>> >> >>> >>> >> least would need to write out the identifier of the current
>>> >> >>> >>> >> greenlet,
>>> >> >>> >>> >> since it is very likely that greenlets have something to do
>>> >> >>> >>> >> with
>>> >> >>> >>> >> this
>>> >> >>> >>> >> issue.
>>> >> >>> >>> >>
>>> >> >>> >>> >>
>>> >> >>> >>> >>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>     if pool.dispatch.checkin:
>>> >> >>> >>> >>>         pool.dispatch.checkin(connection, self)
>>> >> >>> >>> >>>     pool._return_conn(self)
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> Trace the ConnectionRecord 0x88fc5d0
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new
>>> >> >>> >>> >>> connection
>>> >> >>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>>> >> >>> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn
>>> >> >>> >>> >>> success
>>> >> >>> >>> >>> <connection
>>> >> >>> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection
>>> >> >>> >>> >>> <connection
>>> >> >>> >>> >>> object
>>> >> >>> >>> >>> at
>>> >> >>> >>> >>> 0x81f69b0; closed: 0>, pool queue
>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object at
>>> >> >>> >>> >>> 0x79fdad0>])
>>> >> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout
>>> >> >>> >>> >>> fairy
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>
>>> >> >>> >>> >>> pool
>>> >> >>> >>> >>> queue
>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at
>>> >> >>> >>> >>> 0x7a04810>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>])
>>> >> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection
>>> >> >>> >>> >>> <connection
>>> >> >>> >>> >>> object at
>>> >> >>> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool
>>> >> >>> >>> >>> size:
>>> >> >>> >>> >>> 10
>>> >> >>> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow:
>>> >> >>> >>> >>> 40
>>> >> >>> >>> >>> Current
>>> >> >>> >>> >>> Checked
>>> >> >>> >>> >>> out connections: 8, queue
>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object
>>> >> >>> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object
>>> >> >>> >>> >>> at
>>> >> >>> >>> >>> 0x79fdad0>])
>>> >> >>> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>>> >> >>> >>> >>> _ConnectionFairy._checkout
>>> >> >>> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>,
>>> >> >>> >>> >>> rec.conn
>>> >> >>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>>> >> >>> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor
>>> >> >>> >>> >>> ERROR
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn
>>> >> >>> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>,
>>> >> >>> >>> >>> err
>>> >> >>> >>> >>> connection
>>> >> >>> >>> >>> pointer is NULL
>>> >> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor
>>> >> >>> >>> >>> ERROR
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter
>>> >> >>> >>> >>> newraise
>>> >> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor
>>> >> >>> >>> >>> ERROR
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter
>>> >> >>> >>> >>> finally
>>> >> >>> >>> >>>>>Connection._handle_dbapi_exception --> finally -->
>>> >> >>> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate**
>>> >> >>> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate -->
>>> >> >>> >>> >>>>> ConnectionFairy.invalidate -->
>>> >> >>> >>> >>>>> _ConnectionRecord.invalidate
>>> >> >>> >>> >>>>> &
>>> >> >>> >>> >>>>> **self._checkin()**
>>> >> >>> >>> >>>>>self.invalidate() --> **del self._root.__connection**
>>> >> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>>> >> >>> >>> >>> connectionFairy
>>> >> >>> >>> >>> invalidate
>>> >> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor
>>> >> >>> >>> >>> INFO
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:109 info]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate
>>> >> >>> >>> >>> connection
>>> >> >>> >>> >>> <connection
>>> >> >>> >>> >>> object at 0x81f69b0; closed: 2> (reason:
>>> >> >>> >>> >>> OperationalError:connection
>>> >> >>> >>> >>> pointer
>>> >> >>> >>> >>> is NULL
>>> >> >>> >>> >>>>>_ConnectionRecord.invalidate(e)  -->
>>> >> >>> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self,
>>> >> >>> >>> >>>>> e) &
>>> >> >>> >>> >>>>> self.__close() &
>>> >> >>> >>> >>>>> self.connection = None
>>> >> >>> >>> >>>>>self.__close() --> self.finalize_callback.clear() &
>>> >> >>> >>> >>>>> self.__pool._close_connection(self.connection)
>>> >> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing
>>> >> >>> >>> >>> connection
>>> >> >>> >>> >>> <connection
>>> >> >>> >>> >>> object at 0x81f69b0; closed: 2>
>>> >> >>> >>> >>>>>Pool._close_connection(connection) -->
>>> >> >>> >>> >>>>> self._dialect.do_close(connection)
>>> >> >>> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) -->
>>> >> >>> >>> >>>>> dbapi_connection.close() FIN
>>> >> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>>> >> >>> >>> >>> _ConnectionFairy._checkin
>>> >> >>> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at
>>> >> >>> >>> >>> 0x88fc5d0>
>>> >> >>> >>> >>> rec.conn None
>>> >> >>> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn,
>>> >> >>> >>> >>>>> conn_rec,
>>> >> >>> >>> >>>>> pool,
>>> >> >>> >>> >>>>> None,
>>> >> >>> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None
>>> >> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>>> >> >>> >>> >>> _finalize_fairy
>>> >> >>> >>> >>> args
>>> >> >>> >>> >>> conn
>>> >> >>> >>> >>> None, rec.conn None, rec,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object at
>>> >> >>> >>> >>> 0x88fc5d0>. ref None, fairy
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionFairy
>>> >> >>> >>> >>> object
>>> >> >>> >>> >>> at
>>> >> >>> >>> >>> 0x329c0d0>
>>> >> >>> >>> >>>>>_finalize_fairy --> connection_record.checkin()
>>> >> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>>> >> >>> >>> >>> connection_record.checkin
>>> >> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn
>>> >> >>> >>> >>> checkin
>>> >> >>> >>> >>> None,
>>> >> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object at
>>> >> >>> >>> >>> 0x7524590>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]),
>>> >> >>> >>> >>> finalize_callback
>>> >> >>> >>> >>> deque([])
>>> >> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn
>>> >> >>> >>> >>> checkin,
>>> >> >>> >>> >>> pool
>>> >> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>>> >> >>> >>> >>> 0x7524590>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>>> >> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor
>>> >> >>> >>> >>> ERROR
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception
>>> >> >>> >>> >>> during
>>> >> >>> >>> >>> reset
>>> >> >>> >>> >>> or
>>> >> >>> >>> >>> similar
>>> >> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>>> >> >>> >>> >>> connection_record.invalidate
>>> >> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>>> >> >>> >>> >>> connection_record.checkin
>>> >> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn
>>> >> >>> >>> >>> checkin
>>> >> >>> >>> >>> None,
>>> >> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord
>>> >> >>> >>> >>> object at
>>> >> >>> >>> >>> 0x7524590>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]),
>>> >> >>> >>> >>> finalize_callback
>>> >> >>> >>> >>> deque([])
>>> >> >>> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor
>>> >> >>> >>> >>> DEBUG
>>> >> >>> >>> >>> [pid:4528]
>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn
>>> >> >>> >>> >>> checkin,
>>> >> >>> >>> >>> pool
>>> >> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>>> >> >>> >>> >>> 0x7524590>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>,
>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>>
>>> >> >>> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04,[email protected]写道:
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> I found why we could get the same connection twice from
>>> >> >>> >>> >>>> the
>>> >> >>> >>> >>>> pool...
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Because we put it into pool twice.
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> Any suggestion for finding why it put the same connection
>>> >> >>> >>> >>>> into
>>> >> >>> >>> >>>> pool
>>> >> >>> >>> >>>> within
>>> >> >>> >>> >>>> different coroutines?
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor
>>> >> >>> >>> >>>> DEBUG
>>> >> >>> >>> >>>> [pid:15755]
>>> >> >>> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug]
>>> >> >>> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn
>>> >> >>> >>> >>>> checkin
>>> >> >>> >>> >>>> <connection
>>> >> >>> >>> >>>> object at 0x111d9740; closed: 0>
>>> >> >>> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor
>>> >> >>> >>> >>>> DEBUG
>>> >> >>> >>> >>>> [pid:15755]
>>> >> >>> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug]
>>> >> >>> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn
>>> >> >>> >>> >>>> checkin
>>> >> >>> >>> >>>> <connection
>>> >> >>> >>> >>>> object at 0x111d9740; closed: 0>
>>> >> >>> >>> >>>> ...
>>> >> >>> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor
>>> >> >>> >>> >>>> DEBUG
>>> >> >>> >>> >>>> [pid:15755]
>>> >> >>> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug]
>>> >> >>> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection
>>> >> >>> >>> >>>> <connection
>>> >> >>> >>> >>>> object at
>>> >> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>>> >> >>> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor
>>> >> >>> >>> >>>> DEBUG
>>> >> >>> >>> >>>> [pid:15755]
>>> >> >>> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug]
>>> >> >>> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection
>>> >> >>> >>> >>>> <connection
>>> >> >>> >>> >>>> object at
>>> >> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>>
>>> >> >>> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52,[email protected]写道:
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> I added logs in sqlalchemy/pool.py
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> QueuePool.checkout
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> @classmethod
>>> >> >>> >>> >>>>> def checkout(cls, pool):
>>> >> >>> >>> >>>>>     rec = pool._do_get()
>>> >> >>> >>> >>>>>     try:
>>> >> >>> >>> >>>>>         dbapi_connection = rec.get_connection()
>>> >> >>> >>> >>>>>         pool.logger.debug("jjs get conn success %s",
>>> >> >>> >>> >>>>> dbapi_connection)
>>> >> >>> >>> >>>>>     except:
>>> >> >>> >>> >>>>>         pool.logger.debug("jjs get error in
>>> >> >>> >>> >>>>> get_connection")
>>> >> >>> >>> >>>>>         with util.safe_reraise():
>>> >> >>> >>> >>>>>             rec.checkin()
>>> >> >>> >>> >>>>>     echo = pool._should_log_debug()
>>> >> >>> >>> >>>>>     fairy = _ConnectionFairy(dbapi_connection, rec,
>>> >> >>> >>> >>>>> echo)
>>> >> >>> >>> >>>>>     rec.fairy_ref = weakref.ref(
>>> >> >>> >>> >>>>>         fairy,
>>> >> >>> >>> >>>>>         lambda ref: _finalize_fairy and
>>> >> >>> >>> >>>>>         _finalize_fairy(
>>> >> >>> >>> >>>>>             dbapi_connection,
>>> >> >>> >>> >>>>>             rec, pool, ref, echo)
>>> >> >>> >>> >>>>>     )
>>> >> >>> >>> >>>>>     _refs.add(rec)
>>> >> >>> >>> >>>>>     if echo:
>>> >> >>> >>> >>>>>         pool.logger.debug("jjs conn %s, pool status %s",
>>> >> >>> >>> >>>>> dbapi_connection, pool.status())
>>> >> >>> >>> >>>>>         pool.logger.debug("Connection %r checked out
>>> >> >>> >>> >>>>> from
>>> >> >>> >>> >>>>> pool",
>>> >> >>> >>> >>>>>                           dbapi_connection)
>>> >> >>> >>> >>>>>     return fairy
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> Then I got logs below:
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> coroutine A:
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> INFO
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info]
>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait
>>> >> >>> >>> >>>>> is
>>> >> >>> >>> >>>>> False,
>>> >> >>> >>> >>>>> timeout
>>> >> >>> >>> >>>>> 30
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new
>>> >> >>> >>> >>>>> connection
>>> >> >>> >>> >>>>> <connection object at 0x903f8c0; closed: 0>
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn
>>> >> >>> >>> >>>>> success
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at 0x903f8c0; closed: 0>
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at
>>> >> >>> >>> >>>>> 0x903f8c0;  closed: 0>, pool status Pool size: 10
>>> >> >>> >>> >>>>> Connections
>>> >> >>> >>> >>>>> in
>>> >> >>> >>> >>>>> pool: 1
>>> >> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked
>>> >> >>> >>> >>>>> out
>>> >> >>> >>> >>>>> connections: 45
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at
>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> coroutine B:
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> INFO
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info]
>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait
>>> >> >>> >>> >>>>> is
>>> >> >>> >>> >>>>> False,
>>> >> >>> >>> >>>>> timeout
>>> >> >>> >>> >>>>> 30
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn
>>> >> >>> >>> >>>>> success
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at 0x903f8c0;  closed: 0>
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at
>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10
>>> >> >>> >>> >>>>> Connections in
>>> >> >>> >>> >>>>> pool: 0
>>> >> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked
>>> >> >>> >>> >>>>> out
>>> >> >>> >>> >>>>> connections: 46
>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor
>>> >> >>> >>> >>>>> DEBUG
>>> >> >>> >>> >>>>> [pid:55528]
>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection
>>> >> >>> >>> >>>>> <connection
>>> >> >>> >>> >>>>> object at
>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> It seems that when we create_connection in
>>> >> >>> >>> >>>>> QueuePool._do_get, we
>>> >> >>> >>> >>>>> may
>>> >> >>> >>> >>>>> not
>>> >> >>> >>> >>>>> get the lock _pool.mutex(~Queue.mutex).
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>>
>>> >> >>> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道:
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <[email protected]>
>>> >> >>> >>> >>>>>> wrote:
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two
>>> >> >>> >>> >>>>>>> coroutine
>>> >> >>> >>> >>>>>>> use the
>>> >> >>> >>> >>>>>>> same connection before the pthread_mutex_lock
>>> >> >>> >>> >>>>>>> released.
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> Maybe something in the connection pool goes wrong?
>>> >> >>> >>> >>>>>>> [pid] [coroutine_id] msg
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>> This seems unlikely.   Openstack uses the SQLAlchemy
>>> >> >>> >>> >>>>>> connection
>>> >> >>> >>> >>>>>> pool
>>> >> >>> >>> >>>>>> with eventlet patching in conjunction with the MySQL
>>> >> >>> >>> >>>>>> drivers
>>> >> >>> >>> >>>>>> and
>>> >> >>> >>> >>>>>> nothing
>>> >> >>> >>> >>>>>> like that happens.   The biggest issue SQLAlchemy had
>>> >> >>> >>> >>>>>> with
>>> >> >>> >>> >>>>>> greenlets is that
>>> >> >>> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause
>>> >> >>> >>> >>>>>> a
>>> >> >>> >>> >>>>>> dirty
>>> >> >>> >>> >>>>>> connection
>>> >> >>> >>> >>>>>> to go back into the pool, that was fixed in 1.2.
>>> >> >>> >>> >>>>>>
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs
>>> >> >>> >>> >>>>>>> conn
>>> >> >>> >>> >>>>>>> 0x94122f0,
>>> >> >>> >>> >>>>>>> cb 0x23ad320
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn
>>> >> >>> >>> >>>>>>> 0x94122f0
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status =
>>> >> >>> >>> >>>>>>> ASYNC_WRITE
>>> >> >>> >>> >>>>>>> 0x94122f0
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS
>>> >> >>> >>> >>>>>>> conn
>>> >> >>> >>> >>>>>>> 0x94122f0,
>>> >> >>> >>> >>>>>>> async_cursor 0x881ac00
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS
>>> >> >>> >>> >>>>>>> conn
>>> >> >>> >>> >>>>>>> 0x94122f0,
>>> >> >>> >>> >>>>>>> async_cursor 0x881ac00
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0,
>>> >> >>> >>> >>>>>>> autocommit
>>> >> >>> >>> >>>>>>> = 0,
>>> >> >>> >>> >>>>>>> status = 2
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0,
>>> >> >>> >>> >>>>>>> owner
>>> >> >>> >>> >>>>>>> 49174
>>> >> >>> >>> >>>>>>> __lock 1
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> Below is the logs grep by coroutine id:
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]finish send query, before
>>> >> >>> >>> >>>>>>> psyco_wait,
>>> >> >>> >>> >>>>>>> conn
>>> >> >>> >>> >>>>>>> 0x94122f0
>>> >> >>> >>> >>>>>>>
>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback con
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to