good news, I reproduced it. Even better news for you, vindication! nothing to do with greenlets, threads, concurrency, or anything (though it doesn't have to do with the "del" or the weakref, this is much simpler ).
This has to do with a special rule in the pool when there is a so-called "reset agent" involved that cleans up a pooled connection while taking into account the fact that the enclosing Connection had a transaction in progress. That is introducing the _handle_error()->invalidate() routine *inside* the normal _finalize_fairy routine and that is why we see _finalize_fairy twice. The _finalize_fairy we see that seems to start in the middle of the operation is called first, then the _finalize_fairy we see failing is inside of that. So they aren't sibling operations, they are nested. https://bitbucket.org/zzzeek/sqlalchemy/issues/4252/double-checkin-due-to-handle_error-called On Tue, May 15, 2018 at 1:26 PM, Mike Bayer <[email protected]> wrote: > can you add these lines to your logging, ensure that the check for > "already in pool" is present, and produce new logs? > > try: > fairy = fairy or _ConnectionFairy( > connection, connection_record, echo) > assert fairy.connection is connection > pool.logger.debug("ABOUT TO RESET CONNECTION %s CONNECTION > REC %s FAIRY %s", connection, fairy, connection_record) > fairy._reset(pool) > pool.logger.debug("COMPLETED RESET CONNECTION %s > CONNECTION REC %s FAIRY %s", connection, fairy, connection_record) > > # Immediately close detached instances > if not connection_record: > pool._close_connection(connection) > except BaseException as e: > pool.logger.debug("EXCEPTION RAISED IN RESET CONNECTION %s > CONNECTION REC %s FAIRY %s: %s", connection, fairy, connection_record, > e) > pool.logger.error( > "Exception during reset or similar", exc_info=True) > if connection_record: > pool.logger.debug("jjs going to connection_record.invalidate") > connection_record.invalidate(e=e) > if not isinstance(e, Exception): > pool.logger.debug("jjs e %s is not Exception, raise again", e) > raise > > my current theory is that a greenlet is being blocked in _reset(), > then waking up later but is being corrupted at the greenlet level in > some way. > > > > On Tue, May 15, 2018 at 12:15 PM, Mike Bayer <[email protected]> wrote: >> >> >> On Tue, May 15, 2018, 3:26 AM <[email protected]> wrote: >>> >>> We got the log "jjs _finalize_fairy args" just once but the code in >>> function _finalize_fairy seem to be called twice. It also confused me. >> >> >> It's because that code originated elsewhere and got blocked, then it picks >> up again but the callstack is somehow corrupted (or logging isn't telling >> the truth). >> >> Can you please give me: >> >> Exact version of eventlet in use >> Exact version of psycopg2 and psyco-green >> Exact python interpreter version >> Exact version of SQLAlchemy >> Exact version of Oslo.db >> >> Thanks >> >> >> >>> >>> BTW, I tried to add code[1] to avoid double checkin in _finalize_fairy but >>> failed. >>> That means the code in pool.py#L670-692 was called without L646-668. >>> >>> [1] https://github.com/jiajunsu/test_logs/blob/master/pool.py#L659 >>> >>> 在 2018年5月15日星期二 UTC+8下午12:41:17,Mike Bayer写道: >>>> >>>> s >>>> >>>> >>>> >>>> On Mon, May 14, 2018 at 9:38 PM, <[email protected]> wrote: >>>> > The tid is coroutine_id, we get it by id(greenlet.getcurrent()) in each >>>> > coroutine(green thread). >>>> > >>>> > We run nova-conductor with monkey_patch, the same as [1], >>>> > eventlet.monkey_patch(os=False) >>>> > >>>> > Our nova-conductor is running with multi workers. >>>> > >>>> > It's confusing why the _finalize_fairy is called twice but only log >>>> > once at >>>> > the entrance, that's why I guess it's called by weakref's callback. >>>> >>>> For 130446032 which is the tid you mentioned, _finalize_fairy is only >>>> called once in your log: >>>> >>>> $ cat nc_5679.log | grep -e "130446032.*jjs _finalize" >>>> 2018-05-14T11:11:39.955+08:00 localhost nova-conductor DEBUG >>>> [pid:5679] [MainThread] [tid:130446032] [log.py:104 debug] >>>> [req-841bb432-7cd2-4fa3-9def-559a34b05a21] jjs _finalize_fairy args >>>> conn None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord >>>> object at 0x70cdb10>. ref None, fairy >>>> <sqlalchemy.pool._ConnectionFairy object at 0x8a87390> debug >>>> >>>> Also, the log shows that no _finalize_fairy is ever called due to >>>> "del" - this would show the weakref "ref" as non-None: >>>> >>>> $ cat nc_5679.log | grep -e "FINALIZE FAIRY.*REF:.*" | grep -v "REF: >>>> None" >>>> $ >>>> >>>> I would assume by "finalize fairy" called twice, you're referring to >>>> the fact that you see the "exception during reset or simlar" message. >>>> But that message is still AFAICT impossible to be emitted without the >>>> function being entered, and a corresponding "FINALIZE FAIRY" log line >>>> preceding it, unless something is going wrong with the logging, or the >>>> interpreter is somehow not functioning correctly (I wonder if eventlet >>>> had any such issues). I'm struggling to see any other possibility >>>> but not coming up with any. >>>> >>>> One weird thing I'm trying to figure out is that I see a connection >>>> being returned to the pool normally, but then nothing is logged about >>>> it after that: >>>> >>>> 2018-05-14T11:11:21.183+08:00 localhost nova-conductor DEBUG >>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug] >>>> [req-19a57f61-bc64-4575-8253-00b5bfe62a74] Connection <connection >>>> object at 0x7f419c0; closed: 0> being returned to pool, >>>> connection_record.connection <connection object at 0x7f419c0; closed: >>>> 0> debug >>>> >>>> >>>> the next log message from tid:130446352 should be "jjs conn reset". >>>> But that never happens. The next we see of that tid is: >>>> >>>> 2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG >>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug] >>>> [req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs >>>> _ConnectionFairy._checkout fariy None debug >>>> 2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG >>>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug] >>>> [req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs before pool._do_get >>>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at >>>> 0x7777e10>, <sqlalchemy.pool._ConnectionRecord object at 0x75fb7d0>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x778d810>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x7600610>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x6ea7e10>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x7798050>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x796d110>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x7798d50>, >>>> <sqlalchemy.pool._ConnectionRecord object at 0x75f5710>]) debug >>>> >>>> that is, it's checking out a new connection ? If it were blocking on >>>> trying to reset the connection, we wouldn't see that tid again. I >>>> wonder if the greenlet was interrupted or something. I'm finding >>>> various impossibilities like this going on and not able to come up >>>> with any theory other than the interpreter is somehow not functioning >>>> correcltly, or the logging here is somehow incomplete. >>>> >>>> I notice you're on a very old version of SQLAlchemy (three years old). >>>> You really should be on at least the 1.1 series because for >>>> greenlets you need to have: >>>> >>>> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3803. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> > >>>> > [1]https://github.com/openstack/nova/blob/master/nova/cmd/__init__.py >>>> > >>>> > >>>> > 在 2018年5月14日星期一 UTC+8下午11:59:43,Mike Bayer写道: >>>> >> >>>> >> I notice in this log file there are multiple "tid", which I assume to >>>> >> be thread identifiers ? E.g. OS-level threads? Are you using >>>> >> eventlet or not? Are you monkeypatching SQLAlchemy and then using >>>> >> multiple OS-level threads at the same time? Because that likely >>>> >> doesn't work. >>>> >> >>>> >> Particularly, the log here illustrates an impossible codepath. >>>> >> >>>> >> We enter finalize_fairy with connection is None: >>>> >> >>>> >> 2018-05-14T11:11:39.961+08:00 localhost nova-conductor DEBUG >>>> >> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug] >>>> >> [req-74ef6531-2d92-40ea-a524-f54e11edab06] FINALIZE FAIRY: DB CONN: >>>> >> None, FAIRY: <sqlalchemy.pool._ConnectionFairy object at 0x87dd3d0>, >>>> >> REF: None CONNECTION_REC: <sqlalchemy.pool._ConnectionRecord object >>>> >> at 0x776fe90> CONNECTION_REC REF <weakref at 0x7c5cdb8; to >>>> >> '_ConnectionFairy' at 0x87dd3d0> debug >>>> >> >>>> >> this seems to make it through to the checkin at the bottom. >>>> >> >>>> >> Then, we see this: >>>> >> >>>> >> 2018-05-14T11:11:39.968+08:00 localhost nova-conductor ERROR >>>> >> [pid:5679] [MainThread] [tid:130446032] [log.py:122 error] >>>> >> [req-841bb432-7cd2-4fa3-9def-559a34b05a21] Exception during reset or >>>> >> similar >>>> >> >>>> >> The only place that message is generated is *in* finalize_fairy, >>>> >> *before* the checkin occurs, and *only* if _finalize_fairy was called >>>> >> *with* a connection that is non -None. There is no other call >>>> >> _finalize_fairy with that request identifier, therefore I see no >>>> >> evidence of where this message "exception during reset" can be >>>> >> generated from - there would at least need to be an additional log >>>> >> line with "FINALIZE FAIRY" and the connection non-None for this to be >>>> >> reached. I see no way that the pool.py shipped with SQLAlchemy can >>>> >> possibly generate this series of steps. >>>> >> >>>> >> This is an extremely intricate issue and I still have only the vaguest >>>> >> notion of how you are running SQLAlchemy / Nova. >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> On Mon, May 14, 2018 at 11:10 AM, Mike Bayer <[email protected]> >>>> >> wrote: >>>> >> > OK I need the exact pool.py you are using as well that includes all >>>> >> > these logging statements and any other adjustments that were made. >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Mon, May 14, 2018 at 8:02 AM, <[email protected]> wrote: >>>> >> >> I've uploaded the logs here: >>>> >> >> >>>> >> >> https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log >>>> >> >> >>>> >> >> The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032 >>>> >> >> >>>> >> >> At 2018-05-14T11:11:39.970, we got the log " jjs rec already in >>>> >> >> pool >>>> >> >> checkin", but I just deleted `return` in code, so the problem >>>> >> >> happen >>>> >> >> again. >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> 在 2018年5月14日星期一 UTC+8上午10:48:49,Mike Bayer写道: >>>> >> >>> >>>> >> >>> as well as the log that shows this: >>>> >> >>> >>>> >> >>> + if self in pool._pool.queue: >>>> >> >>> + pool.logger.debug("jjs rec already in pool checkin, skip >>>> >> >>> %r", >>>> >> >>> self) >>>> >> >>> + return >>>> >> >>> >>>> >> >>> >>>> >> >>> actually happening. >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer >>>> >> >>> <[email protected]> >>>> >> >>> wrote: >>>> >> >>> > On Sun, May 13, 2018 at 10:35 PM, <[email protected]> wrote: >>>> >> >>> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong >>>> >> >>> >> Cai). We >>>> >> >>> >> work >>>> >> >>> >> together and add logs in the same environment, add prefix of >>>> >> >>> >> name >>>> >> >>> >> to >>>> >> >>> >> get a >>>> >> >>> >> keyword for log filter. >>>> >> >>> >> >>>> >> >>> >> The doubled ConnectionRecords(with None connection) are got by >>>> >> >>> >> two >>>> >> >>> >> different >>>> >> >>> >> coroutines, and the variable record.connection is rewrite by >>>> >> >>> >> the >>>> >> >>> >> second >>>> >> >>> >> coroutines. >>>> >> >>> >> After the record being put back to QueuePool again, there are >>>> >> >>> >> two >>>> >> >>> >> same >>>> >> >>> >> record with same connection in the pool, which will cause the >>>> >> >>> >> next >>>> >> >>> >> two >>>> >> >>> >> coroutines get conflict and deadlock in psycopg2 as above. >>>> >> >>> > >>>> >> >>> > OK, I need the detailed step by step that shows how the two >>>> >> >>> > coroutines >>>> >> >>> > are interacting such that they both are returning the same >>>> >> >>> > connection. >>>> >> >>> > The nova log above seems to show just one coroutine I assume >>>> >> >>> > since >>>> >> >>> > there is just one nova request ID, if I understand correctly. >>>> >> >>> > >>>> >> >>> >> >>>> >> >>> >> The logs of psycopg2 has been pasted at >>>> >> >>> >> https://github.com/psycopg/psycopg2/issues/703 >>>> >> >>> > >>>> >> >>> > The above issue says the bug has been found in SQLAlchemy but >>>> >> >>> > there >>>> >> >>> > is >>>> >> >>> > no bug report yet and I've not yet been shown how the behavior >>>> >> >>> > here >>>> >> >>> > is >>>> >> >>> > possible. I posted a straightforward example above, can you >>>> >> >>> > please show me where the two coroutines get mixed up ? >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> 在 2018年5月11日星期五 UTC+8上午3:05:46,Mike Bayer写道: >>>> >> >>> >>> >>>> >> >>> >>> I tried to follow your log lines, which is difficult because I >>>> >> >>> >>> don't >>>> >> >>> >>> know where those log statements actually are and I don't know >>>> >> >>> >>> what >>>> >> >>> >>> acronyms like "jjs", "cjr", mean. But it does look like it's >>>> >> >>> >>> al >>>> >> >>> >>> in >>>> >> >>> >>> one thread (so not async GC) and all in one Nova request (so >>>> >> >>> >>> perhaps >>>> >> >>> >>> just one greenlet involved, unless other greenlets are >>>> >> >>> >>> affecting >>>> >> >>> >>> this >>>> >> >>> >>> without them being displayed in this log snippet). I do see >>>> >> >>> >>> the >>>> >> >>> >>> connection record doubled at the end. So here is the >>>> >> >>> >>> illustration of >>>> >> >>> >>> those steps in isolation, the connection record is not >>>> >> >>> >>> doubled. >>>> >> >>> >>> Can >>>> >> >>> >>> you alter this program to show this condition occurring? >>>> >> >>> >>> >>>> >> >>> >>> from sqlalchemy import create_engine >>>> >> >>> >>> >>>> >> >>> >>> e = create_engine("postgresql://scott:tiger@localhost/test", >>>> >> >>> >>> echo_pool='debug') >>>> >> >>> >>> e.connect().close() >>>> >> >>> >>> >>>> >> >>> >>> assert len(e.pool._pool.queue) == 1 >>>> >> >>> >>> >>>> >> >>> >>> c1 = e.connect() >>>> >> >>> >>> >>>> >> >>> >>> assert len(e.pool._pool.queue) == 0 >>>> >> >>> >>> >>>> >> >>> >>> c1.engine.pool._invalidate(c1._Connection__connection, None) >>>> >> >>> >>> c1.invalidate(e) >>>> >> >>> >>> >>>> >> >>> >>> # connection record is back >>>> >> >>> >>> assert len(e.pool._pool.queue) == 1 >>>> >> >>> >>> >>>> >> >>> >>> # do a reconnect >>>> >> >>> >>> c1.connection >>>> >> >>> >>> >>>> >> >>> >>> # uses the record again >>>> >> >>> >>> assert len(e.pool._pool.queue) == 0 >>>> >> >>> >>> >>>> >> >>> >>> # close the Connection >>>> >> >>> >>> c1.close() >>>> >> >>> >>> >>>> >> >>> >>> # record is back >>>> >> >>> >>> assert len(e.pool._pool.queue) == 1 >>>> >> >>> >>> >>>> >> >>> >>> # etc. >>>> >> >>> >>> c1 = e.connect() >>>> >> >>> >>> assert len(e.pool._pool.queue) == 0 >>>> >> >>> >>> >>>> >> >>> >>> c1.close() >>>> >> >>> >>> assert len(e.pool._pool.queue) == 1 >>>> >> >>> >>> >>>> >> >>> >>> >>>> >> >>> >>> >>>> >> >>> >>> >>>> >> >>> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer >>>> >> >>> >>> <[email protected]> >>>> >> >>> >>> wrote: >>>> >> >>> >>> > Here's a print statement you might want to try: >>>> >> >>> >>> > >>>> >> >>> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py >>>> >> >>> >>> > index 89a4cea7c..452db4883 100644 >>>> >> >>> >>> > --- a/lib/sqlalchemy/pool.py >>>> >> >>> >>> > +++ b/lib/sqlalchemy/pool.py >>>> >> >>> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection, >>>> >> >>> >>> > connection_record, >>>> >> >>> >>> > been garbage collected. >>>> >> >>> >>> > >>>> >> >>> >>> > """ >>>> >> >>> >>> > + print( >>>> >> >>> >>> > + "FINALIZE FAIRY: GREENLET ID: %s, DB CONN: %s, >>>> >> >>> >>> > FAIRY: >>>> >> >>> >>> > %s, >>>> >> >>> >>> > REF: >>>> >> >>> >>> > %s " >>>> >> >>> >>> > + "CONNECTION_REC: %s CONNECTION_REC REF %s" % ( >>>> >> >>> >>> > + "put greenlet id here", >>>> >> >>> >>> > + connection, >>>> >> >>> >>> > + fairy, >>>> >> >>> >>> > + ref, >>>> >> >>> >>> > + connection_record, >>>> >> >>> >>> > + connection_record.fairy_ref >>>> >> >>> >>> > + if connection_record is not None else "n/a" >>>> >> >>> >>> > + ) >>>> >> >>> >>> > + ) >>>> >> >>> >>> > + >>>> >> >>> >>> > _refs.discard(connection_record) >>>> >> >>> >>> > >>>> >> >>> >>> > if ref is not None: >>>> >> >>> >>> > >>>> >> >>> >>> > >>>> >> >>> >>> > >>>> >> >>> >>> > >>>> >> >>> >>> > >>>> >> >>> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer >>>> >> >>> >>> > <[email protected]> >>>> >> >>> >>> > wrote: >>>> >> >>> >>> >> On Thu, May 10, 2018 at 7:23 AM, <[email protected]> >>>> >> >>> >>> >> wrote: >>>> >> >>> >>> >>> Hi Mike, >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> I added more logs and finnaly find the reason and a method >>>> >> >>> >>> >>> to >>>> >> >>> >>> >>> avoid >>>> >> >>> >>> >>> the >>>> >> >>> >>> >>> deadlock between coroutines. >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> In summary, the deadlock happend because two different >>>> >> >>> >>> >>> coroutines >>>> >> >>> >>> >>> use >>>> >> >>> >>> >>> the >>>> >> >>> >>> >>> same ConnectionRecord with same connection. >>>> >> >>> >>> >>> It happends under QueuePool, when the DB connections are >>>> >> >>> >>> >>> reset >>>> >> >>> >>> >>> by >>>> >> >>> >>> >>> the >>>> >> >>> >>> >>> DB >>>> >> >>> >>> >>> server(not related with the backend driver). >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, >>>> >> >>> >>> >>> and >>>> >> >>> >>> >>> apply a >>>> >> >>> >>> >>> weakref of the ConnectionFairy, with callback function >>>> >> >>> >>> >>> `_finalize_fairy`. >>>> >> >>> >>> >>> 2.When the connection get disconnected, it receives an >>>> >> >>> >>> >>> exception(psycopg2.OperationalError), and goto >>>> >> >>> >>> >>> Connection._handle_dbapi_exception. >>>> >> >>> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both >>>> >> >>> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`. >>>> >> >>> >>> >>> 4.In `ConnectionFairy.invalidate`, >>>> >> >>> >>> >>> `_ConnectionRecord._checkin` is >>>> >> >>> >>> >>> called, >>>> >> >>> >>> >>> and put the ConnectionRecord back to the QueuePool. >>>> >> >>> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is >>>> >> >>> >>> >>> called, >>>> >> >>> >>> >>> and >>>> >> >>> >>> >>> **del** >>>> >> >>> >>> >>> the ConnectionFairy. >>>> >> >>> >>> >>> 6.The **del** operation will make weakref of >>>> >> >>> >>> >>> ConnectionFairy >>>> >> >>> >>> >>> to >>>> >> >>> >>> >>> dead, >>>> >> >>> >>> >>> and >>>> >> >>> >>> >>> the callback `_finalize_fairy` executed. >>>> >> >>> >>> >>> 7.In `_finalize_fairy`, another >>>> >> >>> >>> >>> `_ConnectionRecord._checkin` >>>> >> >>> >>> >>> is >>>> >> >>> >>> >>> called, put >>>> >> >>> >>> >>> the ConnectionRecord to the QueuePool again. >>>> >> >>> >>> >> >>>> >> >>> >>> >> Here's what does not make sense. You are claiming that a >>>> >> >>> >>> >> simple >>>> >> >>> >>> >> invalidation will result in the same ConnectionRecord being >>>> >> >>> >>> >> checked >>>> >> >>> >>> >> in >>>> >> >>> >>> >> twice. This is obviously not the case, what is special in >>>> >> >>> >>> >> your >>>> >> >>> >>> >> situation that makes this happen? >>>> >> >>> >>> >> >>>> >> >>> >>> >> If I follow the steps you refer towards, in step 4, when >>>> >> >>> >>> >> ConnectionFairy._checkin is called, that calls >>>> >> >>> >>> >> finalize_fairy >>>> >> >>> >>> >> directly, which then calls ConnectionRecord.checkin(). >>>> >> >>> >>> >> Within >>>> >> >>> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is >>>> >> >>> >>> >> set >>>> >> >>> >>> >> to >>>> >> >>> >>> >> None. >>>> >> >>> >>> >> Then back in ConnectionFairy._checkin(), it also sets >>>> >> >>> >>> >> self._connection_record = None. >>>> >> >>> >>> >> >>>> >> >>> >>> >> If we come back into finalize_fairy a *second* time then, >>>> >> >>> >>> >> as a >>>> >> >>> >>> >> result >>>> >> >>> >>> >> of the "del" which occurs in _handle_error(), and the >>>> >> >>> >>> >> connection_record is present, the function will exit >>>> >> >>> >>> >> immediately, >>>> >> >>> >>> >> because: >>>> >> >>> >>> >> >>>> >> >>> >>> >> if connection_record.fairy_ref is not ref: >>>> >> >>> >>> >> return >>>> >> >>> >>> >> >>>> >> >>> >>> >> So the situation as given is not possible without >>>> >> >>> >>> >> concurrent >>>> >> >>> >>> >> access >>>> >> >>> >>> >> to >>>> >> >>> >>> >> the ConnectionFairy being introduced externally to the >>>> >> >>> >>> >> pool. >>>> >> >>> >>> >> >>>> >> >>> >>> >> There is exactly one thing I can think of that is both >>>> >> >>> >>> >> unique >>>> >> >>> >>> >> to >>>> >> >>> >>> >> the >>>> >> >>> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL >>>> >> >>> >>> >> form >>>> >> >>> >>> >> of >>>> >> >>> >>> >> it, >>>> >> >>> >>> >> that might be important here, and it's that I think >>>> >> >>> >>> >> psycopg2's >>>> >> >>> >>> >> connection.close() method will defer to another greenlet. >>>> >> >>> >>> >> That >>>> >> >>> >>> >> means >>>> >> >>> >>> >> when we're inside of ConnectionRecord.invalidate(), and we >>>> >> >>> >>> >> call >>>> >> >>> >>> >> self.__close(), that could be deferring to another >>>> >> >>> >>> >> greenlet. >>>> >> >>> >>> >> But >>>> >> >>> >>> >> this all happens before the connection is returned to the >>>> >> >>> >>> >> pool, >>>> >> >>> >>> >> so >>>> >> >>> >>> >> I >>>> >> >>> >>> >> still don't know what is happening. >>>> >> >>> >>> >> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> The logs in detail is below. To prove my result is right, >>>> >> >>> >>> >>> I >>>> >> >>> >>> >>> tried >>>> >> >>> >>> >>> to >>>> >> >>> >>> >>> avoid >>>> >> >>> >>> >>> the scienario with this patch, and it worked. >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> def checkin(self): >>>> >> >>> >>> >>> self.fairy_ref = None >>>> >> >>> >>> >>> connection = self.connection >>>> >> >>> >>> >>> pool = self.__pool >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> while self.finalize_callback: >>>> >> >>> >>> >>> finalizer = self.finalize_callback.pop() >>>> >> >>> >>> >>> finalizer(connection) >>>> >> >>> >>> >>> + if self in pool._pool.queue: >>>> >> >>> >>> >>> + pool.logger.debug("jjs rec already in pool >>>> >> >>> >>> >>> checkin, >>>> >> >>> >>> >>> skip >>>> >> >>> >>> >>> %r", >>>> >> >>> >>> >>> self) >>>> >> >>> >>> >>> + return >>>> >> >>> >>> >> >>>> >> >>> >>> >> as mentioned above it's not possible to reach this point >>>> >> >>> >>> >> without >>>> >> >>> >>> >> concurrent access to the pooled connection, which means >>>> >> >>> >>> >> this >>>> >> >>> >>> >> fix >>>> >> >>> >>> >> could >>>> >> >>> >>> >> fail also (with at least threaded-style concurrency, not >>>> >> >>> >>> >> greenlet >>>> >> >>> >>> >> style). The original source of the greenlet issue must >>>> >> >>> >>> >> still >>>> >> >>> >>> >> be >>>> >> >>> >>> >> identified. >>>> >> >>> >>> >> >>>> >> >>> >>> >> I would need a self-contained proof of concept that >>>> >> >>> >>> >> demonstrates >>>> >> >>> >>> >> this >>>> >> >>> >>> >> issue occurring. The log output is not that helpful >>>> >> >>> >>> >> because it >>>> >> >>> >>> >> at >>>> >> >>> >>> >> least would need to write out the identifier of the current >>>> >> >>> >>> >> greenlet, >>>> >> >>> >>> >> since it is very likely that greenlets have something to do >>>> >> >>> >>> >> with >>>> >> >>> >>> >> this >>>> >> >>> >>> >> issue. >>>> >> >>> >>> >> >>>> >> >>> >>> >> >>>> >> >>> >>> >> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> if pool.dispatch.checkin: >>>> >> >>> >>> >>> pool.dispatch.checkin(connection, self) >>>> >> >>> >>> >>> pool._return_conn(self) >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> Trace the ConnectionRecord 0x88fc5d0 >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new >>>> >> >>> >>> >>> connection >>>> >> >>> >>> >>> <connection object at 0x81f69b0; closed: 0> >>>> >> >>> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn >>>> >> >>> >>> >>> success >>>> >> >>> >>> >>> <connection >>>> >> >>> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection >>>> >> >>> >>> >>> <connection >>>> >> >>> >>> >>> object >>>> >> >>> >>> >>> at >>>> >> >>> >>> >>> 0x81f69b0; closed: 0>, pool queue >>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object at >>>> >> >>> >>> >>> 0x79fdad0>]) >>>> >> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout >>>> >> >>> >>> >>> fairy >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> >>>> >> >>> >>> >>> pool >>>> >> >>> >>> >>> queue >>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at >>>> >> >>> >>> >>> 0x7a04810>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>]) >>>> >> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection >>>> >> >>> >>> >>> <connection >>>> >> >>> >>> >>> object at >>>> >> >>> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool >>>> >> >>> >>> >>> size: >>>> >> >>> >>> >>> 10 >>>> >> >>> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: >>>> >> >>> >>> >>> 40 >>>> >> >>> >>> >>> Current >>>> >> >>> >>> >>> Checked >>>> >> >>> >>> >>> out connections: 8, queue >>>> >> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object >>>> >> >>> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object >>>> >> >>> >>> >>> at >>>> >> >>> >>> >>> 0x79fdad0>]) >>>> >> >>> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >>>> >> >>> >>> >>> _ConnectionFairy._checkout >>>> >> >>> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, >>>> >> >>> >>> >>> rec.conn >>>> >> >>> >>> >>> <connection object at 0x81f69b0; closed: 0> >>>> >> >>> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor >>>> >> >>> >>> >>> ERROR >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn >>>> >> >>> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, >>>> >> >>> >>> >>> err >>>> >> >>> >>> >>> connection >>>> >> >>> >>> >>> pointer is NULL >>>> >> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor >>>> >> >>> >>> >>> ERROR >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter >>>> >> >>> >>> >>> newraise >>>> >> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor >>>> >> >>> >>> >>> ERROR >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter >>>> >> >>> >>> >>> finally >>>> >> >>> >>> >>>>>Connection._handle_dbapi_exception --> finally --> >>>> >> >>> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate** >>>> >> >>> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate --> >>>> >> >>> >>> >>>>> ConnectionFairy.invalidate --> >>>> >> >>> >>> >>>>> _ConnectionRecord.invalidate >>>> >> >>> >>> >>>>> & >>>> >> >>> >>> >>>>> **self._checkin()** >>>> >> >>> >>> >>>>>self.invalidate() --> **del self._root.__connection** >>>> >> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >>>> >> >>> >>> >>> connectionFairy >>>> >> >>> >>> >>> invalidate >>>> >> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor >>>> >> >>> >>> >>> INFO >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:109 info] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate >>>> >> >>> >>> >>> connection >>>> >> >>> >>> >>> <connection >>>> >> >>> >>> >>> object at 0x81f69b0; closed: 2> (reason: >>>> >> >>> >>> >>> OperationalError:connection >>>> >> >>> >>> >>> pointer >>>> >> >>> >>> >>> is NULL >>>> >> >>> >>> >>>>>_ConnectionRecord.invalidate(e) --> >>>> >> >>> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, >>>> >> >>> >>> >>>>> e) & >>>> >> >>> >>> >>>>> self.__close() & >>>> >> >>> >>> >>>>> self.connection = None >>>> >> >>> >>> >>>>>self.__close() --> self.finalize_callback.clear() & >>>> >> >>> >>> >>>>> self.__pool._close_connection(self.connection) >>>> >> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing >>>> >> >>> >>> >>> connection >>>> >> >>> >>> >>> <connection >>>> >> >>> >>> >>> object at 0x81f69b0; closed: 2> >>>> >> >>> >>> >>>>>Pool._close_connection(connection) --> >>>> >> >>> >>> >>>>> self._dialect.do_close(connection) >>>> >> >>> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) --> >>>> >> >>> >>> >>>>> dbapi_connection.close() FIN >>>> >> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >>>> >> >>> >>> >>> _ConnectionFairy._checkin >>>> >> >>> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at >>>> >> >>> >>> >>> 0x88fc5d0> >>>> >> >>> >>> >>> rec.conn None >>>> >> >>> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, >>>> >> >>> >>> >>>>> conn_rec, >>>> >> >>> >>> >>>>> pool, >>>> >> >>> >>> >>>>> None, >>>> >> >>> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None >>>> >> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >>>> >> >>> >>> >>> _finalize_fairy >>>> >> >>> >>> >>> args >>>> >> >>> >>> >>> conn >>>> >> >>> >>> >>> None, rec.conn None, rec, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object at >>>> >> >>> >>> >>> 0x88fc5d0>. ref None, fairy >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionFairy >>>> >> >>> >>> >>> object >>>> >> >>> >>> >>> at >>>> >> >>> >>> >>> 0x329c0d0> >>>> >> >>> >>> >>>>>_finalize_fairy --> connection_record.checkin() >>>> >> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >>>> >> >>> >>> >>> connection_record.checkin >>>> >> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn >>>> >> >>> >>> >>> checkin >>>> >> >>> >>> >>> None, >>>> >> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object at >>>> >> >>> >>> >>> 0x7524590>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]), >>>> >> >>> >>> >>> finalize_callback >>>> >> >>> >>> >>> deque([]) >>>> >> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn >>>> >> >>> >>> >>> checkin, >>>> >> >>> >>> >>> pool >>>> >> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at >>>> >> >>> >>> >>> 0x7524590>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) >>>> >> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor >>>> >> >>> >>> >>> ERROR >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception >>>> >> >>> >>> >>> during >>>> >> >>> >>> >>> reset >>>> >> >>> >>> >>> or >>>> >> >>> >>> >>> similar >>>> >> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >>>> >> >>> >>> >>> connection_record.invalidate >>>> >> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >>>> >> >>> >>> >>> connection_record.checkin >>>> >> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn >>>> >> >>> >>> >>> checkin >>>> >> >>> >>> >>> None, >>>> >> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord >>>> >> >>> >>> >>> object at >>>> >> >>> >>> >>> 0x7524590>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]), >>>> >> >>> >>> >>> finalize_callback >>>> >> >>> >>> >>> deque([]) >>>> >> >>> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor >>>> >> >>> >>> >>> DEBUG >>>> >> >>> >>> >>> [pid:4528] >>>> >> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >>>> >> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn >>>> >> >>> >>> >>> checkin, >>>> >> >>> >>> >>> pool >>>> >> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at >>>> >> >>> >>> >>> 0x7524590>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, >>>> >> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> >>>> >> >>> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04,[email protected]写道: >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> I found why we could get the same connection twice from >>>> >> >>> >>> >>>> the >>>> >> >>> >>> >>>> pool... >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> Because we put it into pool twice. >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> Any suggestion for finding why it put the same connection >>>> >> >>> >>> >>>> into >>>> >> >>> >>> >>>> pool >>>> >> >>> >>> >>>> within >>>> >> >>> >>> >>>> different coroutines? >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor >>>> >> >>> >>> >>>> DEBUG >>>> >> >>> >>> >>>> [pid:15755] >>>> >> >>> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug] >>>> >> >>> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn >>>> >> >>> >>> >>>> checkin >>>> >> >>> >>> >>>> <connection >>>> >> >>> >>> >>>> object at 0x111d9740; closed: 0> >>>> >> >>> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor >>>> >> >>> >>> >>>> DEBUG >>>> >> >>> >>> >>>> [pid:15755] >>>> >> >>> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug] >>>> >> >>> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn >>>> >> >>> >>> >>>> checkin >>>> >> >>> >>> >>>> <connection >>>> >> >>> >>> >>>> object at 0x111d9740; closed: 0> >>>> >> >>> >>> >>>> ... >>>> >> >>> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor >>>> >> >>> >>> >>>> DEBUG >>>> >> >>> >>> >>>> [pid:15755] >>>> >> >>> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug] >>>> >> >>> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection >>>> >> >>> >>> >>>> <connection >>>> >> >>> >>> >>>> object at >>>> >> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool >>>> >> >>> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor >>>> >> >>> >>> >>>> DEBUG >>>> >> >>> >>> >>>> [pid:15755] >>>> >> >>> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug] >>>> >> >>> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection >>>> >> >>> >>> >>>> <connection >>>> >> >>> >>> >>>> object at >>>> >> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> >>>> >> >>> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52,[email protected]写道: >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> I added logs in sqlalchemy/pool.py >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> QueuePool.checkout >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> @classmethod >>>> >> >>> >>> >>>>> def checkout(cls, pool): >>>> >> >>> >>> >>>>> rec = pool._do_get() >>>> >> >>> >>> >>>>> try: >>>> >> >>> >>> >>>>> dbapi_connection = rec.get_connection() >>>> >> >>> >>> >>>>> pool.logger.debug("jjs get conn success %s", >>>> >> >>> >>> >>>>> dbapi_connection) >>>> >> >>> >>> >>>>> except: >>>> >> >>> >>> >>>>> pool.logger.debug("jjs get error in >>>> >> >>> >>> >>>>> get_connection") >>>> >> >>> >>> >>>>> with util.safe_reraise(): >>>> >> >>> >>> >>>>> rec.checkin() >>>> >> >>> >>> >>>>> echo = pool._should_log_debug() >>>> >> >>> >>> >>>>> fairy = _ConnectionFairy(dbapi_connection, rec, >>>> >> >>> >>> >>>>> echo) >>>> >> >>> >>> >>>>> rec.fairy_ref = weakref.ref( >>>> >> >>> >>> >>>>> fairy, >>>> >> >>> >>> >>>>> lambda ref: _finalize_fairy and >>>> >> >>> >>> >>>>> _finalize_fairy( >>>> >> >>> >>> >>>>> dbapi_connection, >>>> >> >>> >>> >>>>> rec, pool, ref, echo) >>>> >> >>> >>> >>>>> ) >>>> >> >>> >>> >>>>> _refs.add(rec) >>>> >> >>> >>> >>>>> if echo: >>>> >> >>> >>> >>>>> pool.logger.debug("jjs conn %s, pool status %s", >>>> >> >>> >>> >>>>> dbapi_connection, pool.status()) >>>> >> >>> >>> >>>>> pool.logger.debug("Connection %r checked out >>>> >> >>> >>> >>>>> from >>>> >> >>> >>> >>>>> pool", >>>> >> >>> >>> >>>>> dbapi_connection) >>>> >> >>> >>> >>>>> return fairy >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> Then I got logs below: >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> coroutine A: >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> INFO >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info] >>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait >>>> >> >>> >>> >>>>> is >>>> >> >>> >>> >>>>> False, >>>> >> >>> >>> >>>>> timeout >>>> >> >>> >>> >>>>> 30 >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new >>>> >> >>> >>> >>>>> connection >>>> >> >>> >>> >>>>> <connection object at 0x903f8c0; closed: 0> >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn >>>> >> >>> >>> >>>>> success >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at 0x903f8c0; closed: 0> >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at >>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 >>>> >> >>> >>> >>>>> Connections >>>> >> >>> >>> >>>>> in >>>> >> >>> >>> >>>>> pool: 1 >>>> >> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked >>>> >> >>> >>> >>>>> out >>>> >> >>> >>> >>>>> connections: 45 >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at >>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> coroutine B: >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> INFO >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info] >>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait >>>> >> >>> >>> >>>>> is >>>> >> >>> >>> >>>>> False, >>>> >> >>> >>> >>>>> timeout >>>> >> >>> >>> >>>>> 30 >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn >>>> >> >>> >>> >>>>> success >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at 0x903f8c0; closed: 0> >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at >>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 >>>> >> >>> >>> >>>>> Connections in >>>> >> >>> >>> >>>>> pool: 0 >>>> >> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked >>>> >> >>> >>> >>>>> out >>>> >> >>> >>> >>>>> connections: 46 >>>> >> >>> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor >>>> >> >>> >>> >>>>> DEBUG >>>> >> >>> >>> >>>>> [pid:55528] >>>> >> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >>>> >> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection >>>> >> >>> >>> >>>>> <connection >>>> >> >>> >>> >>>>> object at >>>> >> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> It seems that when we create_connection in >>>> >> >>> >>> >>>>> QueuePool._do_get, we >>>> >> >>> >>> >>>>> may >>>> >> >>> >>> >>>>> not >>>> >> >>> >>> >>>>> get the lock _pool.mutex(~Queue.mutex). >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> >>>> >> >>> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道: >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <[email protected]> >>>> >> >>> >>> >>>>>> wrote: >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two >>>> >> >>> >>> >>>>>>> coroutine >>>> >> >>> >>> >>>>>>> use the >>>> >> >>> >>> >>>>>>> same connection before the pthread_mutex_lock >>>> >> >>> >>> >>>>>>> released. >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> Maybe something in the connection pool goes wrong? >>>> >> >>> >>> >>>>>>> [pid] [coroutine_id] msg >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>> This seems unlikely. Openstack uses the SQLAlchemy >>>> >> >>> >>> >>>>>> connection >>>> >> >>> >>> >>>>>> pool >>>> >> >>> >>> >>>>>> with eventlet patching in conjunction with the MySQL >>>> >> >>> >>> >>>>>> drivers >>>> >> >>> >>> >>>>>> and >>>> >> >>> >>> >>>>>> nothing >>>> >> >>> >>> >>>>>> like that happens. The biggest issue SQLAlchemy had >>>> >> >>> >>> >>>>>> with >>>> >> >>> >>> >>>>>> greenlets is that >>>> >> >>> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause >>>> >> >>> >>> >>>>>> a >>>> >> >>> >>> >>>>>> dirty >>>> >> >>> >>> >>>>>> connection >>>> >> >>> >>> >>>>>> to go back into the pool, that was fixed in 1.2. >>>> >> >>> >>> >>>>>> >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs >>>> >> >>> >>> >>>>>>> conn >>>> >> >>> >>> >>>>>>> 0x94122f0, >>>> >> >>> >>> >>>>>>> cb 0x23ad320 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn >>>> >> >>> >>> >>>>>>> 0x94122f0 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = >>>> >> >>> >>> >>>>>>> ASYNC_WRITE >>>> >> >>> >>> >>>>>>> 0x94122f0 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS >>>> >> >>> >>> >>>>>>> conn >>>> >> >>> >>> >>>>>>> 0x94122f0, >>>> >> >>> >>> >>>>>>> async_cursor 0x881ac00 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS >>>> >> >>> >>> >>>>>>> conn >>>> >> >>> >>> >>>>>>> 0x94122f0, >>>> >> >>> >>> >>>>>>> async_cursor 0x881ac00 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, >>>> >> >>> >>> >>>>>>> autocommit >>>> >> >>> >>> >>>>>>> = 0, >>>> >> >>> >>> >>>>>>> status = 2 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, >>>> >> >>> >>> >>>>>>> owner >>>> >> >>> >>> >>>>>>> 49174 >>>> >> >>> >>> >>>>>>> __lock 1 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> Below is the logs grep by coroutine id: >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]finish send query, before >>>> >> >>> >>> >>>>>>> psyco_wait, >>>> >> >>> >>> >>>>>>> conn >>>> >> >>> >>> >>>>>>> 0x94122f0 >>>> >> >>> >>> >>>>>>> >>>> >> >>> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback con >>> >>> -- >>> SQLAlchemy - >>> The Python SQL Toolkit and Object Relational Mapper >>> >>> http://www.sqlalchemy.org/ >>> >>> To post example code, please provide an MCVE: Minimal, Complete, and >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> description. >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
