OK I need the exact pool.py you are using as well that includes all these logging statements and any other adjustments that were made.
On Mon, May 14, 2018 at 8:02 AM, <[email protected]> wrote: > I've uploaded the logs here: > https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log > > The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032 > > At 2018-05-14T11:11:39.970, we got the log " jjs rec already in pool > checkin", but I just deleted `return` in code, so the problem happen again. > > > > > 在 2018年5月14日星期一 UTC+8上午10:48:49,Mike Bayer写道: >> >> as well as the log that shows this: >> >> + if self in pool._pool.queue: >> + pool.logger.debug("jjs rec already in pool checkin, skip %r", >> self) >> + return >> >> >> actually happening. >> >> >> >> >> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer <[email protected]> >> wrote: >> > On Sun, May 13, 2018 at 10:35 PM, <[email protected]> wrote: >> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We >> >> work >> >> together and add logs in the same environment, add prefix of name to >> >> get a >> >> keyword for log filter. >> >> >> >> The doubled ConnectionRecords(with None connection) are got by two >> >> different >> >> coroutines, and the variable record.connection is rewrite by the second >> >> coroutines. >> >> After the record being put back to QueuePool again, there are two same >> >> record with same connection in the pool, which will cause the next two >> >> coroutines get conflict and deadlock in psycopg2 as above. >> > >> > OK, I need the detailed step by step that shows how the two coroutines >> > are interacting such that they both are returning the same connection. >> > The nova log above seems to show just one coroutine I assume since >> > there is just one nova request ID, if I understand correctly. >> > >> >> >> >> The logs of psycopg2 has been pasted at >> >> https://github.com/psycopg/psycopg2/issues/703 >> > >> > The above issue says the bug has been found in SQLAlchemy but there is >> > no bug report yet and I've not yet been shown how the behavior here is >> > possible. I posted a straightforward example above, can you >> > please show me where the two coroutines get mixed up ? >> > >> > >> > >> > >> >> >> >> >> >> >> >> 在 2018年5月11日星期五 UTC+8上午3:05:46,Mike Bayer写道: >> >>> >> >>> I tried to follow your log lines, which is difficult because I don't >> >>> know where those log statements actually are and I don't know what >> >>> acronyms like "jjs", "cjr", mean. But it does look like it's al in >> >>> one thread (so not async GC) and all in one Nova request (so perhaps >> >>> just one greenlet involved, unless other greenlets are affecting this >> >>> without them being displayed in this log snippet). I do see the >> >>> connection record doubled at the end. So here is the illustration of >> >>> those steps in isolation, the connection record is not doubled. Can >> >>> you alter this program to show this condition occurring? >> >>> >> >>> from sqlalchemy import create_engine >> >>> >> >>> e = create_engine("postgresql://scott:tiger@localhost/test", >> >>> echo_pool='debug') >> >>> e.connect().close() >> >>> >> >>> assert len(e.pool._pool.queue) == 1 >> >>> >> >>> c1 = e.connect() >> >>> >> >>> assert len(e.pool._pool.queue) == 0 >> >>> >> >>> c1.engine.pool._invalidate(c1._Connection__connection, None) >> >>> c1.invalidate(e) >> >>> >> >>> # connection record is back >> >>> assert len(e.pool._pool.queue) == 1 >> >>> >> >>> # do a reconnect >> >>> c1.connection >> >>> >> >>> # uses the record again >> >>> assert len(e.pool._pool.queue) == 0 >> >>> >> >>> # close the Connection >> >>> c1.close() >> >>> >> >>> # record is back >> >>> assert len(e.pool._pool.queue) == 1 >> >>> >> >>> # etc. >> >>> c1 = e.connect() >> >>> assert len(e.pool._pool.queue) == 0 >> >>> >> >>> c1.close() >> >>> assert len(e.pool._pool.queue) == 1 >> >>> >> >>> >> >>> >> >>> >> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer <[email protected]> >> >>> wrote: >> >>> > Here's a print statement you might want to try: >> >>> > >> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py >> >>> > index 89a4cea7c..452db4883 100644 >> >>> > --- a/lib/sqlalchemy/pool.py >> >>> > +++ b/lib/sqlalchemy/pool.py >> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection, >> >>> > connection_record, >> >>> > been garbage collected. >> >>> > >> >>> > """ >> >>> > + print( >> >>> > + "FINALIZE FAIRY: GREENLET ID: %s, DB CONN: %s, FAIRY: %s, >> >>> > REF: >> >>> > %s " >> >>> > + "CONNECTION_REC: %s CONNECTION_REC REF %s" % ( >> >>> > + "put greenlet id here", >> >>> > + connection, >> >>> > + fairy, >> >>> > + ref, >> >>> > + connection_record, >> >>> > + connection_record.fairy_ref >> >>> > + if connection_record is not None else "n/a" >> >>> > + ) >> >>> > + ) >> >>> > + >> >>> > _refs.discard(connection_record) >> >>> > >> >>> > if ref is not None: >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer >> >>> > <[email protected]> >> >>> > wrote: >> >>> >> On Thu, May 10, 2018 at 7:23 AM, <[email protected]> wrote: >> >>> >>> Hi Mike, >> >>> >>> >> >>> >>> I added more logs and finnaly find the reason and a method to >> >>> >>> avoid >> >>> >>> the >> >>> >>> deadlock between coroutines. >> >>> >>> >> >>> >>> In summary, the deadlock happend because two different coroutines >> >>> >>> use >> >>> >>> the >> >>> >>> same ConnectionRecord with same connection. >> >>> >>> It happends under QueuePool, when the DB connections are reset by >> >>> >>> the >> >>> >>> DB >> >>> >>> server(not related with the backend driver). >> >>> >>> >> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, and >> >>> >>> apply a >> >>> >>> weakref of the ConnectionFairy, with callback function >> >>> >>> `_finalize_fairy`. >> >>> >>> 2.When the connection get disconnected, it receives an >> >>> >>> exception(psycopg2.OperationalError), and goto >> >>> >>> Connection._handle_dbapi_exception. >> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both >> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`. >> >>> >>> 4.In `ConnectionFairy.invalidate`, `_ConnectionRecord._checkin` is >> >>> >>> called, >> >>> >>> and put the ConnectionRecord back to the QueuePool. >> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is called, >> >>> >>> and >> >>> >>> **del** >> >>> >>> the ConnectionFairy. >> >>> >>> 6.The **del** operation will make weakref of ConnectionFairy to >> >>> >>> dead, >> >>> >>> and >> >>> >>> the callback `_finalize_fairy` executed. >> >>> >>> 7.In `_finalize_fairy`, another `_ConnectionRecord._checkin` is >> >>> >>> called, put >> >>> >>> the ConnectionRecord to the QueuePool again. >> >>> >> >> >>> >> Here's what does not make sense. You are claiming that a simple >> >>> >> invalidation will result in the same ConnectionRecord being checked >> >>> >> in >> >>> >> twice. This is obviously not the case, what is special in your >> >>> >> situation that makes this happen? >> >>> >> >> >>> >> If I follow the steps you refer towards, in step 4, when >> >>> >> ConnectionFairy._checkin is called, that calls finalize_fairy >> >>> >> directly, which then calls ConnectionRecord.checkin(). Within >> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is set to >> >>> >> None. >> >>> >> Then back in ConnectionFairy._checkin(), it also sets >> >>> >> self._connection_record = None. >> >>> >> >> >>> >> If we come back into finalize_fairy a *second* time then, as a >> >>> >> result >> >>> >> of the "del" which occurs in _handle_error(), and the >> >>> >> connection_record is present, the function will exit immediately, >> >>> >> because: >> >>> >> >> >>> >> if connection_record.fairy_ref is not ref: >> >>> >> return >> >>> >> >> >>> >> So the situation as given is not possible without concurrent access >> >>> >> to >> >>> >> the ConnectionFairy being introduced externally to the pool. >> >>> >> >> >>> >> There is exactly one thing I can think of that is both unique to >> >>> >> the >> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL form of >> >>> >> it, >> >>> >> that might be important here, and it's that I think psycopg2's >> >>> >> connection.close() method will defer to another greenlet. That >> >>> >> means >> >>> >> when we're inside of ConnectionRecord.invalidate(), and we call >> >>> >> self.__close(), that could be deferring to another greenlet. But >> >>> >> this all happens before the connection is returned to the pool, so >> >>> >> I >> >>> >> still don't know what is happening. >> >>> >> >> >>> >>> >> >>> >>> The logs in detail is below. To prove my result is right, I tried >> >>> >>> to >> >>> >>> avoid >> >>> >>> the scienario with this patch, and it worked. >> >>> >>> >> >>> >>> def checkin(self): >> >>> >>> self.fairy_ref = None >> >>> >>> connection = self.connection >> >>> >>> pool = self.__pool >> >>> >>> >> >>> >>> while self.finalize_callback: >> >>> >>> finalizer = self.finalize_callback.pop() >> >>> >>> finalizer(connection) >> >>> >>> + if self in pool._pool.queue: >> >>> >>> + pool.logger.debug("jjs rec already in pool checkin, skip >> >>> >>> %r", >> >>> >>> self) >> >>> >>> + return >> >>> >> >> >>> >> as mentioned above it's not possible to reach this point without >> >>> >> concurrent access to the pooled connection, which means this fix >> >>> >> could >> >>> >> fail also (with at least threaded-style concurrency, not greenlet >> >>> >> style). The original source of the greenlet issue must still be >> >>> >> identified. >> >>> >> >> >>> >> I would need a self-contained proof of concept that demonstrates >> >>> >> this >> >>> >> issue occurring. The log output is not that helpful because it at >> >>> >> least would need to write out the identifier of the current >> >>> >> greenlet, >> >>> >> since it is very likely that greenlets have something to do with >> >>> >> this >> >>> >> issue. >> >>> >> >> >>> >> >> >>> >> >> >>> >>> >> >>> >>> if pool.dispatch.checkin: >> >>> >>> pool.dispatch.checkin(connection, self) >> >>> >>> pool._return_conn(self) >> >>> >>> >> >>> >>> Trace the ConnectionRecord 0x88fc5d0 >> >>> >>> >> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new connection >> >>> >>> <connection object at 0x81f69b0; closed: 0> >> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn success >> >>> >>> <connection >> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection <connection object >> >>> >>> at >> >>> >>> 0x81f69b0; closed: 0>, pool queue >> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord >> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x79fdad0>]) >> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout fairy >> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> pool queue >> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at 0x7a04810>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>]) >> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection <connection >> >>> >>> object at >> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool size: 10 >> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: 40 >> >>> >>> Current >> >>> >>> Checked >> >>> >>> out connections: 8, queue >> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord >> >>> >>> object >> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x79fdad0>]) >> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >> >>> >>> _ConnectionFairy._checkout >> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, rec.conn >> >>> >>> <connection object at 0x81f69b0; closed: 0> >> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor ERROR >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn >> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, err >> >>> >>> connection >> >>> >>> pointer is NULL >> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter newraise >> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter finally >> >>> >>>>>Connection._handle_dbapi_exception --> finally --> >> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate** >> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate --> >> >>> >>>>> ConnectionFairy.invalidate --> _ConnectionRecord.invalidate & >> >>> >>>>> **self._checkin()** >> >>> >>>>>self.invalidate() --> **del self._root.__connection** >> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs connectionFairy >> >>> >>> invalidate >> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor INFO >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:109 info] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate connection >> >>> >>> <connection >> >>> >>> object at 0x81f69b0; closed: 2> (reason: >> >>> >>> OperationalError:connection >> >>> >>> pointer >> >>> >>> is NULL >> >>> >>>>>_ConnectionRecord.invalidate(e) --> >> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, e) & >> >>> >>>>> self.__close() & >> >>> >>>>> self.connection = None >> >>> >>>>>self.__close() --> self.finalize_callback.clear() & >> >>> >>>>> self.__pool._close_connection(self.connection) >> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing connection >> >>> >>> <connection >> >>> >>> object at 0x81f69b0; closed: 2> >> >>> >>>>>Pool._close_connection(connection) --> >> >>> >>>>> self._dialect.do_close(connection) >> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) --> >> >>> >>>>> dbapi_connection.close() FIN >> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs >> >>> >>> _ConnectionFairy._checkin >> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x88fc5d0> >> >>> >>> rec.conn None >> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, conn_rec, >> >>> >>>>> pool, >> >>> >>>>> None, >> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None >> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs _finalize_fairy >> >>> >>> args >> >>> >>> conn >> >>> >>> None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord >> >>> >>> object at >> >>> >>> 0x88fc5d0>. ref None, fairy <sqlalchemy.pool._ConnectionFairy >> >>> >>> object >> >>> >>> at >> >>> >>> 0x329c0d0> >> >>> >>>>>_finalize_fairy --> connection_record.checkin() >> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >> >>> >>> connection_record.checkin >> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin >> >>> >>> None, >> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x7524590>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]), >> >>> >>> finalize_callback >> >>> >>> deque([]) >> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, >> >>> >>> pool >> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x7524590>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) >> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor ERROR >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:122 error] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception during reset >> >>> >>> or >> >>> >>> similar >> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >> >>> >>> connection_record.invalidate >> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to >> >>> >>> connection_record.checkin >> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn checkin >> >>> >>> None, >> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x7524590>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]), >> >>> >>> finalize_callback >> >>> >>> deque([]) >> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor DEBUG >> >>> >>> [pid:4528] >> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug] >> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn checkin, >> >>> >>> pool >> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at >> >>> >>> 0x7524590>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>, >> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]) >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04,[email protected]写道: >> >>> >>>> >> >>> >>>> I found why we could get the same connection twice from the >> >>> >>>> pool... >> >>> >>>> >> >>> >>>> Because we put it into pool twice. >> >>> >>>> >> >>> >>>> Any suggestion for finding why it put the same connection into >> >>> >>>> pool >> >>> >>>> within >> >>> >>>> different coroutines? >> >>> >>>> >> >>> >>>> >> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG >> >>> >>>> [pid:15755] >> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug] >> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn checkin >> >>> >>>> <connection >> >>> >>>> object at 0x111d9740; closed: 0> >> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor DEBUG >> >>> >>>> [pid:15755] >> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug] >> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn checkin >> >>> >>>> <connection >> >>> >>>> object at 0x111d9740; closed: 0> >> >>> >>>> ... >> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor DEBUG >> >>> >>>> [pid:15755] >> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug] >> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection <connection >> >>> >>>> object at >> >>> >>>> 0x111d9740; closed: 0> checked out from pool >> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor DEBUG >> >>> >>>> [pid:15755] >> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug] >> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection <connection >> >>> >>>> object at >> >>> >>>> 0x111d9740; closed: 0> checked out from pool >> >>> >>>> >> >>> >>>> >> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52,[email protected]写道: >> >>> >>>>> >> >>> >>>>> I added logs in sqlalchemy/pool.py >> >>> >>>>> >> >>> >>>>> QueuePool.checkout >> >>> >>>>> >> >>> >>>>> @classmethod >> >>> >>>>> def checkout(cls, pool): >> >>> >>>>> rec = pool._do_get() >> >>> >>>>> try: >> >>> >>>>> dbapi_connection = rec.get_connection() >> >>> >>>>> pool.logger.debug("jjs get conn success %s", >> >>> >>>>> dbapi_connection) >> >>> >>>>> except: >> >>> >>>>> pool.logger.debug("jjs get error in get_connection") >> >>> >>>>> with util.safe_reraise(): >> >>> >>>>> rec.checkin() >> >>> >>>>> echo = pool._should_log_debug() >> >>> >>>>> fairy = _ConnectionFairy(dbapi_connection, rec, echo) >> >>> >>>>> rec.fairy_ref = weakref.ref( >> >>> >>>>> fairy, >> >>> >>>>> lambda ref: _finalize_fairy and >> >>> >>>>> _finalize_fairy( >> >>> >>>>> dbapi_connection, >> >>> >>>>> rec, pool, ref, echo) >> >>> >>>>> ) >> >>> >>>>> _refs.add(rec) >> >>> >>>>> if echo: >> >>> >>>>> pool.logger.debug("jjs conn %s, pool status %s", >> >>> >>>>> dbapi_connection, pool.status()) >> >>> >>>>> pool.logger.debug("Connection %r checked out from pool", >> >>> >>>>> dbapi_connection) >> >>> >>>>> return fairy >> >>> >>>>> >> >>> >>>>> Then I got logs below: >> >>> >>>>> >> >>> >>>>> coroutine A: >> >>> >>>>> >> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info] >> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is >> >>> >>>>> False, >> >>> >>>>> timeout >> >>> >>>>> 30 >> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new >> >>> >>>>> connection >> >>> >>>>> <connection object at 0x903f8c0; closed: 0> >> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success >> >>> >>>>> <connection >> >>> >>>>> object at 0x903f8c0; closed: 0> >> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection >> >>> >>>>> object at >> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections >> >>> >>>>> in >> >>> >>>>> pool: 1 >> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out >> >>> >>>>> connections: 45 >> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug] >> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection >> >>> >>>>> <connection >> >>> >>>>> object at >> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool >> >>> >>>>> >> >>> >>>>> coroutine B: >> >>> >>>>> >> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info] >> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is >> >>> >>>>> False, >> >>> >>>>> timeout >> >>> >>>>> 30 >> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success >> >>> >>>>> <connection >> >>> >>>>> object at 0x903f8c0; closed: 0> >> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection >> >>> >>>>> object at >> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in >> >>> >>>>> pool: 0 >> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out >> >>> >>>>> connections: 46 >> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG >> >>> >>>>> [pid:55528] >> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug] >> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection >> >>> >>>>> <connection >> >>> >>>>> object at >> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> It seems that when we create_connection in QueuePool._do_get, we >> >>> >>>>> may >> >>> >>>>> not >> >>> >>>>> get the lock _pool.mutex(~Queue.mutex). >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道: >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <[email protected]> wrote: >> >>> >>>>>>> >> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two >> >>> >>>>>>> coroutine >> >>> >>>>>>> use the >> >>> >>>>>>> same connection before the pthread_mutex_lock released. >> >>> >>>>>>> >> >>> >>>>>>> Maybe something in the connection pool goes wrong? >> >>> >>>>>>> [pid] [coroutine_id] msg >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> This seems unlikely. Openstack uses the SQLAlchemy connection >> >>> >>>>>> pool >> >>> >>>>>> with eventlet patching in conjunction with the MySQL drivers >> >>> >>>>>> and >> >>> >>>>>> nothing >> >>> >>>>>> like that happens. The biggest issue SQLAlchemy had with >> >>> >>>>>> greenlets is that >> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause a dirty >> >>> >>>>>> connection >> >>> >>>>>> to go back into the pool, that was fixed in 1.2. >> >>> >>>>>> >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> cb 0x23ad320 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE >> >>> >>>>>>> 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> async_cursor 0x881ac00 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> async_cursor 0x881ac00 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, >> >>> >>>>>>> autocommit >> >>> >>>>>>> = 0, >> >>> >>>>>>> status = 2 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 >> >>> >>>>>>> __lock 1 >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> Below is the logs grep by coroutine id: >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]finish send query, before psyco_wait, conn >> >>> >>>>>>> 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> cb 0x23ad320 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE >> >>> >>>>>>> 0x94122f0 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: poll writing >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ >> >>> >>>>>>> >> >>> >>>>>>> ----- >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for >> >>> >>>>>>> connection >> >>> >>>>>>> at 0x8de2d30 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: init cursor object at >> >>> >>>>>>> 0xa6c2650 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null), >> >>> >>>>>>> conn = >> >>> >>>>>>> 0x8de2d30 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: good cursor object at >> >>> >>>>>>> 0xa6c2650, >> >>> >>>>>>> refcnt = 1 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650: >> >>> >>>>>>> refcnt = >> >>> >>>>>>> 1 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> async_cursor 0x881ac00 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn >> >>> >>>>>>> 0x94122f0, >> >>> >>>>>>> async_cursor 0x881ac00 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, >> >>> >>>>>>> autocommit >> >>> >>>>>>> = 0, >> >>> >>>>>>> status = 2 >> >>> >>>>>>> >> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 >> >>> >>>>>>> __lock 1 >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> 在 2018年4月28日星期六 UTC+8下午4:07:34,[email protected]写道: >> >>> >>>>>>>> >> >>> >>>>>>>> We reproduced this problem and added logs in psycopg2, found >> >>> >>>>>>>> a >> >>> >>>>>>>> confusing thing and reported it to psycopg2. >> >>> >>>>>>>> >> >>> >>>>>>>> It seems sqlalchemy & psycopg2 stack flow is: >> >>> >>>>>>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit >> >>> >>>>>>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn) >> >>> >>>>>>>> 3.get exception in sqlalchemy, and do_rollback >> >>> >>>>>>>> 4.from sqlalchemy to psycopg: pq_abort >> >>> >>>>>>>> 5.psycopg get pthread_mutex_lock deadlock >> >>> >>>>>>>> >> >>> >>>>>>>> What we have done to reproduce it is: stop master pg-server >> >>> >>>>>>>> and >> >>> >>>>>>>> promote the slave pg-server to master, with moving FIP from >> >>> >>>>>>>> old >> >>> >>>>>>>> master to >> >>> >>>>>>>> slave. At the same time, let nova-conductor do quite a lot db >> >>> >>>>>>>> query >> >>> >>>>>>>> requests. >> >>> >>>>>>>> >> >>> >>>>>>>> [1] https://github.com/psycopg/psycopg2/issues/703 >> >>> >>>>>>>> >> >>> >>>>>>>> 在 2018年4月23日星期一 UTC+8下午9:45:04,Mike Bayer写道: >> >>> >>>>>>>>> >> >>> >>>>>>>>> On Mon, Apr 23, 2018 at 9:03 AM, <[email protected]> >> >>> >>>>>>>>> wrote: >> >>> >>>>>>>>> > Sorry for reply on this topic. >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > We recently get the same problem in our production >> >>> >>>>>>>>> > environment. >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > I found a patch in other lib [1], and they added >> >>> >>>>>>>>> > conn.close() >> >>> >>>>>>>>> > in >> >>> >>>>>>>>> > exception >> >>> >>>>>>>>> > psycopg2.ProgrammingError. >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > Shall we do the same in [2] ? >> >>> >>>>>>>>> >> >>> >>>>>>>>> SQLAlchemy does things much more carefully than that, we >> >>> >>>>>>>>> parse >> >>> >>>>>>>>> the >> >>> >>>>>>>>> error message for specific ones that correspond to >> >>> >>>>>>>>> "connection >> >>> >>>>>>>>> is no >> >>> >>>>>>>>> longer usable", we call these "is_disconnect" but it can be >> >>> >>>>>>>>> any >> >>> >>>>>>>>> invalidating condition. >> >>> >>>>>>>>> >> >>> >>>>>>>>> You can make these yourself, and they can also be made to be >> >>> >>>>>>>>> part of >> >>> >>>>>>>>> oslo.db, using the handle_error event: >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error >> >>> >>>>>>>>> >> >>> >>>>>>>>> within oslo.db you would want to propose a change here: >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387 >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > [1] >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > https://github.com/aio-libs/aiopg/pull/415/files?diff=split >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > [2] >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289 >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31,JinRong Cai写道: >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> Hi Michael , >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> I am using openstack with postgresql which sqlalchemy and >> >>> >>>>>>>>> >> oslo_db >> >>> >>>>>>>>> >> module >> >>> >>>>>>>>> >> were used. >> >>> >>>>>>>>> >> And there are some problems after my pg database switched >> >>> >>>>>>>>> >> over. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> Here is my switch over process: >> >>> >>>>>>>>> >> 1. nova-conductor(python application) is running with DB >> >>> >>>>>>>>> >> connection >> >>> >>>>>>>>> >> strings point to vip , which is in primary site(A) of pg. >> >>> >>>>>>>>> >> 2. switch VIP from primary(A) to new primary(B) >> >>> >>>>>>>>> >> 3. switch over pg: shutdown primary(A), promopt >> >>> >>>>>>>>> >> standby(B) to >> >>> >>>>>>>>> >> new >> >>> >>>>>>>>> >> primary. >> >>> >>>>>>>>> >> 4. nova-conductor is running in the whole process. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> After some seconds, I found some nova-conductor processes >> >>> >>>>>>>>> >> are >> >>> >>>>>>>>> >> hang >> >>> >>>>>>>>> >> with >> >>> >>>>>>>>> >> status futex_wait_queue_me, and the status of the query >> >>> >>>>>>>>> >> in DB >> >>> >>>>>>>>> >> is >> >>> >>>>>>>>> >> "idle in >> >>> >>>>>>>>> >> transaction", the transaction was not commited or >> >>> >>>>>>>>> >> rollbacked! >> >>> >>>>>>>>> >> I think disconnection was handled in the oslo_db, which >> >>> >>>>>>>>> >> will >> >>> >>>>>>>>> >> send >> >>> >>>>>>>>> >> a >> >>> >>>>>>>>> >> ping(select 1) to DB. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> If DB was switchd over, the connection in the pool would >> >>> >>>>>>>>> >> be >> >>> >>>>>>>>> >> set >> >>> >>>>>>>>> >> with >> >>> >>>>>>>>> >> status invalid, and reconnect after next check out. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> ###error messages from nova-conductor >> >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] >> >>> >>>>>>>>> >> [tid:122397712] >> >>> >>>>>>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError] >> >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError >> >>> >>>>>>>>> >> exception >> >>> >>>>>>>>> >> wrapped from >> >>> >>>>>>>>> >> (psycopg2.ProgrammingError) execute cannot be used while >> >>> >>>>>>>>> >> an >> >>> >>>>>>>>> >> asynchronous >> >>> >>>>>>>>> >> query is underway [SQL: 'SELECT 1'] >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback >> >>> >>>>>>>>> >> (most >> >>> >>>>>>>>> >> recent >> >>> >>>>>>>>> >> call >> >>> >>>>>>>>> >> last): >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File >> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", >> >>> >>>>>>>>> >> line >> >>> >>>>>>>>> >> 1139, >> >>> >>>>>>>>> >> in >> >>> >>>>>>>>> >> _execute_context >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters context) >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File >> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py", >> >>> >>>>>>>>> >> line >> >>> >>>>>>>>> >> 450, >> >>> >>>>>>>>> >> in >> >>> >>>>>>>>> >> do_execute >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters >> >>> >>>>>>>>> >> cursor.execute(statement, >> >>> >>>>>>>>> >> parameters) >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters >> >>> >>>>>>>>> >> ProgrammingError: >> >>> >>>>>>>>> >> execute >> >>> >>>>>>>>> >> cannot be used while an asynchronous query is underway >> >>> >>>>>>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters >> >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] >> >>> >>>>>>>>> >> [tid:122397712] >> >>> >>>>>>>>> >> [log.py:122 error] >> >>> >>>>>>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] >> >>> >>>>>>>>> >> Error closing >> >>> >>>>>>>>> >> cursor >> >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool Traceback (most >> >>> >>>>>>>>> >> recent >> >>> >>>>>>>>> >> call last): >> >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool File >> >>> >>>>>>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", >> >>> >>>>>>>>> >> line >> >>> >>>>>>>>> >> 1226, >> >>> >>>>>>>>> >> in >> >>> >>>>>>>>> >> _safe_close_cursor >> >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool >> >>> >>>>>>>>> >> cursor.close() >> >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: >> >>> >>>>>>>>> >> close >> >>> >>>>>>>>> >> cannot be >> >>> >>>>>>>>> >> used while an asynchronous query is underway >> >>> >>>>>>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> ###ps status of nova-conductor >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log >> >>> >>>>>>>>> >> # cat >> >>> >>>>>>>>> >> /proc/33316/stack >> >>> >>>>>>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120 >> >>> >>>>>>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280 >> >>> >>>>>>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0 >> >>> >>>>>>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180 >> >>> >>>>>>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b >> >>> >>>>>>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> ### stack of the nova-conductor process >> >>> >>>>>>>>> >> POD6-Mongodb03:/tmp # pstack 33316 >> >>> >>>>>>>>> >> #0 0x00002b8449e35f4d in __lll_lock_wait () from >> >>> >>>>>>>>> >> /lib64/libpthread.so.0 >> >>> >>>>>>>>> >> #1 0x00002b8449e31d02 in _L_lock_791 () from >> >>> >>>>>>>>> >> /lib64/libpthread.so.0 >> >>> >>>>>>>>> >> #2 0x00002b8449e31c08 in pthread_mutex_lock () from >> >>> >>>>>>>>> >> /lib64/libpthread.so.0 >> >>> >>>>>>>>> >> #3 0x00002b84554c44ab in pq_abort () from >> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so >> >>> >>>>>>>>> >> #4 0x00002b84554c955e in psyco_conn_rollback () from >> >>> >>>>>>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so >> >>> >>>>>>>>> >> #5 0x00002b8449b42b50 in PyEval_EvalFrameEx () from >> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 >> >>> >>>>>>>>> >> #6 0x00002b8449b42ad0 in PyEval_EvalFrameEx () from >> >>> >>>>>>>>> >> /lib64/libpython2.7.so.1.0 >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> The psycopg2 was trying to close the cursor, and try to >> >>> >>>>>>>>> >> get >> >>> >>>>>>>>> >> the >> >>> >>>>>>>>> >> mutex lock >> >>> >>>>>>>>> >> "pthread_mutex_lock", but it seems that the cursor was >> >>> >>>>>>>>> >> used >> >>> >>>>>>>>> >> by >> >>> >>>>>>>>> >> other >> >>> >>>>>>>>> >> session. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> Questions: >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> 1. What the error "ProgrammingError: close cannot be used >> >>> >>>>>>>>> >> while an >> >>> >>>>>>>>> >> asynchronous query is underway" mean? >> >>> >>>>>>>>> >> AFAIK, these caused by psycopg2, which means a >> >>> >>>>>>>>> >> asynchronous >> >>> >>>>>>>>> >> query >> >>> >>>>>>>>> >> was >> >>> >>>>>>>>> >> executed in one connection. >> >>> >>>>>>>>> >> But the I think the sqlalchemy was thread safe since it >> >>> >>>>>>>>> >> was >> >>> >>>>>>>>> >> patched by >> >>> >>>>>>>>> >> eventlet, see details in >> >>> >>>>>>>>> >> eventlet/support/psycopg2_patcher.py >> >>> >>>>>>>>> >> And we can see different green thread number in the log, >> >>> >>>>>>>>> >> as: >> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397712] >> >>> >>>>>>>>> >> [pid:36365] [MainThread] [tid:122397815] >> >>> >>>>>>>>> >> So, I guess the connection pool in one process is safe. >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> 2. The nova-conductor was a multi-thread python client, >> >>> >>>>>>>>> >> which >> >>> >>>>>>>>> >> forked >> >>> >>>>>>>>> >> several child process. >> >>> >>>>>>>>> >> ps -elf|grep -i nova-conductor >> >>> >>>>>>>>> >> 30878 1 pool_s /usr/bin/nova-conductor >> >>> >>>>>>>>> >> 36364 1 ep_pol /usr/bin/nova-conductor >> >>> >>>>>>>>> >> 36365 1 futex_ /usr/bin/nova-conductor >> >>> >>>>>>>>> >> 36366 1 ep_pol /usr/bin/nova-conductor >> >>> >>>>>>>>> >> 36367 1 ep_pol /usr/bin/nova-conductor >> >>> >>>>>>>>> >> 36368 1 ep_pol /usr/bin/nova-conductor >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> If the nova-conductor was started with only one child, >> >>> >>>>>>>>> >> the >> >>> >>>>>>>>> >> problem >> >>> >>>>>>>>> >> was not >> >>> >>>>>>>>> >> happen. >> >>> >>>>>>>>> >> Does this mean the connection/engine CAN NOT shared in >> >>> >>>>>>>>> >> these >> >>> >>>>>>>>> >> child >> >>> >>>>>>>>> >> processes? >> >>> >>>>>>>>> >> >> >>> >>>>>>>>> >> Thanks. >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > -- >> >>> >>>>>>>>> > SQLAlchemy - >> >>> >>>>>>>>> > The Python SQL Toolkit and Object Relational Mapper >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > http://www.sqlalchemy.org/ >> >>> >>>>>>>>> > >> >>> >>>>>>>>> > To post example code, please provide an MCVE: Minimal, >> >>> >>>>>>>>> > Complete, >> >>> >>>>>>>>> > and >> >>> >>>>>>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve >> >>> >>>>>>>>> > for >> >>> >>>>>>>>> > a >> >>> >>>>>>>>> > full >> >>> >>>>>>>>> > description. >> >>> >>>>>>>>> > --- >> >>> >>>>>>>>> > You received this message because you are subscribed to >> >>> >>>>>>>>> > the >> >>> >>>>>>>>> > Google >> >>> >>>>>>>>> > Groups >> >>> >>>>>>>>> > "sqlalchemy" group. >> >>> >>>>>>>>> > To unsubscribe from this group and stop receiving emails >> >>> >>>>>>>>> > from >> >>> >>>>>>>>> > it, >> >>> >>>>>>>>> > send an >> >>> >>>>>>>>> > email to [email protected]. >> >>> >>>>>>>>> > To post to this group, send email to >> >>> >>>>>>>>> > [email protected]. >> >>> >>>>>>>>> > Visit this group at >> >>> >>>>>>>>> > https://groups.google.com/group/sqlalchemy. >> >>> >>>>>>>>> > For more options, visit >> >>> >>>>>>>>> > https://groups.google.com/d/optout. >> >>> >>>>>>> >> >>> >>>>>>> -- >> >>> >>>>>>> SQLAlchemy - >> >>> >>>>>>> The Python SQL Toolkit and Object Relational Mapper >> >>> >>>>>>> >> >>> >>>>>>> http://www.sqlalchemy.org/ >> >>> >>>>>>> >> >>> >>>>>>> To post example code, please provide an MCVE: Minimal, >> >>> >>>>>>> Complete, >> >>> >>>>>>> and >> >>> >>>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for >> >>> >>>>>>> a >> >>> >>>>>>> full >> >>> >>>>>>> description. >> >>> >>>>>>> --- >> >>> >>>>>>> You received this message because you are subscribed to the >> >>> >>>>>>> Google >> >>> >>>>>>> Groups "sqlalchemy" group. >> >>> >>>>>>> To unsubscribe from this group and stop receiving emails from >> >>> >>>>>>> it, >> >>> >>>>>>> send >> >>> >>>>>>> an email to [email protected]. >> >>> >>>>>>> To post to this group, send email to >> >>> >>>>>>> [email protected]. >> >>> >>>>>>> Visit this group at >> >>> >>>>>>> https://groups.google.com/group/sqlalchemy. >> >>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >> >>> >>> >> >>> >>> -- >> >>> >>> SQLAlchemy - >> >>> >>> The Python SQL Toolkit and Object Relational Mapper >> >>> >>> >> >>> >>> http://www.sqlalchemy.org/ >> >>> >>> >> >>> >>> To post example code, please provide an MCVE: Minimal, Complete, >> >>> >>> and >> >>> >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a >> >>> >>> full >> >>> >>> description. >> >>> >>> --- >> >>> >>> You received this message because you are subscribed to the Google >> >>> >>> Groups >> >>> >>> "sqlalchemy" group. >> >>> >>> To unsubscribe from this group and stop receiving emails from it, >> >>> >>> send >> >>> >>> an >> >>> >>> email to [email protected]. >> >>> >>> To post to this group, send email to [email protected]. >> >>> >>> Visit this group at https://groups.google.com/group/sqlalchemy. >> >>> >>> For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> >> SQLAlchemy - >> >> The Python SQL Toolkit and Object Relational Mapper >> >> >> >> http://www.sqlalchemy.org/ >> >> >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> >> description. >> >> --- >> >> You received this message because you are subscribed to the Google >> >> Groups >> >> "sqlalchemy" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> >> an >> >> email to [email protected]. >> >> To post to this group, send email to [email protected]. >> >> Visit this group at https://groups.google.com/group/sqlalchemy. >> >> For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
