I added logs in sqlalchemy/pool.py
QueuePool.checkout
@classmethod
def checkout(cls, pool):
rec = pool._do_get()
try:
dbapi_connection = rec.get_connection()
pool.logger.debug("jjs get conn success %s", dbapi_connection)
except:
pool.logger.debug("jjs get error in get_connection")
with util.safe_reraise():
rec.checkin()
echo = pool._should_log_debug()
fairy = _ConnectionFairy(dbapi_connection, rec, echo)
rec.fairy_ref = weakref.ref(
fairy,
lambda ref: _finalize_fairy and
_finalize_fairy(
dbapi_connection,
rec, pool, ref, echo)
)
_refs.add(rec)
if echo:
pool.logger.debug("jjs conn %s, pool status %s", dbapi_connection,
pool.status())
pool.logger.debug("Connection %r checked out from pool",
dbapi_connection)
return fairy
Then I got logs below:
coroutine A:
2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO [pid:55528]
[MainThread] [tid:197123760] [log.py:109 info]
[req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is False, timeout
30
2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:197123760] [log.py:104 debug]
[req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new connection
<connection object at 0x903f8c0; closed: 0>
2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:197123760] [log.py:104 debug]
[req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success <connection
object at 0x903f8c0; closed: 0>
2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:197123760] [log.py:104 debug]
[req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection object at
0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in pool: 1
Current Overflow: 36 Max Overflow: 40 Current Checked out connections: 45
2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:197123760] [log.py:104 debug]
[req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection <connection object at
0x903f8c0; closed: 0> checked out from pool
coroutine B:
2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO [pid:55528]
[MainThread] [tid:151519312] [log.py:109 info]
[req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is False, timeout
30
2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:151519312] [log.py:104 debug]
[req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success <connection
object at 0x903f8c0; closed: 0>
2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:151519312] [log.py:104 debug]
[req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection object at
0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in pool: 0
Current Overflow: 36 Max Overflow: 40 Current Checked out connections: 46
2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG [pid:55528]
[MainThread] [tid:151519312] [log.py:104 debug]
[req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection <connection object at
0x903f8c0; closed: 0> checked out from pool
It seems that when we create_connection in QueuePool._do_get, we may not
get the lock _pool.mutex(~Queue.mutex).
在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道:
>
>
>
> On Mon, May 7, 2018, 7:01 AM <[email protected] <javascript:>> wrote:
>
>> We added coroutine_id in psycopg2, and found that two coroutine use the
>> same connection before the pthread_mutex_lock released.
>>
>> Maybe something in the connection pool goes wrong?
>> [pid] [coroutine_id] msg
>>
>
>
> This seems unlikely. Openstack uses the SQLAlchemy connection pool with
> eventlet patching in conjunction with the MySQL drivers and nothing like
> that happens. The biggest issue SQLAlchemy had with greenlets is that a
> timed out greenlet would throw GreenletExit and cause a dirty connection to
> go back into the pool, that was fixed in 1.2.
>
>
>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 0x94122f0, cb
>> 0x23ad320
>>
>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>>
>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 0x94122f0
>>
>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0,
>> async_cursor 0x881ac00
>>
>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0,
>> async_cursor 0x881ac00
>>
>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, autocommit = 0,
>> status = 2
>>
>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 __lock 1
>>
>> Below is the logs grep by coroutine id:
>>
>> [49174] [0xa5db730]finish send query, before psyco_wait, conn 0x94122f0
>>
>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0
>>
>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 0x94122f0, cb
>> 0x23ad320
>>
>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>>
>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 0x94122f0
>>
>> [49174] [0xa5db730]conn_poll: poll writing
>>
>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ
>>
>> -----
>>
>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for connection
>> at 0x8de2d30
>>
>> [49174] [0xa5db4b0]cursor_setup: init cursor object at 0xa6c2650
>>
>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null), conn =
>> 0x8de2d30
>>
>> [49174] [0xa5db4b0]cursor_setup: good cursor object at 0xa6c2650, refcnt
>> = 1
>>
>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650: refcnt = 1
>>
>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0,
>> async_cursor 0x881ac00
>>
>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0,
>> async_cursor 0x881ac00
>>
>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, autocommit = 0,
>> status = 2
>>
>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 __lock 1
>>
>> 在 2018年4月28日星期六 UTC+8下午4:07:34,[email protected]写道:
>>>
>>> We reproduced this problem and added logs in psycopg2, found a confusing
>>> thing and reported it to psycopg2.
>>>
>>> It seems sqlalchemy & psycopg2 stack flow is:
>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit
>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn)
>>> 3.get exception in sqlalchemy, and do_rollback
>>> 4.from sqlalchemy to psycopg: pq_abort
>>> 5.psycopg get pthread_mutex_lock deadlock
>>>
>>> What we have done to reproduce it is: stop master pg-server and promote
>>> the slave pg-server to master, with moving FIP from old master to slave. At
>>> the same time, let nova-conductor do quite a lot db query requests.
>>>
>>> [1] https://github.com/psycopg/psycopg2/issues/703
>>>
>>> 在 2018年4月23日星期一 UTC+8下午9:45:04,Mike Bayer写道:
>>>>
>>>> On Mon, Apr 23, 2018 at 9:03 AM, <[email protected]> wrote:
>>>> > Sorry for reply on this topic.
>>>> >
>>>> > We recently get the same problem in our production environment.
>>>> >
>>>> > I found a patch in other lib [1], and they added conn.close() in
>>>> exception
>>>> > psycopg2.ProgrammingError.
>>>> >
>>>> > Shall we do the same in [2] ?
>>>>
>>>> SQLAlchemy does things much more carefully than that, we parse the
>>>> error message for specific ones that correspond to "connection is no
>>>> longer usable", we call these "is_disconnect" but it can be any
>>>> invalidating condition.
>>>>
>>>> You can make these yourself, and they can also be made to be part of
>>>> oslo.db, using the handle_error event:
>>>>
>>>>
>>>> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error
>>>>
>>>>
>>>> within oslo.db you would want to propose a change here:
>>>>
>>>>
>>>> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> >
>>>> > [1] https://github.com/aio-libs/aiopg/pull/415/files?diff=split
>>>> >
>>>> > [2]
>>>> >
>>>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289
>>>>
>>>> >
>>>> >
>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31,JinRong Cai写道:
>>>> >>
>>>> >> Hi Michael ,
>>>> >>
>>>> >> I am using openstack with postgresql which sqlalchemy and oslo_db
>>>> module
>>>> >> were used.
>>>> >> And there are some problems after my pg database switched over.
>>>> >>
>>>> >> Here is my switch over process:
>>>> >> 1. nova-conductor(python application) is running with DB connection
>>>> >> strings point to vip , which is in primary site(A) of pg.
>>>> >> 2. switch VIP from primary(A) to new primary(B)
>>>> >> 3. switch over pg: shutdown primary(A), promopt standby(B) to new
>>>> primary.
>>>> >> 4. nova-conductor is running in the whole process.
>>>> >>
>>>> >> After some seconds, I found some nova-conductor processes are hang
>>>> with
>>>> >> status futex_wait_queue_me, and the status of the query in DB is
>>>> "idle in
>>>> >> transaction", the transaction was not commited or rollbacked!
>>>> >> I think disconnection was handled in the oslo_db, which will send a
>>>> >> ping(select 1) to DB.
>>>> >>
>>>> >> If DB was switchd over, the connection in the pool would be set with
>>>> >> status invalid, and reconnect after next check out.
>>>> >>
>>>> >> ###error messages from nova-conductor
>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread]
>>>> [tid:122397712]
>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError]
>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError exception
>>>> wrapped from
>>>> >> (psycopg2.ProgrammingError) execute cannot be used while an
>>>> asynchronous
>>>> >> query is underway [SQL: 'SELECT 1']
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent
>>>> call
>>>> >> last):
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File
>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in
>>>> >> _execute_context
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters context)
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File
>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py", line 450,
>>>> in
>>>> >> do_execute
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters
>>>> cursor.execute(statement,
>>>> >> parameters)
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters ProgrammingError:
>>>> execute
>>>> >> cannot be used while an asynchronous query is underway
>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters
>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread]
>>>> [tid:122397712]
>>>> >> [log.py:122 error] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] Error
>>>> closing
>>>> >> cursor
>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call
>>>> last):
>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool File
>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226, in
>>>> >> _safe_close_cursor
>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool cursor.close()
>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: close
>>>> cannot be
>>>> >> used while an asynchronous query is underway
>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool
>>>> >>
>>>> >> ###ps status of nova-conductor
>>>> >> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log #
>>>> cat
>>>> >> /proc/33316/stack
>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120
>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280
>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0
>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180
>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff
>>>> >>
>>>> >> ### stack of the nova-conductor process
>>>> >> POD6-Mongodb03:/tmp # pstack 33316
>>>> >> #0 0x00002b8449e35f4d in __lll_lock_wait () from
>>>> /lib64/libpthread.so.0
>>>> >> #1 0x00002b8449e31d02 in _L_lock_791 () from /lib64/libpthread.so.0
>>>> >> #2 0x00002b8449e31c08 in pthread_mutex_lock () from
>>>> >> /lib64/libpthread.so.0
>>>> >> #3 0x00002b84554c44ab in pq_abort () from
>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so
>>>> >> #4 0x00002b84554c955e in psyco_conn_rollback () from
>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so
>>>> >> #5 0x00002b8449b42b50 in PyEval_EvalFrameEx () from
>>>> >> /lib64/libpython2.7.so.1.0
>>>> >> #6 0x00002b8449b42ad0 in PyEval_EvalFrameEx () from
>>>> >> /lib64/libpython2.7.so.1.0
>>>> >>
>>>> >> The psycopg2 was trying to close the cursor, and try to get the
>>>> mutex lock
>>>> >> "pthread_mutex_lock", but it seems that the cursor was used by other
>>>> >> session.
>>>> >>
>>>> >>
>>>> >> Questions:
>>>> >>
>>>> >> 1. What the error "ProgrammingError: close cannot be used while an
>>>> >> asynchronous query is underway" mean?
>>>> >> AFAIK, these caused by psycopg2, which means a asynchronous query
>>>> was
>>>> >> executed in one connection.
>>>> >> But the I think the sqlalchemy was thread safe since it was patched
>>>> by
>>>> >> eventlet, see details in eventlet/support/psycopg2_patcher.py
>>>> >> And we can see different green thread number in the log, as:
>>>> >> [pid:36365] [MainThread] [tid:122397712]
>>>> >> [pid:36365] [MainThread] [tid:122397815]
>>>> >> So, I guess the connection pool in one process is safe.
>>>> >>
>>>> >> 2. The nova-conductor was a multi-thread python client, which forked
>>>> >> several child process.
>>>> >> ps -elf|grep -i nova-conductor
>>>> >> 30878 1 pool_s /usr/bin/nova-conductor
>>>> >> 36364 1 ep_pol /usr/bin/nova-conductor
>>>> >> 36365 1 futex_ /usr/bin/nova-conductor
>>>> >> 36366 1 ep_pol /usr/bin/nova-conductor
>>>> >> 36367 1 ep_pol /usr/bin/nova-conductor
>>>> >> 36368 1 ep_pol /usr/bin/nova-conductor
>>>> >>
>>>> >> If the nova-conductor was started with only one child, the problem
>>>> was not
>>>> >> happen.
>>>> >> Does this mean the connection/engine CAN NOT shared in these child
>>>> >> processes?
>>>> >>
>>>> >> Thanks.
>>>> >
>>>> > --
>>>> > SQLAlchemy -
>>>> > The Python SQL Toolkit and Object Relational Mapper
>>>> >
>>>> > http://www.sqlalchemy.org/
>>>> >
>>>> > To post example code, please provide an MCVE: Minimal, Complete, and
>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a
>>>> full
>>>> > description.
>>>> > ---
>>>> > You received this message because you are subscribed to the Google
>>>> Groups
>>>> > "sqlalchemy" group.
>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>> send an
>>>> > email to [email protected].
>>>> > To post to this group, send email to [email protected].
>>>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.