Hi Michael ,
I am using openstack with postgresql which sqlalchemy and oslo_db module were used. And there are some problems after my pg database switched over. Here is my switch over process: 1. nova-conductor(python application) is running with DB connection strings point to vip , which is in primary site(A) of pg. 2. switch VIP from primary(A) to new primary(B) 3. switch over pg: shutdown primary(A), promopt standby(B) to new primary. 4. nova-conductor is running in the whole process. After some seconds, I found some nova-conductor processes are hang with status futex_wait_queue_me, and the status of the query in DB is "idle in transaction", the transaction was not commited or rollbacked! I think disconnection was handled in the oslo_db, which will send a ping(select 1) to DB. If DB was switchd over, the connection in the pool would be set with status invalid, and reconnect after next check out. ###error messages from nova-conductor localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712] [exc_filters.py:330 _raise_for_remaining_DBAPIError] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError exception wrapped from (psycopg2.ProgrammingError) execute cannot be used while an asynchronous query is underway [SQL: 'SELECT 1'] 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last): 36365 ERROR oslo_db.sqlalchemy.exc_filters File "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context 36365 ERROR oslo_db.sqlalchemy.exc_filters context) 36365 ERROR oslo_db.sqlalchemy.exc_filters File "/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute 36365 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters) 36365 ERROR oslo_db.sqlalchemy.exc_filters ProgrammingError: execute cannot be used while an asynchronous query is underway 36365 ERROR oslo_db.sqlalchemy.exc_filters localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712] [log.py:122 error] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] Error closing cursor 36365 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call last): 36365 ERROR sqlalchemy.pool.QueuePool File "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226, in _safe_close_cursor 36365 ERROR sqlalchemy.pool.QueuePool cursor.close() 36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: close cannot be used while an asynchronous query is underway 36365 ERROR sqlalchemy.pool.QueuePool ###ps status of nova-conductor POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log # cat /proc/33316/stack [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120 [<ffffffff810e5799>] futex_wait+0x179/0x280 [<ffffffff810e782e>] do_futex+0xfe/0x5b0 [<ffffffff810e7d60>] SyS_futex+0x80/0x180 [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff ### stack of the nova-conductor process POD6-Mongodb03:/tmp # pstack 33316 #0 0x00002b8449e35f4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b8449e31d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00002b8449e31c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b84554c44ab in pq_abort () from /python2.7/site-packages/psycopg2/_psycopg.so #4 0x00002b84554c955e in psyco_conn_rollback () from /python2.7/site-packages/psycopg2/_psycopg.so #5 0x00002b8449b42b50 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #6 0x00002b8449b42ad0 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 The psycopg2 was trying to close the cursor, and try to get the mutex lock "pthread_mutex_lock", but it seems that the cursor was used by other session. Questions: 1. What the error "ProgrammingError: close cannot be used while an asynchronous query is underway" mean? AFAIK, these caused by psycopg2, which means a asynchronous query was executed in one connection. But the I think the sqlalchemy was thread safe since it was patched by eventlet, see details in eventlet/support/psycopg2_patcher.py And we can see different green thread number in the log, as: [pid:36365] [MainThread] [tid:122397712] [pid:36365] [MainThread] [tid:122397815] So, I guess the connection pool in one process is safe. 2. The nova-conductor was a multi-thread python client, which forked several child process. ps -elf|grep -i nova-conductor 30878 1 pool_s /usr/bin/nova-conductor 36364 1 ep_pol /usr/bin/nova-conductor 36365 1 futex_ /usr/bin/nova-conductor 36366 1 ep_pol /usr/bin/nova-conductor 36367 1 ep_pol /usr/bin/nova-conductor 36368 1 ep_pol /usr/bin/nova-conductor If the nova-conductor was started with only one child, the problem was not happen. Does this mean the connection/engine CAN NOT shared in these child processes? Thanks. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.