[sqlalchemy] long transaction after database switched over

JinRong Cai Sun, 12 Nov 2017 18:44:42 -0800


Hi  Michael ,


I am using openstack with postgresql which sqlalchemy and oslo_db module 
were used.
And there are some problems after my pg database switched over.
Here is my switch over process:
1. nova-conductor(python application) is running with DB connection strings 
point to vip , which is in primary site(A) of pg.
2. switch VIP from primary(A) to new primary(B)
3. switch over pg: shutdown primary(A), promopt standby(B) to new primary.
4. nova-conductor is running in the whole process.

After some seconds, I found some nova-conductor processes are hang with 
status futex_wait_queue_me, and the status of the query in DB is "idle in 
transaction", the transaction was not commited or rollbacked!
I think disconnection was handled in the oslo_db, which will send a 
ping(select 1) to DB.

If DB was switchd over, the connection in the pool would be set with status 
invalid, and reconnect after next check out.

###error messages from nova-conductor
localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712] 
[exc_filters.py:330 _raise_for_remaining_DBAPIError] 
[req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError exception wrapped 
from (psycopg2.ProgrammingError) execute cannot be used while an 
asynchronous query is underway [SQL: 'SELECT 1']
 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call 
last):
  36365 ERROR oslo_db.sqlalchemy.exc_filters   File 
"/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in 
_execute_context
  36365 ERROR oslo_db.sqlalchemy.exc_filters     context)
  36365 ERROR oslo_db.sqlalchemy.exc_filters   File 
"/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in 
do_execute
  36365 ERROR oslo_db.sqlalchemy.exc_filters     cursor.execute(statement, 
parameters)
  36365 ERROR oslo_db.sqlalchemy.exc_filters ProgrammingError: execute 
cannot be used while an asynchronous query is underway
  36365 ERROR oslo_db.sqlalchemy.exc_filters
 localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712] 
[log.py:122 error] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] Error closing 
cursor
  36365 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call last):
  36365 ERROR sqlalchemy.pool.QueuePool   File 
"/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226, in 
_safe_close_cursor
  36365 ERROR sqlalchemy.pool.QueuePool     cursor.close()
  36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: close cannot be 
used while an asynchronous query is underway
  36365 ERROR sqlalchemy.pool.QueuePool

###ps status of nova-conductor
POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log # cat 
/proc/33316/stack 
[<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120
[<ffffffff810e5799>] futex_wait+0x179/0x280
[<ffffffff810e782e>] do_futex+0xfe/0x5b0
[<ffffffff810e7d60>] SyS_futex+0x80/0x180
[<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

### stack of the nova-conductor process
POD6-Mongodb03:/tmp # pstack 33316
#0  0x00002b8449e35f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b8449e31d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00002b8449e31c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002b84554c44ab in pq_abort () from 
/python2.7/site-packages/psycopg2/_psycopg.so
#4  0x00002b84554c955e in psyco_conn_rollback () from 
/python2.7/site-packages/psycopg2/_psycopg.so
#5  0x00002b8449b42b50 in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0
#6  0x00002b8449b42ad0 in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0
The psycopg2 was trying to close the cursor, and try to get the mutex lock 
"pthread_mutex_lock", but it seems that the cursor was used by other 
session.


Questions:

1. What the error "ProgrammingError: close cannot be used while an 
asynchronous query is underway" mean?
AFAIK, these caused by psycopg2, which means a asynchronous query was 
executed in one connection.
But the I think the sqlalchemy was thread safe since it was patched by 
eventlet, see details in  eventlet/support/psycopg2_patcher.py
And we can see different green thread number in the log, as:
[pid:36365] [MainThread] [tid:122397712]
[pid:36365] [MainThread] [tid:122397815]
So, I guess the connection pool in one process is safe.

2. The nova-conductor was a multi-thread python client, which forked 
several child process.
ps -elf|grep -i nova-conductor
30878  1 pool_s /usr/bin/nova-conductor
36364  1 ep_pol /usr/bin/nova-conductor
36365  1 futex_ /usr/bin/nova-conductor
36366  1 ep_pol /usr/bin/nova-conductor
36367  1 ep_pol /usr/bin/nova-conductor
36368  1 ep_pol /usr/bin/nova-conductor

If the nova-conductor was started with only one child, the problem was not 
happen.
Does this mean the connection/engine CAN NOT shared in these child 
processes? 

Thanks.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] long transaction after database switched over

Reply via email to