Re: [sqlalchemy] Re: long transaction after database switched over

Mike Bayer Mon, 23 Apr 2018 06:45:29 -0700

On Mon, Apr 23, 2018 at 9:03 AM,  <[email protected]> wrote:
> Sorry for reply on this topic.
>
> We recently get the same problem in our production environment.
>
> I found a patch in other lib [1], and they added conn.close() in exception
> psycopg2.ProgrammingError.
>
> Shall we do the same in [2] ?


SQLAlchemy does things much more carefully than that, we parse the
error message for specific ones that correspond to "connection is no
longer usable", we call these "is_disconnect" but it can be any
invalidating condition.

You can make these yourself, and they can also be made to be part of
oslo.db, using the handle_error event:

http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error

within oslo.db you would want to propose a change here:

https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387





>
> [1] https://github.com/aio-libs/aiopg/pull/415/files?diff=split
>
> [2]
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289
>
>
> 在 2017年11月13日星期一 UTC+8上午10:44:31，JinRong Cai写道：
>>
>> Hi  Michael ,
>>
>> I am using openstack with postgresql which sqlalchemy and oslo_db module
>> were used.
>> And there are some problems after my pg database switched over.
>>
>> Here is my switch over process:
>> 1. nova-conductor(python application) is running with DB connection
>> strings point to vip , which is in primary site(A) of pg.
>> 2. switch VIP from primary(A) to new primary(B)
>> 3. switch over pg: shutdown primary(A), promopt standby(B) to new primary.
>> 4. nova-conductor is running in the whole process.
>>
>> After some seconds, I found some nova-conductor processes are hang with
>> status futex_wait_queue_me, and the status of the query in DB is "idle in
>> transaction", the transaction was not commited or rollbacked!
>> I think disconnection was handled in the oslo_db, which will send a
>> ping(select 1) to DB.
>>
>> If DB was switchd over, the connection in the pool would be set with
>> status invalid, and reconnect after next check out.
>>
>> ###error messages from nova-conductor
>> localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712]
>> [exc_filters.py:330 _raise_for_remaining_DBAPIError]
>> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError exception wrapped from
>> (psycopg2.ProgrammingError) execute cannot be used while an asynchronous
>> query is underway [SQL: 'SELECT 1']
>>  36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call
>> last):
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File
>> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in
>> _execute_context
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters     context)
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters   File
>> "/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in
>> do_execute
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters     cursor.execute(statement,
>> parameters)
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters ProgrammingError: execute
>> cannot be used while an asynchronous query is underway
>>   36365 ERROR oslo_db.sqlalchemy.exc_filters
>>  localhost nova-conductor ERROR [pid:36365] [MainThread] [tid:122397712]
>> [log.py:122 error] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] Error closing
>> cursor
>>   36365 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call last):
>>   36365 ERROR sqlalchemy.pool.QueuePool   File
>> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226, in
>> _safe_close_cursor
>>   36365 ERROR sqlalchemy.pool.QueuePool     cursor.close()
>>   36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: close cannot be
>> used while an asynchronous query is underway
>>   36365 ERROR sqlalchemy.pool.QueuePool
>>
>> ###ps status of nova-conductor
>> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log # cat
>> /proc/33316/stack
>> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120
>> [<ffffffff810e5799>] futex_wait+0x179/0x280
>> [<ffffffff810e782e>] do_futex+0xfe/0x5b0
>> [<ffffffff810e7d60>] SyS_futex+0x80/0x180
>> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> ### stack of the nova-conductor process
>> POD6-Mongodb03:/tmp # pstack 33316
>> #0  0x00002b8449e35f4d in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00002b8449e31d02 in _L_lock_791 () from /lib64/libpthread.so.0
>> #2  0x00002b8449e31c08 in pthread_mutex_lock () from
>> /lib64/libpthread.so.0
>> #3  0x00002b84554c44ab in pq_abort () from
>> /python2.7/site-packages/psycopg2/_psycopg.so
>> #4  0x00002b84554c955e in psyco_conn_rollback () from
>> /python2.7/site-packages/psycopg2/_psycopg.so
>> #5  0x00002b8449b42b50 in PyEval_EvalFrameEx () from
>> /lib64/libpython2.7.so.1.0
>> #6  0x00002b8449b42ad0 in PyEval_EvalFrameEx () from
>> /lib64/libpython2.7.so.1.0
>>
>> The psycopg2 was trying to close the cursor, and try to get the mutex lock
>> "pthread_mutex_lock", but it seems that the cursor was used by other
>> session.
>>
>>
>> Questions:
>>
>> 1. What the error "ProgrammingError: close cannot be used while an
>> asynchronous query is underway" mean?
>> AFAIK, these caused by psycopg2, which means a asynchronous query was
>> executed in one connection.
>> But the I think the sqlalchemy was thread safe since it was patched by
>> eventlet, see details in  eventlet/support/psycopg2_patcher.py
>> And we can see different green thread number in the log, as:
>> [pid:36365] [MainThread] [tid:122397712]
>> [pid:36365] [MainThread] [tid:122397815]
>> So, I guess the connection pool in one process is safe.
>>
>> 2. The nova-conductor was a multi-thread python client, which forked
>> several child process.
>> ps -elf|grep -i nova-conductor
>> 30878  1 pool_s /usr/bin/nova-conductor
>> 36364  1 ep_pol /usr/bin/nova-conductor
>> 36365  1 futex_ /usr/bin/nova-conductor
>> 36366  1 ep_pol /usr/bin/nova-conductor
>> 36367  1 ep_pol /usr/bin/nova-conductor
>> 36368  1 ep_pol /usr/bin/nova-conductor
>>
>> If the nova-conductor was started with only one child, the problem was not
>> happen.
>> Does this mean the connection/engine CAN NOT shared in these child
>> processes?
>>
>> Thanks.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: long transaction after database switched over

Reply via email to