This patch worked.
https://gerrit.sqlalchemy.org/#/c/zzzeek/sqlalchemy/+/747/
But I didn't get the log "Double checkin attempted", maybe the real working
code is
if connection_record and connection_record.fairy_ref is not None:
connection_record.checkin()
在 2018年5月16日星期三
Thanks a lot.
I'll test this patch in our environment.
在 2018年5月16日星期三 UTC+8上午2:58:56,Mike Bayer写道:
>
> good news, I reproduced it.
>
> Even better news for you, vindication! nothing to do with greenlets,
> threads, concurrency, or anything (though it doesn't have to do with
> the "del" or
good news, I reproduced it.
Even better news for you, vindication! nothing to do with greenlets,
threads, concurrency, or anything (though it doesn't have to do with
the "del" or the weakref, this is much simpler ).
This has to do with a special rule in the pool when there is a
so-called "reset
can you add these lines to your logging, ensure that the check for
"already in pool" is present, and produce new logs?
try:
fairy = fairy or _ConnectionFairy(
connection, connection_record, echo)
assert fairy.connection is connection
On Tue, May 15, 2018, 3:26 AM wrote:
> We got the log "jjs _finalize_fairy args" just once but the code in
> function _finalize_fairy seem to be called twice. It also confused me.
>
It's because that code originated elsewhere and got blocked, then it picks
up again but
We got the log "jjs _finalize_fairy args" just once but the code in
function _finalize_fairy seem to be called twice. It also confused me.
BTW, I tried to add code[1] to avoid double checkin in _finalize_fairy but
failed.
That means the code in pool.py#L670-692 was called without L646-668.
[1]
s
On Mon, May 14, 2018 at 9:38 PM, wrote:
> The tid is coroutine_id, we get it by id(greenlet.getcurrent()) in each
> coroutine(green thread).
>
> We run nova-conductor with monkey_patch, the same as [1],
> eventlet.monkey_patch(os=False)
>
> Our nova-conductor is
The exc_info of tid 130446032 "Exception during reset or similar" is
sqlalchemy.pool.QueuePool Traceback (most recent call last):
sqlalchemy.pool.QueuePool File "sqlalchemy/pool.py", line 679, in
_finalize_fairy
sqlalchemy.pool.QueuePool fairy._reset(pool)
sqlalchemy.pool.QueuePool File
The tid is coroutine_id, we get it by id(greenlet.getcurrent()) in each
coroutine(green thread).
We run nova-conductor with monkey_patch, the same as [1],
eventlet.monkey_patch(os=False)
Our nova-conductor is running with multi workers.
It's confusing why the _finalize_fairy is called twice
I put the files we added logs here, include engine/base.py and pool.py
https://github.com/jiajunsu/test_logs
在 2018年5月14日星期一 UTC+8下午11:10:37,Mike Bayer写道:
>
> OK I need the exact pool.py you are using as well that includes all
> these logging statements and any other adjustments that were made.
I notice in this log file there are multiple "tid", which I assume to
be thread identifiers ? E.g. OS-level threads?Are you using
eventlet or not?Are you monkeypatching SQLAlchemy and then using
multiple OS-level threads at the same time? Because that likely
doesn't work.
Particularly,
OK I need the exact pool.py you are using as well that includes all
these logging statements and any other adjustments that were made.
On Mon, May 14, 2018 at 8:02 AM, wrote:
> I've uploaded the logs here:
>
I've uploaded the logs
here: https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log
The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032
At 2018-05-14T11:11:39.970, we got the log " jjs rec already in pool
checkin", but I just deleted `return` in code, so the
OK. I'll reproduce it and upload all logs of the process.
在 2018年5月14日星期一 UTC+8上午10:48:49,Mike Bayer写道:
>
> as well as the log that shows this:
>
> +if self in pool._pool.queue:
> +pool.logger.debug("jjs rec already in pool checkin, skip %r",
> self)
> +return
>
>
>
as well as the log that shows this:
+if self in pool._pool.queue:
+pool.logger.debug("jjs rec already in pool checkin, skip %r", self)
+return
actually happening.
On Sun, May 13, 2018 at 10:42 PM, Mike Bayer wrote:
> On Sun, May 13, 2018 at
On Sun, May 13, 2018 at 10:35 PM, wrote:
> "jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We work
> together and add logs in the same environment, add prefix of name to get a
> keyword for log filter.
>
> The doubled ConnectionRecords(with None
"jjs" is short for my name and "cjr" is my colleague's(JinRong Cai). We
work together and add logs in the same environment, add prefix of name to
get a keyword for log filter.
The doubled ConnectionRecords(with None connection) are got by two
different coroutines, and the variable
if that connection error is not handled in "is_disconnect" then this
wouldn't invalidate the pool.
However we've had lots of problems moving vips around with Galera. I
would recommend using haproxy or PGBouncer which are both designed to
mitigate this issue. As well as, run Nova under
The scenario we make to reproduce the problem is:
1.stop PG master
2.move the IP of PG from master node to slave node, and make an ARP
3.fail over the PG slave to master
That will shut all DB connections to PG master, and in SQLAlchemy we got an
exception:
ERROR sqlalchemy.pool.QueuePool
I tried to follow your log lines, which is difficult because I don't
know where those log statements actually are and I don't know what
acronyms like "jjs", "cjr", mean. But it does look like it's al in
one thread (so not async GC) and all in one Nova request (so perhaps
just one greenlet
Here's a print statement you might want to try:
diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py
index 89a4cea7c..452db4883 100644
--- a/lib/sqlalchemy/pool.py
+++ b/lib/sqlalchemy/pool.py
@@ -689,6 +689,19 @@ def _finalize_fairy(connection, connection_record,
been garbage
On Thu, May 10, 2018 at 7:23 AM, wrote:
> Hi Mike,
>
> I added more logs and finnaly find the reason and a method to avoid the
> deadlock between coroutines.
>
> In summary, the deadlock happend because two different coroutines use the
> same ConnectionRecord with same
Hi Mike,
I added more logs and finnaly find the reason and a method to avoid the
deadlock between coroutines.
In summary, the deadlock happend because two different coroutines use the
same ConnectionRecord with same connection.
It happends under QueuePool, when the DB connections are reset by
I found why we could get the same connection twice from the pool...
Because we put it into pool twice.
Any suggestion for finding why it put the same connection into pool within
different coroutines?
2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG [pid:15755]
[MainThread]
It's not a problem with lock. I tried to get the _pool.mutex in
create_connection but helpless.
Maybe we create the connection, put it into pool, but use it without get it
out?
在 2018年5月8日星期二 UTC+8下午10:18:52,jiajun...@gmail.com写道:
>
> I added logs in sqlalchemy/pool.py
>
> QueuePool.checkout
>
I added logs in sqlalchemy/pool.py
QueuePool.checkout
@classmethod
def checkout(cls, pool):
rec = pool._do_get()
try:
dbapi_connection = rec.get_connection()
pool.logger.debug("jjs get conn success %s", dbapi_connection)
except:
pool.logger.debug("jjs get
just to note, there's definitely a bug somewhere, and if the bug is
found there's a 90% chance it involves code that I wrote at some point
:) (though likely some assumption that the PG /async stuff does not
meet) but this is part of oslo / openstack. I work on oslo /
openstack, but that's my
On Mon, May 7, 2018, 7:01 AM wrote:
> We added coroutine_id in psycopg2, and found that two coroutine use the
> same connection before the pthread_mutex_lock released.
>
> Maybe something in the connection pool goes wrong?
> [pid] [coroutine_id] msg
>
This seems
We added coroutine_id in psycopg2, and found that two coroutine use the
same connection before the pthread_mutex_lock released.
Maybe something in the connection pool goes wrong?
[pid] [coroutine_id] msg
[49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 0x94122f0, cb
0x23ad320
We reproduced this problem and added logs in psycopg2, found a confusing
thing and reported it to psycopg2.
It seems sqlalchemy & psycopg2 stack flow is:
1.from sqlalchemy to psycopg: pq_execute/pq_commit
2.from psycopg to sqlalchemy PyWeakref_NewRef(conn)
3.get exception in sqlalchemy, and
On Mon, Apr 23, 2018 at 9:03 AM, wrote:
> Sorry for reply on this topic.
>
> We recently get the same problem in our production environment.
>
> I found a patch in other lib [1], and they added conn.close() in exception
> psycopg2.ProgrammingError.
>
> Shall we do the
31 matches
Mail list logo