Hi all,
Our production system is experiencing a very strange behavior. At peak
load, we see some of our python processes were stuck due to: * <built-in
method __enter__ of _thread.lock object at remote 0x7f8e021156c0> in* *with
self._exec_once_mutex in sqlalchemy/event/attr.py*, (here is the full stack
trace <https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31>that
I got via gdb with python extension).
Here are some high level information about production setup.
Python: 3.7.9
SQLAlchemy 1.3.6
PyMySQL 1.0.2
Connection str: *mysql+pymysql*
://HOST:PORT/DB?connect_timeout=30&charset=utf8
Docker container where the python process are running: Ubuntu 18.04.6
VM host version where docker runs: amzn2
When creating the engine, we use NullPool as the pool_class:
engine_args['poolclass'] = NullPool
engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)
setup_event_handlers
<https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>
(engine)
Session = scoped_session(
sessionmaker(autocommit=False,
autoflush=False,
bind=engine,
expire_on_commit=False))
# here is the setup_event_handlers code
<https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>
.
There are 5 processes running running in the container. please see this link
<https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>
for
the python process model running in the docker container. The stuck python
process is created by its parent process (which is also python) via:
`subprocess.Popen`, right after `subprocess.Popen`, there is a thread
created to read the log of the subprocess(please see here for more info
<https://gist.github.com/pingzh/8c98afa5421f03dc82af7de8910d4a31?permalink_comment_id=4451451#gistcomment-4451451>).
Reading the source code
<https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_6/lib/sqlalchemy/event/attr.py#L277>
of
that stack trace, it looks like it tries to grab a lock, but i am not sure
why it would fail to grab the lock given that we only have one process uses
the session.
At the peak load, there are around ~200 docker containers running on a
single amzn2 host. From the metric from database side, we do see some
connections were dropped at the peak. I am not sure whether this would
cause that threading.lock issue but they correlate. *Or is it possible due
to the event_handlers
<https://github.com/apache/airflow/blob/1.10.4/airflow/utils/sqlalchemy.py#L42-L136>of
the engine and the session query in the stack trace try to grab this lock
at the same time*?
Potential related issues that I have checked:
- Race condition in dialect initialization
<https://github.com/sqlalchemy/sqlalchemy/issues/6337>
- A possible race condition between dialect initialization and query
execution <https://github.com/sqlalchemy/sqlalchemy/issues/6337>
but i could not figure out why it happens in our system. We also tried to
reproduce in our stress test env, however, we could not reproduce it.
Could you please share some insights? Any suggestions would be really
appreciated.
Thanks,
Ping
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sqlalchemy/2979ec01-e3be-4c5d-b4aa-50a6f4fceee4n%40googlegroups.com.