Joy Gao created AIRFLOW-1642:
--------------------------------

             Summary: An Alembic script not using scoped session causing 
deadlock
                 Key: AIRFLOW-1642
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1642
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Joy Gao
            Priority: Minor


The bug I'm about to describe is a more of an obscure edge case, however I 
think it's something still worth fixing.

After upgrading to airflow 1.9, while running `airflow resetdb` on my local 
machine (with mysql), I encountered a deadlock on the final alembic revision 
_d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text 
types)_.

The deadlock turned out to be caused by another earlier session that was 
created and left open in revision _cc1e65623dc7 add max tries column to task 
instance_. Notably the code below:

{code}
sessionmaker = sa.orm.sessionmaker()
session = sessionmaker(bind=connection)
dagbag = DagBag(settings.DAGS_FOLDER)
{code}

The session created here was not a `scoped_session`, so when the DAGs were 
being parsed in line 3 above, one of the DAG files makes a direct call to the 
class method `Variable.get()` to acquire an env variable, which makes a db 
query to the `variable` table, but raised a KeyError as the env variable was 
non-existent, thus holding the lock to the `variable` table as a result of that 
exception.

Later on, the latter alembic script `_cc1e65623dc7` needs to alter the 
`Variable` table. Instead of creating its own Session object, it attempts to 
reuse the same one as above. And because of the exception, it waits 
indefinitely to acquire the lock on that table. 

So the DAG file itself could have avoided the KeyError by providing a default 
value when calling Variable.get(). However I think it would be a good idea to 
avoid using unscoped sessions in general, as an exception could potentially 
occur in the future elsewhere.  The easiest fix is replacing *session = 
sessionmaker(bind=connection)* with *session = settings.Session()*, which is 
scoped. However, making a change on a migration script is going to make folks 
anxious.

If anyone have any thoughts on this, let me know! Thanks :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to