[ https://issues.apache.org/jira/browse/AIRFLOW-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Imberman closed AIRFLOW-405. ----------------------------------- Resolution: Auto Closed > MySQL backend connection management bug (related to PythonOperator) > ------------------------------------------------------------------- > > Key: AIRFLOW-405 > URL: https://issues.apache.org/jira/browse/AIRFLOW-405 > Project: Apache Airflow > Issue Type: Bug > Components: operators > Affects Versions: 1.7.1.2 > Reporter: Jiasi Zeng > Priority: Major > > # Environment setup > We are running Airflow 1.7.1.2 with MySQL 5.6. The `wait_timeout` of MySQL is > set at 300 seconds, which means that idle connections will go away after 300 > seconds of inactivity. To reflect this, we set SQLAlchemy's `pool_recycle` in > `airflow.cfg` to 290 seconds, which should force Airflow/SQLAlchemy to > recycle/discard connections after 290 seconds. Thus Airflow shouldn't try to > use an already-dead connection. > # Symptom > When running a PythonOperator that takes more than 300 seconds to execute, > the task would finish executing the Python callable, but ends up with error: > https://gist.github.com/garthcn/cd7bcdec12748406506f2b0710655c8b > It seems that after the Python callable finishes executing, Airflow tries to > push its return value to XCom. However, the SQL connection has gone away > while Airflow/SQLAlchemy think it's still there. > # Hypothesis > I did some investigation and think that it might be caused by not calling > `session.commit()` or `session.close()` for the DB operations before the XCom > push. As far as I know, in SQLAlchemy, if you don't close a connection and > let it be in `checked-out` state, it won't be recycled by connection pool, > and thus SQLAlchemy will try to use it again after >300 seconds (which is the > wait_timeout for MySQL in our case). This will result in a "MySQL connection > has gone away" issue. > It seems that Airflow codebase uses @provide_context decorator to help with > session open/close, and my hunch is that some functions are not using it or > misusing it. -- This message was sent by Atlassian Jira (v8.3.4#803005)