[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195759#comment-16195759 ] Allison Wang commented on AIRFLOW-1667: --- Great I didn't realize the closed flag is removed in other PR. > Remote log handlers don't upload logs on task finish > > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581 ] Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:25 AM: I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly add a post_task_run method in the handler that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. was (Author: allisonwang): I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method in handlers that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. > Remote log handlers don't upload logs > - > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581 ] Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:24 AM: I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method in handlers that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. was (Author: allisonwang): I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. > Remote log handlers don't upload logs > - > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs
[ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581 ] Allison Wang commented on AIRFLOW-1667: --- I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. > Remote log handlers don't upload logs > - > > Key: AIRFLOW-1667 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1667 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Affects Versions: 1.9.0, 1.10.0 >Reporter: Arthur Vigil > > AIRFLOW-1385 revised logging for configurability, but the provided remote log > handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is > left at the default implementation provided by `logging.FileHandler`). A > handler will be closed on process exit by `logging.shutdown()`, but depending > on the Executor used worker processes may not regularly shutdown, and can > very likely persist between tasks. This means during normal execution log > files are never uploaded. > Need to find a way to flush remote log handlers in a timely manner, but > without hitting the target resources unnecessarily. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (AIRFLOW-1385) Make Airflow task logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang resolved AIRFLOW-1385. --- Resolution: Fixed > Make Airflow task logging configurable > -- > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang >Assignee: Allison Wang > > Make Airflow task logging supports custom loggers and handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Airflow streaming log backed by ElasticSearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Summary: Airflow streaming log backed by ElasticSearch (was: Enable ElasticSearch for Airflow Logs) > Airflow streaming log backed by ElasticSearch > - > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement > Components: logging >Reporter: Allison Wang >Assignee: Allison Wang > > Add Elasticsearch log handler and reader for querying logs in ES -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Enable ElasticSearch for Airflow Logs
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Summary: Enable ElasticSearch for Airflow Logs (was: Airflow Log Backed By ElasticSearch) > Enable ElasticSearch for Airflow Logs > - > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement > Components: logging >Reporter: Allison Wang >Assignee: Allison Wang > > Add Elasticsearch log handler and reader for querying logs in ES -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Airflow Log Backed By ElasticSearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Add Elasticsearch log handler and reader for querying logs in ES was:Add Elasticsearch logging backend. > Airflow Log Backed By ElasticSearch > --- > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement > Components: logging >Reporter: Allison Wang >Assignee: Allison Wang > > Add Elasticsearch log handler and reader for querying logs in ES -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Airflow Log Backed By ElasticSearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Summary: Airflow Log Backed By ElasticSearch (was: Airflow Streaming Log Backed By ElasticSearch) > Airflow Log Backed By ElasticSearch > --- > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement > Components: logging >Reporter: Allison Wang >Assignee: Allison Wang > > Add Elasticsearch logging backend. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Airflow Streaming Log Backed By ElasticSearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Add Elasticsearch logging backend. (was: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified.) > Airflow Streaming Log Backed By ElasticSearch > - > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement > Components: logging >Reporter: Allison Wang >Assignee: Allison Wang > > Add Elasticsearch logging backend. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (AIRFLOW-1443) Update Airflow configuration documentation
[ https://issues.apache.org/jira/browse/AIRFLOW-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang closed AIRFLOW-1443. - Resolution: Fixed > Update Airflow configuration documentation > -- > > Key: AIRFLOW-1443 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1443 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (AIRFLOW-1332) Split logs based on try_number
[ https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang closed AIRFLOW-1332. - Resolution: Fixed > Split logs based on try_number > -- > > Key: AIRFLOW-1332 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1332 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Minor > Labels: transitional > > Split airflow logs based on current try_number. It also add {{.log}} suffix > to log files. The new log directory will be in this format: > {{dag_id/task_id/execution_date_iso/try_number.log}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (AIRFLOW-1485) Get configuration throws exceptions when key does not exist
[ https://issues.apache.org/jira/browse/AIRFLOW-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang closed AIRFLOW-1485. - Resolution: Fixed > Get configuration throws exceptions when key does not exist > --- > > Key: AIRFLOW-1485 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1485 > Project: Apache Airflow > Issue Type: Bug >Reporter: Allison Wang > > Airflow configuration get method throws exceptions when the key is not > defined in airflow.cfg. This behavior makes adding new, optional > configuration difficult given people already have their own airflow.cfg. We > should probably have another method to return an empty string instead of > throwing exceptions when there is no such key in airflow config. > {code} > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, > in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line > 367, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in > decorated_view > return func(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line > 125, in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line > 873, in log > if conf.get('core', 'logging_backend_url'): > File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", > line 802, in get > return conf.get(section, key, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", > line 615, in get > "in config".format(**locals())) > AirflowConfigException: section/key [core/logging_backend_url] not found in > config > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1485) Get configuration throws exceptions when key does not exist
Allison Wang created AIRFLOW-1485: - Summary: Get configuration throws exceptions when key does not exist Key: AIRFLOW-1485 URL: https://issues.apache.org/jira/browse/AIRFLOW-1485 Project: Apache Airflow Issue Type: Bug Reporter: Allison Wang Airflow configuration get method throws exceptions when the key is not defined in airflow.cfg. This behavior makes adding new, optional configuration difficult given people already have their own airflow.cfg. We should probably have another method to return an empty string instead of throwing exceptions when there is no such key in airflow config. {code} Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, in inner return self._run_view(f, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 367, in _run_view return fn(self, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in decorated_view return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 125, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 873, in log if conf.get('core', 'logging_backend_url'): File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", line 802, in get return conf.get(section, key, **kwargs) File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", line 615, in get "in config".format(**locals())) AirflowConfigException: section/key [core/logging_backend_url] not found in config {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (AIRFLOW-1452) "airflow initdb" stuck forever on upgrade
[ https://issues.apache.org/jira/browse/AIRFLOW-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111697#comment-16111697 ] Allison Wang edited comment on AIRFLOW-1452 at 8/2/17 8:57 PM: --- Then there must be locks in the database when you run {{airflow initdb}}. I am not familiar with MSSQL but the SQL in posted error message post is {{UPDATE alembic_version SET version_num='cc1e65623dc7' WHERE alembic_version.version_num = '127d2bf2dfa7'}} This is the error of updating alembic_version, not any particular operation related to adding max_tries column. Please look into what exactly causes this error in MSSQL: {{[Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: Error code 0x2746 (10054)}} Please make sure there is no lock before and during the migration. MSSQL is not officially supported DB. This migration script is tested against MySQL, Postgres and SQLite. We recommend using MySQL and Postgres as we can provide more support for issues with these databases. was (Author: allisonwang): Then there must be locks in the database when you run {{airflow initdb}}. I am not familiar with MSSQL but the SQL in posted error message post is {{UPDATE alembic_version SET version_num='cc1e65623dc7' WHERE alembic_version.version_num = '127d2bf2dfa7'}} This is the error of updating alembic_version, not any particular operation related to adding max_tries column. Please look into what exactly causes this error in MSSQL: {{[Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: Error code 0x2746 (10054)}} Please make sure there is no lock before and during the migration. This migration script is tested against MySQL, Postgres and SQLite. We recommend using MySQL and Postgres as we can provide more support for issues with these databases. > "airflow initdb" stuck forever on upgrade > - > > Key: AIRFLOW-1452 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1452 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Pavel Martynov > Attachments: docker-compose.yml, Dockerfile, run-initdb.sh > > > I install airflow from the current master branch > (426b6a65f6ec142449893e36fcd677941bdad879 when I write this issue) and run > "airflow initdb" against MS SQL and it stuck forever with that output: > {noformat} > [2017-07-25 07:30:12,458] {db.py:307} INFO - Creating tables > INFO [alembic.runtime.migration] Context impl MSSQLImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade -> e3a246e0dc1, current > schema > INFO [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> > 1507a7289a2f, create is_encrypted > INFO [alembic.runtime.migration] Running upgrade 1507a7289a2f -> > 13eb55f81627, maintain history for compatibility with earlier migrations > INFO [alembic.runtime.migration] Running upgrade 13eb55f81627 -> > 338e90f54d61, More logging into task_isntance > INFO [alembic.runtime.migration] Running upgrade 338e90f54d61 -> > 52d714495f0, job_id indices > INFO [alembic.runtime.migration] Running upgrade 52d714495f0 -> > 502898887f84, Adding extra to Log > INFO [alembic.runtime.migration] Running upgrade 502898887f84 -> > 1b38cef5b76e, add dagrun > INFO [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> > 2e541a1dcfed, task_duration > INFO [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> > 40e67319e3a9, dagrun_config > INFO [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> > 561833c1c74b, add password column to user > INFO [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, > dagrun start end > INFO [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, > Add notification_sent column to sla_miss > INFO [alembic.runtime.migration] Running upgrade bbc73705a13e -> > bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field > in connection > INFO [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> > 1968acfc09e3, add is_encrypted column to variable table > INFO [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> > 2e82aab8ef20, rename user table > INFO [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> > 211e584da130, add TI state index > INFO [alembic.runtime.migration] Running upgrade 211e584da130 -> > 64de9cddf6c9, add task fails journal table > INFO [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> > f2ca10b85618, add dag_stats table > INFO [alembic.runtime.migration] Running upgrade f2ca10b85618 -> > 4addfa1236f1, Add fractional seconds to mysql tables > INFO [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> > 8504051e801b, xcom dag task indices > INFO [alembic.runtime.migration] Running upgrade 8504051e801b -> >
[jira] [Commented] (AIRFLOW-1452) "airflow initdb" stuck forever on upgrade
[ https://issues.apache.org/jira/browse/AIRFLOW-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106261#comment-16106261 ] Allison Wang commented on AIRFLOW-1452: --- Hi Pavel. This migration indeed takes a long time, assuming you do not have mysql database connection anywhere else. We did a profiling for the migration on 1M rows and it takes about an hour. This is because the new column populates its value from existing rows and does an UPDATE query for each task_instance row. If the process is taking more than an hour, it means the database still holds locks on some task_instance table rows. Please make sure to disconnect everything before upgrading. We are aware this slow migration and will address it ASAP. In the mean while, I highly suggest using an older version of Airflow other than the master branch. There is a progress going on to refactor and improve airflow logging on the master branch. Please let us know if you have more questions or concerns. > "airflow initdb" stuck forever on upgrade > - > > Key: AIRFLOW-1452 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1452 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Pavel Martynov > Attachments: docker-compose.yml, Dockerfile, run-initdb.sh > > > I install airflow from the current master branch > (426b6a65f6ec142449893e36fcd677941bdad879 when I write this issue) and run > "airflow initdb" against MS SQL and it stuck forever with that output: > {noformat} > [2017-07-25 07:30:12,458] {db.py:307} INFO - Creating tables > INFO [alembic.runtime.migration] Context impl MSSQLImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade -> e3a246e0dc1, current > schema > INFO [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> > 1507a7289a2f, create is_encrypted > INFO [alembic.runtime.migration] Running upgrade 1507a7289a2f -> > 13eb55f81627, maintain history for compatibility with earlier migrations > INFO [alembic.runtime.migration] Running upgrade 13eb55f81627 -> > 338e90f54d61, More logging into task_isntance > INFO [alembic.runtime.migration] Running upgrade 338e90f54d61 -> > 52d714495f0, job_id indices > INFO [alembic.runtime.migration] Running upgrade 52d714495f0 -> > 502898887f84, Adding extra to Log > INFO [alembic.runtime.migration] Running upgrade 502898887f84 -> > 1b38cef5b76e, add dagrun > INFO [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> > 2e541a1dcfed, task_duration > INFO [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> > 40e67319e3a9, dagrun_config > INFO [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> > 561833c1c74b, add password column to user > INFO [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, > dagrun start end > INFO [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, > Add notification_sent column to sla_miss > INFO [alembic.runtime.migration] Running upgrade bbc73705a13e -> > bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field > in connection > INFO [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> > 1968acfc09e3, add is_encrypted column to variable table > INFO [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> > 2e82aab8ef20, rename user table > INFO [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> > 211e584da130, add TI state index > INFO [alembic.runtime.migration] Running upgrade 211e584da130 -> > 64de9cddf6c9, add task fails journal table > INFO [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> > f2ca10b85618, add dag_stats table > INFO [alembic.runtime.migration] Running upgrade f2ca10b85618 -> > 4addfa1236f1, Add fractional seconds to mysql tables > INFO [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> > 8504051e801b, xcom dag task indices > INFO [alembic.runtime.migration] Running upgrade 8504051e801b -> > 5e7d17757c7a, add pid field to TaskInstance > INFO [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> > 127d2bf2dfa7, Add dag_id/state index on dag_run table > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > {noformat} > I reproduce this problem with docker-compose, see files in attachment. > Also, I try this on 1.8.2rc2 and it works fine, looks like problem in > cc1e65623dc7_add_max_tries_column_to_task_instance.py migration. > Some locks occurred, I "killed lock" in MS SQL and got exception: > {noformat} > sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('08S01', '[08S01] [Microsoft][ODBC > Driver 13 for SQL Server]TCP Provider: Error code 0x2746 (10054) > (SQLExecDirectW)') [SQL: u"UPDATE alembic_version SET >
[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1454: -- Issue Type: Improvement (was: Task) > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow for the > webserver, scheduler, and worker. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1454: -- Issue Type: Task (was: Improvement) > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Task >Reporter: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow for the > webserver, scheduler, and worker. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang reassigned AIRFLOW-1454: - Assignee: (was: Allison Wang) > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow for the > webserver, scheduler, and worker. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1457) Unify Airflow logging setup
Allison Wang created AIRFLOW-1457: - Summary: Unify Airflow logging setup Key: AIRFLOW-1457 URL: https://issues.apache.org/jira/browse/AIRFLOW-1457 Project: Apache Airflow Issue Type: Sub-task Reporter: Allison Wang Logging is setup in multiple places inside Airflow, including {{setting.py:configure_logging}}, {{cli.py:setup_logging}}, etc. This task is to unify Airflow logging setup in setting.py and use {{dictConfig}} to configre all logging settings including the webserver and the scheduler. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1455: -- Description: All logging related configurations including `LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. (was: All logging related configruations including `LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. ) > Move logging related configs out of airflow.cfg > --- > > Key: AIRFLOW-1455 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang > > All logging related configurations including `LOG_BASE_FOLDER`, > `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed > inside `default_airflow_logging`. This task also includes refactoring all > occurrence of those variables and make them handler specific rather than > global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1455: -- Description: All logging related configurations including {{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and {{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. (was: All logging related configurations including {{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and other {{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. ) > Move logging related configs out of airflow.cfg > --- > > Key: AIRFLOW-1455 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang > > All logging related configurations including {{LOG_BASE_FOLDER}}, > {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and {{LOG_FORMAT}} should be placed > inside {{default_airflow_logging}}. This task also includes refactoring all > occurrence of those variables and make them handler specific rather than > global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1455: -- Description: All logging related configurations including {{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and other {{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. (was: All logging related configurations including ``LOG_BASE_FOLDER``, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. ) > Move logging related configs out of airflow.cfg > --- > > Key: AIRFLOW-1455 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang > > All logging related configurations including {{LOG_BASE_FOLDER}}, > {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and other {{LOG_FORMAT}} should be > placed inside {{default_airflow_logging}}. This task also includes > refactoring all occurrence of those variables and make them handler specific > rather than global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1455: -- Description: All logging related configurations including ``LOG_BASE_FOLDER``, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. (was: All logging related configurations including `LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. ) > Move logging related configs out of airflow.cfg > --- > > Key: AIRFLOW-1455 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang > > All logging related configurations including ``LOG_BASE_FOLDER``, > `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed > inside `default_airflow_logging`. This task also includes refactoring all > occurrence of those variables and make them handler specific rather than > global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1455: -- Summary: Move logging related configs out of airflow.cfg (was: Move logging related config out of airflow.cfg) > Move logging related configs out of airflow.cfg > --- > > Key: AIRFLOW-1455 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang > > All logging related configruations including `LOG_BASE_FOLDER`, > `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed > inside `default_airflow_logging`. This task also includes refactoring all > occurrence of those variables and make them handler specific rather than > global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1455) Move logging related config out of airflow.cfg
Allison Wang created AIRFLOW-1455: - Summary: Move logging related config out of airflow.cfg Key: AIRFLOW-1455 URL: https://issues.apache.org/jira/browse/AIRFLOW-1455 Project: Apache Airflow Issue Type: Sub-task Reporter: Allison Wang All logging related configruations including `LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also includes refactoring all occurrence of those variables and make them handler specific rather than global. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1454: -- Description: Airflow logging should be configurable. Users can provide custom log handlers, formatters and loggers to handle log messages in Airflow for the webserver, scheduler, and worker. (was: Airflow logging should be configurable. Users can provide custom log handlers, formatters and loggers to handle log messages in Airflow for the webserver, scheduler and worker. ) > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow for the > webserver, scheduler, and worker. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1454: -- Description: Airflow logging should be configurable. Users can provide custom log handlers, formatters and loggers to handle log messages in Airflow for the webserver, scheduler and worker. (was: Airflow logging should be configurable. Users can provide custom log handlers, formatters and loggers to handle log messages in Airflow for the webserver, scheduler and workers. ) > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow for the > webserver, scheduler and worker. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Make Airflow task logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Description: Make Airflow task logging supports custom loggers and handlers. > Make Airflow task logging configurable > -- > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang >Assignee: Allison Wang > > Make Airflow task logging supports custom loggers and handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Make Airflow task logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Summary: Make Airflow task logging configurable (was: Refactor Airflow task logging) > Make Airflow task logging configurable > -- > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang >Assignee: Allison Wang > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Description: (was: Unify Airflow logging and adds custom logger and handler conguration.) > Refactor Airflow task logging > - > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang >Assignee: Allison Wang > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Issue Type: Sub-task (was: Improvement) Parent: AIRFLOW-1454 > Refactor Airflow task logging > - > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Allison Wang >Assignee: Allison Wang > > Unify Airflow logging and adds custom logger and handler conguration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1454) Make Airflow logging configurable
Allison Wang created AIRFLOW-1454: - Summary: Make Airflow logging configurable Key: AIRFLOW-1454 URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang Airflow logging should be configurable. Users can provide custom log handlers, formatters and loggers to handle log messages in Airflow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1454) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1454 started by Allison Wang. - > Make Airflow logging configurable > - > > Key: AIRFLOW-1454 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1454 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Airflow logging should be configurable. Users can provide custom log > handlers, formatters and loggers to handle log messages in Airflow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Summary: Refactor Airflow task logging (was: Make Airflow logging configurable) > Refactor Airflow task logging > - > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Unify Airflow logging and adds custom logger and handler conguration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Make Airflow logging configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Summary: Make Airflow logging configurable (was: Create abstraction for Airflow task logging) > Make Airflow logging configurable > - > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Unify Airflow logging and adds custom logger and handler conguration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Airflow Streaming Log Backed By ElasticSearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Summary: Airflow Streaming Log Backed By ElasticSearch (was: Make Airflow Logging Backed By Elasticsearch) > Airflow Streaming Log Backed By ElasticSearch > - > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This change adds functionality to use customized logging backend. Users are > able to configure logging backend that supports streaming logs and more > advanced queries. Currently, Elasticsearch logging backend is implemented. > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. was: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: - Streaming logs without refresh the page - Separate logs by attempts - Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This change adds functionality to use customized logging backend. Users are > able to configure logging backend that supports streaming logs and more > advanced queries. Currently, Elasticsearch logging backend is implemented. > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1443) Update Airflow configuration documentation
Allison Wang created AIRFLOW-1443: - Summary: Update Airflow configuration documentation Key: AIRFLOW-1443 URL: https://issues.apache.org/jira/browse/AIRFLOW-1443 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang Assignee: Allison Wang Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1385) Create abstraction for Airflow task logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1385: -- Description: Unify Airflow logging and adds custom logger and handler conguration. > Create abstraction for Airflow task logging > --- > > Key: AIRFLOW-1385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Unify Airflow logging and adds custom logger and handler conguration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (AIRFLOW-1366) Add max_tries to task instance
[ https://issues.apache.org/jira/browse/AIRFLOW-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang closed AIRFLOW-1366. - > Add max_tries to task instance > -- > > Key: AIRFLOW-1366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1366 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Right now Airflow deletes the task instance when user clear it. We have no > way of keeping track of how many times a task instance gets run either via > user or itself. So instead of deleting the task instance record, we should > keep the task instance and make try_number monotonically increasing for every > task instance attempt. max_tries is introduced as an upper bound for retrying > tasks by task itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (AIRFLOW-1366) Add max_tries to task instance
[ https://issues.apache.org/jira/browse/AIRFLOW-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang resolved AIRFLOW-1366. --- Resolution: Done > Add max_tries to task instance > -- > > Key: AIRFLOW-1366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1366 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Right now Airflow deletes the task instance when user clear it. We have no > way of keeping track of how many times a task instance gets run either via > user or itself. So instead of deleting the task instance record, we should > keep the task instance and make try_number monotonically increasing for every > task instance attempt. max_tries is introduced as an upper bound for retrying > tasks by task itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1385) Create abstraction for Airflow worker log handler
Allison Wang created AIRFLOW-1385: - Summary: Create abstraction for Airflow worker log handler Key: AIRFLOW-1385 URL: https://issues.apache.org/jira/browse/AIRFLOW-1385 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1366) Add max_tries to task instance
Allison Wang created AIRFLOW-1366: - Summary: Add max_tries to task instance Key: AIRFLOW-1366 URL: https://issues.apache.org/jira/browse/AIRFLOW-1366 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang Assignee: Allison Wang Right now Airflow deletes the task instance when user clear it. We have no universal way of keeping track of how many times a task instance attempts either via user or itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1332) Split logs based on try_number
[ https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1332: -- Description: Split airflow logs based on current try_number. It also add `.log` suffix to log files. The new log directory will be in this format: `dag_id/task_id/execution_date_iso/try_number.log` was:Adding attempt number to separate logs for each task run. > Split logs based on try_number > -- > > Key: AIRFLOW-1332 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1332 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Minor > > Split airflow logs based on current try_number. It also add `.log` suffix to > log files. The new log directory will be in this format: > `dag_id/task_id/execution_date_iso/try_number.log` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1332) Split logs based on try_number
[ https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1332: -- Description: Split airflow logs based on current try_number. It also add {{.log}} suffix to log files. The new log directory will be in this format: {{dag_id/task_id/execution_date_iso/try_number.log}} was: Split airflow logs based on current try_number. It also add `.log` suffix to log files. The new log directory will be in this format: `dag_id/task_id/execution_date_iso/try_number.log` > Split logs based on try_number > -- > > Key: AIRFLOW-1332 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1332 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Minor > > Split airflow logs based on current try_number. It also add {{.log}} suffix > to log files. The new log directory will be in this format: > {{dag_id/task_id/execution_date_iso/try_number.log}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061491#comment-16061491 ] Allison Wang commented on AIRFLOW-1325: --- Yes airflow will only use ES if the user configures the logging_backend_url and S3/GCS won't be removed :) > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This change adds functionality to use customized logging backend. Users are > able to configure logging backend that supports streaming logs and more > advanced queries. Currently, Elasticsearch logging backend is implemented. > Having Elasticsearch as logging backend enables the development of more > advanced logging related features. Those are features that will be > implemented in the future: > - Streaming logs without refresh the page > - Separate logs by attempts > - Filter log with excluded phrases > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This PR adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: - Streaming logs without refresh the page - Separate logs by attempts - Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. was: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This PR adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: Streaming logs without refresh the page Separate logs by attempts Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This PR adds functionality to use customized logging backend. Users are able > to configure logging backend that supports streaming logs and more advanced > queries. Currently, Elasticsearch logging backend is implemented. > Having Elasticsearch as logging backend enables the development of more > advanced logging related features. Those are features that will be > implemented in the future: > - Streaming logs without refresh the page > - Separate logs by attempts > - Filter log with excluded phrases > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: - Streaming logs without refresh the page - Separate logs by attempts - Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. was: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This PR adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: - Streaming logs without refresh the page - Separate logs by attempts - Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This change adds functionality to use customized logging backend. Users are > able to configure logging backend that supports streaming logs and more > advanced queries. Currently, Elasticsearch logging backend is implemented. > Having Elasticsearch as logging backend enables the development of more > advanced logging related features. Those are features that will be > implemented in the future: > - Streaming logs without refresh the page > - Separate logs by attempts > - Filter log with excluded phrases > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers. This PR adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented. Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future: Streaming logs without refresh the page Separate logs by attempts Filter log with excluded phrases This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified. was: Move logging to Elasticsearch and also make it backward compatible. This feature is the first step to make Airflow logging more readable. > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Currently, Airflow uses S3/GCS as the log storage backend. Workers, when > executing the task, flushes logs into local files. When tasks are completed, > those log files will be uploaded to the remote storage system like S3 or GCS. > This approach makes log streaming and analysis difficult. Also when worker > servers are down while executing the task, the entire task log will be lost > until worker servers are recovered. It's also considered a bad practice for > airflow webserver to communicate directly with worker servers. > This PR adds functionality to use customized logging backend. Users are able > to configure logging backend that supports streaming logs and more advanced > queries. Currently, Elasticsearch logging backend is implemented. > Having Elasticsearch as logging backend enables the development of more > advanced logging related features. Those are features that will be > implemented in the future: > Streaming logs without refresh the page > Separate logs by attempts > Filter log with excluded phrases > This feature will also be backward compatible. It will direct users to the > old logging flow if logging_backend_url is not set. A new UI will be created > to support above features and old page won't be modified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1332) Add attempt column to task instance
Allison Wang created AIRFLOW-1332: - Summary: Add attempt column to task instance Key: AIRFLOW-1332 URL: https://issues.apache.org/jira/browse/AIRFLOW-1332 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang Assignee: _matthewHawthorne Priority: Minor Adding attempt number to separate logs for each task run. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1332) Add attempt column to task instance
[ https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1332 started by Allison Wang. - > Add attempt column to task instance > --- > > Key: AIRFLOW-1332 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1332 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Minor > > Adding attempt number to separate logs for each task run. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated AIRFLOW-1325: -- Description: Move logging to Elasticsearch and also make it backward compatible. This feature is the first step to make Airflow logging more readable. was:Move logging to Elasticsearch and also make it backward compatible. > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Move logging to Elasticsearch and also make it backward compatible. > This feature is the first step to make Airflow logging more readable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
[ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1325 started by Allison Wang. - > Make Airflow Logging Backed By Elasticsearch > > > Key: AIRFLOW-1325 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Allison Wang >Assignee: Allison Wang > > Move logging to Elasticsearch and also make it backward compatible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch
Allison Wang created AIRFLOW-1325: - Summary: Make Airflow Logging Backed By Elasticsearch Key: AIRFLOW-1325 URL: https://issues.apache.org/jira/browse/AIRFLOW-1325 Project: Apache Airflow Issue Type: Improvement Reporter: Allison Wang Assignee: Allison Wang Move logging to Elasticsearch and also make it backward compatible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)