[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs on task finish

2017-10-07 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195759#comment-16195759
 ] 

Allison Wang commented on AIRFLOW-1667:
---

Great I didn't realize the closed flag is removed in other PR. 

> Remote log handlers don't upload logs on task finish
> 
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs

2017-10-06 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581
 ] 

Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:25 AM:


I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly add a post_task_run method in the handler that handles any 
additional clean up/operations upon task completion. This change only requires 
modifying a small amount of current code. I am not exactly sure how the to 
upload the log to remote storage like S3/GCS periodically upon task execution, 
but it's possible to use a log collector (e.g Filebeat) to ship the log to a 
centralized storage (e.g ElasticSearch) in real time. 


was (Author: allisonwang):
I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly invoke a post_task_run method in handlers that handles any 
additional clean up/operations upon task completion. This change only requires 
modifying a small amount of current code. I am not exactly sure how the to 
upload the log to remote storage like S3/GCS periodically upon task execution, 
but it's possible to use a log collector (e.g Filebeat) to ship the log to a 
centralized storage (e.g ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs

2017-10-06 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581
 ] 

Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:24 AM:


I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly invoke a post_task_run method in handlers that handles any 
additional clean up/operations upon task completion. This change only requires 
modifying a small amount of current code. I am not exactly sure how the to 
upload the log to remote storage like S3/GCS periodically upon task execution, 
but it's possible to use a log collector (e.g Filebeat) to ship the log to a 
centralized storage (e.g ElasticSearch) in real time. 


was (Author: allisonwang):
I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly invoke a post_task_run method that handles any additional clean 
up/operations upon task completion. This change only requires modifying a small 
amount of current code. I am not exactly sure how the to upload the log to 
remote storage like S3/GCS periodically upon task execution, but it's possible 
to use a log collector (e.g Filebeat) to ship the log to a centralized storage 
(e.g ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1667) Remote log handlers don't upload logs

2017-10-06 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195581#comment-16195581
 ] 

Allison Wang commented on AIRFLOW-1667:
---

I agree that we shouldn't rely on the logging module's close to upload the log 
since we have no control when it's called. Instead of calling close, we could 
explicitly invoke a post_task_run method that handles any additional clean 
up/operations upon task completion. This change only requires modifying a small 
amount of current code. I am not exactly sure how the to upload the log to 
remote storage like S3/GCS periodically upon task execution, but it's possible 
to use a log collector (e.g Filebeat) to ship the log to a centralized storage 
(e.g ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -
>
> Key: AIRFLOW-1667
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log 
> handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is 
> left at the default implementation provided by `logging.FileHandler`). A 
> handler will be closed on process exit by `logging.shutdown()`, but depending 
> on the Executor used worker processes may not regularly shutdown, and can 
> very likely persist between tasks. This means during normal execution log 
> files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but 
> without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1385) Make Airflow task logging configurable

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang resolved AIRFLOW-1385.
---
Resolution: Fixed

> Make Airflow task logging configurable
> --
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Make Airflow task logging supports custom loggers and handlers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Airflow streaming log backed by ElasticSearch

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Summary: Airflow streaming log backed by ElasticSearch  (was: Enable 
ElasticSearch for Airflow Logs)

> Airflow streaming log backed by ElasticSearch
> -
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Add Elasticsearch log handler and reader for querying logs in ES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Enable ElasticSearch for Airflow Logs

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Summary: Enable ElasticSearch for Airflow Logs  (was: Airflow Log Backed By 
ElasticSearch)

> Enable ElasticSearch for Airflow Logs
> -
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Add Elasticsearch log handler and reader for querying logs in ES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Airflow Log Backed By ElasticSearch

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Add Elasticsearch log handler and reader for querying logs in ES


  was:Add Elasticsearch logging backend.


> Airflow Log Backed By ElasticSearch
> ---
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Add Elasticsearch log handler and reader for querying logs in ES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Airflow Log Backed By ElasticSearch

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Summary: Airflow Log Backed By ElasticSearch  (was: Airflow Streaming Log 
Backed By ElasticSearch)

> Airflow Log Backed By ElasticSearch
> ---
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Add Elasticsearch logging backend.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Airflow Streaming Log Backed By ElasticSearch

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: Add Elasticsearch logging backend.  (was: Currently, Airflow 
uses S3/GCS as the log storage backend. Workers, when executing the task, 
flushes logs into local files. When tasks are completed, those log files will 
be uploaded to the remote storage system like S3 or GCS. This approach makes 
log streaming and analysis difficult. Also when worker servers are down while 
executing the task, the entire task log will be lost until worker servers are 
recovered. It's also considered a bad practice for airflow webserver to 
communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are 
able to configure logging backend that supports streaming logs and more 
advanced queries. Currently, Elasticsearch logging backend is implemented.

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.)

> Airflow Streaming Log Backed By ElasticSearch
> -
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Add Elasticsearch logging backend.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1443) Update Airflow configuration documentation

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang closed AIRFLOW-1443.
-
Resolution: Fixed

> Update Airflow configuration documentation
> --
>
> Key: AIRFLOW-1443
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1443
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1332) Split logs based on try_number

2017-08-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang closed AIRFLOW-1332.
-
Resolution: Fixed

> Split logs based on try_number
> --
>
> Key: AIRFLOW-1332
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1332
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Minor
>  Labels: transitional
>
> Split airflow logs based on current try_number. It also add {{.log}} suffix 
> to log files. The new log directory will be in this format:
> {{dag_id/task_id/execution_date_iso/try_number.log}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1485) Get configuration throws exceptions when key does not exist

2017-08-03 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang closed AIRFLOW-1485.
-
Resolution: Fixed

> Get configuration throws exceptions when key does not exist
> ---
>
> Key: AIRFLOW-1485
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1485
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Allison Wang
>
> Airflow configuration get method throws exceptions when the key is not 
> defined in airflow.cfg. This behavior makes adding new, optional 
> configuration difficult given people already have their own airflow.cfg. We 
> should probably have another method to return an empty string instead of 
> throwing exceptions when there is no such key in airflow config. 
> {code}
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 125, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 873, in log
> if conf.get('core', 'logging_backend_url'):
>   File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", 
> line 802, in get
> return conf.get(section, key, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", 
> line 615, in get
> "in config".format(**locals()))
> AirflowConfigException: section/key [core/logging_backend_url] not found in 
> config
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1485) Get configuration throws exceptions when key does not exist

2017-08-03 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1485:
-

 Summary: Get configuration throws exceptions when key does not 
exist
 Key: AIRFLOW-1485
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1485
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Allison Wang


Airflow configuration get method throws exceptions when the key is not defined 
in airflow.cfg. This behavior makes adding new, optional configuration 
difficult given people already have their own airflow.cfg. We should probably 
have another method to return an empty string instead of throwing exceptions 
when there is no such key in airflow config. 
{code}
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
wsgi_app
response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
full_dispatch_request
rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
full_dispatch_request
rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
in inner
return self._run_view(f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 367, 
in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in 
decorated_view
return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 125, 
in wrapper
return f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 873, 
in log
if conf.get('core', 'logging_backend_url'):
  File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", line 
802, in get
return conf.get(section, key, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", line 
615, in get
"in config".format(**locals()))
AirflowConfigException: section/key [core/logging_backend_url] not found in 
config
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AIRFLOW-1452) "airflow initdb" stuck forever on upgrade

2017-08-02 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111697#comment-16111697
 ] 

Allison Wang edited comment on AIRFLOW-1452 at 8/2/17 8:57 PM:
---

Then there must be locks in the database when you run {{airflow initdb}}. I am 
not familiar with MSSQL but the SQL in posted error message post is 
{{UPDATE alembic_version SET version_num='cc1e65623dc7' WHERE 
alembic_version.version_num = '127d2bf2dfa7'}}
This is the error of updating alembic_version, not any particular operation 
related to adding max_tries column. Please look into what exactly causes this 
error in MSSQL: {{[Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: Error 
code 0x2746 (10054)}}

Please make sure there is no lock before and during the migration. MSSQL is not 
officially supported DB. This migration script is tested against MySQL, 
Postgres and SQLite. We recommend using MySQL and Postgres as we can provide 
more support for issues with these databases. 




was (Author: allisonwang):
Then there must be locks in the database when you run {{airflow initdb}}. I am 
not familiar with MSSQL but the SQL in posted error message post is 
{{UPDATE alembic_version SET version_num='cc1e65623dc7' WHERE 
alembic_version.version_num = '127d2bf2dfa7'}}
This is the error of updating alembic_version, not any particular operation 
related to adding max_tries column. Please look into what exactly causes this 
error in MSSQL: {{[Microsoft][ODBC Driver 13 for SQL Server]TCP Provider: Error 
code 0x2746 (10054)}}

Please make sure there is no lock before and during the migration. This 
migration script is tested against MySQL, Postgres and SQLite. We recommend 
using MySQL and Postgres as we can provide more support for issues with these 
databases. 



> "airflow initdb" stuck forever on upgrade
> -
>
> Key: AIRFLOW-1452
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1452
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Pavel Martynov
> Attachments: docker-compose.yml, Dockerfile, run-initdb.sh
>
>
> I install airflow from the current master branch 
> (426b6a65f6ec142449893e36fcd677941bdad879 when I write this issue) and run 
> "airflow initdb" against MS SQL and it stuck forever with that output:
> {noformat}
> [2017-07-25 07:30:12,458] {db.py:307} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl MSSQLImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current 
> schema
> INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 
> 1507a7289a2f, create is_encrypted
> INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 
> 13eb55f81627, maintain history for compatibility with earlier migrations
> INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 
> 338e90f54d61, More logging into task_isntance
> INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 
> 52d714495f0, job_id indices
> INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 
> 502898887f84, Adding extra to Log
> INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 
> 1b38cef5b76e, add dagrun
> INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 
> 2e541a1dcfed, task_duration
> INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 
> 40e67319e3a9, dagrun_config
> INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 
> 561833c1c74b, add password column to user
> INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, 
> dagrun start end
> INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, 
> Add notification_sent column to sla_miss
> INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> 
> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field 
> in connection
> INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 
> 1968acfc09e3, add is_encrypted column to variable table
> INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 
> 2e82aab8ef20, rename user table
> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 
> 211e584da130, add TI state index
> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 
> 64de9cddf6c9, add task fails journal table
> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> 
> f2ca10b85618, add dag_stats table
> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 
> 4addfa1236f1, Add fractional seconds to mysql tables
> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 
> 8504051e801b, xcom dag task indices
> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 
> 

[jira] [Commented] (AIRFLOW-1452) "airflow initdb" stuck forever on upgrade

2017-07-29 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106261#comment-16106261
 ] 

Allison Wang commented on AIRFLOW-1452:
---

Hi Pavel. This migration indeed takes a long time, assuming you do not have 
mysql database connection anywhere else. We did a profiling for the migration 
on 1M rows and it takes about an hour. This is because the new column populates 
its value from existing rows and does an UPDATE query for each task_instance 
row. 

If the process is taking more than an hour, it means the database still holds 
locks on some task_instance table rows. Please make sure to disconnect 
everything before upgrading. 

We are aware this slow migration and will address it ASAP. In the mean while, I 
highly suggest using an older version of Airflow other than the master branch. 
There is a progress going on to refactor and improve airflow logging on the 
master branch. 

Please let us know if you have more questions or concerns. 

> "airflow initdb" stuck forever on upgrade
> -
>
> Key: AIRFLOW-1452
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1452
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Pavel Martynov
> Attachments: docker-compose.yml, Dockerfile, run-initdb.sh
>
>
> I install airflow from the current master branch 
> (426b6a65f6ec142449893e36fcd677941bdad879 when I write this issue) and run 
> "airflow initdb" against MS SQL and it stuck forever with that output:
> {noformat}
> [2017-07-25 07:30:12,458] {db.py:307} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl MSSQLImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current 
> schema
> INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 
> 1507a7289a2f, create is_encrypted
> INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 
> 13eb55f81627, maintain history for compatibility with earlier migrations
> INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 
> 338e90f54d61, More logging into task_isntance
> INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 
> 52d714495f0, job_id indices
> INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 
> 502898887f84, Adding extra to Log
> INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 
> 1b38cef5b76e, add dagrun
> INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 
> 2e541a1dcfed, task_duration
> INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 
> 40e67319e3a9, dagrun_config
> INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 
> 561833c1c74b, add password column to user
> INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, 
> dagrun start end
> INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, 
> Add notification_sent column to sla_miss
> INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> 
> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field 
> in connection
> INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 
> 1968acfc09e3, add is_encrypted column to variable table
> INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 
> 2e82aab8ef20, rename user table
> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 
> 211e584da130, add TI state index
> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 
> 64de9cddf6c9, add task fails journal table
> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> 
> f2ca10b85618, add dag_stats table
> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 
> 4addfa1236f1, Add fractional seconds to mysql tables
> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 
> 8504051e801b, xcom dag task indices
> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 
> 5e7d17757c7a, add pid field to TaskInstance
> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 
> 127d2bf2dfa7, Add dag_id/state index on dag_run table
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> {noformat}
> I reproduce this problem with docker-compose, see files in attachment.
> Also, I try this on 1.8.2rc2 and it works fine, looks like problem in 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py migration.
> Some locks occurred, I "killed lock" in MS SQL and got exception:
> {noformat}
> sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('08S01', '[08S01] [Microsoft][ODBC 
> Driver 13 for SQL Server]TCP Provider: Error code 0x2746 (10054) 
> (SQLExecDirectW)') [SQL: u"UPDATE alembic_version SET 
> 

[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1454:
--
Issue Type: Improvement  (was: Task)

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow for the 
> webserver, scheduler, and worker. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1454:
--
Issue Type: Task  (was: Improvement)

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow for the 
> webserver, scheduler, and worker. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang reassigned AIRFLOW-1454:
-

Assignee: (was: Allison Wang)

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow for the 
> webserver, scheduler, and worker. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1457) Unify Airflow logging setup

2017-07-25 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1457:
-

 Summary: Unify Airflow logging setup
 Key: AIRFLOW-1457
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1457
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Allison Wang


Logging is setup in multiple places inside Airflow, including 
{{setting.py:configure_logging}}, {{cli.py:setup_logging}}, etc. This task is 
to unify Airflow logging setup in setting.py and use {{dictConfig}} to configre 
all logging settings including the webserver and the scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1455:
--
Description: All logging related configurations including 
`LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` 
should be placed inside `default_airflow_logging`. This task also includes 
refactoring all occurrence of those variables and make them handler specific 
rather than global.   (was: All logging related configruations including 
`LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` 
should be placed inside `default_airflow_logging`. This task also includes 
refactoring all occurrence of those variables and make them handler specific 
rather than global. )

> Move logging related configs out of airflow.cfg
> ---
>
> Key: AIRFLOW-1455
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>
> All logging related configurations including `LOG_BASE_FOLDER`, 
> `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed 
> inside `default_airflow_logging`. This task also includes refactoring all 
> occurrence of those variables and make them handler specific rather than 
> global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1455:
--
Description: All logging related configurations including 
{{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and 
{{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This task 
also includes refactoring all occurrence of those variables and make them 
handler specific rather than global.   (was: All logging related configurations 
including {{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and 
other {{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This 
task also includes refactoring all occurrence of those variables and make them 
handler specific rather than global. )

> Move logging related configs out of airflow.cfg
> ---
>
> Key: AIRFLOW-1455
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>
> All logging related configurations including {{LOG_BASE_FOLDER}}, 
> {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and {{LOG_FORMAT}} should be placed 
> inside {{default_airflow_logging}}. This task also includes refactoring all 
> occurrence of those variables and make them handler specific rather than 
> global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1455:
--
Description: All logging related configurations including 
{{LOG_BASE_FOLDER}}, {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and other 
{{LOG_FORMAT}} should be placed inside {{default_airflow_logging}}. This task 
also includes refactoring all occurrence of those variables and make them 
handler specific rather than global.   (was: All logging related configurations 
including ``LOG_BASE_FOLDER``, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other 
`LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also 
includes refactoring all occurrence of those variables and make them handler 
specific rather than global. )

> Move logging related configs out of airflow.cfg
> ---
>
> Key: AIRFLOW-1455
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>
> All logging related configurations including {{LOG_BASE_FOLDER}}, 
> {{REMOTE_LOG_BASE_FOLDER}}, {{LOG_LEVEL}} and other {{LOG_FORMAT}} should be 
> placed inside {{default_airflow_logging}}. This task also includes 
> refactoring all occurrence of those variables and make them handler specific 
> rather than global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1455:
--
Description: All logging related configurations including 
``LOG_BASE_FOLDER``, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other 
`LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also 
includes refactoring all occurrence of those variables and make them handler 
specific rather than global.   (was: All logging related configurations 
including `LOG_BASE_FOLDER`, `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other 
`LOG_FORMAT` should be placed inside `default_airflow_logging`. This task also 
includes refactoring all occurrence of those variables and make them handler 
specific rather than global. )

> Move logging related configs out of airflow.cfg
> ---
>
> Key: AIRFLOW-1455
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>
> All logging related configurations including ``LOG_BASE_FOLDER``, 
> `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed 
> inside `default_airflow_logging`. This task also includes refactoring all 
> occurrence of those variables and make them handler specific rather than 
> global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1455) Move logging related configs out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1455:
--
Summary: Move logging related configs out of airflow.cfg  (was: Move 
logging related config out of airflow.cfg)

> Move logging related configs out of airflow.cfg
> ---
>
> Key: AIRFLOW-1455
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>
> All logging related configruations including `LOG_BASE_FOLDER`, 
> `REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed 
> inside `default_airflow_logging`. This task also includes refactoring all 
> occurrence of those variables and make them handler specific rather than 
> global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1455) Move logging related config out of airflow.cfg

2017-07-25 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1455:
-

 Summary: Move logging related config out of airflow.cfg
 Key: AIRFLOW-1455
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1455
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Allison Wang


All logging related configruations including `LOG_BASE_FOLDER`, 
`REMOTE_LOG_BASE_FOLDER`, `LOG_LEVEL` and other `LOG_FORMAT` should be placed 
inside `default_airflow_logging`. This task also includes refactoring all 
occurrence of those variables and make them handler specific rather than 
global. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1454:
--
Description: Airflow logging should be configurable. Users can provide 
custom log handlers, formatters and loggers to handle log messages in Airflow 
for the webserver, scheduler, and worker.   (was: Airflow logging should be 
configurable. Users can provide custom log handlers, formatters and loggers to 
handle log messages in Airflow for the webserver, scheduler and worker. )

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow for the 
> webserver, scheduler, and worker. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1454:
--
Description: Airflow logging should be configurable. Users can provide 
custom log handlers, formatters and loggers to handle log messages in Airflow 
for the webserver, scheduler and worker.   (was: Airflow logging should be 
configurable. Users can provide custom log handlers, formatters and loggers to 
handle log messages in Airflow for the webserver, scheduler and workers. )

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow for the 
> webserver, scheduler and worker. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Make Airflow task logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Description: Make Airflow task logging supports custom loggers and handlers.

> Make Airflow task logging configurable
> --
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Make Airflow task logging supports custom loggers and handlers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Make Airflow task logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Summary: Make Airflow task logging configurable  (was: Refactor Airflow 
task logging)

> Make Airflow task logging configurable
> --
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>Assignee: Allison Wang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Description: (was: Unify Airflow logging and adds custom logger and 
handler conguration.)

> Refactor Airflow task logging
> -
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>Assignee: Allison Wang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Issue Type: Sub-task  (was: Improvement)
Parent: AIRFLOW-1454

> Refactor Airflow task logging
> -
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Unify Airflow logging and adds custom logger and handler conguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1454:
-

 Summary: Make Airflow logging configurable
 Key: AIRFLOW-1454
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang


Airflow logging should be configurable. Users can provide custom log handlers, 
formatters and loggers to handle log messages in Airflow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1454) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1454 started by Allison Wang.
-
> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1454
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1454
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Airflow logging should be configurable. Users can provide custom log 
> handlers, formatters and loggers to handle log messages in Airflow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Refactor Airflow task logging

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Summary: Refactor Airflow task logging  (was: Make Airflow logging 
configurable)

> Refactor Airflow task logging
> -
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Unify Airflow logging and adds custom logger and handler conguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Make Airflow logging configurable

2017-07-25 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Summary: Make Airflow logging configurable  (was: Create abstraction for 
Airflow task logging)

> Make Airflow logging configurable
> -
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Unify Airflow logging and adds custom logger and handler conguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Airflow Streaming Log Backed By ElasticSearch

2017-07-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Summary: Airflow Streaming Log Backed By ElasticSearch  (was: Make Airflow 
Logging Backed By Elasticsearch)

> Airflow Streaming Log Backed By ElasticSearch
> -
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This change adds functionality to use customized logging backend. Users are 
> able to configure logging backend that supports streaming logs and more 
> advanced queries. Currently, Elasticsearch logging backend is implemented.
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-07-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are 
able to configure logging backend that supports streaming logs and more 
advanced queries. Currently, Elasticsearch logging backend is implemented.

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.

  was:
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are 
able to configure logging backend that supports streaming logs and more 
advanced queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:
- Streaming logs without refresh the page
- Separate logs by attempts
- Filter log with excluded phrases

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.


> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This change adds functionality to use customized logging backend. Users are 
> able to configure logging backend that supports streaming logs and more 
> advanced queries. Currently, Elasticsearch logging backend is implemented.
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1443) Update Airflow configuration documentation

2017-07-21 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1443:
-

 Summary: Update Airflow configuration documentation
 Key: AIRFLOW-1443
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1443
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang
Assignee: Allison Wang
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1385) Create abstraction for Airflow task logging

2017-07-19 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1385:
--
Description: Unify Airflow logging and adds custom logger and handler 
conguration.

> Create abstraction for Airflow task logging
> ---
>
> Key: AIRFLOW-1385
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Unify Airflow logging and adds custom logger and handler conguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AIRFLOW-1366) Add max_tries to task instance

2017-07-11 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang closed AIRFLOW-1366.
-

> Add max_tries to task instance
> --
>
> Key: AIRFLOW-1366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1366
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Right now Airflow deletes the task instance when user clear it. We have no 
> way of keeping track of how many times a task instance gets run either via 
> user or itself. So instead of deleting the task instance record, we should 
> keep the task instance and make try_number monotonically increasing for every 
> task instance attempt. max_tries is introduced as an upper bound for retrying 
> tasks by task itself.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1366) Add max_tries to task instance

2017-07-10 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang resolved AIRFLOW-1366.
---
Resolution: Done

> Add max_tries to task instance
> --
>
> Key: AIRFLOW-1366
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1366
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Right now Airflow deletes the task instance when user clear it. We have no 
> way of keeping track of how many times a task instance gets run either via 
> user or itself. So instead of deleting the task instance record, we should 
> keep the task instance and make try_number monotonically increasing for every 
> task instance attempt. max_tries is introduced as an upper bound for retrying 
> tasks by task itself.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1385) Create abstraction for Airflow worker log handler

2017-07-06 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1385:
-

 Summary: Create abstraction for Airflow worker log handler
 Key: AIRFLOW-1385
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1385
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1366) Add max_tries to task instance

2017-06-30 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1366:
-

 Summary: Add max_tries to task instance
 Key: AIRFLOW-1366
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1366
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang
Assignee: Allison Wang


Right now Airflow deletes the task instance when user clear it. We have no 
universal way of keeping track of how many times a task instance attempts 
either via user or itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1332) Split logs based on try_number

2017-06-27 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1332:
--
Description: 
Split airflow logs based on current try_number. It also add `.log` suffix to 
log files. The new log directory will be in this format:
`dag_id/task_id/execution_date_iso/try_number.log`

  was:Adding attempt number to separate logs for each task run. 


> Split logs based on try_number
> --
>
> Key: AIRFLOW-1332
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1332
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Minor
>
> Split airflow logs based on current try_number. It also add `.log` suffix to 
> log files. The new log directory will be in this format:
> `dag_id/task_id/execution_date_iso/try_number.log`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1332) Split logs based on try_number

2017-06-27 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1332:
--
Description: 
Split airflow logs based on current try_number. It also add {{.log}} suffix to 
log files. The new log directory will be in this format:
{{dag_id/task_id/execution_date_iso/try_number.log}}

  was:
Split airflow logs based on current try_number. It also add `.log` suffix to 
log files. The new log directory will be in this format:
`dag_id/task_id/execution_date_iso/try_number.log`


> Split logs based on try_number
> --
>
> Key: AIRFLOW-1332
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1332
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Minor
>
> Split airflow logs based on current try_number. It also add {{.log}} suffix 
> to log files. The new log directory will be in this format:
> {{dag_id/task_id/execution_date_iso/try_number.log}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-23 Thread Allison Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061491#comment-16061491
 ] 

Allison Wang commented on AIRFLOW-1325:
---

Yes airflow will only use ES if the user configures the logging_backend_url and 
S3/GCS won't be removed :) 

> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This change adds functionality to use customized logging backend. Users are 
> able to configure logging backend that supports streaming logs and more 
> advanced queries. Currently, Elasticsearch logging backend is implemented.
> Having Elasticsearch as logging backend enables the development of more 
> advanced logging related features. Those are features that will be 
> implemented in the future:
> - Streaming logs without refresh the page
> - Separate logs by attempts
> - Filter log with excluded phrases
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This PR adds functionality to use customized logging backend. Users are able to 
configure logging backend that supports streaming logs and more advanced 
queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:
- Streaming logs without refresh the page
- Separate logs by attempts
- Filter log with excluded phrases

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.

  was:
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This PR adds functionality to use customized logging backend. Users are able to 
configure logging backend that supports streaming logs and more advanced 
queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:

Streaming logs without refresh the page
Separate logs by attempts
Filter log with excluded phrases
This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.


> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This PR adds functionality to use customized logging backend. Users are able 
> to configure logging backend that supports streaming logs and more advanced 
> queries. Currently, Elasticsearch logging backend is implemented.
> Having Elasticsearch as logging backend enables the development of more 
> advanced logging related features. Those are features that will be 
> implemented in the future:
> - Streaming logs without refresh the page
> - Separate logs by attempts
> - Filter log with excluded phrases
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are 
able to configure logging backend that supports streaming logs and more 
advanced queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:
- Streaming logs without refresh the page
- Separate logs by attempts
- Filter log with excluded phrases

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.

  was:
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This PR adds functionality to use customized logging backend. Users are able to 
configure logging backend that supports streaming logs and more advanced 
queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:
- Streaming logs without refresh the page
- Separate logs by attempts
- Filter log with excluded phrases

This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.


> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This change adds functionality to use customized logging backend. Users are 
> able to configure logging backend that supports streaming logs and more 
> advanced queries. Currently, Elasticsearch logging backend is implemented.
> Having Elasticsearch as logging backend enables the development of more 
> advanced logging related features. Those are features that will be 
> implemented in the future:
> - Streaming logs without refresh the page
> - Separate logs by attempts
> - Filter log with excluded phrases
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
executing the task, flushes logs into local files. When tasks are completed, 
those log files will be uploaded to the remote storage system like S3 or GCS. 
This approach makes log streaming and analysis difficult. Also when worker 
servers are down while executing the task, the entire task log will be lost 
until worker servers are recovered. It's also considered a bad practice for 
airflow webserver to communicate directly with worker servers.

This PR adds functionality to use customized logging backend. Users are able to 
configure logging backend that supports streaming logs and more advanced 
queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more 
advanced logging related features. Those are features that will be implemented 
in the future:

Streaming logs without refresh the page
Separate logs by attempts
Filter log with excluded phrases
This feature will also be backward compatible. It will direct users to the old 
logging flow if logging_backend_url is not set. A new UI will be created to 
support above features and old page won't be modified.

  was:
Move logging to Elasticsearch and also make it backward compatible.
This feature is the first step to make Airflow logging more readable. 


> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when 
> executing the task, flushes logs into local files. When tasks are completed, 
> those log files will be uploaded to the remote storage system like S3 or GCS. 
> This approach makes log streaming and analysis difficult. Also when worker 
> servers are down while executing the task, the entire task log will be lost 
> until worker servers are recovered. It's also considered a bad practice for 
> airflow webserver to communicate directly with worker servers.
> This PR adds functionality to use customized logging backend. Users are able 
> to configure logging backend that supports streaming logs and more advanced 
> queries. Currently, Elasticsearch logging backend is implemented.
> Having Elasticsearch as logging backend enables the development of more 
> advanced logging related features. Those are features that will be 
> implemented in the future:
> Streaming logs without refresh the page
> Separate logs by attempts
> Filter log with excluded phrases
> This feature will also be backward compatible. It will direct users to the 
> old logging flow if logging_backend_url is not set. A new UI will be created 
> to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1332) Add attempt column to task instance

2017-06-21 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1332:
-

 Summary: Add attempt column to task instance
 Key: AIRFLOW-1332
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1332
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang
Assignee: _matthewHawthorne
Priority: Minor


Adding attempt number to separate logs for each task run. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1332) Add attempt column to task instance

2017-06-21 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1332 started by Allison Wang.
-
> Add attempt column to task instance
> ---
>
> Key: AIRFLOW-1332
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1332
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Minor
>
> Adding attempt number to separate logs for each task run. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-19 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated AIRFLOW-1325:
--
Description: 
Move logging to Elasticsearch and also make it backward compatible.
This feature is the first step to make Airflow logging more readable. 

  was:Move logging to Elasticsearch and also make it backward compatible.


> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Move logging to Elasticsearch and also make it backward compatible.
> This feature is the first step to make Airflow logging more readable. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-19 Thread Allison Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1325 started by Allison Wang.
-
> Make Airflow Logging Backed By Elasticsearch
> 
>
> Key: AIRFLOW-1325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Allison Wang
>Assignee: Allison Wang
>
> Move logging to Elasticsearch and also make it backward compatible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

2017-06-19 Thread Allison Wang (JIRA)
Allison Wang created AIRFLOW-1325:
-

 Summary: Make Airflow Logging Backed By Elasticsearch
 Key: AIRFLOW-1325
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Allison Wang
Assignee: Allison Wang


Move logging to Elasticsearch and also make it backward compatible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)