[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726770#comment-16726770
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3551:


pre_exec is not specific to the BashOperator, but part of the BaseOperator. 
Testing it would be good though :)

> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-549) Scheduler child logs are created out of normal location

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-549.
---
Resolution: Fixed

The logging config was massively reworked around 1.9.0 so I'm saying this will 
not be an issue anymore.

> Scheduler child logs are created out of normal location
> ---
>
> Key: AIRFLOW-549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-549
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Paul Yang
>Priority: Major
>
> The new scheduler has childs logging in their own log file. The location of 
> the log files are set outside of the cli configurable locations making it 
> inconsistent with other log configurations in airflow. In addition the log 
> files are by default created in /tmp which is a non standard location for log 
> files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3545) Can't use Prometheus or other pull based instrumentation systems to monitor Tasks launched on Kubernetes

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726763#comment-16726763
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3545:


Right now this is not possible, and due to the heavy use of (sub)processes by 
Airflow this isn't a trivial change.

I'd suggest taking a look at https://github.com/prometheus/statsd_exporter

> Can't use Prometheus or other pull based instrumentation systems to monitor 
> Tasks launched on Kubernetes
> 
>
> Key: AIRFLOW-3545
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3545
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Aditya Vishwakarma
>Priority: Major
>
> Prometheus, which is a common way to instrument services on Kubernetes, uses 
> a pull based mechanism to fetch metrics. This  involves a service exposing a 
> `/metrics` endpoint. This endpoint is scraped every 30 secs by prometheus to 
> collect metrics.
> This requires a port to be specified in the generated Pod config. Something 
> like below.
> {code:java}
> // Sample Pod Spec
> apiVersion: v1
> kind: Job
> metadata:
>   name: batch-job
> spec:
>   ports:
>   - name: metrics
> port: 9091 # port to fetch metrics from
> protocol: TCP
> targetPort: 9091
> {code}
> Currently KubernetesPodOperator doesn't have any options to open ports like 
> this.
> Is it possible to have an option to do this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3246) Make hmsclient import optional

2018-12-19 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3246.

   Resolution: Fixed
Fix Version/s: 1.10.2

> Make hmsclient import optional
> --
>
> Key: AIRFLOW-3246
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3246
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hive_hooks
>Affects Versions: 1.10.0
>Reporter: Gavrilov Seva
>Priority: Minor
> Fix For: 1.10.2
>
>
> Currently hmsclient is imported globally in hive_hooks.py, which is 
> inconsistent with the general style in this file: hive dependencies are 
> imported during the runtime. For example thrift components are imported 
> inside the {{get_metastore_client}} method, but hmsclient also imports thrift 
> components, so it forces you to have them installed.
> I moved the import in this PR: 
> https://github.com/apache/incubator-airflow/pull/4080
> To give you a bit more information on why i even bother to do such a change, 
> we are having problems with the new hive dependencies of airflow 1.10, 
> particularly new version of pyhive. I described the problem 
> [here|https://github.com/dropbox/PyHive/issues/240], seems like a combination 
> of docker environment with newest versions of these libraries. We opted to 
> rollback HiveServer2 hook to use the old dependencies, among them 
> {{thrift==0.9.3}}, and hmsclient requires newer version of thrift. If you by 
> chance have any clue on how we can diagnose our problem, please let me know.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3532) Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 3.x using other than amqp transport

2018-12-19 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724817#comment-16724817
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3532:


So you have HAProxy running on each server proxying redis and postgres traffic 
from localhost:6400 to redis-host:6400 etc? If you have problems again I'd try 
without HAProxy and going direct, at least to isolate possible sources of 
problems.

> Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 
> 3.x using other than amqp transport
> ---
>
> Key: AIRFLOW-3532
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3532
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
>Reporter: Valeriy
>Priority: Major
>  Labels: celery
>
> I needed Airflow > 1.8 with all the necessary fixes in the cluster 
> configuration.
> I'm used Aiflow 1.10.0/1.10.1 + Celery 4.2.0/4.2.1 and have problem with 
> working DAG's. After some time, all DAG's gone away in queued. After 
> restarting the worker, the problem is solved. I tried for a long time to find 
> solutions to this problem (logs in DEBUG mode showed nothing) and found a 
> number of discussions, one them: 
> [https://stackoverflow.com/questions/43524457/airflow-tasks-queued-but-not-running]
> *{color:#d04437}As a result, we conclude that Airflow does not work with 
> Celery 4.x!{color}* The code is not adapted to the Celery 4.x.
> I decided to try the Celery 3.x and damn I got an WARNING:
> {code:java}
> [2018-12-17 15:43:11,136: WARNING/MainProcess] 
> /home/hadoop/youla_airflow/lib/python3.6/site-packages/celery/apps/worker.py:161:
>  CDeprecationWarning:
> Starting from version 3.2 Celery will refuse to accept pickle by default.
> The pickle serializer is a security concern as it may give attackers
> the ability to execute any command.  It's important to secure
> your broker from unauthorized access when using pickle, so we think
> that enabling pickle should require a deliberate action and not be
> the default choice.
> If you depend on pickle then you should set a setting to disable this
> warning and to be sure that everything will continue working
> when you upgrade to Celery 3.2::
>     CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
> You must only enable the serializers that you will actually use.
>   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
>  
>  -- celery@myserver v3.1.26.post2 (Cipater)
>   -
> --- * ***  * -- 
> Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core
> -- * -  ---
> - ** -- [config]
> - ** -- .> app: 
> airflow.executors.celery_executor:0x7f0093b86470
> - ** -- .> transport:   amqp://guest:**@localhost:5672//
> - ** -- .> results: disabled://
> - *** --- * --- .> concurrency: 16 (prefork)
> -- *** 
> --- * - [queues]
>  -- .> default  exchange=default(direct) key=default
> {code}
> Airflow > 1.8 version with Celery 3.x flatly refuses to use transport other 
> than amqp. About it already wrote here 
> [http://mail-archives.apache.org/mod_mbox/airflow-commits/201801.mbox/%3cjira.13129586.1515519138000.610058.1515519180...@atlassian.jira%3E]
> My Airflow config:
> {code:java}
> [celery]
> celery_app_name = airflow.executors.celery_executor
> worker_concurrency = 16
> worker_log_server_port = 8793
> broker_url = redis://localhost:6400/0
> result_backend = db+postgres://airflow:pass@localhost:5434/airflow
> flower_host = 0.0.0.0
> flower_url_prefix =
> flower_port = 
> default_queue = default
> celery_config_options = 
> airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
> {code}
> How do I run Airflow > 1.8 with Celery as a Redis broker? Is that possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook

2018-12-18 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2428.

   Resolution: Fixed
Fix Version/s: (was: 1.10.0)

> Add AutoScalingRole key to emr_hook
> ---
>
> Key: AIRFLOW-2428
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2428
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kyle Hamlin
>Priority: Minor
>
> Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` 
> method for EMR autoscaling to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3535) Airflow should collect display names, not Firstname / Lastname

2018-12-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723225#comment-16723225
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3535:


https://github.com/dpgaspar/Flask-AppBuilder/blob/d95f0ed934272629ee44ad3241646fa7ba09cdf8/flask_appbuilder/console.py#L102
 doesn't look promising :(

But we could overload the meaning of one of the name colums and change 
templates etc. so that we just use one column? Maybe?

> Airflow should collect display names, not Firstname / Lastname
> --
>
> Key: AIRFLOW-3535
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3535
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: database, ui
>Affects Versions: 1.10.1
>Reporter: James Meickle
>Priority: Minor
>
> We use Google OAuth to provision our Airflow accounts. This creates "user 
> names" of "google_12345", with the corresponding email address. The first and 
> last name of the user are pulled into the corresponding Airflow fields.
> In general, though, First Name / Last Name is not considered a good pattern 
> for user systems unless they are actually critical to handle business logic. 
> Further reading on problems that can cause here: 
> https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
> We should condense these fields into a totally freeform "Display Name", and 
> use that more consistently in the UI. For example, in AIRFLOW-3442, an 
> internal username is displayed rather than a display name. (In the case of an 
> audit log, the right value is probably: `Display Name (internal_name)`.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3535) Airflow should collect display names, not Firstname / Lastname

2018-12-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723222#comment-16723222
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3535:


100% agreed.

We'll need to check if Flask-AppBuilder supports this though :(

> Airflow should collect display names, not Firstname / Lastname
> --
>
> Key: AIRFLOW-3535
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3535
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: database, ui
>Affects Versions: 1.10.1
>Reporter: James Meickle
>Priority: Minor
>
> We use Google OAuth to provision our Airflow accounts. This creates "user 
> names" of "google_12345", with the corresponding email address. The first and 
> last name of the user are pulled into the corresponding Airflow fields.
> In general, though, First Name / Last Name is not considered a good pattern 
> for user systems unless they are actually critical to handle business logic. 
> Further reading on problems that can cause here: 
> https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
> We should condense these fields into a totally freeform "Display Name", and 
> use that more consistently in the UI. For example, in AIRFLOW-3442, an 
> internal username is displayed rather than a display name. (In the case of an 
> audit log, the right value is probably: `Display Name (internal_name)`.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3532) Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 3.x using other than amqp transport

2018-12-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723176#comment-16723176
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3532:


I can categorically say the Ariflow 1.8.2, Airflow 1.9.0 and Airflow 1.10.1 
work with Celery 4 on Redis as a transport as that is how I run Airflow.

Are you running the scheduler and the workers on the same node? If not your 
broker_url and result_backend will be in correct.

You have config for Flower - what does Flower show about the number of active 
nodes.

What is shown in the output when you run {{airflow worker}}?

Finally on a worker node what do the following commands (run in the same 
environment/as the Airflow user etc) show:

{code}
celery -A airflow.executors.celery_executor inspect ping
celery -A airflow.executors.celery_executor inspect active
celery -A airflow.executors.celery_executor inspect report
celery -A airflow.executors.celery_executor inspect stats

{code}

Also run the same commands from the scheduler - you should get similar or 
identical output.

> Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 
> 3.x using other than amqp transport
> ---
>
> Key: AIRFLOW-3532
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3532
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
>Reporter: Valeriy
>Priority: Major
>  Labels: celery
>
> I needed Airflow > 1.8 with all the necessary fixes in the cluster 
> configuration.
> I'm used Aiflow 1.10.0/1.10.1 + Celery 4.2.0/4.2.1 and have problem with 
> working DAG's. After some time, all DAG's gone away in queued. After 
> restarting the worker, the problem is solved. I tried for a long time to find 
> solutions to this problem (logs in DEBUG mode showed nothing) and found a 
> number of discussions, one them: 
> [https://stackoverflow.com/questions/43524457/airflow-tasks-queued-but-not-running]
> *{color:#d04437}As a result, we conclude that Airflow does not work with 
> Celery 4.x!{color}* The code is not adapted to the Celery 4.x.
> I decided to try the Celery 3.x and damn I got an WARNING:
> {code:java}
> [2018-12-17 15:43:11,136: WARNING/MainProcess] 
> /home/hadoop/youla_airflow/lib/python3.6/site-packages/celery/apps/worker.py:161:
>  CDeprecationWarning:
> Starting from version 3.2 Celery will refuse to accept pickle by default.
> The pickle serializer is a security concern as it may give attackers
> the ability to execute any command.  It's important to secure
> your broker from unauthorized access when using pickle, so we think
> that enabling pickle should require a deliberate action and not be
> the default choice.
> If you depend on pickle then you should set a setting to disable this
> warning and to be sure that everything will continue working
> when you upgrade to Celery 3.2::
>     CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
> You must only enable the serializers that you will actually use.
>   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
>  
>  -- celery@myserver v3.1.26.post2 (Cipater)
>   -
> --- * ***  * -- 
> Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core
> -- * -  ---
> - ** -- [config]
> - ** -- .> app: 
> airflow.executors.celery_executor:0x7f0093b86470
> - ** -- .> transport:   amqp://guest:**@localhost:5672//
> - ** -- .> results: disabled://
> - *** --- * --- .> concurrency: 16 (prefork)
> -- *** 
> --- * - [queues]
>  -- .> default  exchange=default(direct) key=default
> {code}
> Airflow > 1.8 version with Celery 3.x flatly refuses to use transport other 
> than amqp. About it already wrote here 
> [http://mail-archives.apache.org/mod_mbox/airflow-commits/201801.mbox/%3cjira.13129586.1515519138000.610058.1515519180...@atlassian.jira%3E]
> My Airflow config:
> {code:java}
> [celery]
> celery_app_name = airflow.executors.celery_executor
> worker_concurrency = 16
> worker_log_server_port = 8793
> broker_url = redis://localhost:6400/0
> result_backend = db+postgres://airflow:pass@localhost:5434/airflow
> flower_host = 0.0.0.0
> flower_url_prefix =
> flower_port = 
> default_queue = default
> celery_config_options = 
> airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
> {code}
> How do I run Airflow > 1.8 with Celery as a Redis broker? Is that possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-698) dag_run "scheduled" property should be it's own DB column

2018-12-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-698.
---
Resolution: Fixed

This has been changed since - and the scheduler will now pick up any dag run 
_unless_ it starts with {{backfill_}}

> dag_run "scheduled" property should be it's own DB column
> -
>
> Key: AIRFLOW-698
> URL: https://issues.apache.org/jira/browse/AIRFLOW-698
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Priority: Major
>  Labels: beginner, starter
>
> The airflow schedule only executes dag_runs that have a run_id that start 
> with "scheduled__". This can be very confusing, especially when manually 
> creating a dagrun and forgetting the "scheduled__" prefix. The "scheduled" 
> part should be pulled into a separate column so that it is very clear in the 
> UI that a user is creating a DAG that isn't scheduled.
> cc [~maxime.beauche...@apache.org]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2633) Retry loop on AWSBatchOperator won't quit

2018-12-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2633.

   Resolution: Duplicate
Fix Version/s: (was: 2.0.0)

> Retry loop on AWSBatchOperator won't quit
> -
>
> Key: AIRFLOW-2633
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2633
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Sebastian Schwartz
>Assignee: Sebastian Schwartz
>Priority: Major
>  Labels: patch, pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The exponential backoff retry loop that is a fallback for AWSBatchOperator as 
> a strategy for polling job success does not quit until maximum retries is 
> reached due to a control flow error. This is a simply one line fix. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3001:
---
Fix Version/s: (was: 2.0.0)
   1.10.2

> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
> Fix For: 1.10.2
>
>
> I have created very long term schedule in short interval. (2~3 years as 10 
> min interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date 
> = ''2018-09-01 00:00:00"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date.
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2747:
---
Fix Version/s: (was: 2.0.0)
   1.10.2

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.2
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png, 
> google_apis-23_r01.zip
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3392) Add index on dag_id in sla_miss table

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3392.

   Resolution: Fixed
Fix Version/s: (was: 2.0.0)
   1.10.2

> Add index on dag_id in sla_miss table
> -
>
> Key: AIRFLOW-3392
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3392
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.2
>
>
> The select queries on sla_miss table produce a great % of DB traffic and thus 
> made the DB CPU usage unnecessarily high. It would be a low hanging fruit to 
> add an index and reduce the load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-3392) Add index on dag_id in sla_miss table

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-3392:


Changing fix version

> Add index on dag_id in sla_miss table
> -
>
> Key: AIRFLOW-3392
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3392
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.2
>
>
> The select queries on sla_miss table produce a great % of DB traffic and thus 
> made the DB CPU usage unnecessarily high. It would be a low hanging fruit to 
> add an index and reduce the load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-2747) Explicit re-schedule of sensors

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-2747:


Some more work needed on this issue - the logs are wrong and the visibility in 
the UI is now worse.

> Explicit re-schedule of sensors
> ---
>
> Key: AIRFLOW-2747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core, operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-07-12_14-10-24.png, 
> Screenshot_2018-09-16_20-09-28.png, Screenshot_2018-09-16_20-19-23.png, 
> google_apis-23_r01.zip
>
>
> By default sensors block a worker and just sleep between pokes. This is very 
> inefficient, especially when there are many long-running sensors.
> There is a hacky workaroud by setting a small timeout value and a high retry 
> number. But that has drawbacks:
>  * Errors raised by sensors are hidden and the sensor retries too often
>  * The sensor is retried in a fixed time interval (with optional exponential 
> backoff)
>  * There are many attempts and many log files are generated
>  I'd like to propose an explicit reschedule mechanism:
>  * A new "reschedule" flag for sensors, if set to True it will raise an 
> AirflowRescheduleException that causes a reschedule.
>  * AirflowRescheduleException contains the (earliest) re-schedule date.
>  * Reschedule requests are recorded in new `task_reschedule` table and 
> visualized in the Gantt view.
>  * A new TI dependency that checks if a sensor task is ready to be 
> re-scheduled.
> Advantages:
>  * This change is backward compatible. Existing sensors behave like before. 
> But it's possible to set the "reschedule" flag.
>  * The poke_interval, timeout, and soft_fail parameters are still respected 
> and used to calculate the next schedule time.
>  * Custom sensor implementations can even define the next sensible schedule 
> date by raising AirflowRescheduleException themselves.
>  * Existing TimeSensor and TimeDeltaSensor can also be changed to be 
> rescheduled when the time is reached.
>  * This mechanism can also be used by non-sensor operators (but then the new 
> ReadyToRescheduleDep has to be added to deps or BaseOperator).
> Design decisions and caveats:
>  * When handling AirflowRescheduleException the `try_number` is decremented. 
> That means that subsequent runs use the same try number and write to the same 
> log file.
>  * Sensor TI dependency check now depends on `task_reschedule` table. However 
> only the BaseSensorOperator includes the new ReadyToRescheduleDep for now.
> Open questions and TODOs:
>  * Should a dedicated state `UP_FOR_RESCHEDULE` be used instead of setting 
> the state back to `NONE`? This would require more changes in scheduler code 
> and especially in the UI, but the state of a task would be more explicit and 
> more transparent to the user.
>  * Add example/test for a non-sensor operator
>  * Document the new feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1552) Airflow Filter_by_owner not working with password_auth

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1552.

Resolution: Fixed

> Airflow Filter_by_owner not working with password_auth
> --
>
> Key: AIRFLOW-1552
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1552
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.8.0
> Environment: CentOS , python 2.7
>Reporter: raghu ram reddy
>Assignee: Thomas Brockmeier
>Priority: Major
> Fix For: 1.10.2
>
>
> Airflow Filter_by_owner parameter is not working with password_auth.
> I created sample user using the below code from airflow documentation and 
> enabled password_auth.
> I'm able to login as the user created but by default this user is superuser 
> and there is noway to modify it, default all users created by PasswordUser 
> are superusers.
> import airflow
> from airflow import models, settings
> from airflow.contrib.auth.backends.password_auth import PasswordUser
> user = PasswordUser(models.User())
> user.username = 'test1'
> user.password = 'test1'
> user.is_superuser()
> session = settings.Session()
> session.add(user)
> session.commit()
> session.close()
> exit()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3520) RBAC UI seems to have bug in master branch

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722193#comment-16722193
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3520:


This seems to bite a few people on master. I wonder if it's worth us putting a 
check somewhere in the webserver start-up code that checks if the expected 
compiled assets exist?

> RBAC UI seems to have bug in master branch
> --
>
> Key: AIRFLOW-3520
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3520
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Priority: Major
> Attachments: Screen Shot 2018-12-14 at 10.58.07 PM.png
>
>
> !Screen Shot 2018-12-14 at 10.58.07 PM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3518) Toposort is very slow

2018-12-15 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3518.

   Resolution: Fixed
Fix Version/s: 1.10.2

> Toposort is very slow
> -
>
> Key: AIRFLOW-3518
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3518
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Major
> Fix For: 1.10.2
>
>
> At a client we've discovered that for larger DAGs toposort is very slow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3513) Pakegecloud

2018-12-13 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720642#comment-16720642
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3513:


It is not clear what you are asking for here.

> Pakegecloud
> ---
>
> Key: AIRFLOW-3513
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3513
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, authentication, configuration, core, database, 
> Dataflow, db, docker
>Reporter: pakegecloud.atlassian.net
>Priority: Major
>   Original Estimate: 1,311h
>  Remaining Estimate: 1,311h
>
> pakegecloud.atlassian.net



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3176) Duration tooltip on Tree View of Tasks

2018-12-13 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3176.

Resolution: Duplicate

> Duration tooltip on Tree View of Tasks
> --
>
> Key: AIRFLOW-3176
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3176
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: ui
>Affects Versions: 1.10.0
>Reporter: Nicolás Kittsteiner
>Priority: Minor
>  Labels: easy-fix
> Attachments: Screen Shot 2018-10-09 at 13.27.42.png
>
>
> On the Tree View of the UI over de squares of tasks, are a tooltip that show 
> details of tasks. The field duration could be in a more friendly format like 
> HH:MM:SS.
> Thanks :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3510) DockerOperator on OSX: Mounts denied. The path /var/folders/mk/xxx is not shared from OS X and is not known to Docker.\r\nYou can configure shared paths from Docker ->

2018-12-13 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3510.

Resolution: Duplicate

Duplicate of AIRFLOW-1381 which has an abandoned PR - if you want to pick it up 
and open a new PR that would be ace.

> DockerOperator on OSX: Mounts denied. The path /var/folders/mk/xxx is not 
> shared from OS X and is not known to Docker.\r\nYou can configure shared 
> paths from Docker -> Preferences... -> File Sharing.
> ---
>
> Key: AIRFLOW-3510
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3510
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker, operators
>Affects Versions: 1.10.1
>Reporter: Nar Kumar Chhantyal
>Assignee: Nar Kumar Chhantyal
>Priority: Major
>  Labels: bug
> Fix For: 1.10.2
>
>
> {{I get this when using DockerOperator on OSX}}
> {code:java}
> Mounts denied: \r\nThe path 
> /var/folders/mk/_n3w1bts11bg3wvy1ln5d7c4k9_mgh/T/airflowtmpj94b7r9v\r\nis not 
> shared from OS X and is not known to Docker.\r\nYou can configure shared 
> paths from Docker -> Preferences... -> File Sharing.\r\nSee 
> https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.\r\n.'
> {code}
> {{This is well known issue with Docker for Mac: 
> [https://stackoverflow.com/questions/45122459/docker-mounts-denied-the-paths-are-not-shared-from-os-x-and-are-not-known]}}
> {{Solution mentioned doesn't work because it always creates directory with 
> cryptic name like:}}
> {code:java}
> var/folders/mk/_n3w1bts11bg3wvy1ln5d7c4k9_mgh/T/airflowtmpj94b7r9v{code}
> Solution could be to pass directory name to TemporaryDirectory. I will send a 
> patch later.
> `



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3223) RBAC with GitHub Authentication

2018-12-13 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720011#comment-16720011
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3223:


FAB uses Flask-OpenID for Oauth, and that should be workable for Github. I 
don't know the exact config you'd need but it feels like it should be doable 
without needing any code changes.

Useful docs:
https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-openid
https://pythonhosted.org/Flask-OpenID/
https://help.github.com/articles/authorizing-oauth-apps/

> RBAC with  GitHub Authentication
> 
>
> Key: AIRFLOW-3223
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3223
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Vikram Fugro
>Assignee: Sai Phanindhra
>Priority: Major
>
> With airflow 1.10 released having RBAC support, I was wondering how I do 
> configure GitHub Auth with airflow's RBAC.  In which case, I believe we don't 
> have to create any users using airflow.  Are there any notes on this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3210) Changing defaults types in BigQuery Hook break BigQuery operator

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3210.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Changing defaults types in BigQuery Hook break BigQuery operator
> 
>
> Key: AIRFLOW-3210
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3210
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp
>Reporter: Sergei Guschin
>Priority: Major
> Fix For: 2.0.0
>
>
> Changes in BigQuery Hook break BigQuery operator run_query() and all DAGs 
> which accommodate current type (Boolean or value):
> [BigQuery operator 
> set|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/bigquery_operator.py#L115-L121]:
> destination_dataset_table=False,
> udf_config=False,
> [New BigQuery hook 
> expects|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/bigquery_hook.py#L645-L650]:
> (udf_config, 'userDefinedFunctionResources', None, list),
> (destination_dataset_table, 'destinationTable', None, dict),



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-352) filter_by_owner is not working when use ldap authentication

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-352.
---
Resolution: Duplicate

> filter_by_owner is not working when use ldap authentication
> ---
>
> Key: AIRFLOW-352
> URL: https://issues.apache.org/jira/browse/AIRFLOW-352
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, security, webserver
>Affects Versions: 1.7.1.3
> Environment: ubuntu 14.04 LTS ,  ldap without encryption 
>Reporter: peter pang
>Priority: Major
>  Labels: security
>
> I set airflow.cfg as follows:
> {noformat}
> [webserver]
> filter_by_owner = True
> authenticate = TRUE
> auth_backend = airflow.contrib.auth.backends.ldap_auth
> [ldap]
> uri = ldap://xx.xx.xx.xx
> user_filter = objectClass=*
> user_name_attr = uid
> superuser_filter = 
> memberOf=CN=airflow-super-users,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
> data_profiler_filter = 
> memberOf=CN=airflow-data-profilers,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
> bind_user = cn=admin,dc=example,dc=com
> bind_password = secret
> basedn = dc=example,dc=com
> cacert = /etc/ca/ldap_ca.crt
> search_scope=SUBTREE
> {noformat}
> then I run the webUI , and I can login with superuser and data_profiler user. 
> But after login with data profiler user, entered the data profiler user home 
> view , there's no dags listed with the same dag owner. It seems the  
> filter_by_owner setting is not working.
> Debug into the views.py --> class HomeView(AdminIndexView):
> {color:red}current_user.username{color} always get{color:red} "None"{color}. 
> It seems we can't get username directly.
> so , continue debug into the ldap_auth.py --> class LdapUser(models.User):
> I added a method to return username   
> {code}
>  def get_username(self):
> return self.user.username
> {code}
> then back to view.py  , replace 'current_user.username' to 
> {color:red}'current_user.get_username()'{color} , the user filter can work 
> now!
> I don't know exactly why, but the modification can work...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-353) when multiple tasks removed update state fails

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-353.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

> when multiple tasks removed update state fails
> --
>
> Key: AIRFLOW-353
> URL: https://issues.apache.org/jira/browse/AIRFLOW-353
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: Yiqing Jin
>Assignee: Yiqing Jin
>Priority: Major
> Fix For: 1.8.0
>
>
> if multiple tasks gets removed during dag run update_state may not work 
> properly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-691) Add SSH keepalive option to ssh_hook

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-691.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

> Add SSH keepalive option to ssh_hook
> 
>
> Key: AIRFLOW-691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-691
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, hooks
>Reporter: Daniel van der Ende
>Assignee: Daniel van der Ende
>Priority: Minor
> Fix For: 1.8.0
>
>
> In situations with long running commands that are executed via the SSH_hook, 
> it is necessary to set the SSH keep alive option, with a corresponding 
> interval at which to ensure the connection stays alive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3419) S3_hook.select_key is broken on Python3

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719217#comment-16719217
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3419:


My memory is that Python2 doesn't have a distinction between bytes and str, so 
I don't see how this can affect Python2?

> S3_hook.select_key is broken on Python3
> ---
>
> Key: AIRFLOW-3419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3419
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: boto3, hooks
>Affects Versions: 1.10.1
>Reporter: Maria Rebelka
>Priority: Major
>
> Hello,
> Using select_key throws an error:
> {quote}text = S3Hook('aws_conn').select_key(key='my_key',
>                                      bucket_name='my_bucket',
>                                      expression='SELECT * FROM S3Object s',
>                                      expression_type='SQL',
>                                      input_serialization={'JSON': \{'Type': 
> 'DOCUMENT'}},
>                                      output_serialization={'JSON': {}}){quote}
> Traceback (most recent call last):
> {quote}   File "db.py", line 31, in 
> output_serialization={'JSON': {}})
>   File "/usr/local/lib/python3.4/site-packages/airflow/hooks/S3_hook.py", 
> line 262, in select_key
> for event in response['Payload']
> TypeError: sequence item 0: expected str instance, bytes found{quote}
> Seems that the problem is in this line:
> S3_hook.py, line 262:  return ''.join(event['Records']['Payload']
> which probably should be return 
> ''.join(event['Records']['Payload'].decode('utf-8')
> From example in Amazon blog:
> https://aws.amazon.com/blogs/aws/s3-glacier-select/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-2217) Add Slack Webhook Hook/Operator

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-2217:


Re-opening to correct Fix Version

> Add Slack Webhook Hook/Operator
> ---
>
> Key: AIRFLOW-2217
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2217
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks, operators
>Reporter: Daniel van der Ende
>Assignee: Daniel van der Ende
>Priority: Minor
> Fix For: 2.0.0
>
>
> Slack offers several ways to interact with it. Airflow currently has support 
> for the full Slack API with the [Slack 
> hook|https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/slack_hook.py]
>  This, however, can be a bit heavy-handed for simple posting of messages. 
> Slack also offers the possibility of using an [Incoming 
> webhook|https://api.slack.com/incoming-webhooks]
> It would be nice to have a hook in Airflow to use the incoming webhook API 
> offered by Slack. A lot of use cases for integrating Slack in Airflow are 
> oriented on posting error or success messages to a Slack channel based on the 
> outcome of a task instance. The Webhook API is perfect for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1236) Slack Operator uses deprecated API, and should use Connection

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1236.
--
Resolution: Duplicate

> Slack Operator uses deprecated API, and should use Connection
> -
>
> Key: AIRFLOW-1236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1236
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Erik Forsberg
>Priority: Major
>
> The SlackAPIPostOperator requires legacy Slack API tokens, and hardcodes said 
> token into the DAG instead of using a Connection.
> Provide an operator that uses the Incoming Webhook API instead, and stores 
> the webhook URL in a Connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2217) Add Slack Webhook Hook/Operator

2018-12-12 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2217.

   Resolution: Fixed
Fix Version/s: 1.10.0

> Add Slack Webhook Hook/Operator
> ---
>
> Key: AIRFLOW-2217
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2217
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks, operators
>Reporter: Daniel van der Ende
>Assignee: Daniel van der Ende
>Priority: Minor
> Fix For: 2.0.0, 1.10.0
>
>
> Slack offers several ways to interact with it. Airflow currently has support 
> for the full Slack API with the [Slack 
> hook|https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/slack_hook.py]
>  This, however, can be a bit heavy-handed for simple posting of messages. 
> Slack also offers the possibility of using an [Incoming 
> webhook|https://api.slack.com/incoming-webhooks]
> It would be nice to have a hook in Airflow to use the incoming webhook API 
> offered by Slack. A lot of use cases for integrating Slack in Airflow are 
> oriented on posting error or success messages to a Slack channel based on the 
> outcome of a task instance. The Webhook API is perfect for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2805) Display user's local timezone and DAG's timezone on UI

2018-12-11 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2805.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Display user's local timezone and DAG's timezone on UI
> --
>
> Key: AIRFLOW-2805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screen Shot 2018-08-02 at 1.08.53 PM.png
>
>
> The UI currently only displays the UTC timezone which is also not in human 
> readable forms on all places. 
> Make all the date times in human readable forms. 
> Also, we need to display user's local timezone and DAG's timezone along with 
> UTC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division

2018-12-11 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1158.

Resolution: Fixed

Fixed by a different change that switched over to boto3

> Multipart uploads to s3 cut off at nearest division
> ---
>
> Key: AIRFLOW-1158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1158
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Reporter: Maksim Pecherskiy
>Assignee: Maksim Pecherskiy
>Priority: Minor
>
> When I try to upload a file of say 104MBs, using multipart uploads of 10MB 
> chunks, I get 10 chunks of 10MBs and that's it.  The 4MBs left over do not 
> get uploaded.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-532) DB API hook's insert rows sets autocommit non-generically

2018-12-11 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-532.
---
Resolution: Duplicate

> DB API hook's insert rows sets autocommit non-generically
> -
>
> Key: AIRFLOW-532
> URL: https://issues.apache.org/jira/browse/AIRFLOW-532
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: Luke Rohde
>Assignee: Luke Rohde
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-2965) Add CLI command to find the next dag run.

2018-12-11 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-2965:


Re-opening to set fix-version

> Add CLI command to find the next dag run.
> -
>
> Key: AIRFLOW-2965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2965
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: Xiaodong DENG
>Priority: Minor
>
> I have a dag with the following properties:
> {code:java}
> dag = DAG(
>     dag_id='mydag',
>     default_args=args,
>     schedule_interval='0 1 * * *',
>     max_active_runs=1,
>     catchup=False){code}
>  
>  
> This runs great.
> Last run is: 2018-08-26 01:00  (start date is 2018-08-27 01:00)
>  
> Now it's 2018-08-27 17:55 I decided to change my dag to:
>  
> {code:java}
> dag = DAG(
>     dag_id='mydag',
>     default_args=args,
>     schedule_interval='0 23 * * *',
>     max_active_runs=1,
>     catchup=False){code}
>  
> Now, I have no idea when will be the next dag run.
> Will it be today at 23:00? I can't be sure when the cycle is complete. I'm 
> not even sure that this change will do what I wish.
> I'm sure you guys are expert and you can answer this question but most of us 
> wouldn't know.
>  
> The scheduler has the knowledge when the dag is available for running. All 
> I'm asking is to take that knowledge and create a CLI command that I will 
> give the dag_id and it will tell me the next date/hour which my dag will be 
> runnable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2965) Add CLI command to find the next dag run.

2018-12-11 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2965.

   Resolution: Fixed
Fix Version/s: 1.10.2

> Add CLI command to find the next dag run.
> -
>
> Key: AIRFLOW-2965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2965
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Assignee: Xiaodong DENG
>Priority: Minor
> Fix For: 1.10.2
>
>
> I have a dag with the following properties:
> {code:java}
> dag = DAG(
>     dag_id='mydag',
>     default_args=args,
>     schedule_interval='0 1 * * *',
>     max_active_runs=1,
>     catchup=False){code}
>  
>  
> This runs great.
> Last run is: 2018-08-26 01:00  (start date is 2018-08-27 01:00)
>  
> Now it's 2018-08-27 17:55 I decided to change my dag to:
>  
> {code:java}
> dag = DAG(
>     dag_id='mydag',
>     default_args=args,
>     schedule_interval='0 23 * * *',
>     max_active_runs=1,
>     catchup=False){code}
>  
> Now, I have no idea when will be the next dag run.
> Will it be today at 23:00? I can't be sure when the cycle is complete. I'm 
> not even sure that this change will do what I wish.
> I'm sure you guys are expert and you can answer this question but most of us 
> wouldn't know.
>  
> The scheduler has the knowledge when the dag is available for running. All 
> I'm asking is to take that knowledge and create a CLI command that I will 
> give the dag_id and it will tell me the next date/hour which my dag will be 
> runnable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-610) Configuration parsing order doesn't work properly.

2018-12-10 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-610:
--
Fix Version/s: (was: 2.0.0)
   1.10.0

Was included in 1.10.0

> Configuration parsing order doesn't work properly.
> --
>
> Key: AIRFLOW-610
> URL: https://issues.apache.org/jira/browse/AIRFLOW-610
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Yongjun Park
>Assignee: Yongjun Park
>Priority: Major
> Fix For: 1.10.0
>
>
> I'm testing Airflow using master branch, precisely 527e3ec.
> Documents says configuration is evaluated following order.
> # environment variable
> # configuration in airflow.cfg
> # command in airflow.cfg
> # default
> However, it can't recognize *_cmd* options.
> I added *sql_alchemy_conn_cmd* option. but it printed error below: 
> {code}
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/usr/local/airflow/airflow/bin/airflow", line 17, in 
> from airflow import configuration
>   File "/home/airflow/airflow/__init__.py", line 30, in 
> from airflow import configuration as conf
>   File "/home/airflow/airflow/configuration.py", line 733, in 
> conf.read(AIRFLOW_CONFIG)
>   File "/home/airflow/airflow/configuration.py", line 585, in read
> self._validate()
>   File "/home/airflow/airflow/configuration.py", line 497, in _validate
> self.get('core', 'executor')))
> {code}
> I think it is affected by DEFAULT configuration. It should be changed 
> evaluation order as document's saying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3492) Addition of Unique ID required in every Airflow Task Log for debugging purpose

2018-12-10 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3492.

Resolution: Won't Fix

Airflow supports running with a custom logging config (`logging_config_class` 
in airflow.cfg) - if you want this feature that is where this should go.

> Addition of Unique ID required in every Airflow Task Log for debugging purpose
> --
>
> Key: AIRFLOW-3492
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3492
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Tanuj Gupta
>Assignee: Tanuj Gupta
>Priority: Minor
>
> Use Case: Currently we are using Airflow as our main orchestrator and all of 
> task logs of a particular DAG are forwarded to Splunk. Due to security 
> reason, Airflow UI is not accessible to all the clients. So, we don't have 
> any way to co-relate all the logs of a particular task in a DAG while 
> searching in Splunk. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1026) connection string using _cmd tin airflow.cfg is broken

2018-12-10 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1026.

Resolution: Duplicate

Fixed by AIRFLOW-610 which was in 1.10.0

> connection string using _cmd tin airflow.cfg is broken
> --
>
> Key: AIRFLOW-1026
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1026
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.8.0
>Reporter: Harish Singh
>Priority: Critical
> Fix For: 1.8.0
>
>
> sql_alchemy_conn_cmd = python ./pipeline/dags/configure.py
> I am expectiing configure.py to be invoked.
> But it just throws:
>  "cannot use sqlite with the LocalExecutor"
> The connection string that my script "configure.py" would return is something 
> like this:
> mysql+mysqldb://username:**@mysqlhostname:3306/airflowdbname
> But after debugging, I found that, my script is not getting invoked at all.
> This is my airflow.cfg:
> executor = LocalExecutor
> sql_alchemy_conn_cmd = python ./pipeline/dags/configure.py 
> sql_alchemy_pool_size = 5
> sql_alchemy_pool_recycle = 3600
> I tried not using the script and directly hardcoding the conn_url
> sql_alchemy_conn = 
> mysql+mysqldb://username:**@mysqlhostname:3306/airflowdbname
> It works.
> But  there is a regression bug if somebody wants to use "sql_alchemy_conn_cmd"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1951) kerberos keytab , principal command line argument not getting passed to run function

2018-12-10 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1951.

Resolution: Duplicate

> kerberos keytab , principal command line argument not getting passed to run 
> function
> 
>
> Key: AIRFLOW-1951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1951
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-914) Refactor BackfillJobTest.test_backfill_examples to not use all examples

2018-12-10 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-914:
--
Description: BackfillJobTest.test_backfill_examples takes over 5 minutes to 
execute on Travis. It should use a whitelist instead to run fewer backfills - 
but some to ensure we still have coverage.  (was: 
BackfillJobTest.test_backfill_examples takes over 5 minutes to execute on 
Travis. It should use a whitelist instead or not run at all if on Travis.)

> Refactor BackfillJobTest.test_backfill_examples to not use all examples
> ---
>
> Key: AIRFLOW-914
> URL: https://issues.apache.org/jira/browse/AIRFLOW-914
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ci, tests
>Reporter: Bolke de Bruin
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> BackfillJobTest.test_backfill_examples takes over 5 minutes to execute on 
> Travis. It should use a whitelist instead to run fewer backfills - but some 
> to ensure we still have coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-12-09 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-987.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Iuliia Volkova
>Priority: Major
>  Labels: easyfix, kerberos, security
> Fix For: 2.0.0
>
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2229) Scheduler cannot retry abrupt task failures within factory-generated DAGs

2018-12-07 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712555#comment-16712555
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2229:


One thing to point out: if the fileloc is the DB points to a file that does not 
define the DAG then running tasks from that DAG will also likely to fail. This 
is because when a task is run the whole DAG bag is not loaded, just DAG(s) from 
the defined fileloc.

> Scheduler cannot retry abrupt task failures within factory-generated DAGs
> -
>
> Key: AIRFLOW-2229
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2229
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: James Meickle
>Priority: Major
>
> We had an issue where one of our tasks failed without the worker updating 
> state (unclear why, but let's assume it was an OOM), resulting in this series 
> of error messages:
> {{Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com 
> airflow_scheduler-stdout.log: [2018-03-20 14:27:04,993] \{{models.py:1595 
> ERROR - Executor reports task instance %s finished (%s) although the task 
> says its %s. Was the task killed externally?
> {{ Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com 
> airflow_scheduler-stdout.log: NoneType}}
> {{ Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com 
> airflow_scheduler-stdout.log: [2018-03-20 14:27:04,994] {{jobs.py:1435 ERROR 
> - Cannot load the dag bag to handle failure for  nightly_dataload.dummy_operator 2018-03-19 00:00:00 [queued]>. Setting task 
> to FAILED without callbacks or retries. Do you have enough resources?
> Mysterious failures are not unexpected, because we are in the cloud, after 
> all. The concern is the last line: ignoring callbacks and retries, implying 
> that it's a lack of resources. However, the machine was totally underutilized 
> at the time.
> I dug into this code a bit more and as far as I can tell this error is 
> happening in this code path: 
> [https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/jobs.py#L1427]
> {{self.log.error(msg)}}
>  {{try:}}
>  {{    simple_dag = simple_dag_bag.get_dag(dag_id)}}
>  {{    dagbag = models.DagBag(simple_dag.full_filepath)}}
>  {{    dag = dagbag.get_dag(dag_id)}}
>  {{    ti.task = dag.get_task(task_id)}}
>  {{    ti.handle_failure(msg)}}
>  {{except Exception:}}
>  {{    self.log.error("Cannot load the dag bag to handle failure for %s"}}
>  {{    ". Setting task to FAILED without callbacks or "}}
>  {{    "retries. Do you have enough resources?", ti)}}
>  {{    ti.state = State.FAILED}}
>  {{    session.merge(ti)}}
>  {{    session.commit()}}{{}}
> I am not very familiar with this code, nor do I have time to attach a 
> debugger at the moment, but I think what is happening here is:
>  * I have a factory Python file, which imports and instantiates DAG code from 
> other files.
>  * The scheduler loads the DAGs from the factory file on the filesystem. It 
> gets a fileloc (as represented in the DB) not of the factory file, but of the 
> file it loaded code from.
>  * The scheduler makes a simple DAGBag from the instantiated DAGs.
>  * This line of code uses the simple DAG, which references the original DAG 
> object's fileloc, to create a new DAGBag object.
>  * This DAGBag looks for the original DAG in the fileloc, which is the file 
> containing that DAG's _code_, but is not actually importable by Airflow.
>  * An exception is raised trying to load the DAG from the DAGBag, which found 
> nothing.
>  * Handling of the task failure never occurs.
>  * The over-broad Exception code swallows all of the above occurring.
>  * There's just a generic error message that is not helpful to a system 
> operator.
> If this is the case, at minimum, the try/except should be rewritten to be 
> more graceful and to have a better error message. But I question whether this 
> level of DAGBag abstraction/indirection isn't making this failure case worse 
> than it needs to be; under normal conditions the scheduler is definitely able 
> to find the relevant factory-generated DAGs and execute tasks within them as 
> expected, even with the fileloc set to the code path and not the import path.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3452) Cannot view dags at /home page

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712050#comment-16712050
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3452:


If you are checking out from Git you will need to follow the steps in 
https://github.com/apache/incubator-airflow/blob/master/CONTRIBUTING.md#setting-up-the-node--npm-javascript-environment-only-for-www_rbac
 to build the assets.

The {{display:none}} is odd, but I suspect something else is overriding that 
later?

> Cannot view dags at /home page
> --
>
> Key: AIRFLOW-3452
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3452
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Jinhui Zhang
>Priority: Blocker
>
> I checked out the latest master branch(commit 
> {{[9dce1f0|https://github.com/apache/incubator-airflow/commit/9dce1f0740f69af0ee86709a1a34a002b245aa3e]}})
>  and restarted my Airflow webserver. But I cannot view any dag at the home 
> page. I inspected the frontend code and found there's a 
> {{style="display:none;"}} on the \{{main-content}}, and the source code says 
> so at 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L31]
>  . Is this a known issue? How should I fix it? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3449) Airflow DAG parsing logs aren't written when using S3 logging

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711560#comment-16711560
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3449:


I wonder if something else is going on - as I used and tested with default 
config + S3 logging in at least 1.10.0

> Airflow DAG parsing logs aren't written when using S3 logging
> -
>
> Key: AIRFLOW-3449
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3449
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging, scheduler
>Affects Versions: 1.10.0, 1.10.1
>Reporter: James Meickle
>Priority: Critical
>
> The default Airflow logging class outputs provides some logs to stdout, some 
> to "task" folders, and some to "processor" folders (generated during DAG 
> parsing). The 1.10.0 logging update broke this, but only for users who are 
> also using S3 logging. This is because of this feature in the default logging 
> config file:
> {code:python}
> if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
> DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
> {code}
> That replaces this functioning handlers block:
> {code:python}
> 'task': {
> 'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
> 'formatter': 'airflow',
> 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
> 'filename_template': FILENAME_TEMPLATE,
> },
> 'processor': {
> 'class': 
> 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
> 'formatter': 'airflow',
> 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
> 'filename_template': PROCESSOR_FILENAME_TEMPLATE,
> },
> {code}
> With this non-functioning block:
> {code:python}
> 'task': {
> 'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
> 'formatter': 'airflow',
> 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
> 's3_log_folder': REMOTE_BASE_LOG_FOLDER,
> 'filename_template': FILENAME_TEMPLATE,
> },
> 'processor': {
> 'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
> 'formatter': 'airflow',
> 'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
> 's3_log_folder': REMOTE_BASE_LOG_FOLDER,
> 'filename_template': PROCESSOR_FILENAME_TEMPLATE,
> },
> {code}
> The key issue here is that both "task" and "processor" are being given a 
> "S3TaskHandler" class to use for logging. But that is not a generic S3 class; 
> it's actually a subclass of FileTaskHandler! 
> https://github.com/apache/incubator-airflow/blob/1.10.1/airflow/utils/log/s3_task_handler.py#L26
> Since the template vars don't match the template string, the path to log to 
> evaluates to garbage. The handler then silently fails to log anything at all. 
> It is likely that anyone using a default-like logging config, plus the remote 
> S3 logging feature, stopped getting DAG parsing logs (either locally *or* in 
> S3) as of 1.10.0
> Commenting out the DAG parsing section of the S3 block fixed this on my 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1490) In order to get details on exceptions thrown by tasks, the onfailure callback needs an enhancement

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1490.

Resolution: Duplicate

AIRFLOW-843 merged and in 1.10.1 release

> In order to get details on exceptions thrown by tasks, the onfailure callback 
> needs an enhancement
> --
>
> Key: AIRFLOW-1490
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1490
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0, 2.0.0
>Reporter: Steen Manniche
>Priority: Major
> Attachments: 
> 0001-AIRFLOW-1490-carry-exceptions-through-to-the-on_fail.patch
>
>
> The code called when an exception is thrown by a task receives information on 
> the exception thrown from the task, but fails to delegate this information to 
> the registered callbacks. 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1524
>  sends the context to the registered failure callback, but this context does 
> not include the thrown exception.
> The supplied patch proposes a non-api-breaking way of including the exception 
> in the context in order to provide clients with the full exception type and 
> -traceback



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3468) Refactor: Move KnownEventType out of models.py

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711323#comment-16711323
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3468:


KnownEventType could possible be deleted - I don't think anything actually uses 
it (anymore)? We should ask someone at AirBnB if they know what the idea was 
behind this model.

> Refactor: Move KnownEventType out of models.py
> --
>
> Key: AIRFLOW-3468
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3468
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3457) Refactor: Move User out of models.py

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711291#comment-16711291
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3457:


We can delete User I think, as the RBAC code path has it's own User model.

> Refactor: Move User out of models.py
> 
>
> Key: AIRFLOW-3457
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3457
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-06 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711280#comment-16711280
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3458:


Connection is backed on a DB table - why do we want to move it out?

> Refactor: Move Connection out of models.py
> --
>
> Key: AIRFLOW-3458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3458
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3445) MariaDB explicit_defaults_for_timestamp = 1 Does not work.

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710134#comment-16710134
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3445:


Ah I see. I can't quite parse that diff though - what format is it in?

> MariaDB explicit_defaults_for_timestamp = 1 Does not work.
> --
>
> Key: AIRFLOW-3445
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3445
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.1
> Environment: Hosted VM on the Google Cloud Platform, Compute Engine:
> Machine type: n1-standard-2 (2 vCPUs, 7.5 GB memory)
> Operating System  CentOS
>Reporter: Conor Molloy
>Priority: Blocker
> Fix For: 1.10.2
>
>
> {{Running into an issue when running }}
> {{`airflow upgradedb`}}
> {{ going from `1.9` -> `1.10.1`}}
> {{}}
> {code:java}
> `sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1193, 
> "Unknown system variable 'explicit_defaults_for_timestamp'") [SQL: 'SELECT 
> @@explicit_defaults_for_timestamp']`{code}
> {{I saw this link on the airflow website.}}
> {{[https://airflow.readthedocs.io/en/stable/faq.html#how-to-fix-exception-global-variable-explicit-defaults-for-timestamp-needs-to-be-on-1|http://example.com]}}
> {{Here it says you can set}}
> {code:java}
> `explicit_defaults_for_timestamp = 1`{code}
> {{in the _my.cnf_ file. However I am using Mariadb and when I add this to the 
> _my.cnf_ file the}}
> {noformat}
> mariadb.service{noformat}
> {{fails to start up. Has anyone else come across this issue?}}
>  
> The output from
> {code:java}
> `SHOW VARIABLES like '%version%'`{code}
> was
> {code:java}
> `+-+--+`
> `| Variable_name | Value |`
> `+-+--+`
> `| innodb_version | 5.5.59-MariaDB-38.11 |`
> `| protocol_version | 10 |`
> `| slave_type_conversions | |`
> `| version | 5.5.60-MariaDB |`
> `| version_comment | MariaDB Server |`
> `| version_compile_machine | x86_64 |`
> `| version_compile_os | Linux |`
> `+-+--+`{code}
> The MariaDB does not have the argument as it is a MySQL only feature.
> [https://mariadb.com/kb/en/library/system-variable-differences-between-mariadb-100-and-mysql-56/|http://example.com]
> There may need to be a check for MariaDB before upgrading, as mentioned by 
> Ash in this Slack thread. 
> [https://apache-airflow.slack.com/archives/CCQB40SQJ/p1543918149008100|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2941) example_http_operator.py Python 3.7 invalid syntax

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2941.

Resolution: Duplicate

> example_http_operator.py Python 3.7 invalid syntax
> --
>
> Key: AIRFLOW-2941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2941
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Priority: Major
>
> example_http_operator.py fails on Python 3.7 with:
> {code:java}
> [2018-08-23 08:45:26,827] {models.py:365} ERROR - Failed to import: 
> /usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/site-packages/airflow/models.py", line 362, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/local/lib/python3.7/imp.py", line 172, in load_source
> module = _load(spec)
>   File "", line 696, in _load
>   File "", line 677, in _load_unlocked
>   File "", line 728, in exec_module
>   File "", line 219, in _call_with_frames_removed
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py",
>  line 27, in 
> from airflow.operators.http_operator import SimpleHttpOperator
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", 
> line 21, in 
> from airflow.hooks.http_hook import HttpHook
>   File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", 
> line 23, in 
> import tenacity
>   File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 352
> from tenacity.async import AsyncRetrying
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3445) MariaDB explicit_defaults_for_timestamp = 1 Does not work.

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710106#comment-16710106
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3445:


Tangentially related I think, as MariaDB doesn't have a 
{{explicit_defaults_for_timestamp}} variable, so the check to enable it (and 
the per-session workaround proposed in AIRFLOW-3036) won't work - we simply 
need to not check on MariaDB and trust it has sensible defaults. I think?

> MariaDB explicit_defaults_for_timestamp = 1 Does not work.
> --
>
> Key: AIRFLOW-3445
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3445
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.1
> Environment: Hosted VM on the Google Cloud Platform, Compute Engine:
> Machine type: n1-standard-2 (2 vCPUs, 7.5 GB memory)
> Operating System  CentOS
>Reporter: Conor Molloy
>Priority: Blocker
> Fix For: 1.10.2
>
>
> {{Running into an issue when running }}
> {{`airflow upgradedb`}}
> {{ going from `1.9` -> `1.10.1`}}
> {{}}
> {code:java}
> `sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1193, 
> "Unknown system variable 'explicit_defaults_for_timestamp'") [SQL: 'SELECT 
> @@explicit_defaults_for_timestamp']`{code}
> {{I saw this link on the airflow website.}}
> {{[https://airflow.readthedocs.io/en/stable/faq.html#how-to-fix-exception-global-variable-explicit-defaults-for-timestamp-needs-to-be-on-1|http://example.com]}}
> {{Here it says you can set}}
> {code:java}
> `explicit_defaults_for_timestamp = 1`{code}
> {{in the _my.cnf_ file. However I am using Mariadb and when I add this to the 
> _my.cnf_ file the}}
> {noformat}
> mariadb.service{noformat}
> {{fails to start up. Has anyone else come across this issue?}}
>  
> The output from
> {code:java}
> `SHOW VARIABLES like '%version%'`{code}
> was
> {code:java}
> `+-+--+`
> `| Variable_name | Value |`
> `+-+--+`
> `| innodb_version | 5.5.59-MariaDB-38.11 |`
> `| protocol_version | 10 |`
> `| slave_type_conversions | |`
> `| version | 5.5.60-MariaDB |`
> `| version_comment | MariaDB Server |`
> `| version_compile_machine | x86_64 |`
> `| version_compile_os | Linux |`
> `+-+--+`{code}
> The MariaDB does not have the argument as it is a MySQL only feature.
> [https://mariadb.com/kb/en/library/system-variable-differences-between-mariadb-100-and-mysql-56/|http://example.com]
> There may need to be a check for MariaDB before upgrading, as mentioned by 
> Ash in this Slack thread. 
> [https://apache-airflow.slack.com/archives/CCQB40SQJ/p1543918149008100|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3448) Syntax error when importing tenacity on python 3.7

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3448.

Resolution: Duplicate

> Syntax error when importing tenacity on python 3.7
> --
>
> Key: AIRFLOW-3448
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3448
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.10.1
> Environment: Python 3.7
>Reporter: Tiago Reis
>Assignee: Tiago Reis
>Priority: Major
>  Labels: easyfix
> Fix For: 1.10.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Tenacity is used for the retry mechanism in HTTP hooks. With the introduction 
> of {{async}} as a reserved keyword on Python 3.7, Tenacity 4.8.0 is broken 
> with a syntax error from {{tenacity.async import AsyncRetrying}}. Updating to 
> 4.10.0 will solve this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2837) tenacity 4.8.0 breaks with python3.7

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2837.

Resolution: Duplicate

> tenacity 4.8.0 breaks with python3.7
> 
>
> Key: AIRFLOW-2837
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2837
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Adrian Bridgett
>Priority: Minor
>
> Tenacity 4.8.0 (as in setup.py) uses the reserved async keyword.
> Tenacity seems to lack a changelog, 4.12.0 seems to fix the problem but I 
> don't know what breaking changes may have occurred. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3058) Airflow log & multi-threading

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710064#comment-16710064
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3058:


Was this fixed by disabling buffered IO from your python script? (for anyone 
else finding this ticket in the future)

> Airflow log & multi-threading
> -
>
> Key: AIRFLOW-3058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3058
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: jack
>Priority: Major
> Attachments: 456.PNG, Sni.PNG
>
>
> The airflow log does not show messages in real time when executing scripts 
> with Multi-threading.
>  
> for example:
>  
> The left is the Airflow log time. the right is the actual time of the print 
> in my code. If I would execute the script without airflow the console will 
> show the times on the right.
> !Sni.PNG!
> {code:java}
> 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 
> 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 
> Thread: Thread-1 Generate page: #0 run #0 with URL: 
> http://...=2=0=1000
> [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.231635 Thread: Thread-2 Generate page: #1 run #0 with URL: 
> http://...=2=1000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.233226 Thread: Thread-3 Generate page: #2 run #0 with URL: 
> http://...=2=2000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,289] {bash_operator.py:101} INFO - 2018-09-13 
> 14:14:55.234020 Thread: Thread-4 Generate page: #3 run #0 with URL: 
> http://...=2=3000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100122 Thread: Thread-1 page 0 finished. Length is 1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:43.100877 Thread: Thread-1 Generate page: #4 run #0 with URL: 
> http://...=2=4000=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.254536 Thread: Thread-3 page 2 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:46.255508 Thread: Thread-3 Generate page: #5 run #0 with URL: 
> http://...=2=5000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.096360 Thread: Thread-2 page 1 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:51.097269 Thread: Thread-2 Generate page: #6 run #0 with URL: 
> http://...=2=6000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.112621 Thread: Thread-4 page 3 finished. Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:15:53.113455 Thread: Thread-4 Generate page: #7 run #0 with URL: 
> http://...=2=7000=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.345343 Thread: Thread-3 Generate page: #8 run #0 with URL: 
> http://...=2=8000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,290] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:37.701201 Thread: Thread-2 Generate page: #9 run #0 with URL: 
> http://...=2=9000=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:16:47.283796 Thread: Thread-1 page 4 finished. Length is 1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: 
> [2018-09-13 14:19:17,291] {bash_operator.py:101} INFO - 2018-09-13 
> 14:17:27.169359 Thread: Thread-2 page 9 finished. Length is 1000
>  
> {code}
> This never happens when executing regular code.. Happens only with 
> multi-threading. I have some other scripts that the airflow print appears 
> after more than 30 minutes.
>  
>  Check this one:
> hours of delay and then printing everything together. These are not real 
> time. the prints in the log has no correlation to the 

[jira] [Commented] (AIRFLOW-3447) Intended usage of ts_nodash macro broken with migration to new time system.

2018-12-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710063#comment-16710063
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3447:


[~kaxilnaik] one for 1.10.2? I think reverting the behaviour back (by stripping 
off the TZ info) is the right thing to do here?

> Intended usage of ts_nodash macro broken with migration to new time system.
> ---
>
> Key: AIRFLOW-3447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: Luka Draksler
>Priority: Minor
>  Labels: easyfix
>
> Migration to timezone aware times broke the intended usage of ts_nodash macro.
> ts_nodash is used in certain placeholders to create different names (table 
> names, cluster names...). As such it is alphanumeric only, it contains no 
> characters that could be deemed illegal by various naming restrictions. 
> Migration to new time system changed that.
> As an example, this would be returned currently: 
> {{20181205T125657.169324+}}
> {{before:}}
> {{20181204T03}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3412) Worker pods are not being deleted after termination

2018-12-03 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-3412.
--
Resolution: Duplicate

> Worker pods are not being deleted after termination
> ---
>
> Key: AIRFLOW-3412
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3412
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor, kubernetes
>Affects Versions: 1.10.0
>Reporter: Viktor
>Assignee: Viktor
>Priority: Major
> Fix For: 1.10.2
>
>
> When using KubernetesExecutor multiple pods are spawned for tasks.
> When their job is done they are not deleted automatically even if you specify 
> *delete_worker_pods=true* in the Airflow configuration and RBAC is properly 
> configured to allow the scheduler to delete pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3431) Document how to report security vulnerabilities and issues safely

2018-12-03 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3431.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Document how to report security vulnerabilities and issues safely
> -
>
> Key: AIRFLOW-3431
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3431
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ash Berlin-Taylor
>Assignee: Ash Berlin-Taylor
>Priority: Major
> Fix For: 2.0.0
>
>
> Add to our docs how poeple can report security vulnerabilities in Airflow 
> safely and responsibly. Point QU30 in the maturity docs:
> {quote}The project provides a well-documented channel to report security 
> issues, along with a documented way of responding to them.\{quote}
> We will follow the Apache way and use 
> [secur...@apache.org|mailto:secur...@apache.org] for now, but we need to say 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3367) Test celery with redis broker

2018-12-03 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3367.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Test celery with redis broker
> -
>
> Key: AIRFLOW-3367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3367
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Josh Carp
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Current integration tests use celery with the rabbitmq broker, but not the 
> redis broker. We should test with both brokers to avoid regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3431) Document how to report security vulnerabilities and issues safely

2018-12-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reassigned AIRFLOW-3431:
--

Assignee: Ash Berlin-Taylor

> Document how to report security vulnerabilities and issues safely
> -
>
> Key: AIRFLOW-3431
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3431
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ash Berlin-Taylor
>Assignee: Ash Berlin-Taylor
>Priority: Major
>
> Add to our docs how poeple can report security vulnerabilities in Airflow 
> safely and responsibly. Point QU30 in the maturity docs:
> {quote}The project provides a well-documented channel to report security 
> issues, along with a documented way of responding to them.\{quote}
> We will follow the Apache way and use 
> [secur...@apache.org|mailto:secur...@apache.org] for now, but we need to say 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3430) Document how to become a commiter

2018-12-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reassigned AIRFLOW-3430:
--

Assignee: Fokko Driesprong  (was: Ash Berlin-Taylor)

> Document how to become a commiter
> -
>
> Key: AIRFLOW-3430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3430
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ash Berlin-Taylor
>Assignee: Fokko Driesprong
>Priority: Major
>
> Add to our documents what the process is to become a committer (CO50):
> {quote}The way in which contributors can be granted more rights such as 
> commit access or decision power is clearly documented and is the same for all 
> contributors.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3431) Document how to report security vulnerabilities and issues safely

2018-12-01 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-3431:
--

 Summary: Document how to report security vulnerabilities and 
issues safely
 Key: AIRFLOW-3431
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3431
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Ash Berlin-Taylor


Add to our docs how poeple can report security vulnerabilities in Airflow 
safely and responsibly. Point QU30 in the maturity docs:

{quote}The project provides a well-documented channel to report security 
issues, along with a documented way of responding to them.\{quote}

We will follow the Apache way and use 
[secur...@apache.org|mailto:secur...@apache.org] for now, but we need to say 
this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3430) Document how to become a commiter

2018-12-01 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-3430:
--

 Summary: Document how to become a commiter
 Key: AIRFLOW-3430
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3430
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Ash Berlin-Taylor
Assignee: Ash Berlin-Taylor


Add to our documents what the process is to become a committer (CO50):
{quote}The way in which contributors can be granted more rights such as commit 
access or decision power is clearly documented and is the same for all 
contributors.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704945#comment-16704945
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3426:


Where do those classifications even show up?

I think I generally agree, but with one addtion: a note in the docs somewhere 
saying something like "tested on Python 3.5, but we make efforts to support 
Python 3.4 through 3.7"?  (I'm not sure where that would go, if there is no 
where obvious I would settle for a new Jira to add it somewhere)

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704779#comment-16704779
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3426:


Not actively tested against isn't the same as not supported, especially as the 
differences between 3.4, 3.5, and 3.6 are very minor, so I'm not sure we want 
to remove 3.4 from that list.

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704670#comment-16704670
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3422:


Ohhh - that's because _there Is no 2:20AM_ that day is there.

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
> Fix For: 1.10.2
>
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704665#comment-16704665
 ] 

Ash Berlin-Taylor edited comment on AIRFLOW-3422 at 11/30/18 12:36 PM:
---

Even pendulum doesn't get maths around TZ transitions right:
{code:java}
ipdb> target
DateTime(2018, 10, 6, 2, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> target + one_day
DateTime(2018, 10, 7, 3, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> one_day
Duration(days=1)
ipdb> one_day.__class__

ipdb> target.__class__

{code}


was (Author: ashb):
Even pendulum doesn't get maths around TZ transitions write:

{code}
ipdb> target
DateTime(2018, 10, 6, 2, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> target + one_day
DateTime(2018, 10, 7, 3, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> one_day
Duration(days=1)
ipdb> one_day.__class__

ipdb> target.__class__

{code}

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
> Fix For: 1.10.2
>
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704665#comment-16704665
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3422:


Even pendulum doesn't get maths around TZ transitions write:

{code}
ipdb> target
DateTime(2018, 10, 6, 2, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> target + one_day
DateTime(2018, 10, 7, 3, 20, 0, tzinfo=Timezone('Australia/Sydney'))
ipdb> one_day
Duration(days=1)
ipdb> one_day.__class__

ipdb> target.__class__

{code}

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
> Fix For: 1.10.2
>
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3422:
---
Fix Version/s: 1.10.2

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
> Fix For: 1.10.2
>
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-30 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704471#comment-16704471
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3422:


Thanks, and confirmed behaviour on 3.6 local.

We also (try) to not use the timezone library and just use pendulum for TZ 
functions.

I'll take a look a try and get a fix in 1.10.2

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
> Fix For: 1.10.2
>
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6

2018-11-29 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703822#comment-16703822
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3422:


Dang :(

Does it work if a different timezone is used? What version of Pendulum and pytz 
do you have installed please?

> Infinite loops during springtime DST transitions on python 3.6
> --
>
> Key: AIRFLOW-3422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3422
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.1
>Reporter: Till Heistermann
>Priority: Major
>
> Automatic DST transitions can cause dags to be stuck in an infinite loop, if 
> they happen to be scheduled in the "skipped" hour during a springtime DST 
> transition. 
> The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does 
> not seem to work for python 3.6, only for 3.5 and 2.7.
> Example to reproduce (current master, python 3.6):
> {code:java}
> import pendulum
> from datetime import datetime
> from airflow.utils.timezone import make_aware
> from airflow.models import DAG
> nsw = pendulum.Timezone.load("Australia/Sydney")
> dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw)
> dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> dt = dag.following_schedule(dt); print(dt)
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3419) S3_hook.select_key is broken on Python3

2018-11-29 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3419:
---
Summary: S3_hook.select_key is broken on Python3  (was: S3_hook.select_key 
is broken)

> S3_hook.select_key is broken on Python3
> ---
>
> Key: AIRFLOW-3419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3419
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: boto3, hooks
>Affects Versions: 1.10.1
>Reporter: Maria Rebelka
>Priority: Major
>
> Hello,
> Using select_key throws an error:
> {quote}text = S3Hook('aws_conn').select_key(key='my_key',
>                                      bucket_name='my_bucket',
>                                      expression='SELECT * FROM S3Object s',
>                                      expression_type='SQL',
>                                      input_serialization={'JSON': \{'Type': 
> 'DOCUMENT'}},
>                                      output_serialization={'JSON': {}}){quote}
> Traceback (most recent call last):
> {quote}   File "db.py", line 31, in 
> output_serialization={'JSON': {}})
>   File "/usr/local/lib/python3.4/site-packages/airflow/hooks/S3_hook.py", 
> line 262, in select_key
> for event in response['Payload']
> TypeError: sequence item 0: expected str instance, bytes found{quote}
> Seems that the problem is in this line:
> S3_hook.py, line 262:  return ''.join(event['Records']['Payload']
> which probably should be return 
> ''.join(event['Records']['Payload'].decode('utf-8')
> From example in Amazon blog:
> https://aws.amazon.com/blogs/aws/s3-glacier-select/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3416) CloudSqlQueryOperator with sql proxy does not work with Python 3.x

2018-11-29 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3416:
---
Fix Version/s: 1.10.2

> CloudSqlQueryOperator with sql proxy does not work with Python 3.x
> --
>
> Key: AIRFLOW-3416
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3416
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Jarek Potiuk
>Assignee: Jarek Potiuk
>Priority: Major
> Fix For: 1.10.2
>
>
> There are compatibility issues with Python 3.x for CloudSQLoperator. Output 
> of cloud_sql_proxy binary is parsed and the output in Python3 is bytes rather 
> than string so several "in"s raise an exception in Python 3. It needs 
> explicit decode('utf-8')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop

2018-11-29 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702910#comment-16702910
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3415:


Could you include the stack trace too please?

> Imports become null when triggering dagruns in a loop
> -
>
> Key: AIRFLOW-3415
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3415
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.10.1
> Environment: CentOS 7
>Reporter: Yuri Bendana
>Priority: Minor
>
> When triggering dagruns in a loop, the imported references become null on the 
> second iteration.  Here is an example [ gist | 
> [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For 
> the purposes here, you can ignore the DagRunSensor task. On the first 
> iteration the 'sleeper' dag gets triggered but on the second iteration I see a
> {noformat}
> TypeError: 'NoneType' object is not callable{noformat}
> To workaround this, I have to copy the import (in this case trigger_dag) 
> inside the loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3372) Unable to start airflow scheduler

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702109#comment-16702109
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3372:


The problem is that {{airflow initdb}} has never been run.

I would *very strongly* suggest looking at another database other than SQLite - 
I personally recommend Postgres but mysql is supported and used by some users.

> Unable to start airflow scheduler
> -
>
> Key: AIRFLOW-3372
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3372
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker, kubernetes, scheduler
>Affects Versions: 1.9.0
> Environment: Kubernetes,docker
>Reporter: MADHANKUMAR C
>Priority: Blocker
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> *I have installed airflow in kubernetes cluster.When i am installing airflow 
> ,i am unable to start the scheduler.The below is the log of scheduler 
> container.*
> [2018-11-20 12:02:40,860] {{__init__.py:51}} INFO - Using executor 
> SequentialExecutor
>  [2018-11-20 12:02:40,973] {{cli_action_loggers.py:69}} ERROR - Failed on 
> pre-execution callback using 
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
>  context)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
>  cursor.execute(statement, parameters)
>  sqlite3.OperationalError: no such table: log
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>  File 
> "/usr/local/lib/python3.5/dist-packages/airflow/utils/cli_action_loggers.py", 
> line 67, in on_pre_execution
>  cb(**kwargs)
>  File 
> "/usr/local/lib/python3.5/dist-packages/airflow/utils/cli_action_loggers.py", 
> line 99, in default_action_log
>  session.commit()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 927, in commit
>  self.transaction.commit()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 467, in commit
>  self._prepare_impl()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 447, in _prepare_impl
>  self.session.flush()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2209, in flush
>  self._flush(objects)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2329, in _flush
>  transaction.rollback(_capture_exception=True)
>  File 
> "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__
>  compat.reraise(exc_type, exc_value, exc_tb)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", 
> line 187, in reraise
>  raise value
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/session.py", 
> line 2293, in _flush
>  flush_context.execute()
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/unitofwork.py", 
> line 389, in execute
>  rec.execute(self)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/unitofwork.py", 
> line 548, in execute
>  uow
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/persistence.py", 
> line 181, in save_obj
>  mapper, table, insert)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/orm/persistence.py", 
> line 835, in _emit_insert_statements
>  execute(statement, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 945, in execute
>  return meth(self, multiparams, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/elements.py", 
> line 263, in _execute_on_connection
>  return connection._execute_clauseelement(self, multiparams, params)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1053, in _execute_clauseelement
>  compiled_sql, distilled_params
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1189, in _execute_context
>  context)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1402, in _handle_dbapi_exception
>  exc_info
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", 
> line 203, in raise_from_cause
>  reraise(type(exception), exception, tb=exc_tb, cause=cause)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", 
> line 186, in reraise
>  raise value.with_traceback(tb)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
>  context)
>  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
>  cursor.execute(statement, parameters)
>  sqlalchemy.exc.OperationalError: 

[jira] [Closed] (AIRFLOW-1092) {{execution_date}} not matching '%Y-%m-%d %H:%M:%S.%f'

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1092.
--
Resolution: Information Provided

> {{execution_date}} not matching '%Y-%m-%d %H:%M:%S.%f'
> --
>
> Key: AIRFLOW-1092
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1092
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler, utils
>Affects Versions: 1.8.0
> Environment: CentOS
>Reporter: Xi Wang
>Priority: Critical
>
> Hi there,
> I was trying to use  datetime.strptime({{ execution_date }}, '%Y-%m-%d 
> %H:%M:%S.%f') to format some files names, but an error was returned saying {{ 
> execution_date }} format is not matching the provided one. Could anyone take 
> a look?
> Thx!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3405) Task instance fail intermittently due to MySQL error

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701772#comment-16701772
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3405:


1.10.0 - https://issues.apache.org/jira/browse/AIRFLOW-1559 was the issue I was 
thinking of

> Task instance fail intermittently due to MySQL error
> 
>
> Key: AIRFLOW-3405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3405
> Project: Apache Airflow
>  Issue Type: Improvement
> Environment: MySQL, Redhat Linux
>Reporter: Yuvaraj
>Priority: Major
>  Labels: performance, usability
>
> Dags are getting failed intermittently due to below error. 
> OperationalError: (_mysql_exceptions.OperationalError) (1040, 'Too many 
> connections')
> [2018-11-25 12:24:16,952] - Heartbeat time limited exceeded!
> We have max_connections defined as 2000 in DB. 
> Below are the setting in cfg.
> sql_alchemy_pool_size = 1980
> sql_alchemy_pool_recycle = 3600
> As per DBA, The airflow scheduler keeps opening connections to the database, 
> these connections are mostly idle, they get reset whenever the scheduler 
> restarts but with max_connections at 2000 and scheduler holding on to 1600 of 
> these, other apps trying to connect might start running out of connections.
> How do we remediate these idle connections. What should be the optimal value 
> for these configs and max_connections that to be set at DB. Consider we need 
> to build a large environment serving 500+ definitions with 1+ runs per 
> day. Need suggestions...  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3405) Task instance fail intermittently due to MySQL error

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701769#comment-16701769
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3405:


The first thing I would suggest trying is 1.10.1 if you can - I think there was 
some work done in 1.9.0 or 1.10.0 to reduce the number of connections workers 
used.

> Task instance fail intermittently due to MySQL error
> 
>
> Key: AIRFLOW-3405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3405
> Project: Apache Airflow
>  Issue Type: Improvement
> Environment: MySQL, Redhat Linux
>Reporter: Yuvaraj
>Priority: Major
>  Labels: performance, usability
>
> Dags are getting failed intermittently due to below error. 
> OperationalError: (_mysql_exceptions.OperationalError) (1040, 'Too many 
> connections')
> [2018-11-25 12:24:16,952] - Heartbeat time limited exceeded!
> We have max_connections defined as 2000 in DB. 
> Below are the setting in cfg.
> sql_alchemy_pool_size = 1980
> sql_alchemy_pool_recycle = 3600
> As per DBA, The airflow scheduler keeps opening connections to the database, 
> these connections are mostly idle, they get reset whenever the scheduler 
> restarts but with max_connections at 2000 and scheduler holding on to 1600 of 
> these, other apps trying to connect might start running out of connections.
> How do we remediate these idle connections. What should be the optimal value 
> for these configs and max_connections that to be set at DB. Consider we need 
> to build a large environment serving 500+ definitions with 1+ runs per 
> day. Need suggestions...  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3233) Dag deletion in the UI doesn't work

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3233.

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.10.2

> Dag deletion in the UI doesn't work
> ---
>
> Key: AIRFLOW-3233
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3233
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Major
> Fix For: 1.10.2, 2.0.0
>
>
> Dag deletion in the UI doesn't work, DAGs can only be deleted if the DAG 
> doesn't exist in the DAGBag, but if the DAG doesn't exist in the DAGBag the 
> deletion URL gets passed an empty DAG id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3409) Docker build script should build web assets (compile_assets).

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-3409.
--
Resolution: Duplicate

> Docker build script should build web assets (compile_assets).
> -
>
> Key: AIRFLOW-3409
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3409
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Pullin
>Priority: Minor
>
> The docker container does not build the web assets and renders the UI 
> unusable when deployed to a local minikube k8s instance.
> A minor patch to the Dockerfile is needed to build these assets:
> -python setup.py sdist -q
> +python setup.py compile_assets sdist -q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3384) Allow higher versions of sqlalchemy and jinja

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3384.

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.10.2

> Allow higher versions of sqlalchemy and jinja
> -
>
> Key: AIRFLOW-3384
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3384
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Jose Luis Ricon
>Assignee: Jose Luis Ricon
>Priority: Major
> Fix For: 1.10.2, 2.0.0
>
>
> At the moment airflow doesn't allow the installation of sqlalchemy version 
> 1.2.11 and jinja2==2.10 . Airflow works with both, and there is no reason to 
> allow higher versions. Projects downstream who are currently forcing the 
> installation of said versions, overriding airflow's dependencies, will 
> benefit for this as it will allow for version-compatible installations 
> without loss in functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3132) Allow to specify auto_remove option for DockerOperator

2018-11-28 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3132:
---
Fix Version/s: 1.10.2

Marking this for inclusion in 1.10.2 - if there is one (we may just be 
releasing 2.0.0 soon anyway)

> Allow to specify auto_remove option for DockerOperator
> --
>
> Key: AIRFLOW-3132
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3132
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Guoqiang Ding
>Assignee: Guoqiang Ding
>Priority: Major
> Fix For: 1.10.2, 2.0.0
>
>
> Sometimes we want to run docker container command just once. Docker API 
> client allows to specify the auto_remove option when starting a container.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (AIRFLOW-3351) Airflow webserver intermitent broken

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3351:
---
Comment: was deleted

(was: [harik@gmail.com|mailto:harik@gmail.com] is an email id, please 
send invitation.)

> Airflow webserver intermitent broken
> 
>
> Key: AIRFLOW-3351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> After completing the airflow 1.10.0 integration with LDAP anonymously 
> (AIRFLOW-3270), we started to hit "Internal Server Error" with below 
> exception stack, we tried to clean up the browser cache, it sometimes works 
> and sometimes error our. Please advise the resolution to avoid this issue.
>  
> {code:java}
> During handling of the above exception, another exception occurred: Traceback 
> (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle     self.handle_request(listener, req, client, addr)   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request     respiter = self.wsgi(environ, resp.start_response)   
> File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", line 826, in 
> __call__     return app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__ 
>     return self.wsgi_app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app 
>     response = self.handle_exception(e)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1547, in 
> handle_exception     return self.finalize_request(handler(e), 
> from_error_handler=True)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 716, in 
> show_traceback     info=traceback.format_exc()), 500   File 
> "/usr/local/lib/python3.5/site-packages/flask/templating.py", line 132, in 
> render_template     ctx.app.update_template_context(context)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 764, in 
> update_template_context     context.update(func())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 825, in 
> _user_context_processor     return dict(current_user=_get_user())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 794, in 
> _get_user     current_app.login_manager._load_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 363, in 
> _load_user     return self.reload_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 325, in 
> reload_user     user = self.user_callback(user_id)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/utils/db.py", line 74, in 
> wrapper     return func(*args, **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 278, in load_user     return LdapUser(user)   File "", line 4, 
> in __init__   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 414, 
> in _initialize_instance     manager.dispatch.init_failure(self, args, kwargs) 
>   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__     compat.reraise(exc_type, exc_value, exc_tb)   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, 
> in reraise     raise value   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 411, 
> in _initialize_instance     return manager.original_init(*mixed[1:], 
> **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 157, in __init__     user.username) AttributeError: 'NoneType' object 
> has no attribute 'username' 127.0.0.1 - - [15/Nov/2018:10:47:31 +] "GET 
> /admin/ HTTP/1.1" 500 0 "-" "-" [2018-11-15 10:47:38,590] ERROR in app: 
> Exception on /favicon.ico [GET] Traceback (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request     rv = self.dispatch_request()   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1590, in 
> dispatch_request     self.raise_routing_exception(req)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1573, in 
> raise_routing_exception     raise request.routing_exception   File 
> "/usr/local/lib/python3.5/site-packages/flask/ctx.py", line 294, in 
> match_request     self.url_adapter.match(return_rule=True)   File 
> "/usr/local/lib/python3.5/site-packages/werkzeug/routing.py", line 1581, in 
> match  

[jira] [Commented] (AIRFLOW-3351) Airflow webserver intermitent broken

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700810#comment-16700810
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3351:


Sent.

> Airflow webserver intermitent broken
> 
>
> Key: AIRFLOW-3351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> After completing the airflow 1.10.0 integration with LDAP anonymously 
> (AIRFLOW-3270), we started to hit "Internal Server Error" with below 
> exception stack, we tried to clean up the browser cache, it sometimes works 
> and sometimes error our. Please advise the resolution to avoid this issue.
>  
> {code:java}
> During handling of the above exception, another exception occurred: Traceback 
> (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle     self.handle_request(listener, req, client, addr)   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request     respiter = self.wsgi(environ, resp.start_response)   
> File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", line 826, in 
> __call__     return app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__ 
>     return self.wsgi_app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app 
>     response = self.handle_exception(e)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1547, in 
> handle_exception     return self.finalize_request(handler(e), 
> from_error_handler=True)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 716, in 
> show_traceback     info=traceback.format_exc()), 500   File 
> "/usr/local/lib/python3.5/site-packages/flask/templating.py", line 132, in 
> render_template     ctx.app.update_template_context(context)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 764, in 
> update_template_context     context.update(func())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 825, in 
> _user_context_processor     return dict(current_user=_get_user())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 794, in 
> _get_user     current_app.login_manager._load_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 363, in 
> _load_user     return self.reload_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 325, in 
> reload_user     user = self.user_callback(user_id)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/utils/db.py", line 74, in 
> wrapper     return func(*args, **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 278, in load_user     return LdapUser(user)   File "", line 4, 
> in __init__   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 414, 
> in _initialize_instance     manager.dispatch.init_failure(self, args, kwargs) 
>   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__     compat.reraise(exc_type, exc_value, exc_tb)   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, 
> in reraise     raise value   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 411, 
> in _initialize_instance     return manager.original_init(*mixed[1:], 
> **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 157, in __init__     user.username) AttributeError: 'NoneType' object 
> has no attribute 'username' 127.0.0.1 - - [15/Nov/2018:10:47:31 +] "GET 
> /admin/ HTTP/1.1" 500 0 "-" "-" [2018-11-15 10:47:38,590] ERROR in app: 
> Exception on /favicon.ico [GET] Traceback (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request     rv = self.dispatch_request()   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1590, in 
> dispatch_request     self.raise_routing_exception(req)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1573, in 
> raise_routing_exception     raise request.routing_exception   File 
> "/usr/local/lib/python3.5/site-packages/flask/ctx.py", line 294, in 
> match_request     self.url_adapter.match(return_rule=True)   File 
> "/usr/local/lib/python3.5/site-packages/werkzeug/routing.py", line 1581, in 
> match     raise NotFound() werkzeug.exceptions.NotFound: 404 Not Found: The 
> 

[jira] [Commented] (AIRFLOW-3351) Airflow webserver intermitent broken

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700801#comment-16700801
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3351:


I will need an email address to send the invite too.

> Airflow webserver intermitent broken
> 
>
> Key: AIRFLOW-3351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> After completing the airflow 1.10.0 integration with LDAP anonymously 
> (AIRFLOW-3270), we started to hit "Internal Server Error" with below 
> exception stack, we tried to clean up the browser cache, it sometimes works 
> and sometimes error our. Please advise the resolution to avoid this issue.
>  
> {code:java}
> During handling of the above exception, another exception occurred: Traceback 
> (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle     self.handle_request(listener, req, client, addr)   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request     respiter = self.wsgi(environ, resp.start_response)   
> File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", line 826, in 
> __call__     return app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__ 
>     return self.wsgi_app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app 
>     response = self.handle_exception(e)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1547, in 
> handle_exception     return self.finalize_request(handler(e), 
> from_error_handler=True)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 716, in 
> show_traceback     info=traceback.format_exc()), 500   File 
> "/usr/local/lib/python3.5/site-packages/flask/templating.py", line 132, in 
> render_template     ctx.app.update_template_context(context)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 764, in 
> update_template_context     context.update(func())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 825, in 
> _user_context_processor     return dict(current_user=_get_user())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 794, in 
> _get_user     current_app.login_manager._load_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 363, in 
> _load_user     return self.reload_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 325, in 
> reload_user     user = self.user_callback(user_id)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/utils/db.py", line 74, in 
> wrapper     return func(*args, **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 278, in load_user     return LdapUser(user)   File "", line 4, 
> in __init__   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 414, 
> in _initialize_instance     manager.dispatch.init_failure(self, args, kwargs) 
>   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__     compat.reraise(exc_type, exc_value, exc_tb)   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, 
> in reraise     raise value   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 411, 
> in _initialize_instance     return manager.original_init(*mixed[1:], 
> **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 157, in __init__     user.username) AttributeError: 'NoneType' object 
> has no attribute 'username' 127.0.0.1 - - [15/Nov/2018:10:47:31 +] "GET 
> /admin/ HTTP/1.1" 500 0 "-" "-" [2018-11-15 10:47:38,590] ERROR in app: 
> Exception on /favicon.ico [GET] Traceback (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request     rv = self.dispatch_request()   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1590, in 
> dispatch_request     self.raise_routing_exception(req)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1573, in 
> raise_routing_exception     raise request.routing_exception   File 
> "/usr/local/lib/python3.5/site-packages/flask/ctx.py", line 294, in 
> match_request     self.url_adapter.match(return_rule=True)   File 
> "/usr/local/lib/python3.5/site-packages/werkzeug/routing.py", line 1581, in 
> match     raise NotFound() 

[jira] [Commented] (AIRFLOW-3351) Airflow webserver intermitent broken

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700793#comment-16700793
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3351:


You can get an invite sent via http://apache-airflow.slack.com/

> Airflow webserver intermitent broken
> 
>
> Key: AIRFLOW-3351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> After completing the airflow 1.10.0 integration with LDAP anonymously 
> (AIRFLOW-3270), we started to hit "Internal Server Error" with below 
> exception stack, we tried to clean up the browser cache, it sometimes works 
> and sometimes error our. Please advise the resolution to avoid this issue.
>  
> {code:java}
> During handling of the above exception, another exception occurred: Traceback 
> (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle     self.handle_request(listener, req, client, addr)   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request     respiter = self.wsgi(environ, resp.start_response)   
> File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", line 826, in 
> __call__     return app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__ 
>     return self.wsgi_app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app 
>     response = self.handle_exception(e)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1547, in 
> handle_exception     return self.finalize_request(handler(e), 
> from_error_handler=True)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 716, in 
> show_traceback     info=traceback.format_exc()), 500   File 
> "/usr/local/lib/python3.5/site-packages/flask/templating.py", line 132, in 
> render_template     ctx.app.update_template_context(context)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 764, in 
> update_template_context     context.update(func())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 825, in 
> _user_context_processor     return dict(current_user=_get_user())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 794, in 
> _get_user     current_app.login_manager._load_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 363, in 
> _load_user     return self.reload_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 325, in 
> reload_user     user = self.user_callback(user_id)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/utils/db.py", line 74, in 
> wrapper     return func(*args, **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 278, in load_user     return LdapUser(user)   File "", line 4, 
> in __init__   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 414, 
> in _initialize_instance     manager.dispatch.init_failure(self, args, kwargs) 
>   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__     compat.reraise(exc_type, exc_value, exc_tb)   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, 
> in reraise     raise value   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 411, 
> in _initialize_instance     return manager.original_init(*mixed[1:], 
> **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 157, in __init__     user.username) AttributeError: 'NoneType' object 
> has no attribute 'username' 127.0.0.1 - - [15/Nov/2018:10:47:31 +] "GET 
> /admin/ HTTP/1.1" 500 0 "-" "-" [2018-11-15 10:47:38,590] ERROR in app: 
> Exception on /favicon.ico [GET] Traceback (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request     rv = self.dispatch_request()   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1590, in 
> dispatch_request     self.raise_routing_exception(req)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1573, in 
> raise_routing_exception     raise request.routing_exception   File 
> "/usr/local/lib/python3.5/site-packages/flask/ctx.py", line 294, in 
> match_request     self.url_adapter.match(return_rule=True)   File 
> "/usr/local/lib/python3.5/site-packages/werkzeug/routing.py", line 1581, in 
> match     raise 

[jira] [Commented] (AIRFLOW-3351) Airflow webserver intermitent broken

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700781#comment-16700781
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3351:


This is an open source project and that is predominately volunteer lead, and as 
such there is support is best efforts.

If you need that level of support grab me on Slack or email (a...@apache.org) 
and we can discuss something more formal.

> Airflow webserver intermitent broken
> 
>
> Key: AIRFLOW-3351
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3351
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> After completing the airflow 1.10.0 integration with LDAP anonymously 
> (AIRFLOW-3270), we started to hit "Internal Server Error" with below 
> exception stack, we tried to clean up the browser cache, it sometimes works 
> and sometimes error our. Please advise the resolution to avoid this issue.
>  
> {code:java}
> During handling of the above exception, another exception occurred: Traceback 
> (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, 
> in handle     self.handle_request(listener, req, client, addr)   File 
> "/usr/local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, 
> in handle_request     respiter = self.wsgi(environ, resp.start_response)   
> File "/usr/local/lib/python3.5/site-packages/werkzeug/wsgi.py", line 826, in 
> __call__     return app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__ 
>     return self.wsgi_app(environ, start_response)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app 
>     response = self.handle_exception(e)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1547, in 
> handle_exception     return self.finalize_request(handler(e), 
> from_error_handler=True)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 716, in 
> show_traceback     info=traceback.format_exc()), 500   File 
> "/usr/local/lib/python3.5/site-packages/flask/templating.py", line 132, in 
> render_template     ctx.app.update_template_context(context)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 764, in 
> update_template_context     context.update(func())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 825, in 
> _user_context_processor     return dict(current_user=_get_user())   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 794, in 
> _get_user     current_app.login_manager._load_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 363, in 
> _load_user     return self.reload_user()   File 
> "/usr/local/lib/python3.5/site-packages/flask_login.py", line 325, in 
> reload_user     user = self.user_callback(user_id)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/utils/db.py", line 74, in 
> wrapper     return func(*args, **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 278, in load_user     return LdapUser(user)   File "", line 4, 
> in __init__   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 414, 
> in _initialize_instance     manager.dispatch.init_failure(self, args, kwargs) 
>   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 
> 66, in __exit__     compat.reraise(exc_type, exc_value, exc_tb)   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, 
> in reraise     raise value   File 
> "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 411, 
> in _initialize_instance     return manager.original_init(*mixed[1:], 
> **kwargs)   File 
> "/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 157, in __init__     user.username) AttributeError: 'NoneType' object 
> has no attribute 'username' 127.0.0.1 - - [15/Nov/2018:10:47:31 +] "GET 
> /admin/ HTTP/1.1" 500 0 "-" "-" [2018-11-15 10:47:38,590] ERROR in app: 
> Exception on /favicon.ico [GET] Traceback (most recent call last):   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request     rv = self.dispatch_request()   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1590, in 
> dispatch_request     self.raise_routing_exception(req)   File 
> "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1573, in 
> raise_routing_exception     raise request.routing_exception   File 
> "/usr/local/lib/python3.5/site-packages/flask/ctx.py", line 294, 

[jira] [Commented] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700551#comment-16700551
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3164:


Because LDAP without TLS transmits ever users password in plain text over the 
network where it could be sniffed.

In following releases (2.0.0 onwards) this version of the webserver is going to 
be removed and replaced with Flask-AppBuilder so login will need changing 
anyway.

https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-ldap

Thinking about it you could switch to this new UI already (since 1.10.0. See 
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#new-webserver-ui-with-role-based-access-control)
 - it may be less work and would future proof you more.

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3405) Task instance fail intermittently due to MySQL error

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700466#comment-16700466
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3405:


If you are up for trying the bleeding edge version of Airflow (warning! there 
may be bugs in there!) then this PR[1] may help by reducing the number of pool 
slots you need - it should be possible to run with a much much smaller SQLA 
pool size in theory.

[1]: https://github.com/apache/incubator-airflow/pull/4234

> Task instance fail intermittently due to MySQL error
> 
>
> Key: AIRFLOW-3405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3405
> Project: Apache Airflow
>  Issue Type: Improvement
> Environment: MySQL, Redhat Linux
>Reporter: Yuvaraj
>Priority: Major
>  Labels: performance, usability
>
> Dags are getting failed intermittently due to below error. 
> OperationalError: (_mysql_exceptions.OperationalError) (1040, 'Too many 
> connections')
> [2018-11-25 12:24:16,952] - Heartbeat time limited exceeded!
> We have max_connections defined as 2000 in DB. 
> Below are the setting in cfg.
> sql_alchemy_pool_size = 1980
> sql_alchemy_pool_recycle = 3600
> As per DBA, The airflow scheduler keeps opening connections to the database, 
> these connections are mostly idle, they get reset whenever the scheduler 
> restarts but with max_connections at 2000 and scheduler holding on to 1600 of 
> these, other apps trying to connect might start running out of connections.
> How do we remediate these idle connections. What should be the optimal value 
> for these configs and max_connections that to be set at DB. Consider we need 
> to build a large environment serving 500+ definitions with 1+ runs per 
> day. Need suggestions...  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3405) Task instance fail intermittently due to MySQL error

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700460#comment-16700460
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3405:


That is a huge number of connections and is something we'd want to fix.

What version of Airflow are you running on?

> Task instance fail intermittently due to MySQL error
> 
>
> Key: AIRFLOW-3405
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3405
> Project: Apache Airflow
>  Issue Type: Improvement
> Environment: MySQL, Redhat Linux
>Reporter: Yuvaraj
>Priority: Major
>  Labels: performance, usability
>
> Dags are getting failed intermittently due to below error. 
> OperationalError: (_mysql_exceptions.OperationalError) (1040, 'Too many 
> connections')
> [2018-11-25 12:24:16,952] - Heartbeat time limited exceeded!
> We have max_connections defined as 2000 in DB. 
> Below are the setting in cfg.
> sql_alchemy_pool_size = 1980
> sql_alchemy_pool_recycle = 3600
> As per DBA, The airflow scheduler keeps opening connections to the database, 
> these connections are mostly idle, they get reset whenever the scheduler 
> restarts but with max_connections at 2000 and scheduler holding on to 1600 of 
> these, other apps trying to connect might start running out of connections.
> How do we remediate these idle connections. What should be the optimal value 
> for these configs and max_connections that to be set at DB. Consider we need 
> to build a large environment serving 500+ definitions with 1+ runs per 
> day. Need suggestions...  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700458#comment-16700458
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3164:


Airflow doesn't follow SemVer, and doesn't claim to.

I'm sorry this broke your install, but we have decided that sending passwords 
in plain text over the network is bad security practice and security should win 
over not breaking some installs in this case.

If you need help setting up the custom backend let me know and I can run you 
through it.

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1703) Airflow LocalExecutor crashes after 3 hours of work. Database is locked

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1703.

Resolution: Fixed

That was possibly the problem, and we now don't allow sqlite with anything 
other than the Sequential Executor.

> Airflow LocalExecutor crashes after 3 hours of work. Database is locked
> ---
>
> Key: AIRFLOW-1703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1703
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, worker
>Affects Versions: 1.8.0
> Environment: Single CentOS virtual server
>Reporter: Kirill Dubovikov
>Priority: Major
> Attachments: nohup.out
>
>
> Airflow consistently crashes after working several hours on a single node 
> when using SQLite DB. Our DAG is scheduled to run {{@daily}}. We launch 
> airflow using the following commands
> {code:sh}
> airflow scheduler
> airflow webserver -p 8080
> {code}
> After a while worker and webserver crash with the following error: 
> {{sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is 
> locked [SQL: 'SELECT connection.conn_id AS connection_conn_id \nFROM 
> connection GROUP BY connection.conn_id']}}
> I've attached full logs for further investigation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700407#comment-16700407
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3164:


> invalid CA public key file

What are you providing to the cacert config option? It sounds like it is not a 
PEM-encoded certificate.

(Security should not be optional)

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3403) create athena sensor

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700301#comment-16700301
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3403:


Having a sensor is no bad thing and we'll review and merge the PR, but if the 
operator is not failing when the query fails (sorry missed that detail) that is 
a bug in the operator that should be fixed too.

> create athena sensor
> 
>
> Key: AIRFLOW-3403
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3403
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: tal
>Assignee: tal
>Priority: Minor
>
> It will be nice to have an Athena sensor to monitor the progress of the query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3403) create athena sensor

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700137#comment-16700137
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3403:


The current behaviour of EMR is the outlier -- all the other operators (for BQ, 
etc etc. run to completion)

> create athena sensor
> 
>
> Key: AIRFLOW-3403
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3403
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: tal
>Assignee: tal
>Priority: Minor
>
> It will be nice to have an Athena sensor to monitor the progress of the query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3403) create athena sensor

2018-11-27 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700127#comment-16700127
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3403:


Is this to go with the current AthenaOperator? I thought that started a query 
and ran it to completion?

> create athena sensor
> 
>
> Key: AIRFLOW-3403
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3403
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: tal
>Assignee: tal
>Priority: Minor
>
> It will be nice to have an Athena sensor to monitor the progress of the query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3319) Kubernetes Executor attempts to get the "try_number" from labels but fails

2018-11-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3319:
---
Affects Version/s: 1.10.1
Fix Version/s: 1.10.2

> Kubernetes Executor attempts to get the "try_number" from labels but fails
> --
>
> Key: AIRFLOW-3319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor, kubernetes
>Affects Versions: 1.10.1
>Reporter: Bo Blanton
>Priority: Major
> Fix For: 1.10.2
>
>
> The {{_labels_to_key}} function attempts to use the `try_number` from the k8 
> labels, however no such label is applied to the pod resulting in an exception.
> Modify pod executor to add this label to fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1823) API get_task_info is incompatible with manual runs created by UI

2018-11-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1823.

   Resolution: Fixed
Fix Version/s: 1.10.0

> API get_task_info is incompatible with manual runs created by UI
> 
>
> Key: AIRFLOW-1823
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1823
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.0.0
> Environment: ubuntu
> Airflow 1.9rc02
> commit: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126
>Reporter: Jeremy Lewi
>Assignee: Iuliia Volkova
>Priority: Minor
> Fix For: 1.10.0
>
>
> The API method 
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  doesn't work with manual runs created by the UI.
> The UI creates dag runs with ids with sub second precision in the name. An 
> example of a run created by the UI is
> 2017-11-16T20:23:32.045330
> The endpoint for  
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  however assumes the dag run id is of the form '%Y-%m-%dT%H:%M:%S'.
> Runs triggered via the CLI generate run ids with the form expected by the API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3302) Small CSS fixes

2018-11-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3302.

Resolution: Fixed

> Small CSS fixes
> ---
>
> Key: AIRFLOW-3302
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3302
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3118) DAGs not successful on new installation

2018-11-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3118.

Resolution: Duplicate

> DAGs not successful on new installation
> ---
>
> Key: AIRFLOW-3118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.0
> Environment: Ubuntu 18.04
> Python 3.6
>Reporter: Brylie Christopher Oxley
>Assignee: Huy Nguyen
>Priority: Blocker
> Fix For: 1.10.2
>
> Attachments: Screenshot_20180926_161837.png, 
> image-2018-09-26-12-39-03-094.png
>
>
> When trying out Airflow, on localhost, none of the DAG runs are getting to 
> the 'success' state. They are getting stuck in 'running', or I manually label 
> them as failed:
> !image-2018-09-26-12-39-03-094.png!
> h2. Steps to reproduce
>  # create new conda environment
>  ** conda create -n airflow
>  ** source activate airflow
>  # install airflow
>  ** pip install apache-airflow
>  # initialize Airflow db
>  ** airflow initdb
>  # disable default paused setting in airflow.cfg
>  ** dags_are_paused_at_creation = False
>  # {color:#6a8759}run airflow and airflow scheduler (in separate 
> terminal){color}
>  ** {color:#6a8759}airflow scheduler{color}
>  ** {color:#6a8759}airflow webserver{color}
>  # {color:#6a8759}unpause example_bash_operator{color}
>  ** {color:#6a8759}airflow unpause example_bash_operator{color}
>  # {color:#6a8759}log in to Airflow UI{color}
>  # {color:#6a8759}turn on example_bash_operator{color}
>  # {color:#6a8759}click "Trigger DAG" in `example_bash_operator` row{color}
> h2. {color:#6a8759}Observed result{color}
> {color:#6a8759}The `example_bash_operator` never leaves the "running" 
> state.{color}
> h2. {color:#6a8759}Expected result{color}
> {color:#6a8759}The `example_bash_operator` would quickly enter the "success" 
> state{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   >