[jira] [Closed] (AIRFLOW-1204) Scheduler ignores start_date if an earlier successful dag run exists

2017-05-19 Thread Bryan Vanderhoof (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Vanderhoof closed AIRFLOW-1204.
-
Resolution: Not A Bug

> Scheduler ignores start_date if an earlier successful dag run exists
> 
>
> Key: AIRFLOW-1204
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1204
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Bryan Vanderhoof
>
> I have repeatedly run into a problem with the scheduler automatically 
> scheduling DAGs before their start_date if a previous run of the DAG exists 
> in the database.
> For example, if a new DAG has a start_date of 2017-05-01, it will run 
> starting on that date as expected. However, if there's a existing run for 
> execution_date 2017-04-01, the scheduler will automatically generate dag runs 
> for 2017-04-02 through 2017-05-01, completely ignoring the start_date.
> This also happens with backfills. Today, I began backfilling data for January 
> 2017 on a DAG with a start date of 2017-05-01. The backfill began as 
> expected, but as soon as some of the January tasks began completing, the 
> scheduler also created dag runs for every day in February, March, and April.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-645) HttpHook ignores https

2017-05-19 Thread John Zeringue (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017541#comment-16017541
 ] 

John Zeringue commented on AIRFLOW-645:
---

I've taken a stab at this 
[here|https://github.com/apache/incubator-airflow/pull/2311]. It's not as 
straightforward as you'd like, because it seems like the schema has been 
encoded in the host historically.

> HttpHook ignores https
> --
>
> Key: AIRFLOW-645
> URL: https://issues.apache.org/jira/browse/AIRFLOW-645
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 2.0, Airflow 1.7.1
>Reporter: Ryan Morlok
>
> When loading an https connection from an environment variable, HttpHook 
> leverages BaseHook's get_connection(...) method which will load the string 
> from the environment variable. It will then parse the URI.
> HttpHook will then use the base_url as the connection's host, which always 
> strips off the protocol. It does a useless check to see if the bsae_url 
> starts with http, and since it doesn't it always appends http://, losing the 
> https.
> I think 
> self.base_url = conn.host
> in http_hook.py should be updated to:
> self.base_url = conn.conn_type + conn.host



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-342) exception in 'airflow scheduler' : Connection reset by peer

2017-05-19 Thread Samuel Griek (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017477#comment-16017477
 ] 

Samuel Griek commented on AIRFLOW-342:
--

The same exact issue is happening to me.  But I'm not using docker (not yet 
anyway).  Airflow 1.8.0, Python 2.7.12, RabbitMQ 3.5.7, and Celery 4.0.2

We are evaluating airflow as a possible tool to replace a paid license 
orchestration tool but this is not encouraging.  Can someone help?

Thank you!



>  exception in 'airflow scheduler' : Connection reset by peer
> 
>
> Key: AIRFLOW-342
> URL: https://issues.apache.org/jira/browse/AIRFLOW-342
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: Airflow 1.7.1.3
> Environment: OS: Red Hat Enterprise Linux Server 7.2 (Maipo)
> Python: 2.7.5
> Airflow: 1.7.1.3
>Reporter: Hila Visan
>Assignee: Hila Visan
>
> 'airflow scheduler' command throws an exception when running it. 
> Despite the exception, the workers run the tasks from the queues as expected.
> Error details:
>  
> [2016-06-30 19:00:10,130] {jobs.py:758} ERROR - [Errno 104] Connection reset 
> by peer
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 755, in 
> _execute
> executor.heartbeat()
>   File "/usr/lib/python2.7/site-packages/airflow/executors/base_executor.py", 
> line 107, in heartbeat
> self.sync()
>   File 
> "/usr/lib/python2.7/site-packages/airflow/executors/celery_executor.py", line 
> 74, in sync
> state = async.state
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 394, in state
> return self._get_task_meta()['status']
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 339, in 
> _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
>   File "/usr/lib/python2.7/site-packages/celery/backends/amqp.py", line 163, 
> in get_task_meta
> binding.declare()
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 521, in 
> declare
>self.exchange.declare(nowait)
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 174, in 
> declare
> nowait=nowait, passive=passive,
>   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 615, in 
> exchange_declare
> self._send_method((40, 10), args)
>   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, 
> in _send_method
> self.channel_id, method_sig, args, content,
>   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, 
> in write_method
> write_frame(1, channel, payload)
>   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in 
> write_frame
> frame_type, channel, size, payload, 0xce,
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 104] Connection reset by peer
> [2016-06-30 19:00:10,131] {jobs.py:759} ERROR - Tachycardia!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-1231) Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove deprecation warning

2017-05-19 Thread Stanislav Kudriashev (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1231 started by Stanislav Kudriashev.
-
> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove 
> deprecation warning
> 
>
> Key: AIRFLOW-1231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1231
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect.
> {code}
> ...app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been 
> renamed to "CSRFProtect" and will be removed in 1.0.
>   csrf = CsrfProtect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1231) Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove deprecation warning

2017-05-19 Thread Stanislav Kudriashev (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kudriashev updated AIRFLOW-1231:
--
Component/s: core

> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove 
> deprecation warning
> 
>
> Key: AIRFLOW-1231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1231
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect.
> {code}
> ...app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been 
> renamed to "CSRFProtect" and will be removed in 1.0.
>   csrf = CsrfProtect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1231) Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove deprecation warning

2017-05-19 Thread Stanislav Kudriashev (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kudriashev updated AIRFLOW-1231:
--
Fix Version/s: Airflow 1.8

> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove 
> deprecation warning
> 
>
> Key: AIRFLOW-1231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1231
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect.
> {code}
> ...app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been 
> renamed to "CSRFProtect" and will be removed in 1.0.
>   csrf = CsrfProtect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1231) Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove deprecation warning

2017-05-19 Thread Stanislav Kudriashev (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kudriashev updated AIRFLOW-1231:
--
Affects Version/s: Airflow 1.8

> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove 
> deprecation warning
> 
>
> Key: AIRFLOW-1231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1231
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Stanislav Kudriashev
>Assignee: Stanislav Kudriashev
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect.
> {code}
> ...app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been 
> renamed to "CSRFProtect" and will be removed in 1.0.
>   csrf = CsrfProtect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1231) Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect to remove deprecation warning

2017-05-19 Thread Stanislav Kudriashev (JIRA)
Stanislav Kudriashev created AIRFLOW-1231:
-

 Summary: Use flask_wtf.CSRFProtect instead of 
flask_wtf.CsrfProtect to remove deprecation warning
 Key: AIRFLOW-1231
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1231
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Stanislav Kudriashev
Assignee: Stanislav Kudriashev
Priority: Minor


Use flask_wtf.CSRFProtect instead of flask_wtf.CsrfProtect.

{code}
...app.py:23: FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been 
renamed to "CSRFProtect" and will be removed in 1.0.
  csrf = CsrfProtect()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1230) Upstream_failed tasks are not executed when the DAG is restarted after failure

2017-05-19 Thread Rostislaw Krassow (JIRA)
Rostislaw Krassow created AIRFLOW-1230:
--

 Summary: Upstream_failed tasks are not executed when the DAG is 
restarted after failure
 Key: AIRFLOW-1230
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1230
 Project: Apache Airflow
  Issue Type: Bug
  Components: DAG, DagRun
Affects Versions: 1.8.1, 1.8.0
 Environment: CentOS release 6.8 (Final)
Python 2.7.10

Reporter: Rostislaw Krassow
 Attachments: DAG_cleared.gif, DAG_failed.gif, 
DAG_partly_successful.gif, new_example_bash_operator.py

The issue is reproducible with Airflow 1.8.0 and 1.8.1.

Steps to reproduce:
1. Use the attached DAG 
[new_example_bash_operator|^new_example_bash_operator.py]. This is a modified 
standard example DAG. The task run_before_loop will fail because it contains an 
error.
2. Execute the DAG:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02

The task run_before_loop fails as expected. The DAG fails. The screenshot of 
the UI is attached. !DAG_failed.gif!
All dependend tasks  like runme_0, runme_1, runme_2 are going to state 
"upstream_failed".

3. Fix the BashOperator in the task run_before_loop (just put "echo 1" as 
bash_command).

4. Execute the DAG again:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02

Expected behavior:
Restart of the DAG leads to execution of all failed tasks including 
upstream_failed tasks.
Observed behavior:
1. The failed task is not restarted.
2. All dependend tasks are not restarted.
3. In order to get the DAG reexecuted its state must be cleared manually:
airflow clear -f -c new_example_bash_operator -s 2017-05-02 -e 2017-05-02  
!DAG_cleared.gif!

After the clearance the same DAG can be restarted:
airflow backfill new_example_bash_operator -s 2017-05-02 -e 2017-05-02

Then the task run_before_loop is executed. All other tasks still remain in 
state "upstream_failed". !DAG_partly_successful.gif!

To get all tasks executed their state must be cleared explicitely.

Conclusion:
This is a blocker issue for productive usage. We run several dozens of DAGs 
with high number of tasks. In the production environment there are always 
failed tasks. In such cases the restart of the DAG must be simple possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1229) Make "Run Id" column clickable in Browse -> DAG Runs

2017-05-19 Thread Erik Cederstrand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Cederstrand updated AIRFLOW-1229:
--
Description: 
I'm triggering a lot of DAGs manually using "airflow trigger_dag my_dag 
--run_id=some_unique_id". I would like to be able in the UI to browse easily to 
this specific DAG run using the "some_unique_id" label. In the graph page of 
the DAG, I need to know the exact execution date, which is inconvenient, and in 
the Browse -> DAG Runs page I can search by "some_unique_id", but the "Run Ids" 
column is not clickable.

The attached patch makes the aforementioned column clickable, so I'm sent 
directly to the graph view for that specific DAG run, not the DAG in general.

  was:
I'm triggering a lot of DAGs manually using "airflow trigger_dag my_dag 
--run_id=some_unique_id". I would like to be able in the UI to browse easily to 
this specific DAG run using the "some_unique_id" label. In the graph page the 
the DAG, I need to know the execution date, and in the Browse -> DAG Runs page 
I can search by "some_unique_id", but the "Run Ids" column is not clickable.

The attached patch makes the aforementioned column clickable, so I'm sent 
directly to the graph view for that specific run, not the DAG in general.


> Make "Run Id" column clickable in Browse -> DAG Runs
> 
>
> Key: AIRFLOW-1229
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1229
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: Airflow 1.8
> Environment: Python3.4
>Reporter: Erik Cederstrand
>  Labels: patch
> Attachments: dag_run_link.patch
>
>
> I'm triggering a lot of DAGs manually using "airflow trigger_dag my_dag 
> --run_id=some_unique_id". I would like to be able in the UI to browse easily 
> to this specific DAG run using the "some_unique_id" label. In the graph page 
> of the DAG, I need to know the exact execution date, which is inconvenient, 
> and in the Browse -> DAG Runs page I can search by "some_unique_id", but the 
> "Run Ids" column is not clickable.
> The attached patch makes the aforementioned column clickable, so I'm sent 
> directly to the graph view for that specific DAG run, not the DAG in general.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1229) Make "Run Id" column clickable in Browse -> DAG Runs

2017-05-19 Thread Erik Cederstrand (JIRA)
Erik Cederstrand created AIRFLOW-1229:
-

 Summary: Make "Run Id" column clickable in Browse -> DAG Runs
 Key: AIRFLOW-1229
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1229
 Project: Apache Airflow
  Issue Type: Improvement
  Components: webapp
Affects Versions: Airflow 1.8
 Environment: Python3.4
Reporter: Erik Cederstrand
 Attachments: dag_run_link.patch

I'm triggering a lot of DAGs manually using "airflow trigger_dag my_dag 
--run_id=some_unique_id". I would like to be able in the UI to browse easily to 
this specific DAG run using the "some_unique_id" label. In the graph page the 
the DAG, I need to know the execution date, and in the Browse -> DAG Runs page 
I can search by "some_unique_id", but the "Run Ids" column is not clickable.

The attached patch makes the aforementioned column clickable, so I'm sent 
directly to the graph view for that specific run, not the DAG in general.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-342) exception in 'airflow scheduler' : Connection reset by peer

2017-05-19 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-342:
-

Assignee: Hila Visan

>  exception in 'airflow scheduler' : Connection reset by peer
> 
>
> Key: AIRFLOW-342
> URL: https://issues.apache.org/jira/browse/AIRFLOW-342
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, scheduler
>Affects Versions: Airflow 1.7.1.3
> Environment: OS: Red Hat Enterprise Linux Server 7.2 (Maipo)
> Python: 2.7.5
> Airflow: 1.7.1.3
>Reporter: Hila Visan
>Assignee: Hila Visan
>
> 'airflow scheduler' command throws an exception when running it. 
> Despite the exception, the workers run the tasks from the queues as expected.
> Error details:
>  
> [2016-06-30 19:00:10,130] {jobs.py:758} ERROR - [Errno 104] Connection reset 
> by peer
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 755, in 
> _execute
> executor.heartbeat()
>   File "/usr/lib/python2.7/site-packages/airflow/executors/base_executor.py", 
> line 107, in heartbeat
> self.sync()
>   File 
> "/usr/lib/python2.7/site-packages/airflow/executors/celery_executor.py", line 
> 74, in sync
> state = async.state
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 394, in state
> return self._get_task_meta()['status']
>   File "/usr/lib/python2.7/site-packages/celery/result.py", line 339, in 
> _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
>   File "/usr/lib/python2.7/site-packages/celery/backends/amqp.py", line 163, 
> in get_task_meta
> binding.declare()
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 521, in 
> declare
>self.exchange.declare(nowait)
>   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 174, in 
> declare
> nowait=nowait, passive=passive,
>   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 615, in 
> exchange_declare
> self._send_method((40, 10), args)
>   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, 
> in _send_method
> self.channel_id, method_sig, args, content,
>   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, 
> in write_method
> write_frame(1, channel, payload)
>   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in 
> write_frame
> frame_type, channel, size, payload, 0xce,
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 104] Connection reset by peer
> [2016-06-30 19:00:10,131] {jobs.py:759} ERROR - Tachycardia!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1228) Fix readthedocs timeout

2017-05-19 Thread Maxime Beauchemin (JIRA)
Maxime Beauchemin created AIRFLOW-1228:
--

 Summary: Fix readthedocs timeout
 Key: AIRFLOW-1228
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1228
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Maxime Beauchemin


RTD is a service that build documentation automatically on the latest master 
and allow for serving specific versions of the docs. This will allow us to have 
each Apache release have its own documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)