[jira] [Assigned] (AIRFLOW-840) Python3 encoding issue in Kerberos

2017-03-15 Thread Alexander Bij (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bij reassigned AIRFLOW-840:
-

Assignee: Alexander Bij  (was: Bolke de Bruin)

> Python3 encoding issue in Kerberos
> --
>
> Key: AIRFLOW-840
> URL: https://issues.apache.org/jira/browse/AIRFLOW-840
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: $ python --version
> Python 3.4.3
>Reporter: Erik Cederstrand
>Assignee: Alexander Bij
>  Labels: security
>
> While attempting to configure Kerberos ticket renewal in a Python3 
> environment, I encountered this encoding issue trying to run {{airflow 
> kerberos}}:
> {code:none}
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 15, in 
> args.func(args)
>   File "/usr/local/lib/python3.4/dist-packages/airflow/bin/cli.py", line 600, 
> in kerberos
> airflow.security.kerberos.run()
>   File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", 
> line 110, in run
> renew_from_kt()
>   File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", 
> line 55, in renew_from_kt
> "\n".join(subp.stderr.readlines(
> TypeError: sequence item 0: expected str instance, bytes found
> {code}
> The issue here (ignoring for a moment why {{kinit}} is failing on my machine) 
> is that Popen in Python3 returns {{bytes}} for stdin/stdout, but both are 
> handled as if they are {{str}}.
> I'm unsure what the Py2/3 compat policy is at Airflow, but a simple {{from 
> six import PY2}} and an if/else seems like the least intrusive fix. The 
> non-PY2 path would then add something like 
> {{subp.stdin.readlines().decode(errors='ignore')}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)
with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}



> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}
> Furthermore, it may make sense for the task start date to always be the later 
> of the task start date and the dag start date; similarly for the end date 
> (but using the earlier date)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)
with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}



> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> Furthermore, it may make sense for the task start date to always be the later 
> of the task start date and the dag start date; similarly for the end date 
> (but using the earlier date)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it would make sense for the DAG to set the task start_date as the 
later of the task's start date and its own start date; or the earlier for 
end_date.


> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-993:
--

 Summary: Dags should modify the start date and end date of tasks 
when they are added
 Key: AIRFLOW-993
 URL: https://issues.apache.org/jira/browse/AIRFLOW-993
 Project: Apache Airflow
  Issue Type: Bug
  Components: DAG
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor
 Fix For: 1.8.1


When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it would make sense for the DAG to set the task start_date as the 
later of the task's start date and its own start date; or the earlier for 
end_date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-992) Skipped tasks do not propagate correctly

2017-03-15 Thread Dan Davydov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927332#comment-15927332
 ] 

Dan Davydov commented on AIRFLOW-992:
-

This is the expected behavior after the semantics changes made here: 
https://github.com/apache/incubator-airflow/pull/2125 and later 
documented/clarified here: 
https://github.com/apache/incubator-airflow/pull/2151/files

> Skipped tasks do not propagate correctly
> 
>
> Key: AIRFLOW-992
> URL: https://issues.apache.org/jira/browse/AIRFLOW-992
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Priority: Critical
>
> We are seeing skipped tasks not being propagated correctly:
> E.g. 
> A->B
> `-->C
> Task A depends on task B and C
> If B gets skipped and C gets run then:
> Expected:
> A will get skipped
> EDIT: Upon further investigation this was caused by a change in the semantics 
> of ALL_SUCCESS, which I have these feelings about:
> Intuitively you would expect to skip any task that has dependencies that 
> weren't run by default, i.e. the trigger rule is called ALL_SUCCESS and 
> skipped tasks are not successful ones, and that was also the old behavior in 
> 1.7.3.
> This is going to break some use cases which could be alright, but I feel 
> these new semantics make less sense than before so it's a bad reason to break 
> existing use cases.
> I will get started on a PR for a new ALL_SUCCESS_NOT_SKIPPED trigger rule but 
> again I feel this is hacky and really we should have the old ALL_SUCCESS 
> (default) and a new ALL_SUCCESS_OR_SKIPPED trigger rule if desired.
> Actual:
> A gets run
> [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-992) Skipped tasks do not propagate correctly

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927323#comment-15927323
 ] 

Bolke de Bruin commented on AIRFLOW-992:


Do you have anything to reproduce it with? Preferably a test or a dag

> Skipped tasks do not propagate correctly
> 
>
> Key: AIRFLOW-992
> URL: https://issues.apache.org/jira/browse/AIRFLOW-992
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Priority: Critical
>
> We are seeing skipped tasks not being propagated correctly:
> E.g. 
> A->B
> `-->C
> Task A depends on task B and C
> If B gets skipped and C gets run then:
> Expected:
> A will get skipped
> Actual:
> A gets run
> [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-992) Skipped tasks do not propagate correctly

2017-03-15 Thread Dan Davydov (JIRA)
Dan Davydov created AIRFLOW-992:
---

 Summary: Skipped tasks do not propagate correctly
 Key: AIRFLOW-992
 URL: https://issues.apache.org/jira/browse/AIRFLOW-992
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Dan Davydov
Priority: Critical


We are seeing skipped tasks not being propagated correctly:

E.g. 
A->B
`-->C

Task A depends on task B and C

If B gets skipped and C gets run then:
Expected:
A will get skipped

Actual:
A gets run

[~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-991) Mark_success while a task is running leads to failure state

2017-03-15 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-991:
---

 Summary: Mark_success while a task is running leads to failure 
state
 Key: AIRFLOW-991
 URL: https://issues.apache.org/jira/browse/AIRFLOW-991
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-989] Do not mark dag run successful if unfinished tasks

2017-03-15 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master cadfae54b -> 3d6095ff5


[AIRFLOW-989] Do not mark dag run successful if unfinished tasks

Dag runs could be marked successful if all root
tasks were successful,
even if some tasks did not run yet, ie. in case of
clearing. Now
we consider unfinished_tasks, before marking
successful.

Closes #2154 from bolkedebruin/AIRFLOW-989


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3d6095ff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3d6095ff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3d6095ff

Branch: refs/heads/master
Commit: 3d6095ff5cf6eff0444d7e47a2360765f2953daf
Parents: cadfae5
Author: Bolke de Bruin 
Authored: Wed Mar 15 16:39:12 2017 -0700
Committer: Bolke de Bruin 
Committed: Wed Mar 15 16:39:12 2017 -0700

--
 airflow/models.py |  6 +++---
 tests/models.py   | 51 ++
 2 files changed, 54 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3d6095ff/airflow/models.py
--
diff --git a/airflow/models.py b/airflow/models.py
index 27a5670..ad3346a 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4091,9 +4091,9 @@ class DagRun(Base):
 logging.info('Marking run {} failed'.format(self))
 self.state = State.FAILED
 
-# if all roots succeeded, the run succeeded
-elif all(r.state in (State.SUCCESS, State.SKIPPED)
- for r in roots):
+# if all roots succeeded and no unfinished tasks, the run succeeded
+elif not unfinished_tasks and all(r.state in (State.SUCCESS, 
State.SKIPPED)
+  for r in roots):
 logging.info('Marking run {} successful'.format(self))
 self.state = State.SUCCESS
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3d6095ff/tests/models.py
--
diff --git a/tests/models.py b/tests/models.py
index 6fbbf3e..8ce08eb 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -259,6 +259,57 @@ class DagRunTest(unittest.TestCase):
 updated_dag_state = dag_run.update_state()
 self.assertEqual(State.SUCCESS, updated_dag_state)
 
+def test_dagrun_success_conditions(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_success_conditions',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+# A -> B
+# A -> C -> D
+# ordered: B, D, C, A or D, B, C, A or D, C, B, A
+with dag:
+op1 = DummyOperator(task_id='A')
+op2 = DummyOperator(task_id='B')
+op3 = DummyOperator(task_id='C')
+op4 = DummyOperator(task_id='D')
+op1.set_upstream([op2, op3])
+op3.set_upstream(op4)
+
+dag.clear()
+
+now = datetime.datetime.now()
+dr = dag.create_dagrun(run_id='test_dagrun_success_conditions',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+# op1 = root
+ti_op1 = dr.get_task_instance(task_id=op1.task_id)
+ti_op1.set_state(state=State.SUCCESS, session=session)
+
+ti_op2 = dr.get_task_instance(task_id=op2.task_id)
+ti_op3 = dr.get_task_instance(task_id=op3.task_id)
+ti_op4 = dr.get_task_instance(task_id=op4.task_id)
+
+# root is successful, but unfinished tasks
+state = dr.update_state()
+self.assertEqual(State.RUNNING, state)
+
+# one has failed, but root is successful
+ti_op2.set_state(state=State.FAILED, session=session)
+ti_op3.set_state(state=State.SUCCESS, session=session)
+ti_op4.set_state(state=State.SUCCESS, session=session)
+state = dr.update_state()
+self.assertEqual(State.SUCCESS, state)
+
+# upstream dependency failed, root has not run
+ti_op1.set_state(State.NONE, session)
+state = dr.update_state()
+self.assertEqual(State.FAILED, state)
+
 
 class DagBagTest(unittest.TestCase):
 



[jira] [Resolved] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-989.

Resolution: Fixed

Issue resolved by pull request #2154
[https://github.com/apache/incubator-airflow/pull/2154]

> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927191#comment-15927191
 ] 

ASF subversion and git services commented on AIRFLOW-989:
-

Commit 15600e42c805b222d6147b60376b56c8e708dcde in incubator-airflow's branch 
refs/heads/v1-8-test from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=15600e4 ]

[AIRFLOW-989] Do not mark dag run successful if unfinished tasks

Dag runs could be marked successful if all root
tasks were successful,
even if some tasks did not run yet, ie. in case of
clearing. Now
we consider unfinished_tasks, before marking
successful.

Closes #2154 from bolkedebruin/AIRFLOW-989

(cherry picked from commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf)
Signed-off-by: Bolke de Bruin 


> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927189#comment-15927189
 ] 

ASF subversion and git services commented on AIRFLOW-989:
-

Commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf in incubator-airflow's branch 
refs/heads/master from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3d6095f ]

[AIRFLOW-989] Do not mark dag run successful if unfinished tasks

Dag runs could be marked successful if all root
tasks were successful,
even if some tasks did not run yet, ie. in case of
clearing. Now
we consider unfinished_tasks, before marking
successful.

Closes #2154 from bolkedebruin/AIRFLOW-989


> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927190#comment-15927190
 ] 

ASF subversion and git services commented on AIRFLOW-989:
-

Commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf in incubator-airflow's branch 
refs/heads/master from [~bolke]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3d6095f ]

[AIRFLOW-989] Do not mark dag run successful if unfinished tasks

Dag runs could be marked successful if all root
tasks were successful,
even if some tasks did not run yet, ie. in case of
clearing. Now
we consider unfinished_tasks, before marking
successful.

Closes #2154 from bolkedebruin/AIRFLOW-989


> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-989] Do not mark dag run successful if unfinished tasks

2017-03-15 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-8-test 3b37cfa1f -> 15600e42c


[AIRFLOW-989] Do not mark dag run successful if unfinished tasks

Dag runs could be marked successful if all root
tasks were successful,
even if some tasks did not run yet, ie. in case of
clearing. Now
we consider unfinished_tasks, before marking
successful.

Closes #2154 from bolkedebruin/AIRFLOW-989

(cherry picked from commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf)
Signed-off-by: Bolke de Bruin 


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/15600e42
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/15600e42
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/15600e42

Branch: refs/heads/v1-8-test
Commit: 15600e42c805b222d6147b60376b56c8e708dcde
Parents: 3b37cfa
Author: Bolke de Bruin 
Authored: Wed Mar 15 16:39:12 2017 -0700
Committer: Bolke de Bruin 
Committed: Wed Mar 15 16:39:26 2017 -0700

--
 airflow/models.py |  6 +++---
 tests/models.py   | 51 ++
 2 files changed, 54 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15600e42/airflow/models.py
--
diff --git a/airflow/models.py b/airflow/models.py
index 7c6590f..42b8a7f 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4064,9 +4064,9 @@ class DagRun(Base):
 logging.info('Marking run {} failed'.format(self))
 self.state = State.FAILED
 
-# if all roots succeeded, the run succeeded
-elif all(r.state in (State.SUCCESS, State.SKIPPED)
- for r in roots):
+# if all roots succeeded and no unfinished tasks, the run succeeded
+elif not unfinished_tasks and all(r.state in (State.SUCCESS, 
State.SKIPPED)
+  for r in roots):
 logging.info('Marking run {} successful'.format(self))
 self.state = State.SUCCESS
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15600e42/tests/models.py
--
diff --git a/tests/models.py b/tests/models.py
index ffd1f31..1fbb3e6 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -259,6 +259,57 @@ class DagRunTest(unittest.TestCase):
 updated_dag_state = dag_run.update_state()
 self.assertEqual(State.SUCCESS, updated_dag_state)
 
+def test_dagrun_success_conditions(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_success_conditions',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+# A -> B
+# A -> C -> D
+# ordered: B, D, C, A or D, B, C, A or D, C, B, A
+with dag:
+op1 = DummyOperator(task_id='A')
+op2 = DummyOperator(task_id='B')
+op3 = DummyOperator(task_id='C')
+op4 = DummyOperator(task_id='D')
+op1.set_upstream([op2, op3])
+op3.set_upstream(op4)
+
+dag.clear()
+
+now = datetime.datetime.now()
+dr = dag.create_dagrun(run_id='test_dagrun_success_conditions',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+# op1 = root
+ti_op1 = dr.get_task_instance(task_id=op1.task_id)
+ti_op1.set_state(state=State.SUCCESS, session=session)
+
+ti_op2 = dr.get_task_instance(task_id=op2.task_id)
+ti_op3 = dr.get_task_instance(task_id=op3.task_id)
+ti_op4 = dr.get_task_instance(task_id=op4.task_id)
+
+# root is successful, but unfinished tasks
+state = dr.update_state()
+self.assertEqual(State.RUNNING, state)
+
+# one has failed, but root is successful
+ti_op2.set_state(state=State.FAILED, session=session)
+ti_op3.set_state(state=State.SUCCESS, session=session)
+ti_op4.set_state(state=State.SUCCESS, session=session)
+state = dr.update_state()
+self.assertEqual(State.SUCCESS, state)
+
+# upstream dependency failed, root has not run
+ti_op1.set_state(State.NONE, session)
+state = dr.update_state()
+self.assertEqual(State.FAILED, state)
+
 
 class DagBagTest(unittest.TestCase):
 



[jira] [Assigned] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-15 Thread Vitor Baptista (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitor Baptista reassigned AIRFLOW-990:
--

Assignee: Vitor Baptista

> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>Assignee: Vitor Baptista
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {noformat}
> A possible fix is to change that line to:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info(line.decode('utf-8').strip())
> {code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-15 Thread Vitor Baptista (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927126#comment-15927126
 ] 

Vitor Baptista commented on AIRFLOW-990:


Pull request for this issue sent on 
https://github.com/apache/incubator-airflow/pull/2155

> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {noformat}
> A possible fix is to change that line to:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info(line.decode('utf-8').strip())
> {code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-15 Thread Vitor Baptista (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitor Baptista updated AIRFLOW-990:
---
Description: 
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
{code}

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
{noformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
{noformat}

A possible fix is to change that line to:
{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info(line.decode('utf-8').strip())
{code}.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.

  was:
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
{code}

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
{noformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
{noformat}

A possible fix is to change that line to:
{code:python}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info(line.decode('utf-8').strip())
{code}.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.


> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {noformat}
> A possible fix is to change that line to:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info(line.decode('utf-8').strip())
> {code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-15 Thread Vitor Baptista (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitor Baptista updated AIRFLOW-990:
---
Description: 
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
{code}

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
{noformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
{noformat}

A possible fix is to change that line to:
{code:python}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info(line.decode('utf-8').strip())
{code}.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.

  was:
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
{code}

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
{preformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
{preformat}

A possible fix is to change that line to 
{code}logging.info(line.decode('utf-8').strip()){code}.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.


> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {noformat}
> A possible fix is to change that line to:
> {code:python}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info(line.decode('utf-8').strip())
> {code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-15 Thread Vitor Baptista (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitor Baptista updated AIRFLOW-990:
---
Description: 
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

{code:title=airflow/operators/docker_operator.py}
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
{code}

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
{preformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
{preformat}

A possible fix is to change that line to 
{code}logging.info(line.decode('utf-8').strip()){code}.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.

  was:
On line 
https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
 we're calling:

```
for line in self.cli.logs(container=self.container['Id'], stream=True):
logging.info("{}".format(line.strip()))
```

If `self.cli.logs()` return a string with a unicode character, this raises the 
UnicodeDecodeError:
```
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal 
not in range(128)
Logged from file docker_operator.py, line 165
```

A possible fix is to change that line to 
`logging.info(line.decode('utf-8').strip())`.

This error doesn't happen on Python3. I haven't tested, but reading the code it 
seems the same error exists on `master` as well.


> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {preformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {preformat}
> A possible fix is to change that line to 
> {code}logging.info(line.decode('utf-8').strip()){code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-989:
---
Fix Version/s: 1.8.1

> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-989:
---
Affects Version/s: (was: Airflow 1.8)
   1.8.0

> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
> Fix For: 1.8.1
>
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin reassigned AIRFLOW-989:
--

Assignee: Bolke de Bruin

> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-989 started by Bolke de Bruin.
--
> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Assignee: Bolke de Bruin
>Priority: Critical
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927030#comment-15927030
 ] 

Bolke de Bruin commented on AIRFLOW-989:


BTW You could try this (I didn't test it) in DagRun.update_state:

{code}
# if all roots succeeded and there are no unfinished tasks, the run 
succeeded
elif not unfinished_tasks and all(r.state in (State.SUCCESS, 
State.SKIPPED)
 for r in roots):
logging.info('Marking run {} successful'.format(self))
self.state = State.SUCCESS
{code}

> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Priority: Critical
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
>  
> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by 
> actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
> from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Siddharth Anand (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand updated AIRFLOW-989:

Description: 
There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
behavior. 
Consider the following test DAG : 
1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
2. Graph : 
https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0

The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared 
individually. the scheduler would pick  up and rerun the cleared tasks. In 1.8. 
unless the last task  in a DAG is cleared, none of the tasks in the DAG run are 
rerun.

In order for a task that is not the last task in the DAG to be rerun after 
being cleared, its terminal downstream task needs to be cleared. Another 
workaround is to use the CLI to rerun the cleared task.

Here are some screenshots to illustrate the regressed behavior:

Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate 
DAG run, clear the entire DAG Run.
After Clearing : 
https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
 

After the Scheduler Runs : 
https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0

You'll notice that only the DAG runs with the last task cleared completed by 
actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
from the left.

Use Case 2 : Clear d1 and d4 in the same DAG Run
After Clearing (c.f. 2nd from right DAG run): 
https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0

After the Scheduler Runs : 
https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0

  was:
There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
behavior. 

Consider the following test DAG : 
https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29

The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
first task and d4 is the last task.

Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would 
pick  up and rerun the cleared tasks.

In 1.8. unless the last task  in a DAG is cleared, none of the tasks in the DAG 
run are rerun.

In order for a task that is not the last task in the DAG to be rerun after 
being cleared, its terminal downstream task needs to be cleared. Another 
workaround is to use the CLI to rerun the cleared task.

Here are some screenshots to illustrate the regressed behavior:

Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate 
DAG run, clear the entire DAG Run.
After Clearing : 
https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
 

After the Scheduler Runs : 
https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0

You'll notice that only the DAG runs with the last task cleared completed by 
actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
from the left.

Use Case 2 : Clear d1 and d4 in the same DAG Run
After Clearing (c.f. 2nd from right DAG run): 
https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0

After the Scheduler Runs : 
https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0


> Clear Task Regression
> -
>
> Key: AIRFLOW-989
> URL: https://issues.apache.org/jira/browse/AIRFLOW-989
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Priority: Critical
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
> behavior. 
> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : 
> https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
> first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were 
> cleared individually. the scheduler would pick  up and rerun the cleared 
> tasks. In 1.8. unless the last task  in a DAG is cleared, none of the tasks 
> in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after 
> being cleared, its terminal downstream task needs to be cleared. Another 
> workaround is to use the CLI to rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th 
> separate DAG run, clear the entire DAG Run.
> After Clearing : 
> 

[jira] [Created] (AIRFLOW-989) Clear Task Regression

2017-03-15 Thread Siddharth Anand (JIRA)
Siddharth Anand created AIRFLOW-989:
---

 Summary: Clear Task Regression
 Key: AIRFLOW-989
 URL: https://issues.apache.org/jira/browse/AIRFLOW-989
 Project: Apache Airflow
  Issue Type: Bug
  Components: core
Affects Versions: Airflow 1.8
Reporter: Siddharth Anand
Priority: Critical


There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task 
behavior. 

Consider the following test DAG : 
https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29

The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the 
first task and d4 is the last task.

Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would 
pick  up and rerun the cleared tasks.

In 1.8. unless the last task  in a DAG is cleared, none of the tasks in the DAG 
run are rerun.

In order for a task that is not the last task in the DAG to be rerun after 
being cleared, its terminal downstream task needs to be cleared. Another 
workaround is to use the CLI to rerun the cleared task.

Here are some screenshots to illustrate the regressed behavior:

Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate 
DAG run, clear the entire DAG Run.
After Clearing : 
https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0
 

After the Scheduler Runs : 
https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0

You'll notice that only the DAG runs with the last task cleared completed by 
actually running cleared tasks. These are shown as the 1st and 5th DAG runs 
from the left.

Use Case 2 : Clear d1 and d4 in the same DAG Run
After Clearing (c.f. 2nd from right DAG run): 
https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0

After the Scheduler Runs : 
https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-980.
--
Resolution: Fixed

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> 

[jira] [Commented] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2017-03-15 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926575#comment-15926575
 ] 

Jeremiah Lowin commented on AIRFLOW-883:


I'm not totally sure this is a "bug" per se, though it is confusing. 

"default_args" are arguments that are passed to Operators by the parent DAG. 
Critically, that happens when the Operators are created. While bitshift 
operators allow deferred DAG assignment, the Operator in question has already 
been created. The reason the distinction matters is that the Operator's 
__init__ may include logic related to its arguments. If we pass/assign those 
arguments after initialization, the logic won't run. 

However, if we do want to tackle this:
1. The simplest thing would be to walk "default_args" and replace any matching 
Operator attributes that are None.
2. The more proper thing would be to defer Operator initialization until it is 
added to a DAG. This would require a bit of a refactor though.




> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: Daniel Huang
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reassigned AIRFLOW-883:
--

Assignee: Jeremiah Lowin

> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: Daniel Huang
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used

2017-03-15 Thread Zachary Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926392#comment-15926392
 ] 

Zachary Lawson edited comment on AIRFLOW-988 at 3/15/17 3:51 PM:
-

Alternatively, you could just filter to sla_miss records that have 
notification_sent = False given that in the comments it's stated:
{quote}
We consider email or the alert callback as notifications.
{quote}


was (Author: zmjlawson):
Alternatively, you could just filter to sla_miss records that have 
notification_sent = True given that in the comments it's stated:
{quote}
We consider email or the alert callback as notifications.
{quote}

> SLA Miss Callbacks Are Repeated if Email is Not being Used
> --
>
> Key: AIRFLOW-988
> URL: https://issues.apache.org/jira/browse/AIRFLOW-988
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Zachary Lawson
>
> There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
> module, if the system does not have email set up but does have a 
> sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
> that job infinitely as long as the airflow scheduler is running. The 
> offending code seems to be in the query to the airflow meta database which 
> filters to sla_miss records that have *either* email_sent or 
> notification_sent as false ([see lines 
> 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
>  but then executes the sla_miss_callback function regardless if 
> notification_sent was true ([see lines 
> 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
>  A conditional statement should be put prior to executing the 
> sla_miss_callback to check whether a notification has been sent to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used

2017-03-15 Thread Zachary Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926392#comment-15926392
 ] 

Zachary Lawson commented on AIRFLOW-988:


Alternatively, you could just filter to sla_miss records that have 
notification_sent = True given that in the comments it's stated:
{quote}
We consider email or the alert callback as notifications.
{quote}

> SLA Miss Callbacks Are Repeated if Email is Not being Used
> --
>
> Key: AIRFLOW-988
> URL: https://issues.apache.org/jira/browse/AIRFLOW-988
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Zachary Lawson
>
> There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
> module, if the system does not have email set up but does have a 
> sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
> that job infinitely as long as the airflow scheduler is running. The 
> offending code seems to be in the query to the airflow meta database which 
> filters to sla_miss records that have *either* email_sent or 
> notification_sent as false ([see lines 
> 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
>  but then executes the sla_miss_callback function regardless if 
> notification_sent was true ([see lines 
> 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
>  A conditional statement should be put prior to executing the 
> sla_miss_callback to check whether a notification has been sent to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used

2017-03-15 Thread Zachary Lawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zachary Lawson updated AIRFLOW-988:
---
Summary: SLA Miss Callbacks Are Repeated if Email is Not being Used  (was: 
SLA Misses Are Repeated if Email is Not being Used)

> SLA Miss Callbacks Are Repeated if Email is Not being Used
> --
>
> Key: AIRFLOW-988
> URL: https://issues.apache.org/jira/browse/AIRFLOW-988
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Zachary Lawson
>
> There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
> module, if the system does not have email set up but does have a 
> sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
> that job infinitely as long as the airflow scheduler is running. The 
> offending code seems to be in the query to the airflow meta database which 
> filters to sla_miss records that have *either* email_sent or 
> notification_sent as false ([see lines 
> 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
>  but then executes the sla_miss_callback function regardless if 
> notification_sent was true ([see lines 
> 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
>  A conditional statement should be put prior to executing the 
> sla_miss_callback to check whether a notification has been sent to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-988) SLA Misses Are Repeated if Email is Not being Used

2017-03-15 Thread Zachary Lawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zachary Lawson updated AIRFLOW-988:
---
Description: There is an issue in the current v1-8-stable branch. Looking 
at the jobs.py module, if the system does not have email set up but does have a 
sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
that job infinitely as long as the airflow scheduler is running. The offending 
code seems to be in the query to the airflow meta database which filters to 
sla_miss records that have *either* email_sent or notification_sent as false 
([see lines 
606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
 but then executes the sla_miss_callback function regardless if 
notification_sent was true ([see lines 
644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
 A conditional statement should be put prior to executing the sla_miss_callback 
to check whether a notification has been sent to prevent this.  (was: There is 
an issue in the current v1-8-stable branch. Looking at the jobs.py module, if 
the system does not have email set up but does have a sla_miss_callback defined 
in the DAG, that sla_miss_callback is repeated for that job infinitely as long 
as the airflow scheduler is running. The offending code seems to be in the 
query to the airflow meta database which filters to sla_miss records that have 
*either* email_sent or notification_sent as false ([see lines 
606-613|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L606-L613]),
 but then executes the sla_miss_callback function regardless if 
notification_sent was true ([see lines 
644-648|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L644-L648]).
 A conditional statement should be put prior to executing the sla_miss_callback 
to check whether a notification has been sent to prevent this.)

> SLA Misses Are Repeated if Email is Not being Used
> --
>
> Key: AIRFLOW-988
> URL: https://issues.apache.org/jira/browse/AIRFLOW-988
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Zachary Lawson
>
> There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
> module, if the system does not have email set up but does have a 
> sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
> that job infinitely as long as the airflow scheduler is running. The 
> offending code seems to be in the query to the airflow meta database which 
> filters to sla_miss records that have *either* email_sent or 
> notification_sent as false ([see lines 
> 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]),
>  but then executes the sla_miss_callback function regardless if 
> notification_sent was true ([see lines 
> 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]).
>  A conditional statement should be put prior to executing the 
> sla_miss_callback to check whether a notification has been sent to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-988) SLA Misses Are Repeated if Email is Not being Used

2017-03-15 Thread Zachary Lawson (JIRA)
Zachary Lawson created AIRFLOW-988:
--

 Summary: SLA Misses Are Repeated if Email is Not being Used
 Key: AIRFLOW-988
 URL: https://issues.apache.org/jira/browse/AIRFLOW-988
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: Airflow 1.8
Reporter: Zachary Lawson


There is an issue in the current v1-8-stable branch. Looking at the jobs.py 
module, if the system does not have email set up but does have a 
sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for 
that job infinitely as long as the airflow scheduler is running. The offending 
code seems to be in the query to the airflow meta database which filters to 
sla_miss records that have *either* email_sent or notification_sent as false 
([see lines 
606-613|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L606-L613]),
 but then executes the sla_miss_callback function regardless if 
notification_sent was true ([see lines 
644-648|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L644-L648]).
 A conditional statement should be put prior to executing the sla_miss_callback 
to check whether a notification has been sent to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-903) Add configuration setting for default DAG view.

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-903.

   Resolution: Fixed
Fix Version/s: (was: Airflow 1.8)
   1.9.0

Issue resolved by pull request #2103
[https://github.com/apache/incubator-airflow/pull/2103]

> Add configuration setting for default DAG view.
> ---
>
> Key: AIRFLOW-903
> URL: https://issues.apache.org/jira/browse/AIRFLOW-903
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.8
>Reporter: Jason Kromm
>Assignee: Jason Kromm
>Priority: Minor
> Fix For: 1.9.0
>
>
> The default view when clicking on a DAG used to be graph view, it is now tree 
> view instead.  There should be a configuration settings of default_dag_view = 
> ['tree','graph','duration','gant','landing_times']



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-903) Add configuration setting for default DAG view.

2017-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926086#comment-15926086
 ] 

ASF subversion and git services commented on AIRFLOW-903:
-

Commit cadfae54bc0f8bf01582733135595e1d34b3b3fe in incubator-airflow's branch 
refs/heads/master from [~jakromm]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=cadfae5 ]

[AIRFLOW-903] New configuration setting for the default dag view

Added a new configuration setting for the default
view a dag should display when clicked on the
index page.

Make sure we do lower for jinja url_for function

Closes #2103 from jakromm/master


> Add configuration setting for default DAG view.
> ---
>
> Key: AIRFLOW-903
> URL: https://issues.apache.org/jira/browse/AIRFLOW-903
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.8
>Reporter: Jason Kromm
>Assignee: Jason Kromm
>Priority: Minor
> Fix For: 1.9.0
>
>
> The default view when clicking on a DAG used to be graph view, it is now tree 
> view instead.  There should be a configuration settings of default_dag_view = 
> ['tree','graph','duration','gant','landing_times']



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-903] New configuration setting for the default dag view

2017-03-15 Thread jlowin
Repository: incubator-airflow
Updated Branches:
  refs/heads/master b17bd31d1 -> cadfae54b


[AIRFLOW-903] New configuration setting for the default dag view

Added a new configuration setting for the default
view a dag should display when clicked on the
index page.

Make sure we do lower for jinja url_for function

Closes #2103 from jakromm/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/cadfae54
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/cadfae54
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/cadfae54

Branch: refs/heads/master
Commit: cadfae54bc0f8bf01582733135595e1d34b3b3fe
Parents: b17bd31
Author: Jason Kromm 
Authored: Wed Mar 15 08:46:31 2017 -0400
Committer: Jeremiah Lowin 
Committed: Wed Mar 15 08:46:31 2017 -0400

--
 airflow/configuration.py| 5 +
 airflow/models.py   | 4 
 airflow/www/templates/airflow/dags.html | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/configuration.py
--
diff --git a/airflow/configuration.py b/airflow/configuration.py
index cfccbe9..fb3c11e 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -260,6 +260,10 @@ filter_by_owner = False
 # in order to user the ldapgroup mode.
 owner_mode = user
 
+# Default DAG view.  Valid values are:
+# tree, graph, duration, gantt, landing_times
+dag_default_view = tree
+
 # Default DAG orientation. Valid values are:
 # LR (Left->Right), TB (Top->Bottom), RL (Right->Left), BT (Bottom->Top)
 dag_orientation = LR
@@ -481,6 +485,7 @@ base_url = http://localhost:8080
 web_server_host = 0.0.0.0
 web_server_port = 8080
 dag_orientation = LR
+dag_default_view = tree
 log_fetch_timeout_sec = 5
 hide_paused_dags_by_default = False
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/models.py
--
diff --git a/airflow/models.py b/airflow/models.py
index 1244d60..27a5670 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -2632,6 +2632,8 @@ class DAG(BaseDag, LoggingMixin):
 :param sla_miss_callback: specify a function to call when reporting SLA
 timeouts.
 :type sla_miss_callback: types.FunctionType
+:param default_view: Specify DAG default view (tree, graph, duration, 
gantt, landing_times)
+:type default_view: string
 :param orientation: Specify DAG orientation in graph view (LR, TB, RL, BT)
 :type orientation: string
 :param catchup: Perform scheduler catchup (or only run latest)? Defaults 
to True
@@ -2652,6 +2654,7 @@ class DAG(BaseDag, LoggingMixin):
 'core', 'max_active_runs_per_dag'),
 dagrun_timeout=None,
 sla_miss_callback=None,
+default_view=configuration.get('webserver', 
'dag_default_view').lower(),
 orientation=configuration.get('webserver', 'dag_orientation'),
 catchup=configuration.getboolean('scheduler', 
'catchup_by_default'),
 params=None):
@@ -2695,6 +2698,7 @@ class DAG(BaseDag, LoggingMixin):
 self.max_active_runs = max_active_runs
 self.dagrun_timeout = dagrun_timeout
 self.sla_miss_callback = sla_miss_callback
+self.default_view = default_view
 self.orientation = orientation
 self.catchup = catchup
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/www/templates/airflow/dags.html
--
diff --git a/airflow/www/templates/airflow/dags.html 
b/airflow/www/templates/airflow/dags.html
index 379f153..8a5a346 100644
--- a/airflow/www/templates/airflow/dags.html
+++ b/airflow/www/templates/airflow/dags.html
@@ -73,7 +73,7 @@
 
 
 {% if dag_id in webserver_dags %}
-
+
 {{ dag_id }}
 
 {% else %}



[jira] [Commented] (AIRFLOW-979) Add GovTech GDS

2017-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926058#comment-15926058
 ] 

ASF subversion and git services commented on AIRFLOW-979:
-

Commit b17bd31d1f6a55eb36d156be3e3fe10bac77466c in incubator-airflow's branch 
refs/heads/master from chrissng
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b17bd31 ]

[AIRFLOW-979] Add GovTech GDS

Closes #2149 from chrissng/add-govtech-gds


> Add GovTech GDS
> ---
>
> Key: AIRFLOW-979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-979
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: docs
>Reporter: Chris Sng
>Assignee: Chris Sng
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.9.0
>
>
> Add to README.md:
> ```
> 1. [GovTech GDS](https://gds-gov.tech) 
> [[@chrissng](https://github.com/chrissng) & 
> [@datagovsg](https://github.com/datagovsg)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-979] Add GovTech GDS

2017-03-15 Thread jlowin
Repository: incubator-airflow
Updated Branches:
  refs/heads/master c44e2009e -> b17bd31d1


[AIRFLOW-979] Add GovTech GDS

Closes #2149 from chrissng/add-govtech-gds


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b17bd31d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b17bd31d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b17bd31d

Branch: refs/heads/master
Commit: b17bd31d1f6a55eb36d156be3e3fe10bac77466c
Parents: c44e200
Author: chrissng 
Authored: Wed Mar 15 08:33:33 2017 -0400
Committer: Jeremiah Lowin 
Committed: Wed Mar 15 08:33:43 2017 -0400

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b17bd31d/README.md
--
diff --git a/README.md b/README.md
index 2769df9..fb268c9 100644
--- a/README.md
+++ b/README.md
@@ -106,6 +106,7 @@ Currently **officially** using Airflow:
 1. [FreshBooks](https://github.com/freshbooks) 
[[@DinoCow](https://github.com/DinoCow)]
 1. [Gentner Lab](http://github.com/gentnerlab) 
[[@neuromusic](https://github.com/neuromusic)]
 1. [Glassdoor](https://github.com/Glassdoor) 
[[@syvineckruyk](https://github.com/syvineckruyk)]
+1. [GovTech GDS](https://gds-gov.tech) 
[[@chrissng](https://github.com/chrissng) & 
[@datagovsg](https://github.com/datagovsg)]
 1. [Gusto](https://gusto.com) [[@frankhsu](https://github.com/frankhsu)]
 1. [Handshake](https://joinhandshake.com/) 
[[@mhickman](https://github.com/mhickman)]
 1. [Handy](http://www.handy.com/careers/73115?gh_jid=73115_src=o5qcxn) 
[[@marcintustin](https://github.com/marcintustin) / 
[@mtustin-handy](https://github.com/mtustin-handy)]



[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925636#comment-15925636
 ] 

Ruslan Dautkhanov commented on AIRFLOW-987:
---

{quote}
btw you need to set the arguments in the config file, it doesn't accept them 
from the command line this way.
{quote}
yep, see my previous comment. thanks.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925635#comment-15925635
 ] 

Bolke de Bruin commented on AIRFLOW-987:


Ah ok, so yes this is an issue, but a fix won't be in 1.8.0

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin reopened AIRFLOW-987:


> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925630#comment-15925630
 ] 

Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:47 AM:


kerberos.py:39 - it always gets principal and keytab from configuration 
(airflow.cfg):
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39
 
{code}
"-t", configuration.get('kerberos', 'keytab'),   # specify keytab
"-c", configuration.get('kerberos', 'ccache'),   # specify 
credentials cache
{code}

Notice help for `airflow kerberos`:
{noformat}
$ airflow kerberos -h
[2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor
usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D]
[--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE]
[principal]
{noformat}

One can think that you can provide principal and keytab as `airflow kerberos` 
arguments - that's not true and it's a bug.

Although it's not a critical bug as I was able to make `airflow kerberos` 
working just by adding kerberos section in airflow.cfg

`airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` 
doesn't actually accept principal and keytab as arguments.

Thank you.


was (Author: tagar):
kerberos.py:39 - it always gets principal and keytab from configuration 
(airflow.cfg):
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39
 
{code}
"-t", configuration.get('kerberos', 'keytab'),   # specify keytab
"-c", configuration.get('kerberos', 'ccache'),   # specify 
credentials cache
{code}

Notice help for `airflow kerberos`:
{noformat}
$ airflow kerberos -h
[2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor
usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D]
[--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE]
[principal]
{noformat}

One can think that you can provide principal and keytab as `airflow kerberos` - 
that's not true and it's a bug.

Although it's not a critical bug as I was able to make `airflow kerberos` 
working just by adding kerberos section in airflow.cfg

`airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` 
doesn't actually accept principal and keytab as arguments.

Thank you.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925634#comment-15925634
 ] 

Bolke de Bruin commented on AIRFLOW-987:


btw you need to set the arguments in the config file, it doesn't accept them 
from the command line this way.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925630#comment-15925630
 ] 

Ruslan Dautkhanov commented on AIRFLOW-987:
---

kerberos.py:39 - it always gets principal and keytab from configuration 
(airflow.cfg):
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39
 
{code}
"-t", configuration.get('kerberos', 'keytab'),   # specify keytab
"-c", configuration.get('kerberos', 'ccache'),   # specify 
credentials cache
{code}

Notice help for `airflow kerberos`:
{noformat}
$ airflow kerberos -h
[2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor
usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D]
[--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE]
[principal]
{noformat}

One can think that you can provide principal and keytab as `airflow kerberos` - 
that's not true and it's a bug.

Although it's not a critical bug as I was able to make `airflow kerberos` 
working just by adding kerberos section in airflow.cfg

`airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` 
doesn't actually accept principal and keytab as arguments.

Thank you.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925625#comment-15925625
 ] 

Bolke de Bruin commented on AIRFLOW-987:


I cannot reproduce it and we are using kerberos in production. The error you 
are showing comes from kerberos and not from airflow. So you need to be very 
elaborate why you think this is an airflow bug and not a configuration issue.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608
 ] 

Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:24 AM:


I use kinit very often and familiar with the tool. 

kinit works fine outside of Airflow

{noformat}
$ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $?
0

rdautkha@pc1udatahgw01 airflow  $ klist | grep "03/15/17"
03/15/17 00:19:38  03/15/17 10:19:40  krbtgt/corp.some@corp.some.com
{noformat}
(I've changed realm)

If you didn't notice `airflow kerberos` used "airflow" as principal and 
"airflow.keytab" in the output dump above, no matter which parameters I give.


was (Author: tagar):
I use kinit very often and familiar with the tool. 

kinit works fine outside of Airflow

{noformat}
$ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $?
0

rdautkha@pc1udatahgw01 airflow  $ klist | grep "03/15/17"
03/15/17 00:19:38  03/15/17 10:19:40  krbtgt/corp.epsilon@corp.some.com
{noformat}
(I've changed realm)

If you didn't notice `airflow kerberos` used "airflow" as principal and 
"airflow.keytab" in the output dump above, no matter which parameters I give.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608
 ] 

Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:24 AM:


I use kinit very often and familiar with the tool. 

kinit works fine outside of Airflow

{noformat}
$ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $?
0

rdautkha@pc1udatahgw01 airflow  $ klist | grep "03/15/17"
03/15/17 00:19:38  03/15/17 10:19:40  krbtgt/corp.epsilon@corp.some.com
{noformat}
(I've changed realm)

If you didn't notice `airflow kerberos` used "airflow" as principal and 
"airflow.keytab" in the output dump above, no matter which parameters I give.


was (Author: tagar):
I use kinit very often and faniliar with the tool. 

kinit works fine outside of Airflow

{noformat}
$ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $?
0

rdautkha@pc1udatahgw01 airflow  $ klist | grep "03/15/17"
03/15/17 00:19:38  03/15/17 10:19:40  krbtgt/corp.epsilon@corp.some.com
{noformat}
(I've changed realm)

If you didn't notice `airflow kerberos` used "airflow" as principal and 
"airflow.keytab" in the output dump above, no matter which parameters I give.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608
 ] 

Ruslan Dautkhanov commented on AIRFLOW-987:
---

I use kinit very often and faniliar with the tool. 

kinit works fine outside of Airflow

{noformat}
$ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $?
0

rdautkha@pc1udatahgw01 airflow  $ klist | grep "03/15/17"
03/15/17 00:19:38  03/15/17 10:19:40  krbtgt/corp.epsilon@corp.some.com
{noformat}
(I've changed realm)

If you didn't notice `airflow kerberos` used "airflow" as principal and 
"airflow.keytab" in the output dump above, no matter which parameters I give.

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925604#comment-15925604
 ] 

Bolke de Bruin commented on AIRFLOW-987:


This is an Kerberos error. You are specifying invalid credentials. 

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-15 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-987.

Resolution: Not A Bug
  Assignee: Bolke de Bruin

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)