[jira] [Assigned] (AIRFLOW-840) Python3 encoding issue in Kerberos
[ https://issues.apache.org/jira/browse/AIRFLOW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Bij reassigned AIRFLOW-840: - Assignee: Alexander Bij (was: Bolke de Bruin) > Python3 encoding issue in Kerberos > -- > > Key: AIRFLOW-840 > URL: https://issues.apache.org/jira/browse/AIRFLOW-840 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: $ python --version > Python 3.4.3 >Reporter: Erik Cederstrand >Assignee: Alexander Bij > Labels: security > > While attempting to configure Kerberos ticket renewal in a Python3 > environment, I encountered this encoding issue trying to run {{airflow > kerberos}}: > {code:none} > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 15, in > args.func(args) > File "/usr/local/lib/python3.4/dist-packages/airflow/bin/cli.py", line 600, > in kerberos > airflow.security.kerberos.run() > File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", > line 110, in run > renew_from_kt() > File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", > line 55, in renew_from_kt > "\n".join(subp.stderr.readlines( > TypeError: sequence item 0: expected str instance, bytes found > {code} > The issue here (ignoring for a moment why {{kinit}} is failing on my machine) > is that Popen in Python3 returns {{bytes}} for stdin/stdout, but both are > handled as if they are {{str}}. > I'm unsure what the Py2/3 compat policy is at Airflow, but a simple {{from > six import PY2}} and an if/else seems like the least intrusive fix. The > non-PY2 path would then add something like > {{subp.stdin.readlines().decode(errors='ignore')}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added
[ https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-993: --- Description: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} Furthermore, it may make sense for the task start date to always be the later of the task start date and the dag start date; similarly for the end date (but using the earlier date) was: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) Furthermore, it may make sense for the task start date to always be the later of the task start date and the dag start date; similarly for the end date (but using the earlier date) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} > Dags should modify the start date and end date of tasks when they are added > --- > > Key: AIRFLOW-993 > URL: https://issues.apache.org/jira/browse/AIRFLOW-993 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.8.0 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.8.1 > > > When tasks are added to DAGs, the DAG checks if the task has a start_date. If > it doesn't, the DAG sets it to its own start date. This isn't done for > end_date, but it should be. > Otherwise, this simple code leads to a surprising failure as the backfill > tries to run the task every day, even though the DAG clearly has an end date > set. > {code} > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > import datetime > dt = datetime.datetime(2017, 1, 1) > with DAG('test', start_date=dt, end_date=dt) as dag: > op = DummyOperator(task_id='dummy') > op.run() > {code} > Furthermore, it may make sense for the task start date to always be the later > of the task start date and the dag start date; similarly for the end date > (but using the earlier date) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added
[ https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-993: --- Description: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) Furthermore, it may make sense for the task start date to always be the later of the task start date and the dag start date; similarly for the end date (but using the earlier date) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} was: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} > Dags should modify the start date and end date of tasks when they are added > --- > > Key: AIRFLOW-993 > URL: https://issues.apache.org/jira/browse/AIRFLOW-993 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.8.0 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.8.1 > > > When tasks are added to DAGs, the DAG checks if the task has a start_date. If > it doesn't, the DAG sets it to its own start date. This isn't done for > end_date, but it should be. > Otherwise, this simple code leads to a surprising failure as the backfill > tries to run the task every day, even though the DAG clearly has an end date > set. > {code} > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > import datetime > dt = datetime.datetime(2017, 1, 1) > Furthermore, it may make sense for the task start date to always be the later > of the task start date and the dag start date; similarly for the end date > (but using the earlier date) > with DAG('test', start_date=dt, end_date=dt) as dag: > op = DummyOperator(task_id='dummy') > op.run() > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added
[ https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-993: --- Description: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} was: When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} Furthermore, it would make sense for the DAG to set the task start_date as the later of the task's start date and its own start date; or the earlier for end_date. > Dags should modify the start date and end date of tasks when they are added > --- > > Key: AIRFLOW-993 > URL: https://issues.apache.org/jira/browse/AIRFLOW-993 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.8.0 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.8.1 > > > When tasks are added to DAGs, the DAG checks if the task has a start_date. If > it doesn't, the DAG sets it to its own start date. This isn't done for > end_date, but it should be. > Otherwise, this simple code leads to a surprising failure as the backfill > tries to run the task every day, even though the DAG clearly has an end date > set. > {code} > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > import datetime > dt = datetime.datetime(2017, 1, 1) > with DAG('test', start_date=dt, end_date=dt) as dag: > op = DummyOperator(task_id='dummy') > op.run() > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added
Jeremiah Lowin created AIRFLOW-993: -- Summary: Dags should modify the start date and end date of tasks when they are added Key: AIRFLOW-993 URL: https://issues.apache.org/jira/browse/AIRFLOW-993 Project: Apache Airflow Issue Type: Bug Components: DAG Affects Versions: 1.8.0 Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin Priority: Minor Fix For: 1.8.1 When tasks are added to DAGs, the DAG checks if the task has a start_date. If it doesn't, the DAG sets it to its own start date. This isn't done for end_date, but it should be. Otherwise, this simple code leads to a surprising failure as the backfill tries to run the task every day, even though the DAG clearly has an end date set. {code} from airflow.models import DAG from airflow.operators.dummy_operator import DummyOperator import datetime dt = datetime.datetime(2017, 1, 1) with DAG('test', start_date=dt, end_date=dt) as dag: op = DummyOperator(task_id='dummy') op.run() {code} Furthermore, it would make sense for the DAG to set the task start_date as the later of the task's start date and its own start date; or the earlier for end_date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-992) Skipped tasks do not propagate correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927332#comment-15927332 ] Dan Davydov commented on AIRFLOW-992: - This is the expected behavior after the semantics changes made here: https://github.com/apache/incubator-airflow/pull/2125 and later documented/clarified here: https://github.com/apache/incubator-airflow/pull/2151/files > Skipped tasks do not propagate correctly > > > Key: AIRFLOW-992 > URL: https://issues.apache.org/jira/browse/AIRFLOW-992 > Project: Apache Airflow > Issue Type: Bug >Reporter: Dan Davydov >Priority: Critical > > We are seeing skipped tasks not being propagated correctly: > E.g. > A->B > `-->C > Task A depends on task B and C > If B gets skipped and C gets run then: > Expected: > A will get skipped > EDIT: Upon further investigation this was caused by a change in the semantics > of ALL_SUCCESS, which I have these feelings about: > Intuitively you would expect to skip any task that has dependencies that > weren't run by default, i.e. the trigger rule is called ALL_SUCCESS and > skipped tasks are not successful ones, and that was also the old behavior in > 1.7.3. > This is going to break some use cases which could be alright, but I feel > these new semantics make less sense than before so it's a bad reason to break > existing use cases. > I will get started on a PR for a new ALL_SUCCESS_NOT_SKIPPED trigger rule but > again I feel this is hacky and really we should have the old ALL_SUCCESS > (default) and a new ALL_SUCCESS_OR_SKIPPED trigger rule if desired. > Actual: > A gets run > [~bolke] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-992) Skipped tasks do not propagate correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927323#comment-15927323 ] Bolke de Bruin commented on AIRFLOW-992: Do you have anything to reproduce it with? Preferably a test or a dag > Skipped tasks do not propagate correctly > > > Key: AIRFLOW-992 > URL: https://issues.apache.org/jira/browse/AIRFLOW-992 > Project: Apache Airflow > Issue Type: Bug >Reporter: Dan Davydov >Priority: Critical > > We are seeing skipped tasks not being propagated correctly: > E.g. > A->B > `-->C > Task A depends on task B and C > If B gets skipped and C gets run then: > Expected: > A will get skipped > Actual: > A gets run > [~bolke] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-992) Skipped tasks do not propagate correctly
Dan Davydov created AIRFLOW-992: --- Summary: Skipped tasks do not propagate correctly Key: AIRFLOW-992 URL: https://issues.apache.org/jira/browse/AIRFLOW-992 Project: Apache Airflow Issue Type: Bug Reporter: Dan Davydov Priority: Critical We are seeing skipped tasks not being propagated correctly: E.g. A->B `-->C Task A depends on task B and C If B gets skipped and C gets run then: Expected: A will get skipped Actual: A gets run [~bolke] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-991) Mark_success while a task is running leads to failure state
Alex Guziel created AIRFLOW-991: --- Summary: Mark_success while a task is running leads to failure state Key: AIRFLOW-991 URL: https://issues.apache.org/jira/browse/AIRFLOW-991 Project: Apache Airflow Issue Type: Bug Reporter: Alex Guziel Assignee: Alex Guziel -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-989] Do not mark dag run successful if unfinished tasks
Repository: incubator-airflow Updated Branches: refs/heads/master cadfae54b -> 3d6095ff5 [AIRFLOW-989] Do not mark dag run successful if unfinished tasks Dag runs could be marked successful if all root tasks were successful, even if some tasks did not run yet, ie. in case of clearing. Now we consider unfinished_tasks, before marking successful. Closes #2154 from bolkedebruin/AIRFLOW-989 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3d6095ff Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3d6095ff Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3d6095ff Branch: refs/heads/master Commit: 3d6095ff5cf6eff0444d7e47a2360765f2953daf Parents: cadfae5 Author: Bolke de BruinAuthored: Wed Mar 15 16:39:12 2017 -0700 Committer: Bolke de Bruin Committed: Wed Mar 15 16:39:12 2017 -0700 -- airflow/models.py | 6 +++--- tests/models.py | 51 ++ 2 files changed, 54 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3d6095ff/airflow/models.py -- diff --git a/airflow/models.py b/airflow/models.py index 27a5670..ad3346a 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -4091,9 +4091,9 @@ class DagRun(Base): logging.info('Marking run {} failed'.format(self)) self.state = State.FAILED -# if all roots succeeded, the run succeeded -elif all(r.state in (State.SUCCESS, State.SKIPPED) - for r in roots): +# if all roots succeeded and no unfinished tasks, the run succeeded +elif not unfinished_tasks and all(r.state in (State.SUCCESS, State.SKIPPED) + for r in roots): logging.info('Marking run {} successful'.format(self)) self.state = State.SUCCESS http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3d6095ff/tests/models.py -- diff --git a/tests/models.py b/tests/models.py index 6fbbf3e..8ce08eb 100644 --- a/tests/models.py +++ b/tests/models.py @@ -259,6 +259,57 @@ class DagRunTest(unittest.TestCase): updated_dag_state = dag_run.update_state() self.assertEqual(State.SUCCESS, updated_dag_state) +def test_dagrun_success_conditions(self): +session = settings.Session() + +dag = DAG( +'test_dagrun_success_conditions', +start_date=DEFAULT_DATE, +default_args={'owner': 'owner1'}) + +# A -> B +# A -> C -> D +# ordered: B, D, C, A or D, B, C, A or D, C, B, A +with dag: +op1 = DummyOperator(task_id='A') +op2 = DummyOperator(task_id='B') +op3 = DummyOperator(task_id='C') +op4 = DummyOperator(task_id='D') +op1.set_upstream([op2, op3]) +op3.set_upstream(op4) + +dag.clear() + +now = datetime.datetime.now() +dr = dag.create_dagrun(run_id='test_dagrun_success_conditions', + state=State.RUNNING, + execution_date=now, + start_date=now) + +# op1 = root +ti_op1 = dr.get_task_instance(task_id=op1.task_id) +ti_op1.set_state(state=State.SUCCESS, session=session) + +ti_op2 = dr.get_task_instance(task_id=op2.task_id) +ti_op3 = dr.get_task_instance(task_id=op3.task_id) +ti_op4 = dr.get_task_instance(task_id=op4.task_id) + +# root is successful, but unfinished tasks +state = dr.update_state() +self.assertEqual(State.RUNNING, state) + +# one has failed, but root is successful +ti_op2.set_state(state=State.FAILED, session=session) +ti_op3.set_state(state=State.SUCCESS, session=session) +ti_op4.set_state(state=State.SUCCESS, session=session) +state = dr.update_state() +self.assertEqual(State.SUCCESS, state) + +# upstream dependency failed, root has not run +ti_op1.set_state(State.NONE, session) +state = dr.update_state() +self.assertEqual(State.FAILED, state) + class DagBagTest(unittest.TestCase):
[jira] [Resolved] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-989. Resolution: Fixed Issue resolved by pull request #2154 [https://github.com/apache/incubator-airflow/pull/2154] > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927191#comment-15927191 ] ASF subversion and git services commented on AIRFLOW-989: - Commit 15600e42c805b222d6147b60376b56c8e708dcde in incubator-airflow's branch refs/heads/v1-8-test from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=15600e4 ] [AIRFLOW-989] Do not mark dag run successful if unfinished tasks Dag runs could be marked successful if all root tasks were successful, even if some tasks did not run yet, ie. in case of clearing. Now we consider unfinished_tasks, before marking successful. Closes #2154 from bolkedebruin/AIRFLOW-989 (cherry picked from commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf) Signed-off-by: Bolke de Bruin> Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927189#comment-15927189 ] ASF subversion and git services commented on AIRFLOW-989: - Commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf in incubator-airflow's branch refs/heads/master from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3d6095f ] [AIRFLOW-989] Do not mark dag run successful if unfinished tasks Dag runs could be marked successful if all root tasks were successful, even if some tasks did not run yet, ie. in case of clearing. Now we consider unfinished_tasks, before marking successful. Closes #2154 from bolkedebruin/AIRFLOW-989 > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927190#comment-15927190 ] ASF subversion and git services commented on AIRFLOW-989: - Commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf in incubator-airflow's branch refs/heads/master from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3d6095f ] [AIRFLOW-989] Do not mark dag run successful if unfinished tasks Dag runs could be marked successful if all root tasks were successful, even if some tasks did not run yet, ie. in case of clearing. Now we consider unfinished_tasks, before marking successful. Closes #2154 from bolkedebruin/AIRFLOW-989 > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-989] Do not mark dag run successful if unfinished tasks
Repository: incubator-airflow Updated Branches: refs/heads/v1-8-test 3b37cfa1f -> 15600e42c [AIRFLOW-989] Do not mark dag run successful if unfinished tasks Dag runs could be marked successful if all root tasks were successful, even if some tasks did not run yet, ie. in case of clearing. Now we consider unfinished_tasks, before marking successful. Closes #2154 from bolkedebruin/AIRFLOW-989 (cherry picked from commit 3d6095ff5cf6eff0444d7e47a2360765f2953daf) Signed-off-by: Bolke de BruinProject: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/15600e42 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/15600e42 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/15600e42 Branch: refs/heads/v1-8-test Commit: 15600e42c805b222d6147b60376b56c8e708dcde Parents: 3b37cfa Author: Bolke de Bruin Authored: Wed Mar 15 16:39:12 2017 -0700 Committer: Bolke de Bruin Committed: Wed Mar 15 16:39:26 2017 -0700 -- airflow/models.py | 6 +++--- tests/models.py | 51 ++ 2 files changed, 54 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15600e42/airflow/models.py -- diff --git a/airflow/models.py b/airflow/models.py index 7c6590f..42b8a7f 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -4064,9 +4064,9 @@ class DagRun(Base): logging.info('Marking run {} failed'.format(self)) self.state = State.FAILED -# if all roots succeeded, the run succeeded -elif all(r.state in (State.SUCCESS, State.SKIPPED) - for r in roots): +# if all roots succeeded and no unfinished tasks, the run succeeded +elif not unfinished_tasks and all(r.state in (State.SUCCESS, State.SKIPPED) + for r in roots): logging.info('Marking run {} successful'.format(self)) self.state = State.SUCCESS http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/15600e42/tests/models.py -- diff --git a/tests/models.py b/tests/models.py index ffd1f31..1fbb3e6 100644 --- a/tests/models.py +++ b/tests/models.py @@ -259,6 +259,57 @@ class DagRunTest(unittest.TestCase): updated_dag_state = dag_run.update_state() self.assertEqual(State.SUCCESS, updated_dag_state) +def test_dagrun_success_conditions(self): +session = settings.Session() + +dag = DAG( +'test_dagrun_success_conditions', +start_date=DEFAULT_DATE, +default_args={'owner': 'owner1'}) + +# A -> B +# A -> C -> D +# ordered: B, D, C, A or D, B, C, A or D, C, B, A +with dag: +op1 = DummyOperator(task_id='A') +op2 = DummyOperator(task_id='B') +op3 = DummyOperator(task_id='C') +op4 = DummyOperator(task_id='D') +op1.set_upstream([op2, op3]) +op3.set_upstream(op4) + +dag.clear() + +now = datetime.datetime.now() +dr = dag.create_dagrun(run_id='test_dagrun_success_conditions', + state=State.RUNNING, + execution_date=now, + start_date=now) + +# op1 = root +ti_op1 = dr.get_task_instance(task_id=op1.task_id) +ti_op1.set_state(state=State.SUCCESS, session=session) + +ti_op2 = dr.get_task_instance(task_id=op2.task_id) +ti_op3 = dr.get_task_instance(task_id=op3.task_id) +ti_op4 = dr.get_task_instance(task_id=op4.task_id) + +# root is successful, but unfinished tasks +state = dr.update_state() +self.assertEqual(State.RUNNING, state) + +# one has failed, but root is successful +ti_op2.set_state(state=State.FAILED, session=session) +ti_op3.set_state(state=State.SUCCESS, session=session) +ti_op4.set_state(state=State.SUCCESS, session=session) +state = dr.update_state() +self.assertEqual(State.SUCCESS, state) + +# upstream dependency failed, root has not run +ti_op1.set_state(State.NONE, session) +state = dr.update_state() +self.assertEqual(State.FAILED, state) + class DagBagTest(unittest.TestCase):
[jira] [Assigned] (AIRFLOW-990) DockerOperator fails when logging unicode string
[ https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitor Baptista reassigned AIRFLOW-990: -- Assignee: Vitor Baptista > DockerOperator fails when logging unicode string > > > Key: AIRFLOW-990 > URL: https://issues.apache.org/jira/browse/AIRFLOW-990 > Project: Apache Airflow > Issue Type: Bug > Components: docker >Affects Versions: Airflow 1.7.1 > Environment: Python 2.7 >Reporter: Vitor Baptista >Assignee: Vitor Baptista > > On line > https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, > we're calling: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info("{}".format(line.strip())) > {code} > If `self.cli.logs()` return a string with a unicode character, this raises > the UnicodeDecodeError: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit > msg = self.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 734, in format > return fmt.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 476, in format > raise e > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: > ordinal not in range(128) > Logged from file docker_operator.py, line 165 > {noformat} > A possible fix is to change that line to: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info(line.decode('utf-8').strip()) > {code}. > This error doesn't happen on Python3. I haven't tested, but reading the code > it seems the same error exists on `master` as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-990) DockerOperator fails when logging unicode string
[ https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927126#comment-15927126 ] Vitor Baptista commented on AIRFLOW-990: Pull request for this issue sent on https://github.com/apache/incubator-airflow/pull/2155 > DockerOperator fails when logging unicode string > > > Key: AIRFLOW-990 > URL: https://issues.apache.org/jira/browse/AIRFLOW-990 > Project: Apache Airflow > Issue Type: Bug > Components: docker >Affects Versions: Airflow 1.7.1 > Environment: Python 2.7 >Reporter: Vitor Baptista > > On line > https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, > we're calling: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info("{}".format(line.strip())) > {code} > If `self.cli.logs()` return a string with a unicode character, this raises > the UnicodeDecodeError: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit > msg = self.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 734, in format > return fmt.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 476, in format > raise e > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: > ordinal not in range(128) > Logged from file docker_operator.py, line 165 > {noformat} > A possible fix is to change that line to: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info(line.decode('utf-8').strip()) > {code}. > This error doesn't happen on Python3. I haven't tested, but reading the code > it seems the same error exists on `master` as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string
[ https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitor Baptista updated AIRFLOW-990: --- Description: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) {code} If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: {noformat} Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 {noformat} A possible fix is to change that line to: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info(line.decode('utf-8').strip()) {code}. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. was: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) {code} If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: {noformat} Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 {noformat} A possible fix is to change that line to: {code:python} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info(line.decode('utf-8').strip()) {code}. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. > DockerOperator fails when logging unicode string > > > Key: AIRFLOW-990 > URL: https://issues.apache.org/jira/browse/AIRFLOW-990 > Project: Apache Airflow > Issue Type: Bug > Components: docker >Affects Versions: Airflow 1.7.1 > Environment: Python 2.7 >Reporter: Vitor Baptista > > On line > https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, > we're calling: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info("{}".format(line.strip())) > {code} > If `self.cli.logs()` return a string with a unicode character, this raises > the UnicodeDecodeError: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit > msg = self.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 734, in format > return fmt.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 476, in format > raise e > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: > ordinal not in range(128) > Logged from file docker_operator.py, line 165 > {noformat} > A possible fix is to change that line to: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info(line.decode('utf-8').strip()) > {code}. > This error doesn't happen on Python3. I haven't tested, but reading the code > it seems the same error exists on `master` as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string
[ https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitor Baptista updated AIRFLOW-990: --- Description: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) {code} If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: {noformat} Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 {noformat} A possible fix is to change that line to: {code:python} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info(line.decode('utf-8').strip()) {code}. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. was: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) {code} If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: {preformat} Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 {preformat} A possible fix is to change that line to {code}logging.info(line.decode('utf-8').strip()){code}. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. > DockerOperator fails when logging unicode string > > > Key: AIRFLOW-990 > URL: https://issues.apache.org/jira/browse/AIRFLOW-990 > Project: Apache Airflow > Issue Type: Bug > Components: docker >Affects Versions: Airflow 1.7.1 > Environment: Python 2.7 >Reporter: Vitor Baptista > > On line > https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, > we're calling: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info("{}".format(line.strip())) > {code} > If `self.cli.logs()` return a string with a unicode character, this raises > the UnicodeDecodeError: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit > msg = self.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 734, in format > return fmt.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 476, in format > raise e > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: > ordinal not in range(128) > Logged from file docker_operator.py, line 165 > {noformat} > A possible fix is to change that line to: > {code:python} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info(line.decode('utf-8').strip()) > {code}. > This error doesn't happen on Python3. I haven't tested, but reading the code > it seems the same error exists on `master` as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-990) DockerOperator fails when logging unicode string
[ https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitor Baptista updated AIRFLOW-990: --- Description: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: {code:title=airflow/operators/docker_operator.py} for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) {code} If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: {preformat} Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 {preformat} A possible fix is to change that line to {code}logging.info(line.decode('utf-8').strip()){code}. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. was: On line https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, we're calling: ``` for line in self.cli.logs(container=self.container['Id'], stream=True): logging.info("{}".format(line.strip())) ``` If `self.cli.logs()` return a string with a unicode character, this raises the UnicodeDecodeError: ``` Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 734, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 476, in format raise e UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) Logged from file docker_operator.py, line 165 ``` A possible fix is to change that line to `logging.info(line.decode('utf-8').strip())`. This error doesn't happen on Python3. I haven't tested, but reading the code it seems the same error exists on `master` as well. > DockerOperator fails when logging unicode string > > > Key: AIRFLOW-990 > URL: https://issues.apache.org/jira/browse/AIRFLOW-990 > Project: Apache Airflow > Issue Type: Bug > Components: docker >Affects Versions: Airflow 1.7.1 > Environment: Python 2.7 >Reporter: Vitor Baptista > > On line > https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164, > we're calling: > {code:title=airflow/operators/docker_operator.py} > for line in self.cli.logs(container=self.container['Id'], stream=True): > logging.info("{}".format(line.strip())) > {code} > If `self.cli.logs()` return a string with a unicode character, this raises > the UnicodeDecodeError: > {preformat} > Traceback (most recent call last): > File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit > msg = self.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 734, in format > return fmt.format(record) > File "/usr/lib/python2.7/logging/__init__.py", line 476, in format > raise e > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: > ordinal not in range(128) > Logged from file docker_operator.py, line 165 > {preformat} > A possible fix is to change that line to > {code}logging.info(line.decode('utf-8').strip()){code}. > This error doesn't happen on Python3. I haven't tested, but reading the code > it seems the same error exists on `master` as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-989: --- Fix Version/s: 1.8.1 > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-989: --- Affects Version/s: (was: Airflow 1.8) 1.8.0 > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: 1.8.0 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > Fix For: 1.8.1 > > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin reassigned AIRFLOW-989: -- Assignee: Bolke de Bruin > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-989 started by Bolke de Bruin. -- > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Assignee: Bolke de Bruin >Priority: Critical > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927030#comment-15927030 ] Bolke de Bruin commented on AIRFLOW-989: BTW You could try this (I didn't test it) in DagRun.update_state: {code} # if all roots succeeded and there are no unfinished tasks, the run succeeded elif not unfinished_tasks and all(r.state in (State.SUCCESS, State.SKIPPED) for r in roots): logging.info('Marking run {} successful'.format(self)) self.state = State.SUCCESS {code} > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Priority: Critical > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : > https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 > > After the Scheduler Runs : > https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 > You'll notice that only the DAG runs with the last task cleared completed by > actually running cleared tasks. These are shown as the 1st and 5th DAG runs > from the left. > Use Case 2 : Clear d1 and d4 in the same DAG Run > After Clearing (c.f. 2nd from right DAG run): > https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 > After the Scheduler Runs : > https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-989) Clear Task Regression
[ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand updated AIRFLOW-989: Description: There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior. Consider the following test DAG : 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 2. Graph : https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would pick up and rerun the cleared tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks in the DAG run are rerun. In order for a task that is not the last task in the DAG to be rerun after being cleared, its terminal downstream task needs to be cleared. Another workaround is to use the CLI to rerun the cleared task. Here are some screenshots to illustrate the regressed behavior: Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run, clear the entire DAG Run. After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 You'll notice that only the DAG runs with the last task cleared completed by actually running cleared tasks. These are shown as the 1st and 5th DAG runs from the left. Use Case 2 : Clear d1 and d4 in the same DAG Run After Clearing (c.f. 2nd from right DAG run): https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 was: There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior. Consider the following test DAG : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would pick up and rerun the cleared tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks in the DAG run are rerun. In order for a task that is not the last task in the DAG to be rerun after being cleared, its terminal downstream task needs to be cleared. Another workaround is to use the CLI to rerun the cleared task. Here are some screenshots to illustrate the regressed behavior: Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run, clear the entire DAG Run. After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 You'll notice that only the DAG runs with the last task cleared completed by actually running cleared tasks. These are shown as the 1st and 5th DAG runs from the left. Use Case 2 : Clear d1 and d4 in the same DAG Run After Clearing (c.f. 2nd from right DAG run): https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 > Clear Task Regression > - > > Key: AIRFLOW-989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-989 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Priority: Critical > > There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task > behavior. > Consider the following test DAG : > 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 > 2. Graph : > https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0 > The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the > first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were > cleared individually. the scheduler would pick up and rerun the cleared > tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks > in the DAG run are rerun. > In order for a task that is not the last task in the DAG to be rerun after > being cleared, its terminal downstream task needs to be cleared. Another > workaround is to use the CLI to rerun the cleared task. > Here are some screenshots to illustrate the regressed behavior: > Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th > separate DAG run, clear the entire DAG Run. > After Clearing : >
[jira] [Created] (AIRFLOW-989) Clear Task Regression
Siddharth Anand created AIRFLOW-989: --- Summary: Clear Task Regression Key: AIRFLOW-989 URL: https://issues.apache.org/jira/browse/AIRFLOW-989 Project: Apache Airflow Issue Type: Bug Components: core Affects Versions: Airflow 1.8 Reporter: Siddharth Anand Priority: Critical There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior. Consider the following test DAG : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29 The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would pick up and rerun the cleared tasks. In 1.8. unless the last task in a DAG is cleared, none of the tasks in the DAG run are rerun. In order for a task that is not the last task in the DAG to be rerun after being cleared, its terminal downstream task needs to be cleared. Another workaround is to use the CLI to rerun the cleared task. Here are some screenshots to illustrate the regressed behavior: Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run, clear the entire DAG Run. After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0 You'll notice that only the DAG runs with the last task cleared completed by actually running cleared tasks. These are shown as the 1st and 5th DAG runs from the left. Use Case 2 : Clear d1 and d4 in the same DAG Run After Clearing (c.f. 2nd from right DAG run): https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0 After the Scheduler Runs : https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin closed AIRFLOW-980. -- Resolution: Fixed > IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique > constraint "dag_run_dag_id_key" on sample DAGs > > > Key: AIRFLOW-980 > URL: https://issues.apache.org/jira/browse/AIRFLOW-980 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1.3 > Environment: Local Executor > postgresql+psycopg2 database backend >Reporter: Ruslan Dautkhanov > > Fresh Airflow install using pip. > Only sample DAGs are installed. > LocalExecutor (4 workers). > Most of the parameters are at defaults. > Turned On all of the sample DAGs (14 of them). > After some execution (a lot of DAGs had at least one successful execution), > started seeing below error stack again and again .. In scheduler log. > {noformat} > IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique > constraint "dag_run_dag_id_key" > [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, > state, run_id, external_trigger, conf) VALUES (%(dag_id)s, > %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, > %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: > {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', > 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': > False, 'state': u'running', 'conf': None, 'start_date': > datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}] > Process Process-152: > Traceback (most recent call last): > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", > line 258, in _bootstrap > self.run() > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", > line 114, in run > self._target(*self._args, **self._kwargs) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", > line 664, in _do_dags > dag = dagbag.get_dag(dag.dag_id) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py", > line 188, in get_dag > orm_dag = DagModel.get_current(root_dag_id) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py", > line 2320, in get_current > obj = session.query(cls).filter(cls.dag_id == dag_id).first() > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py", > line 2634, in first > ret = list(self[0:1]) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py", > line 2457, in __getitem__ > return list(res) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py", > line 2736, in __iter__ > return self._execute_and_instances(context) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py", > line 2749, in _execute_and_instances > close_with_result=True) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py", > line 2740, in _connection_from_session > **kw) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", > line 893, in connection > execution_options=execution_options) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", > line 898, in _connection_for_bind > engine, execution_options) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", > line 313, in _connection_for_bind > self._assert_active() > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", > line 214, in _assert_active > % self._rollback_exception > InvalidRequestError: This Session's transaction has been rolled back due to a > previous exception during flush. To begin a new transaction with this > Session, first issue Session.rollback(). Original exception was: > (psycopg2.IntegrityError) duplicate key value violates unique constraint > "dag_run_dag_id_key" > [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, > state, run_id, external_trigger, conf) VALUES (%(dag_id)s, > %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, > %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: > {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', > 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': > False, 'state': u'running', 'conf': None, 'start_date': >
[jira] [Commented] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args
[ https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926575#comment-15926575 ] Jeremiah Lowin commented on AIRFLOW-883: I'm not totally sure this is a "bug" per se, though it is confusing. "default_args" are arguments that are passed to Operators by the parent DAG. Critically, that happens when the Operators are created. While bitshift operators allow deferred DAG assignment, the Operator in question has already been created. The reason the distinction matters is that the Operator's __init__ may include logic related to its arguments. If we pass/assign those arguments after initialization, the logic won't run. However, if we do want to tackle this: 1. The simplest thing would be to walk "default_args" and replace any matching Operator attributes that are None. 2. The more proper thing would be to defer Operator initialization until it is added to a DAG. This would require a bit of a refactor though. > Assigning operator to DAG via bitwise composition does not pickup default args > -- > > Key: AIRFLOW-883 > URL: https://issues.apache.org/jira/browse/AIRFLOW-883 > Project: Apache Airflow > Issue Type: Bug > Components: models >Reporter: Daniel Huang >Assignee: Jeremiah Lowin >Priority: Minor > > This is only the case when the operator does not specify {{dag=dag}} and is > not initialized within a DAG's context manager (due to > https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50) > Example: > {code} > default_args = { > 'owner': 'airflow', > 'start_date': datetime(2017, 2, 1) > } > dag = DAG('my_dag', default_args=default_args) > dummy = DummyOperator(task_id='dummy') > dag >> dummy > {code} > This will raise a {{Task is missing the start_date parameter}}. I _think_ > this should probably be allowed because I assume the purpose of supporting > {{dag >> op}} was to allow delayed assignment of an operator to a DAG. > I believe to fix this, on assignment, we would need to go back and go through > dag.default_args to see if any of those attrs weren't explicitly set on > task...not the cleanest. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args
[ https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin reassigned AIRFLOW-883: -- Assignee: Jeremiah Lowin > Assigning operator to DAG via bitwise composition does not pickup default args > -- > > Key: AIRFLOW-883 > URL: https://issues.apache.org/jira/browse/AIRFLOW-883 > Project: Apache Airflow > Issue Type: Bug > Components: models >Reporter: Daniel Huang >Assignee: Jeremiah Lowin >Priority: Minor > > This is only the case when the operator does not specify {{dag=dag}} and is > not initialized within a DAG's context manager (due to > https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50) > Example: > {code} > default_args = { > 'owner': 'airflow', > 'start_date': datetime(2017, 2, 1) > } > dag = DAG('my_dag', default_args=default_args) > dummy = DummyOperator(task_id='dummy') > dag >> dummy > {code} > This will raise a {{Task is missing the start_date parameter}}. I _think_ > this should probably be allowed because I assume the purpose of supporting > {{dag >> op}} was to allow delayed assignment of an operator to a DAG. > I believe to fix this, on assignment, we would need to go back and go through > dag.default_args to see if any of those attrs weren't explicitly set on > task...not the cleanest. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used
[ https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926392#comment-15926392 ] Zachary Lawson edited comment on AIRFLOW-988 at 3/15/17 3:51 PM: - Alternatively, you could just filter to sla_miss records that have notification_sent = False given that in the comments it's stated: {quote} We consider email or the alert callback as notifications. {quote} was (Author: zmjlawson): Alternatively, you could just filter to sla_miss records that have notification_sent = True given that in the comments it's stated: {quote} We consider email or the alert callback as notifications. {quote} > SLA Miss Callbacks Are Repeated if Email is Not being Used > -- > > Key: AIRFLOW-988 > URL: https://issues.apache.org/jira/browse/AIRFLOW-988 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Zachary Lawson > > There is an issue in the current v1-8-stable branch. Looking at the jobs.py > module, if the system does not have email set up but does have a > sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for > that job infinitely as long as the airflow scheduler is running. The > offending code seems to be in the query to the airflow meta database which > filters to sla_miss records that have *either* email_sent or > notification_sent as false ([see lines > 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]), > but then executes the sla_miss_callback function regardless if > notification_sent was true ([see lines > 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]). > A conditional statement should be put prior to executing the > sla_miss_callback to check whether a notification has been sent to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used
[ https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926392#comment-15926392 ] Zachary Lawson commented on AIRFLOW-988: Alternatively, you could just filter to sla_miss records that have notification_sent = True given that in the comments it's stated: {quote} We consider email or the alert callback as notifications. {quote} > SLA Miss Callbacks Are Repeated if Email is Not being Used > -- > > Key: AIRFLOW-988 > URL: https://issues.apache.org/jira/browse/AIRFLOW-988 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Zachary Lawson > > There is an issue in the current v1-8-stable branch. Looking at the jobs.py > module, if the system does not have email set up but does have a > sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for > that job infinitely as long as the airflow scheduler is running. The > offending code seems to be in the query to the airflow meta database which > filters to sla_miss records that have *either* email_sent or > notification_sent as false ([see lines > 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]), > but then executes the sla_miss_callback function regardless if > notification_sent was true ([see lines > 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]). > A conditional statement should be put prior to executing the > sla_miss_callback to check whether a notification has been sent to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-988) SLA Miss Callbacks Are Repeated if Email is Not being Used
[ https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zachary Lawson updated AIRFLOW-988: --- Summary: SLA Miss Callbacks Are Repeated if Email is Not being Used (was: SLA Misses Are Repeated if Email is Not being Used) > SLA Miss Callbacks Are Repeated if Email is Not being Used > -- > > Key: AIRFLOW-988 > URL: https://issues.apache.org/jira/browse/AIRFLOW-988 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Zachary Lawson > > There is an issue in the current v1-8-stable branch. Looking at the jobs.py > module, if the system does not have email set up but does have a > sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for > that job infinitely as long as the airflow scheduler is running. The > offending code seems to be in the query to the airflow meta database which > filters to sla_miss records that have *either* email_sent or > notification_sent as false ([see lines > 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]), > but then executes the sla_miss_callback function regardless if > notification_sent was true ([see lines > 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]). > A conditional statement should be put prior to executing the > sla_miss_callback to check whether a notification has been sent to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-988) SLA Misses Are Repeated if Email is Not being Used
[ https://issues.apache.org/jira/browse/AIRFLOW-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zachary Lawson updated AIRFLOW-988: --- Description: There is an issue in the current v1-8-stable branch. Looking at the jobs.py module, if the system does not have email set up but does have a sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for that job infinitely as long as the airflow scheduler is running. The offending code seems to be in the query to the airflow meta database which filters to sla_miss records that have *either* email_sent or notification_sent as false ([see lines 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]), but then executes the sla_miss_callback function regardless if notification_sent was true ([see lines 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]). A conditional statement should be put prior to executing the sla_miss_callback to check whether a notification has been sent to prevent this. (was: There is an issue in the current v1-8-stable branch. Looking at the jobs.py module, if the system does not have email set up but does have a sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for that job infinitely as long as the airflow scheduler is running. The offending code seems to be in the query to the airflow meta database which filters to sla_miss records that have *either* email_sent or notification_sent as false ([see lines 606-613|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L606-L613]), but then executes the sla_miss_callback function regardless if notification_sent was true ([see lines 644-648|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L644-L648]). A conditional statement should be put prior to executing the sla_miss_callback to check whether a notification has been sent to prevent this.) > SLA Misses Are Repeated if Email is Not being Used > -- > > Key: AIRFLOW-988 > URL: https://issues.apache.org/jira/browse/AIRFLOW-988 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Zachary Lawson > > There is an issue in the current v1-8-stable branch. Looking at the jobs.py > module, if the system does not have email set up but does have a > sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for > that job infinitely as long as the airflow scheduler is running. The > offending code seems to be in the query to the airflow meta database which > filters to sla_miss records that have *either* email_sent or > notification_sent as false ([see lines > 606-613|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L606-L613]), > but then executes the sla_miss_callback function regardless if > notification_sent was true ([see lines > 644-648|https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L644-L648]). > A conditional statement should be put prior to executing the > sla_miss_callback to check whether a notification has been sent to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-988) SLA Misses Are Repeated if Email is Not being Used
Zachary Lawson created AIRFLOW-988: -- Summary: SLA Misses Are Repeated if Email is Not being Used Key: AIRFLOW-988 URL: https://issues.apache.org/jira/browse/AIRFLOW-988 Project: Apache Airflow Issue Type: Bug Affects Versions: Airflow 1.8 Reporter: Zachary Lawson There is an issue in the current v1-8-stable branch. Looking at the jobs.py module, if the system does not have email set up but does have a sla_miss_callback defined in the DAG, that sla_miss_callback is repeated for that job infinitely as long as the airflow scheduler is running. The offending code seems to be in the query to the airflow meta database which filters to sla_miss records that have *either* email_sent or notification_sent as false ([see lines 606-613|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L606-L613]), but then executes the sla_miss_callback function regardless if notification_sent was true ([see lines 644-648|https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L644-L648]). A conditional statement should be put prior to executing the sla_miss_callback to check whether a notification has been sent to prevent this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-903) Add configuration setting for default DAG view.
[ https://issues.apache.org/jira/browse/AIRFLOW-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-903. Resolution: Fixed Fix Version/s: (was: Airflow 1.8) 1.9.0 Issue resolved by pull request #2103 [https://github.com/apache/incubator-airflow/pull/2103] > Add configuration setting for default DAG view. > --- > > Key: AIRFLOW-903 > URL: https://issues.apache.org/jira/browse/AIRFLOW-903 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.8 >Reporter: Jason Kromm >Assignee: Jason Kromm >Priority: Minor > Fix For: 1.9.0 > > > The default view when clicking on a DAG used to be graph view, it is now tree > view instead. There should be a configuration settings of default_dag_view = > ['tree','graph','duration','gant','landing_times'] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-903) Add configuration setting for default DAG view.
[ https://issues.apache.org/jira/browse/AIRFLOW-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926086#comment-15926086 ] ASF subversion and git services commented on AIRFLOW-903: - Commit cadfae54bc0f8bf01582733135595e1d34b3b3fe in incubator-airflow's branch refs/heads/master from [~jakromm] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=cadfae5 ] [AIRFLOW-903] New configuration setting for the default dag view Added a new configuration setting for the default view a dag should display when clicked on the index page. Make sure we do lower for jinja url_for function Closes #2103 from jakromm/master > Add configuration setting for default DAG view. > --- > > Key: AIRFLOW-903 > URL: https://issues.apache.org/jira/browse/AIRFLOW-903 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.8 >Reporter: Jason Kromm >Assignee: Jason Kromm >Priority: Minor > Fix For: 1.9.0 > > > The default view when clicking on a DAG used to be graph view, it is now tree > view instead. There should be a configuration settings of default_dag_view = > ['tree','graph','duration','gant','landing_times'] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-903] New configuration setting for the default dag view
Repository: incubator-airflow Updated Branches: refs/heads/master b17bd31d1 -> cadfae54b [AIRFLOW-903] New configuration setting for the default dag view Added a new configuration setting for the default view a dag should display when clicked on the index page. Make sure we do lower for jinja url_for function Closes #2103 from jakromm/master Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/cadfae54 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/cadfae54 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/cadfae54 Branch: refs/heads/master Commit: cadfae54bc0f8bf01582733135595e1d34b3b3fe Parents: b17bd31 Author: Jason KrommAuthored: Wed Mar 15 08:46:31 2017 -0400 Committer: Jeremiah Lowin Committed: Wed Mar 15 08:46:31 2017 -0400 -- airflow/configuration.py| 5 + airflow/models.py | 4 airflow/www/templates/airflow/dags.html | 2 +- 3 files changed, 10 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/configuration.py -- diff --git a/airflow/configuration.py b/airflow/configuration.py index cfccbe9..fb3c11e 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -260,6 +260,10 @@ filter_by_owner = False # in order to user the ldapgroup mode. owner_mode = user +# Default DAG view. Valid values are: +# tree, graph, duration, gantt, landing_times +dag_default_view = tree + # Default DAG orientation. Valid values are: # LR (Left->Right), TB (Top->Bottom), RL (Right->Left), BT (Bottom->Top) dag_orientation = LR @@ -481,6 +485,7 @@ base_url = http://localhost:8080 web_server_host = 0.0.0.0 web_server_port = 8080 dag_orientation = LR +dag_default_view = tree log_fetch_timeout_sec = 5 hide_paused_dags_by_default = False http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/models.py -- diff --git a/airflow/models.py b/airflow/models.py index 1244d60..27a5670 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -2632,6 +2632,8 @@ class DAG(BaseDag, LoggingMixin): :param sla_miss_callback: specify a function to call when reporting SLA timeouts. :type sla_miss_callback: types.FunctionType +:param default_view: Specify DAG default view (tree, graph, duration, gantt, landing_times) +:type default_view: string :param orientation: Specify DAG orientation in graph view (LR, TB, RL, BT) :type orientation: string :param catchup: Perform scheduler catchup (or only run latest)? Defaults to True @@ -2652,6 +2654,7 @@ class DAG(BaseDag, LoggingMixin): 'core', 'max_active_runs_per_dag'), dagrun_timeout=None, sla_miss_callback=None, +default_view=configuration.get('webserver', 'dag_default_view').lower(), orientation=configuration.get('webserver', 'dag_orientation'), catchup=configuration.getboolean('scheduler', 'catchup_by_default'), params=None): @@ -2695,6 +2698,7 @@ class DAG(BaseDag, LoggingMixin): self.max_active_runs = max_active_runs self.dagrun_timeout = dagrun_timeout self.sla_miss_callback = sla_miss_callback +self.default_view = default_view self.orientation = orientation self.catchup = catchup http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/cadfae54/airflow/www/templates/airflow/dags.html -- diff --git a/airflow/www/templates/airflow/dags.html b/airflow/www/templates/airflow/dags.html index 379f153..8a5a346 100644 --- a/airflow/www/templates/airflow/dags.html +++ b/airflow/www/templates/airflow/dags.html @@ -73,7 +73,7 @@ {% if dag_id in webserver_dags %} - + {{ dag_id }} {% else %}
[jira] [Commented] (AIRFLOW-979) Add GovTech GDS
[ https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926058#comment-15926058 ] ASF subversion and git services commented on AIRFLOW-979: - Commit b17bd31d1f6a55eb36d156be3e3fe10bac77466c in incubator-airflow's branch refs/heads/master from chrissng [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b17bd31 ] [AIRFLOW-979] Add GovTech GDS Closes #2149 from chrissng/add-govtech-gds > Add GovTech GDS > --- > > Key: AIRFLOW-979 > URL: https://issues.apache.org/jira/browse/AIRFLOW-979 > Project: Apache Airflow > Issue Type: Wish > Components: docs >Reporter: Chris Sng >Assignee: Chris Sng >Priority: Trivial > Labels: documentation > Fix For: 1.9.0 > > > Add to README.md: > ``` > 1. [GovTech GDS](https://gds-gov.tech) > [[@chrissng](https://github.com/chrissng) & > [@datagovsg](https://github.com/datagovsg)] > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-979] Add GovTech GDS
Repository: incubator-airflow Updated Branches: refs/heads/master c44e2009e -> b17bd31d1 [AIRFLOW-979] Add GovTech GDS Closes #2149 from chrissng/add-govtech-gds Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b17bd31d Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b17bd31d Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b17bd31d Branch: refs/heads/master Commit: b17bd31d1f6a55eb36d156be3e3fe10bac77466c Parents: c44e200 Author: chrissngAuthored: Wed Mar 15 08:33:33 2017 -0400 Committer: Jeremiah Lowin Committed: Wed Mar 15 08:33:43 2017 -0400 -- README.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b17bd31d/README.md -- diff --git a/README.md b/README.md index 2769df9..fb268c9 100644 --- a/README.md +++ b/README.md @@ -106,6 +106,7 @@ Currently **officially** using Airflow: 1. [FreshBooks](https://github.com/freshbooks) [[@DinoCow](https://github.com/DinoCow)] 1. [Gentner Lab](http://github.com/gentnerlab) [[@neuromusic](https://github.com/neuromusic)] 1. [Glassdoor](https://github.com/Glassdoor) [[@syvineckruyk](https://github.com/syvineckruyk)] +1. [GovTech GDS](https://gds-gov.tech) [[@chrissng](https://github.com/chrissng) & [@datagovsg](https://github.com/datagovsg)] 1. [Gusto](https://gusto.com) [[@frankhsu](https://github.com/frankhsu)] 1. [Handshake](https://joinhandshake.com/) [[@mhickman](https://github.com/mhickman)] 1. [Handy](http://www.handy.com/careers/73115?gh_jid=73115_src=o5qcxn) [[@marcintustin](https://github.com/marcintustin) / [@mtustin-handy](https://github.com/mtustin-handy)]
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925636#comment-15925636 ] Ruslan Dautkhanov commented on AIRFLOW-987: --- {quote} btw you need to set the arguments in the config file, it doesn't accept them from the command line this way. {quote} yep, see my previous comment. thanks. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925635#comment-15925635 ] Bolke de Bruin commented on AIRFLOW-987: Ah ok, so yes this is an issue, but a fix won't be in 1.8.0 > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin reopened AIRFLOW-987: > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925630#comment-15925630 ] Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:47 AM: kerberos.py:39 - it always gets principal and keytab from configuration (airflow.cfg): https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39 {code} "-t", configuration.get('kerberos', 'keytab'), # specify keytab "-c", configuration.get('kerberos', 'ccache'), # specify credentials cache {code} Notice help for `airflow kerberos`: {noformat} $ airflow kerberos -h [2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D] [--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE] [principal] {noformat} One can think that you can provide principal and keytab as `airflow kerberos` arguments - that's not true and it's a bug. Although it's not a critical bug as I was able to make `airflow kerberos` working just by adding kerberos section in airflow.cfg `airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` doesn't actually accept principal and keytab as arguments. Thank you. was (Author: tagar): kerberos.py:39 - it always gets principal and keytab from configuration (airflow.cfg): https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39 {code} "-t", configuration.get('kerberos', 'keytab'), # specify keytab "-c", configuration.get('kerberos', 'ccache'), # specify credentials cache {code} Notice help for `airflow kerberos`: {noformat} $ airflow kerberos -h [2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D] [--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE] [principal] {noformat} One can think that you can provide principal and keytab as `airflow kerberos` - that's not true and it's a bug. Although it's not a critical bug as I was able to make `airflow kerberos` working just by adding kerberos section in airflow.cfg `airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` doesn't actually accept principal and keytab as arguments. Thank you. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925634#comment-15925634 ] Bolke de Bruin commented on AIRFLOW-987: btw you need to set the arguments in the config file, it doesn't accept them from the command line this way. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925630#comment-15925630 ] Ruslan Dautkhanov commented on AIRFLOW-987: --- kerberos.py:39 - it always gets principal and keytab from configuration (airflow.cfg): https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L39 {code} "-t", configuration.get('kerberos', 'keytab'), # specify keytab "-c", configuration.get('kerberos', 'ccache'), # specify credentials cache {code} Notice help for `airflow kerberos`: {noformat} $ airflow kerberos -h [2017-03-15 00:40:12,215] {__init__.py:57} INFO - Using executor LocalExecutor usage: airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D] [--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE] [principal] {noformat} One can think that you can provide principal and keytab as `airflow kerberos` - that's not true and it's a bug. Although it's not a critical bug as I was able to make `airflow kerberos` working just by adding kerberos section in airflow.cfg `airflow kerberos -h` has to be corrected to reflect that `airflow kerberos` doesn't actually accept principal and keytab as arguments. Thank you. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925625#comment-15925625 ] Bolke de Bruin commented on AIRFLOW-987: I cannot reproduce it and we are using kerberos in production. The error you are showing comes from kerberos and not from airflow. So you need to be very elaborate why you think this is an airflow bug and not a configuration issue. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608 ] Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:24 AM: I use kinit very often and familiar with the tool. kinit works fine outside of Airflow {noformat} $ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $? 0 rdautkha@pc1udatahgw01 airflow $ klist | grep "03/15/17" 03/15/17 00:19:38 03/15/17 10:19:40 krbtgt/corp.some@corp.some.com {noformat} (I've changed realm) If you didn't notice `airflow kerberos` used "airflow" as principal and "airflow.keytab" in the output dump above, no matter which parameters I give. was (Author: tagar): I use kinit very often and familiar with the tool. kinit works fine outside of Airflow {noformat} $ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $? 0 rdautkha@pc1udatahgw01 airflow $ klist | grep "03/15/17" 03/15/17 00:19:38 03/15/17 10:19:40 krbtgt/corp.epsilon@corp.some.com {noformat} (I've changed realm) If you didn't notice `airflow kerberos` used "airflow" as principal and "airflow.keytab" in the output dump above, no matter which parameters I give. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608 ] Ruslan Dautkhanov edited comment on AIRFLOW-987 at 3/15/17 6:24 AM: I use kinit very often and familiar with the tool. kinit works fine outside of Airflow {noformat} $ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $? 0 rdautkha@pc1udatahgw01 airflow $ klist | grep "03/15/17" 03/15/17 00:19:38 03/15/17 10:19:40 krbtgt/corp.epsilon@corp.some.com {noformat} (I've changed realm) If you didn't notice `airflow kerberos` used "airflow" as principal and "airflow.keytab" in the output dump above, no matter which parameters I give. was (Author: tagar): I use kinit very often and faniliar with the tool. kinit works fine outside of Airflow {noformat} $ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $? 0 rdautkha@pc1udatahgw01 airflow $ klist | grep "03/15/17" 03/15/17 00:19:38 03/15/17 10:19:40 krbtgt/corp.epsilon@corp.some.com {noformat} (I've changed realm) If you didn't notice `airflow kerberos` used "airflow" as principal and "airflow.keytab" in the output dump above, no matter which parameters I give. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925608#comment-15925608 ] Ruslan Dautkhanov commented on AIRFLOW-987: --- I use kinit very often and faniliar with the tool. kinit works fine outside of Airflow {noformat} $ kinit -kt /home/rdautkha/.keytab rdautkha...@corp.some.com; echo $? 0 rdautkha@pc1udatahgw01 airflow $ klist | grep "03/15/17" 03/15/17 00:19:38 03/15/17 10:19:40 krbtgt/corp.epsilon@corp.some.com {noformat} (I've changed realm) If you didn't notice `airflow kerberos` used "airflow" as principal and "airflow.keytab" in the output dump above, no matter which parameters I give. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925604#comment-15925604 ] Bolke de Bruin commented on AIRFLOW-987: This is an Kerberos error. You are specifying invalid credentials. > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-987. Resolution: Not A Bug Assignee: Bolke de Bruin > `airflow kerberos` ignores --keytab and --principal arguments > - > > Key: AIRFLOW-987 > URL: https://issues.apache.org/jira/browse/AIRFLOW-987 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 > Environment: 1.8-rc5 >Reporter: Ruslan Dautkhanov >Assignee: Bolke de Bruin > Labels: easyfix, kerberos, security > > No matter which arguments I pass to `airflow kerberos`, > it always executes as `kinit -r 3600m -k -t airflow.keytab -c > /tmp/airflow_krb5_ccache airflow` > So it failes with expected "kinit: Keytab contains no suitable keys for > airf...@corp.some.com while getting initial credentials" > Tried different arguments, -kt and --keytab, here's one of the runs (some > lines wrapped for readability): > {noformat} > $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com > [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor > [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from > keytab: > kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow > [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR - > Couldn't reinit from keytab! `kinit' exited with 1. > kinit: Keytab contains no suitable keys for airf...@corp.some.com > while getting initial credentials > {noformat} > 1.8-rc5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)