[jira] [Commented] (AIRFLOW-1038) Specify celery serializers explicitly and pin version
[ https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954271#comment-15954271 ] ASF subversion and git services commented on AIRFLOW-1038: -- Commit 34ee1dc0373708f7db0a562ac470338c6126d20a in incubator-airflow's branch refs/heads/master from [~saguziel] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=34ee1dc ] [AIRFLOW-1038] Specify celery serialization options explicitly Specify the CELERY_TASK_SERIALIZER and CELERY_RESULT_SERIALIZER as pickle explicitly, and CELERY_EVENT_SERIALIZER as json. > Specify celery serializers explicitly and pin version > - > > Key: AIRFLOW-1038 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1038 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Celery 3->4 upgrade changes the default task and result serializer from > pickle to json. Pickle is faster and supports more types > http://docs.celeryproject.org/en/latest/userguide/calling.html > This also causes issues when different versions of celery are running on > different hosts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1038) Specify celery serializers explicitly and pin version
[ https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1038. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2185 [https://github.com/apache/incubator-airflow/pull/2185] > Specify celery serializers explicitly and pin version > - > > Key: AIRFLOW-1038 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1038 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Celery 3->4 upgrade changes the default task and result serializer from > pickle to json. Pickle is faster and supports more types > http://docs.celeryproject.org/en/latest/userguide/calling.html > This also causes issues when different versions of celery are running on > different hosts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[1/2] incubator-airflow git commit: [AIRFLOW-1038] Specify celery serialization options explicitly
Repository: incubator-airflow Updated Branches: refs/heads/master c64e876bd -> 75addb4a9 [AIRFLOW-1038] Specify celery serialization options explicitly Specify the CELERY_TASK_SERIALIZER and CELERY_RESULT_SERIALIZER as pickle explicitly, and CELERY_EVENT_SERIALIZER as json. Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/34ee1dc0 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/34ee1dc0 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/34ee1dc0 Branch: refs/heads/master Commit: 34ee1dc0373708f7db0a562ac470338c6126d20a Parents: b2b9587 Author: Alex GuzielAuthored: Fri Mar 24 11:51:39 2017 -0700 Committer: Alex Guziel Committed: Mon Apr 3 15:33:56 2017 -0700 -- airflow/executors/celery_executor.py | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/34ee1dc0/airflow/executors/celery_executor.py -- diff --git a/airflow/executors/celery_executor.py b/airflow/executors/celery_executor.py index 04414fb..e0c94c1 100644 --- a/airflow/executors/celery_executor.py +++ b/airflow/executors/celery_executor.py @@ -36,6 +36,9 @@ DEFAULT_QUEUE = configuration.get('celery', 'DEFAULT_QUEUE') class CeleryConfig(object): CELERY_ACCEPT_CONTENT = ['json', 'pickle'] +CELERY_EVENT_SERIALIZER = 'json' +CELERY_RESULT_SERIALIZER = 'pickle' +CELERY_TASK_SERIALIZER = 'pickle' CELERYD_PREFETCH_MULTIPLIER = 1 CELERY_ACKS_LATE = True BROKER_URL = configuration.get('celery', 'BROKER_URL')
[2/2] incubator-airflow git commit: Merge pull request #2185 from saguziel/aguziel-celery-fix
Merge pull request #2185 from saguziel/aguziel-celery-fix Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/75addb4a Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/75addb4a Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/75addb4a Branch: refs/heads/master Commit: 75addb4a9fba57df39d59d26d15831080fb30ef0 Parents: c64e876 34ee1dc Author: Arthur WiedmerAuthored: Mon Apr 3 15:42:16 2017 -0700 Committer: Arthur Wiedmer Committed: Mon Apr 3 15:42:16 2017 -0700 -- airflow/executors/celery_executor.py | 3 +++ 1 file changed, 3 insertions(+) --
[jira] [Created] (AIRFLOW-1064) TaskInstanceModelView is slow
Alex Guziel created AIRFLOW-1064: Summary: TaskInstanceModelView is slow Key: AIRFLOW-1064 URL: https://issues.apache.org/jira/browse/AIRFLOW-1064 Project: Apache Airflow Issue Type: Improvement Reporter: Alex Guziel Assignee: Alex Guziel Due to a bad query (a full table scan), the TaskInstanceModelView is very slow. Adding an index is costly, and job_id is a good approximation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (AIRFLOW-1052) dags/test_dag.py can not be imported
[ https://issues.apache.org/jira/browse/AIRFLOW-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kengo Seki closed AIRFLOW-1052. --- Resolution: Duplicate > dags/test_dag.py can not be imported > > > Key: AIRFLOW-1052 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1052 > Project: Apache Airflow > Issue Type: Bug >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > AIRFLOW-863 broke dags/test_dag.py as follows: > {code} > [2017-03-29 14:53:39,268] [13910] {models.py:271} ERROR - Failed to import: > /home/sekikn/incubator-airflow/dags/test_dag.py > Traceback (most recent call last): > File "/home/sekikn/incubator-airflow/airflow/models.py", line 268, in > process_file > m = imp.load_source(mod_name, filepath) > File "/home/sekikn/incubator-airflow/dags/test_dag.py", line 27, in > 'start_date': airflow.utils.dates.days_ago(2) > NameError: name 'airflow' is not defined > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1011) Fix bug in BackfillJob._execute() for SubDAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1011: Fix Version/s: 1.8.1 > Fix bug in BackfillJob._execute() for SubDAGs > - > > Key: AIRFLOW-1011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1011 > Project: Apache Airflow > Issue Type: Bug > Components: backfill, subdag >Affects Versions: 1.8.0 >Reporter: Joe Schmid >Priority: Blocker > Fix For: 1.8.1 > > Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, > 2-ZoomedSubDAG-NoTaskInstances-v1.8.png, > 3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, subdag_task_instance_logs.txt, > test_subdag.py > > > The attached test SubDAG is not executed when the parent DAG is triggered > manually. Attached is a simple test DAG that exhibits the issue along with > screenshots showing the UI differences between v1.8 and v1.7.1.3. > Note that if the DAG is run via backfill from command line (e.g. "airflow > backfill Test_SubDAG -s 2017-03-18 -e 2017-03-18") the task instances show up > successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1003) DAG status flips erraticly
[ https://issues.apache.org/jira/browse/AIRFLOW-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1003: Fix Version/s: (was: 1.8.1) > DAG status flips erraticly > -- > > Key: AIRFLOW-1003 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1003 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: Airflow 1.8, 1.8.0rc5 >Reporter: Ruslan Dautkhanov > > Created a flow based on sample tutorial > https://airflow.incubator.apache.org/tutorial.html > (just changed dag-id to 'turorial-RD'). > See a 10 seconds video on this behavior: > http://screencast-o-matic.com/watch/cbebrn6kBw > Notice last DAG 'turorial-RD' keeps changing state > (it loses link and all the buttons and icons to show status > of the dag runs / tasks etc), then links and icons show back up. > In fact, during that short 10 seconds period it transitioned > that state 4 times (from "disabled" to a normal DAG). > There were no changes happenning in the system - I was just clicking > refresh in browser from time to time. All DAGs were disabled while > this was happening. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1019: Issue Type: Improvement (was: Bug) > active_dagruns shouldn't include paused DAGs > > > Key: AIRFLOW-1019 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1019 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.8.0 >Reporter: Dan Davydov >Priority: Critical > Fix For: 1.8.1 > > > Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not > in the executor's memory). The problem is that Airflow counts dagruns in > paused DAGs as running as long as the dagruns state is running. Instead we > should join against non-paused DAGs everywhere we calculate active dagruns > (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there > are enough paused DAGs it brings the scheduler to a halt especially on > scheduler restarts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1019: Affects Version/s: 1.8.0 > active_dagruns shouldn't include paused DAGs > > > Key: AIRFLOW-1019 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1019 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.8.0 >Reporter: Dan Davydov >Priority: Critical > Fix For: 1.8.1 > > > Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not > in the executor's memory). The problem is that Airflow counts dagruns in > paused DAGs as running as long as the dagruns state is running. Instead we > should join against non-paused DAGs everywhere we calculate active dagruns > (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there > are enough paused DAGs it brings the scheduler to a halt especially on > scheduler restarts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1019: Priority: Critical (was: Blocker) > active_dagruns shouldn't include paused DAGs > > > Key: AIRFLOW-1019 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1019 > Project: Apache Airflow > Issue Type: Bug >Reporter: Dan Davydov >Priority: Critical > Fix For: 1.8.1 > > > Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not > in the executor's memory). The problem is that Airflow counts dagruns in > paused DAGs as running as long as the dagruns state is running. Instead we > should join against non-paused DAGs everywhere we calculate active dagruns > (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there > are enough paused DAGs it brings the scheduler to a halt especially on > scheduler restarts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1054) Fix broken import on test_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954091#comment-15954091 ] ASF subversion and git services commented on AIRFLOW-1054: -- Commit 68b1c982e048878ec9dd658072c147e4341bf2c2 in incubator-airflow's branch refs/heads/v1-8-test from Siddharth Anand [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=68b1c98 ] [AIRFLOW-1054] Fix broken import in test_dag Closes #2201 from r39132/fix_broken_import_on_test_dag (cherry picked from commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640) > Fix broken import on test_dag > - > > Key: AIRFLOW-1054 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1054 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Minor > Fix For: 1.8.1 > > > Fixes the issue below > {noformat} > [2017-03-29 18:08:54,355] [30387] {models.py:271} ERROR - Failed to import: > /Users/siddharth/Projects/airflow/dags/test_dag.py > Traceback (most recent call last): > File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in > process_file > m = imp.load_source(mod_name, filepath) > File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in > > 'start_date': airflow.utils.dates.days_ago(2) > NameError: name 'airflow' is not defined > [2017-03-29 18:08:54,406] [30388] {models.py:172} INFO - Filling up the > DagBag from /Users/siddharth/Projects/airflow/dags > [2017-03-29 18:08:54,408] [30388] {models.py:271} ERROR - Failed to import: > /Users/siddharth/Projects/airflow/dags/test_dag.py > Traceback (most recent call last): > File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in > process_file > m = imp.load_source(mod_name, filepath) > File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in > > 'start_date': airflow.utils.dates.days_ago(2) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-1054] Fix broken import in test_dag
Repository: incubator-airflow Updated Branches: refs/heads/v1-8-test 5eb33358f -> 68b1c982e [AIRFLOW-1054] Fix broken import in test_dag Closes #2201 from r39132/fix_broken_import_on_test_dag (cherry picked from commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640) Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/68b1c982 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/68b1c982 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/68b1c982 Branch: refs/heads/v1-8-test Commit: 68b1c982e048878ec9dd658072c147e4341bf2c2 Parents: 5eb3335 Author: Siddharth AnandAuthored: Mon Apr 3 13:10:51 2017 -0700 Committer: Chris Riccomini Committed: Mon Apr 3 13:11:42 2017 -0700 -- dags/test_dag.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/68b1c982/dags/test_dag.py -- diff --git a/dags/test_dag.py b/dags/test_dag.py index db0b648..f2a9f6a 100644 --- a/dags/test_dag.py +++ b/dags/test_dag.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +from airflow import utils from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from datetime import datetime, timedelta @@ -24,7 +25,7 @@ DAG_NAME = 'test_dag_v1' default_args = { 'owner': 'airflow', 'depends_on_past': True, -'start_date': airflow.utils.dates.days_ago(2) +'start_date': utils.dates.days_ago(2) } dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *', default_args=default_args)
[jira] [Commented] (AIRFLOW-1054) Fix broken import on test_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954089#comment-15954089 ] ASF subversion and git services commented on AIRFLOW-1054: -- Commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640 in incubator-airflow's branch refs/heads/master from Siddharth Anand [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=c64e876 ] [AIRFLOW-1054] Fix broken import in test_dag Closes #2201 from r39132/fix_broken_import_on_test_dag > Fix broken import on test_dag > - > > Key: AIRFLOW-1054 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1054 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Minor > Fix For: 1.8.1 > > > Fixes the issue below > {noformat} > [2017-03-29 18:08:54,355] [30387] {models.py:271} ERROR - Failed to import: > /Users/siddharth/Projects/airflow/dags/test_dag.py > Traceback (most recent call last): > File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in > process_file > m = imp.load_source(mod_name, filepath) > File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in > > 'start_date': airflow.utils.dates.days_ago(2) > NameError: name 'airflow' is not defined > [2017-03-29 18:08:54,406] [30388] {models.py:172} INFO - Filling up the > DagBag from /Users/siddharth/Projects/airflow/dags > [2017-03-29 18:08:54,408] [30388] {models.py:271} ERROR - Failed to import: > /Users/siddharth/Projects/airflow/dags/test_dag.py > Traceback (most recent call last): > File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in > process_file > m = imp.load_source(mod_name, filepath) > File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in > > 'start_date': airflow.utils.dates.days_ago(2) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-1054] Fix broken import in test_dag
Repository: incubator-airflow Updated Branches: refs/heads/master daa281c03 -> c64e876bd [AIRFLOW-1054] Fix broken import in test_dag Closes #2201 from r39132/fix_broken_import_on_test_dag Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c64e876b Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c64e876b Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c64e876b Branch: refs/heads/master Commit: c64e876bd50eeb6c9e2600ac9d832c05eb5e9640 Parents: daa281c Author: Siddharth AnandAuthored: Mon Apr 3 13:10:51 2017 -0700 Committer: Chris Riccomini Committed: Mon Apr 3 13:10:51 2017 -0700 -- dags/test_dag.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c64e876b/dags/test_dag.py -- diff --git a/dags/test_dag.py b/dags/test_dag.py index db0b648..f2a9f6a 100644 --- a/dags/test_dag.py +++ b/dags/test_dag.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +from airflow import utils from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from datetime import datetime, timedelta @@ -24,7 +25,7 @@ DAG_NAME = 'test_dag_v1' default_args = { 'owner': 'airflow', 'depends_on_past': True, -'start_date': airflow.utils.dates.days_ago(2) +'start_date': utils.dates.days_ago(2) } dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *', default_args=default_args)
[jira] [Commented] (AIRFLOW-1055) airflow/jobs.py:create_dag_run() exception for @once dag when catchup = False
[ https://issues.apache.org/jira/browse/AIRFLOW-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954080#comment-15954080 ] Bolke de Bruin commented on AIRFLOW-1055: - Sid, I don't think this is in create_dag_run() (the traceback doe not list that function) and it only happens when a DAG with a @once schedule has a sla associated with it. Which is not very common. I don't think this should be a blocker (but nevertheless a fix would be very nice) [~criccomini] > airflow/jobs.py:create_dag_run() exception for @once dag when catchup = False > - > > Key: AIRFLOW-1055 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1055 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Blocker > Labels: dagrun, once, scheduler, sla > Fix For: 1.8.1 > > > Getting following exception > {noformat} > [2017-03-19 20:16:25,786] {jobs.py:354} DagFileProcessor2638 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", > line 346, in helper > pickle_dags) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", > line 1581, in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", > line 1175, in _process_dags > self.manage_slas(dag) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", > line 595, in manage_slas > while dttm < datetime.now(): > TypeError: can't compare datetime.datetime to NoneType > {noformat} > Exception is in airflow/jobs.py:manage_slas() : > https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L595 > {code} > ts = datetime.now() > SlaMiss = models.SlaMiss > for ti in max_tis: > task = dag.get_task(ti.task_id) > dttm = ti.execution_date > if task.sla: > dttm = dag.following_schedule(dttm) > >>> while dttm < datetime.now(): <<< here > following_schedule = dag.following_schedule(dttm) > if following_schedule + task.sla < datetime.now(): > session.merge(models.SlaMiss( > task_id=ti.task_id, > {code} > It seems that dag.following_schedule() returns None for @once dag? > Here's how dag is defined: > {code} > import datetime as dt > import airflow > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > from datetime import datetime, timedelta > def sla_alert_func(dag, task_list, blocking_task_list, slas, blocking_tis): > print('Executing SLA miss callback') > now = datetime.now() > now_to_the_hour = now.replace(hour=now.time().hour, minute=0, second=0, > microsecond=0) > START_DATE = now_to_the_hour + timedelta(hours=-3) > DAG_NAME = 'manage_sla_once_dag' > default_args = { > 'owner': 'sanand', > 'depends_on_past': False, > 'start_date': START_DATE, > 'sla': timedelta(hours=2) > } > dag = DAG( > dag_id = 'manage_sla_once_dag', > default_args = default_args, > catchup = False, > schedule_interval = '@once', > sla_miss_callback = sla_alert_func > ) > task1 = DummyOperator(task_id='task1', dag=dag) > {code} > This issue works if catchup = True. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE
[ https://issues.apache.org/jira/browse/AIRFLOW-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1007. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2184 [https://github.com/apache/incubator-airflow/pull/2184] > Jinja sandbox is vulnerable to RCE > -- > > Key: AIRFLOW-1007 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1007 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Right now, the jinja template functionality in chart_data takes arbitrary > strings and executes them. We should use the sandbox functionality to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE
[ https://issues.apache.org/jira/browse/AIRFLOW-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954034#comment-15954034 ] ASF subversion and git services commented on AIRFLOW-1007: -- Commit daa281c0364609d6812921123cf47e4118b40484 in incubator-airflow's branch refs/heads/master from [~saguziel] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=daa281c ] [AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint Right now, users can put in arbitrary strings into the chart_data endpoint, and execute arbitrary code using the chart_data endpoint. By using literal_eval and ImmutableSandboxedEnvironment, we can reduce RCE. Right now, users can put in arbitrary strings into the chart_data endpoint, and execute arbitrary code using the chart_data endpoint. By using literal_eval and ImmutableSandboxedEnvironment, we can prevent RCE. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1007 ### Description - [x] I changed Jinja to use the ImmutableSandboxedEnvironment, and used literal_eval, to limit the amount of RCE. ### Tests - [x] My PR adds the following unit tests: SecurityTest chart_data tests ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" to: aoen plypaul artwr bolkedebruin Closes #2184 from saguziel/aguziel-jinja-2 > Jinja sandbox is vulnerable to RCE > -- > > Key: AIRFLOW-1007 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1007 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Right now, the jinja template functionality in chart_data takes arbitrary > strings and executes them. We should use the sandbox functionality to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint
Repository: incubator-airflow Updated Branches: refs/heads/master b55f41f2c -> daa281c03 [AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint Right now, users can put in arbitrary strings into the chart_data endpoint, and execute arbitrary code using the chart_data endpoint. By using literal_eval and ImmutableSandboxedEnvironment, we can reduce RCE. Right now, users can put in arbitrary strings into the chart_data endpoint, and execute arbitrary code using the chart_data endpoint. By using literal_eval and ImmutableSandboxedEnvironment, we can prevent RCE. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1007 ### Description - [x] I changed Jinja to use the ImmutableSandboxedEnvironment, and used literal_eval, to limit the amount of RCE. ### Tests - [x] My PR adds the following unit tests: SecurityTest chart_data tests ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" to: aoen plypaul artwr bolkedebruin Closes #2184 from saguziel/aguziel-jinja-2 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/daa281c0 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/daa281c0 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/daa281c0 Branch: refs/heads/master Commit: daa281c0364609d6812921123cf47e4118b40484 Parents: b55f41f Author: Alex GuzielAuthored: Mon Apr 3 12:16:00 2017 -0700 Committer: Arthur Wiedmer Committed: Mon Apr 3 12:16:00 2017 -0700 -- airflow/www/views.py | 7 --- tests/core.py| 38 ++ 2 files changed, 42 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/daa281c0/airflow/www/views.py -- diff --git a/airflow/www/views.py b/airflow/www/views.py index 0def0a9..a9bab31 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -43,7 +43,7 @@ from flask_admin.tools import iterdecode from flask_login import flash from flask._compat import PY2 -import jinja2 +from jinja2.sandbox import ImmutableSandboxedEnvironment import markdown import nvd3 @@ -328,8 +328,9 @@ class Airflow(BaseView): request_dict = {k: request.args.get(k) for k in request.args} args.update(request_dict) args['macros'] = macros -sql = jinja2.Template(chart.sql).render(**args) -label = jinja2.Template(chart.label).render(**args) +sandbox = ImmutableSandboxedEnvironment() +sql = sandbox.from_string(chart.sql).render(**args) +label = sandbox.from_string(chart.label).render(**args) payload['sql_html'] = Markup(highlight( sql, lexers.SqlLexer(), # Lexer call http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/daa281c0/tests/core.py -- diff --git a/tests/core.py b/tests/core.py index bd52d19..997bb42 100644 --- a/tests/core.py +++ b/tests/core.py @@ -63,6 +63,8 @@ from airflow.utils.logging import LoggingMixin from lxml import html from airflow.exceptions import AirflowException from airflow.configuration import AirflowConfigException, run_command +from jinja2.sandbox import SecurityError +from jinja2 import UndefinedError import six @@ -1469,6 +1471,42 @@ class SecurityTests(unittest.TestCase): response = self.app.get("/admin/log", follow_redirects=True) self.assertIn(bleach.clean("alert(123456)"), response.data.decode('UTF-8')) +def test_chart_data_template(self): +"""Protect chart_data from being able to do RCE.""" +session = settings.Session() +Chart = models.Chart +chart1 = Chart( +label='insecure_chart', +conn_id='airflow_db', +chart_type='bar', +sql="SELECT {{ ''.__class__.__mro__[1].__subclasses__() }}"
[jira] [Created] (AIRFLOW-1063) A manually-created DAG run can prevent a scheduled run to be created
Vitor Baptista created AIRFLOW-1063: --- Summary: A manually-created DAG run can prevent a scheduled run to be created Key: AIRFLOW-1063 URL: https://issues.apache.org/jira/browse/AIRFLOW-1063 Project: Apache Airflow Issue Type: Improvement Components: scheduler Affects Versions: Airflow 1.7.1.3 Reporter: Vitor Baptista I manually created a DAG Run with the {{execution_date}} as {{2017-03-01 00:00:00}} on a monthly-recurrent DAG. After a while, I noticed that the scheduled run was never created and checked the scheduler's logs, finding this traceback: {quote} Process Process-475397: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 664, in _do_dags dag = dagbag.get_dag(dag.dag_id) File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 188, in get_dag orm_dag = DagModel.get_current(root_dag_id) File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 2320, in get_current obj = session.query(cls).filter(cls.dag_id == dag_id).first() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2690, in first ret = list(self[0:1]) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2482, in __getitem__ return list(res) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2790, in __iter__ return self._execute_and_instances(context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2811, in _execute_and_instances close_with_result=True) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2820, in _get_bind_args **kw File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2802, in _connection_from_session conn = self.session.connection(**kw) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 966, in connection execution_options=execution_options) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 971, in _connection_for_bind engine, execution_options) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 382, in _connection_for_bind self._assert_active() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 276, in _assert_active % self._rollback_exception InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_execution_date_key" DETAIL: Key (dag_id, execution_date)=(nct, 2017-03-01 00:00:00) already exists. [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, state, run_id, external_trigger, conf) VALUES (%(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: {'end_date': None, 'run_id': u'scheduled__2017-03-01T00:00:00', 'execution_date': datetime.datetime(2017, 3, 1, 0, 0), 'external_trigger': False, 'state': u'running', 'conf': None, 'start_date': dateti me.datetime(2017, 4, 3, 13, 48, 39, 168456), 'dag_id': 'nct'}] {quote} The problem is that the {{dag_runs}} table require the {{(dag_id, execution_date)}} pair to be unique, so the scheduler was stuck in a loop where it tried creating a new scheduled dag run but failed, as I had already created one on the same {{execution_date}}. This was surprising. As a user, I would expect that it would either schedule the run normally, even if there's a manual one on the same date, or maybe it would skip that execution date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1018) Scheduler DAG processes can not log to stdout
[ https://issues.apache.org/jira/browse/AIRFLOW-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated AIRFLOW-1018: Priority: Critical (was: Blocker) > Scheduler DAG processes can not log to stdout > - > > Key: AIRFLOW-1018 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1018 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Airflow 1.8.0 >Reporter: Vincent Poulain >Priority: Critical > Fix For: 1.8.1 > > > Each DAG has its own log file for the scheduler and we can specify the > directory with child_process_log_directory param. > Unfortunately we can not change device / by specifying /dev/stdout for > example. That is very useful when we execute Airflow in a container. > When we specify /dev/stdout it raises: > "OSError: [Errno 20] Not a directory: '/dev/stdout/2017-03-19'" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1062) DagRun#find returns wrong result if external_trigger=False is specified
Kengo Seki created AIRFLOW-1062: --- Summary: DagRun#find returns wrong result if external_trigger=False is specified Key: AIRFLOW-1062 URL: https://issues.apache.org/jira/browse/AIRFLOW-1062 Project: Apache Airflow Issue Type: Bug Components: models Reporter: Kengo Seki Assignee: Kengo Seki Given the following record, {code} sqlite> select id, external_trigger from dag_run; 1|1 sqlite> {code} the following code should return no result, {code} In [1]: from airflow import models In [2]: models.DagRun.find(external_trigger=False) {code} ... but an externally-triggered record is returned erroneously. {code} Out[2]: [] {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)