[jira] [Commented] (AIRFLOW-1038) Specify celery serializers explicitly and pin version

2017-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954271#comment-15954271
 ] 

ASF subversion and git services commented on AIRFLOW-1038:
--

Commit 34ee1dc0373708f7db0a562ac470338c6126d20a in incubator-airflow's branch 
refs/heads/master from [~saguziel]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=34ee1dc ]

[AIRFLOW-1038] Specify celery serialization options explicitly

Specify the CELERY_TASK_SERIALIZER and CELERY_RESULT_SERIALIZER as
pickle explicitly, and CELERY_EVENT_SERIALIZER as json.


> Specify celery serializers explicitly and pin version
> -
>
> Key: AIRFLOW-1038
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1038
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Celery 3->4 upgrade changes the default task and result serializer from 
> pickle to json. Pickle is faster and supports more types 
> http://docs.celeryproject.org/en/latest/userguide/calling.html
> This also causes issues when different versions of celery are running on 
> different hosts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1038) Specify celery serializers explicitly and pin version

2017-04-03 Thread Arthur Wiedmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arthur Wiedmer resolved AIRFLOW-1038.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2185
[https://github.com/apache/incubator-airflow/pull/2185]

> Specify celery serializers explicitly and pin version
> -
>
> Key: AIRFLOW-1038
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1038
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Celery 3->4 upgrade changes the default task and result serializer from 
> pickle to json. Pickle is faster and supports more types 
> http://docs.celeryproject.org/en/latest/userguide/calling.html
> This also causes issues when different versions of celery are running on 
> different hosts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] incubator-airflow git commit: [AIRFLOW-1038] Specify celery serialization options explicitly

2017-04-03 Thread arthur
Repository: incubator-airflow
Updated Branches:
  refs/heads/master c64e876bd -> 75addb4a9


[AIRFLOW-1038] Specify celery serialization options explicitly

Specify the CELERY_TASK_SERIALIZER and CELERY_RESULT_SERIALIZER as
pickle explicitly, and CELERY_EVENT_SERIALIZER as json.


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/34ee1dc0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/34ee1dc0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/34ee1dc0

Branch: refs/heads/master
Commit: 34ee1dc0373708f7db0a562ac470338c6126d20a
Parents: b2b9587
Author: Alex Guziel 
Authored: Fri Mar 24 11:51:39 2017 -0700
Committer: Alex Guziel 
Committed: Mon Apr 3 15:33:56 2017 -0700

--
 airflow/executors/celery_executor.py | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/34ee1dc0/airflow/executors/celery_executor.py
--
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 04414fb..e0c94c1 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -36,6 +36,9 @@ DEFAULT_QUEUE = configuration.get('celery', 'DEFAULT_QUEUE')
 
 class CeleryConfig(object):
 CELERY_ACCEPT_CONTENT = ['json', 'pickle']
+CELERY_EVENT_SERIALIZER = 'json'
+CELERY_RESULT_SERIALIZER = 'pickle'
+CELERY_TASK_SERIALIZER = 'pickle'
 CELERYD_PREFETCH_MULTIPLIER = 1
 CELERY_ACKS_LATE = True
 BROKER_URL = configuration.get('celery', 'BROKER_URL')



[2/2] incubator-airflow git commit: Merge pull request #2185 from saguziel/aguziel-celery-fix

2017-04-03 Thread arthur
Merge pull request #2185 from saguziel/aguziel-celery-fix


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/75addb4a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/75addb4a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/75addb4a

Branch: refs/heads/master
Commit: 75addb4a9fba57df39d59d26d15831080fb30ef0
Parents: c64e876 34ee1dc
Author: Arthur Wiedmer 
Authored: Mon Apr 3 15:42:16 2017 -0700
Committer: Arthur Wiedmer 
Committed: Mon Apr 3 15:42:16 2017 -0700

--
 airflow/executors/celery_executor.py | 3 +++
 1 file changed, 3 insertions(+)
--




[jira] [Created] (AIRFLOW-1064) TaskInstanceModelView is slow

2017-04-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1064:


 Summary: TaskInstanceModelView is slow
 Key: AIRFLOW-1064
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1064
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Due to a bad query (a full table scan), the TaskInstanceModelView is very slow. 
Adding an index is costly, and job_id is a good approximation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-1052) dags/test_dag.py can not be imported

2017-04-03 Thread Kengo Seki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kengo Seki closed AIRFLOW-1052.
---
Resolution: Duplicate

> dags/test_dag.py can not be imported
> 
>
> Key: AIRFLOW-1052
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1052
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> AIRFLOW-863 broke dags/test_dag.py as follows:
> {code}
> [2017-03-29 14:53:39,268] [13910] {models.py:271} ERROR - Failed to import: 
> /home/sekikn/incubator-airflow/dags/test_dag.py
> Traceback (most recent call last):
>   File "/home/sekikn/incubator-airflow/airflow/models.py", line 268, in 
> process_file
> m = imp.load_source(mod_name, filepath)
>   File "/home/sekikn/incubator-airflow/dags/test_dag.py", line 27, in 
> 'start_date': airflow.utils.dates.days_ago(2)
> NameError: name 'airflow' is not defined
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1011) Fix bug in BackfillJob._execute() for SubDAGs

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1011:

Fix Version/s: 1.8.1

> Fix bug in BackfillJob._execute() for SubDAGs
> -
>
> Key: AIRFLOW-1011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1011
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill, subdag
>Affects Versions: 1.8.0
>Reporter: Joe Schmid
>Priority: Blocker
> Fix For: 1.8.1
>
> Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, 
> 2-ZoomedSubDAG-NoTaskInstances-v1.8.png, 
> 3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, subdag_task_instance_logs.txt, 
> test_subdag.py
>
>
> The attached test SubDAG is not executed when the parent DAG is triggered 
> manually. Attached is a simple test DAG that exhibits the issue along with 
> screenshots showing the UI differences between v1.8 and v1.7.1.3.
> Note that if the DAG is run via backfill from command line (e.g. "airflow 
> backfill Test_SubDAG -s 2017-03-18 -e 2017-03-18") the task instances show up 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1003) DAG status flips erraticly

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1003:

Fix Version/s: (was: 1.8.1)

> DAG status flips erraticly
> --
>
> Key: AIRFLOW-1003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1003
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: Airflow 1.8, 1.8.0rc5
>Reporter: Ruslan Dautkhanov
>
> Created a flow based on sample tutorial 
> https://airflow.incubator.apache.org/tutorial.html
> (just changed dag-id to 'turorial-RD').
> See a 10 seconds video on this behavior:
> http://screencast-o-matic.com/watch/cbebrn6kBw
> Notice last DAG 'turorial-RD' keeps changing state 
> (it loses link and all the buttons and icons to show status 
>  of the dag runs  / tasks etc), then links and icons show back up.
> In fact, during that short 10 seconds period it transitioned 
> that state 4 times (from "disabled" to a normal DAG).
> There were no changes happenning in the system - I was just clicking
> refresh in browser from time to time. All DAGs were disabled while
> this was happening.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1019:

Issue Type: Improvement  (was: Bug)

> active_dagruns shouldn't include paused DAGs
> 
>
> Key: AIRFLOW-1019
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1019
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Dan Davydov
>Priority: Critical
> Fix For: 1.8.1
>
>
> Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not 
> in the executor's memory). The problem is that Airflow counts dagruns in 
> paused DAGs as running as long as the dagruns state is running. Instead we 
> should join against non-paused DAGs everywhere we calculate active dagruns 
> (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there 
> are enough paused DAGs it brings the scheduler to a halt especially on 
> scheduler restarts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1019:

Affects Version/s: 1.8.0

> active_dagruns shouldn't include paused DAGs
> 
>
> Key: AIRFLOW-1019
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1019
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.8.0
>Reporter: Dan Davydov
>Priority: Critical
> Fix For: 1.8.1
>
>
> Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not 
> in the executor's memory). The problem is that Airflow counts dagruns in 
> paused DAGs as running as long as the dagruns state is running. Instead we 
> should join against non-paused DAGs everywhere we calculate active dagruns 
> (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there 
> are enough paused DAGs it brings the scheduler to a halt especially on 
> scheduler restarts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1019:

Priority: Critical  (was: Blocker)

> active_dagruns shouldn't include paused DAGs
> 
>
> Key: AIRFLOW-1019
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1019
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Priority: Critical
> Fix For: 1.8.1
>
>
> Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not 
> in the executor's memory). The problem is that Airflow counts dagruns in 
> paused DAGs as running as long as the dagruns state is running. Instead we 
> should join against non-paused DAGs everywhere we calculate active dagruns 
> (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there 
> are enough paused DAGs it brings the scheduler to a halt especially on 
> scheduler restarts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1054) Fix broken import on test_dag

2017-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954091#comment-15954091
 ] 

ASF subversion and git services commented on AIRFLOW-1054:
--

Commit 68b1c982e048878ec9dd658072c147e4341bf2c2 in incubator-airflow's branch 
refs/heads/v1-8-test from Siddharth Anand
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=68b1c98 ]

[AIRFLOW-1054] Fix broken import in test_dag

Closes #2201 from
r39132/fix_broken_import_on_test_dag

(cherry picked from commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640)


> Fix broken import on test_dag
> -
>
> Key: AIRFLOW-1054
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1054
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Minor
> Fix For: 1.8.1
>
>
> Fixes the issue below
> {noformat}
> [2017-03-29 18:08:54,355] [30387] {models.py:271} ERROR - Failed to import: 
> /Users/siddharth/Projects/airflow/dags/test_dag.py
> Traceback (most recent call last):
>   File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in 
> process_file
> m = imp.load_source(mod_name, filepath)
>   File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in 
> 
> 'start_date': airflow.utils.dates.days_ago(2)
> NameError: name 'airflow' is not defined
> [2017-03-29 18:08:54,406] [30388] {models.py:172} INFO - Filling up the 
> DagBag from /Users/siddharth/Projects/airflow/dags
> [2017-03-29 18:08:54,408] [30388] {models.py:271} ERROR - Failed to import: 
> /Users/siddharth/Projects/airflow/dags/test_dag.py
> Traceback (most recent call last):
>   File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in 
> process_file
> m = imp.load_source(mod_name, filepath)
>   File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in 
> 
> 'start_date': airflow.utils.dates.days_ago(2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1054] Fix broken import in test_dag

2017-04-03 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-8-test 5eb33358f -> 68b1c982e


[AIRFLOW-1054] Fix broken import in test_dag

Closes #2201 from
r39132/fix_broken_import_on_test_dag

(cherry picked from commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640)


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/68b1c982
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/68b1c982
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/68b1c982

Branch: refs/heads/v1-8-test
Commit: 68b1c982e048878ec9dd658072c147e4341bf2c2
Parents: 5eb3335
Author: Siddharth Anand 
Authored: Mon Apr 3 13:10:51 2017 -0700
Committer: Chris Riccomini 
Committed: Mon Apr 3 13:11:42 2017 -0700

--
 dags/test_dag.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/68b1c982/dags/test_dag.py
--
diff --git a/dags/test_dag.py b/dags/test_dag.py
index db0b648..f2a9f6a 100644
--- a/dags/test_dag.py
+++ b/dags/test_dag.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+from airflow import utils
 from airflow import DAG
 from airflow.operators.dummy_operator import DummyOperator
 from datetime import datetime, timedelta
@@ -24,7 +25,7 @@ DAG_NAME = 'test_dag_v1'
 default_args = {
 'owner': 'airflow',
 'depends_on_past': True,
-'start_date': airflow.utils.dates.days_ago(2)
+'start_date': utils.dates.days_ago(2)
 }
 dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *', 
default_args=default_args)
 



[jira] [Commented] (AIRFLOW-1054) Fix broken import on test_dag

2017-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954089#comment-15954089
 ] 

ASF subversion and git services commented on AIRFLOW-1054:
--

Commit c64e876bd50eeb6c9e2600ac9d832c05eb5e9640 in incubator-airflow's branch 
refs/heads/master from Siddharth Anand
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=c64e876 ]

[AIRFLOW-1054] Fix broken import in test_dag

Closes #2201 from
r39132/fix_broken_import_on_test_dag


> Fix broken import on test_dag
> -
>
> Key: AIRFLOW-1054
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1054
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Minor
> Fix For: 1.8.1
>
>
> Fixes the issue below
> {noformat}
> [2017-03-29 18:08:54,355] [30387] {models.py:271} ERROR - Failed to import: 
> /Users/siddharth/Projects/airflow/dags/test_dag.py
> Traceback (most recent call last):
>   File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in 
> process_file
> m = imp.load_source(mod_name, filepath)
>   File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in 
> 
> 'start_date': airflow.utils.dates.days_ago(2)
> NameError: name 'airflow' is not defined
> [2017-03-29 18:08:54,406] [30388] {models.py:172} INFO - Filling up the 
> DagBag from /Users/siddharth/Projects/airflow/dags
> [2017-03-29 18:08:54,408] [30388] {models.py:271} ERROR - Failed to import: 
> /Users/siddharth/Projects/airflow/dags/test_dag.py
> Traceback (most recent call last):
>   File "/Users/siddharth/Projects/airflow/airflow/models.py", line 268, in 
> process_file
> m = imp.load_source(mod_name, filepath)
>   File "/Users/siddharth/Projects/airflow/dags/test_dag.py", line 27, in 
> 
> 'start_date': airflow.utils.dates.days_ago(2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1054] Fix broken import in test_dag

2017-04-03 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/master daa281c03 -> c64e876bd


[AIRFLOW-1054] Fix broken import in test_dag

Closes #2201 from
r39132/fix_broken_import_on_test_dag


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c64e876b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c64e876b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c64e876b

Branch: refs/heads/master
Commit: c64e876bd50eeb6c9e2600ac9d832c05eb5e9640
Parents: daa281c
Author: Siddharth Anand 
Authored: Mon Apr 3 13:10:51 2017 -0700
Committer: Chris Riccomini 
Committed: Mon Apr 3 13:10:51 2017 -0700

--
 dags/test_dag.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/c64e876b/dags/test_dag.py
--
diff --git a/dags/test_dag.py b/dags/test_dag.py
index db0b648..f2a9f6a 100644
--- a/dags/test_dag.py
+++ b/dags/test_dag.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+from airflow import utils
 from airflow import DAG
 from airflow.operators.dummy_operator import DummyOperator
 from datetime import datetime, timedelta
@@ -24,7 +25,7 @@ DAG_NAME = 'test_dag_v1'
 default_args = {
 'owner': 'airflow',
 'depends_on_past': True,
-'start_date': airflow.utils.dates.days_ago(2)
+'start_date': utils.dates.days_ago(2)
 }
 dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *', 
default_args=default_args)
 



[jira] [Commented] (AIRFLOW-1055) airflow/jobs.py:create_dag_run() exception for @once dag when catchup = False

2017-04-03 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954080#comment-15954080
 ] 

Bolke de Bruin commented on AIRFLOW-1055:
-

Sid,

I don't think this is in create_dag_run() (the traceback doe not list that 
function) and it only happens when a DAG with a @once schedule has a sla 
associated with it. Which is not very common.

I don't think this should be a blocker (but nevertheless a fix would be very 
nice)

[~criccomini]

> airflow/jobs.py:create_dag_run() exception for @once dag when catchup = False
> -
>
> Key: AIRFLOW-1055
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1055
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Blocker
>  Labels: dagrun, once, scheduler, sla
> Fix For: 1.8.1
>
>
> Getting following exception 
> {noformat}
> [2017-03-19 20:16:25,786] {jobs.py:354} DagFileProcessor2638 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 346, in helper
> pickle_dags)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1581, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1175, in _process_dags
> self.manage_slas(dag)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 595, in manage_slas
> while dttm < datetime.now():
> TypeError: can't compare datetime.datetime to NoneType
> {noformat}
> Exception is in airflow/jobs.py:manage_slas() :
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L595
> {code}
> ts = datetime.now()
> SlaMiss = models.SlaMiss
> for ti in max_tis:
> task = dag.get_task(ti.task_id)
> dttm = ti.execution_date
> if task.sla:
> dttm = dag.following_schedule(dttm)
>   >>>   while dttm < datetime.now():  <<< here
> following_schedule = dag.following_schedule(dttm)
> if following_schedule + task.sla < datetime.now():
> session.merge(models.SlaMiss(
> task_id=ti.task_id,
> {code}
> It seems that dag.following_schedule() returns None for @once dag?
> Here's how dag is defined:
> {code}
> import datetime as dt
> import airflow
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime, timedelta
> def sla_alert_func(dag, task_list, blocking_task_list, slas, blocking_tis):
> print('Executing SLA miss callback')
> now = datetime.now()
> now_to_the_hour = now.replace(hour=now.time().hour, minute=0, second=0, 
> microsecond=0)
> START_DATE = now_to_the_hour + timedelta(hours=-3)
> DAG_NAME = 'manage_sla_once_dag'
> default_args = {
> 'owner': 'sanand',
> 'depends_on_past': False,
> 'start_date': START_DATE,
> 'sla': timedelta(hours=2)
> }
> dag = DAG(
> dag_id = 'manage_sla_once_dag',
> default_args = default_args,
> catchup = False,  
> schedule_interval = '@once',
> sla_miss_callback = sla_alert_func
> )
> task1 = DummyOperator(task_id='task1', dag=dag)
> {code}
> This issue works if  catchup = True. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE

2017-04-03 Thread Arthur Wiedmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arthur Wiedmer resolved AIRFLOW-1007.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2184
[https://github.com/apache/incubator-airflow/pull/2184]

> Jinja sandbox is vulnerable to RCE
> --
>
> Key: AIRFLOW-1007
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1007
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Right now, the jinja template functionality in chart_data takes arbitrary 
> strings and executes them. We should use the sandbox functionality to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE

2017-04-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954034#comment-15954034
 ] 

ASF subversion and git services commented on AIRFLOW-1007:
--

Commit daa281c0364609d6812921123cf47e4118b40484 in incubator-airflow's branch 
refs/heads/master from [~saguziel]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=daa281c ]

[AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint

Right now, users can put in arbitrary strings into
the chart_data
endpoint, and execute arbitrary code using the
chart_data endpoint. By
using literal_eval and
ImmutableSandboxedEnvironment, we can reduce RCE.

Right now, users can put in arbitrary strings into
the chart_data
endpoint, and execute arbitrary code using the
chart_data endpoint. By
using literal_eval and
ImmutableSandboxedEnvironment, we can prevent
RCE.

Dear Airflow maintainers,

Please accept this PR. I understand that it will
not be reviewed until I have checked off all the
steps below!

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "[AIRFLOW-XXX] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-1007

### Description
- [x] I changed Jinja to use the
ImmutableSandboxedEnvironment, and used
literal_eval, to limit the amount of RCE.

### Tests
- [x] My PR adds the following unit tests:
SecurityTest chart_data tests

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

to: aoen plypaul artwr  bolkedebruin

Closes #2184 from saguziel/aguziel-jinja-2


> Jinja sandbox is vulnerable to RCE
> --
>
> Key: AIRFLOW-1007
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1007
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Right now, the jinja template functionality in chart_data takes arbitrary 
> strings and executes them. We should use the sandbox functionality to prevent 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint

2017-04-03 Thread arthur
Repository: incubator-airflow
Updated Branches:
  refs/heads/master b55f41f2c -> daa281c03


[AIRFLOW-1007] Use Jinja sandbox for chart_data endpoint

Right now, users can put in arbitrary strings into
the chart_data
endpoint, and execute arbitrary code using the
chart_data endpoint. By
using literal_eval and
ImmutableSandboxedEnvironment, we can reduce RCE.

Right now, users can put in arbitrary strings into
the chart_data
endpoint, and execute arbitrary code using the
chart_data endpoint. By
using literal_eval and
ImmutableSandboxedEnvironment, we can prevent
RCE.

Dear Airflow maintainers,

Please accept this PR. I understand that it will
not be reviewed until I have checked off all the
steps below!

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "[AIRFLOW-XXX] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-1007

### Description
- [x] I changed Jinja to use the
ImmutableSandboxedEnvironment, and used
literal_eval, to limit the amount of RCE.

### Tests
- [x] My PR adds the following unit tests:
SecurityTest chart_data tests

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

to: aoen plypaul artwr  bolkedebruin

Closes #2184 from saguziel/aguziel-jinja-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/daa281c0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/daa281c0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/daa281c0

Branch: refs/heads/master
Commit: daa281c0364609d6812921123cf47e4118b40484
Parents: b55f41f
Author: Alex Guziel 
Authored: Mon Apr 3 12:16:00 2017 -0700
Committer: Arthur Wiedmer 
Committed: Mon Apr 3 12:16:00 2017 -0700

--
 airflow/www/views.py |  7 ---
 tests/core.py| 38 ++
 2 files changed, 42 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/daa281c0/airflow/www/views.py
--
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 0def0a9..a9bab31 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -43,7 +43,7 @@ from flask_admin.tools import iterdecode
 from flask_login import flash
 from flask._compat import PY2
 
-import jinja2
+from jinja2.sandbox import ImmutableSandboxedEnvironment
 import markdown
 import nvd3
 
@@ -328,8 +328,9 @@ class Airflow(BaseView):
 request_dict = {k: request.args.get(k) for k in request.args}
 args.update(request_dict)
 args['macros'] = macros
-sql = jinja2.Template(chart.sql).render(**args)
-label = jinja2.Template(chart.label).render(**args)
+sandbox = ImmutableSandboxedEnvironment()
+sql = sandbox.from_string(chart.sql).render(**args)
+label = sandbox.from_string(chart.label).render(**args)
 payload['sql_html'] = Markup(highlight(
 sql,
 lexers.SqlLexer(),  # Lexer call

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/daa281c0/tests/core.py
--
diff --git a/tests/core.py b/tests/core.py
index bd52d19..997bb42 100644
--- a/tests/core.py
+++ b/tests/core.py
@@ -63,6 +63,8 @@ from airflow.utils.logging import LoggingMixin
 from lxml import html
 from airflow.exceptions import AirflowException
 from airflow.configuration import AirflowConfigException, run_command
+from jinja2.sandbox import SecurityError
+from jinja2 import UndefinedError
 
 import six
 
@@ -1469,6 +1471,42 @@ class SecurityTests(unittest.TestCase):
 response = self.app.get("/admin/log", follow_redirects=True)
 self.assertIn(bleach.clean("alert(123456)"), 
response.data.decode('UTF-8'))
 
+def test_chart_data_template(self):
+"""Protect chart_data from being able to do RCE."""
+session = settings.Session()
+Chart = models.Chart
+chart1 = Chart(
+label='insecure_chart',
+conn_id='airflow_db',
+chart_type='bar',
+sql="SELECT {{ ''.__class__.__mro__[1].__subclasses__() }}"

[jira] [Created] (AIRFLOW-1063) A manually-created DAG run can prevent a scheduled run to be created

2017-04-03 Thread Vitor Baptista (JIRA)
Vitor Baptista created AIRFLOW-1063:
---

 Summary: A manually-created DAG run can prevent a scheduled run to 
be created
 Key: AIRFLOW-1063
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1063
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: Airflow 1.7.1.3
Reporter: Vitor Baptista


I manually created a DAG Run with the {{execution_date}} as {{2017-03-01 
00:00:00}} on a monthly-recurrent DAG. After a while, I noticed that the 
scheduled run was never created and checked the scheduler's logs, finding this 
traceback:

{quote}
Process Process-475397:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 664, in 
_do_dags
dag = dagbag.get_dag(dag.dag_id)
  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 188, in 
get_dag
orm_dag = DagModel.get_current(root_dag_id)
  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 2320, 
in get_current
obj = session.query(cls).filter(cls.dag_id == dag_id).first()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2690, in first
ret = list(self[0:1])
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2482, in __getitem__
return list(res)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2790, in __iter__
return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2811, in _execute_and_instances
close_with_result=True)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2820, in _get_bind_args
**kw
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 
2802, in _connection_from_session
conn = self.session.connection(**kw)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
966, in connection
execution_options=execution_options)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
971, in _connection_for_bind
engine, execution_options)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
382, in _connection_for_bind
self._assert_active()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
276, in _assert_active
% self._rollback_exception
InvalidRequestError: This Session's transaction has been rolled back due to a 
previous exception during flush. To begin a new transaction with this Session, 
first issue Session.rollback(). Original exception was: 
(psycopg2.IntegrityError)
 duplicate key value violates unique constraint 
"dag_run_dag_id_execution_date_key"
DETAIL:  Key (dag_id, execution_date)=(nct, 2017-03-01 00:00:00) already exists.
 [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
state, run_id, external_trigger, conf) VALUES (%(dag_id)s, %(execution_date)s, 
%(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(external_trigger)s, 
%(conf)s)
 RETURNING dag_run.id'] [parameters: {'end_date': None, 'run_id': 
u'scheduled__2017-03-01T00:00:00', 'execution_date': datetime.datetime(2017, 3, 
1, 0, 0), 'external_trigger': False, 'state': u'running', 'conf': None, 
'start_date': dateti
me.datetime(2017, 4, 3, 13, 48, 39, 168456), 'dag_id': 'nct'}]
{quote}

The problem is that the {{dag_runs}} table require the {{(dag_id, 
execution_date)}} pair to be unique, so the scheduler was stuck in a loop where 
it tried creating a new scheduled dag run but failed, as I had already created 
one on the same {{execution_date}}. This was surprising. As a user, I would 
expect that it would either schedule the run normally, even if there's a manual 
one on the same date, or maybe it would skip that execution date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1018) Scheduler DAG processes can not log to stdout

2017-04-03 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1018:

Priority: Critical  (was: Blocker)

> Scheduler DAG processes can not log to stdout
> -
>
> Key: AIRFLOW-1018
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1018
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
> Environment: Airflow 1.8.0
>Reporter: Vincent Poulain
>Priority: Critical
> Fix For: 1.8.1
>
>
> Each DAG has its own log file for the scheduler and we can specify the 
> directory with child_process_log_directory param. 
> Unfortunately we can not change device / by specifying /dev/stdout for 
> example. That is very useful when we execute Airflow in a container.
> When we specify /dev/stdout it raises:
> "OSError: [Errno 20] Not a directory: '/dev/stdout/2017-03-19'"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1062) DagRun#find returns wrong result if external_trigger=False is specified

2017-04-03 Thread Kengo Seki (JIRA)
Kengo Seki created AIRFLOW-1062:
---

 Summary: DagRun#find returns wrong result if 
external_trigger=False is specified
 Key: AIRFLOW-1062
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1062
 Project: Apache Airflow
  Issue Type: Bug
  Components: models
Reporter: Kengo Seki
Assignee: Kengo Seki


Given the following record,

{code}
sqlite> select id, external_trigger from dag_run;
1|1
sqlite>
{code}

the following code should return no result,

{code}
In [1]: from airflow import models

In [2]: models.DagRun.find(external_trigger=False)
{code}

... but an externally-triggered record is returned erroneously.

{code}
Out[2]: []
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)