[jira] [Created] (AIRFLOW-779) Task should fail with specific message if task instance is deleted

2017-01-20 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-779:
---

 Summary: Task should fail with specific message if task instance 
is deleted
 Key: AIRFLOW-779
 URL: https://issues.apache.org/jira/browse/AIRFLOW-779
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel
Priority: Trivial


Right now, when a task instance is deleted in the DB (as is in the UI task 
instances page), it will fail with a None have the state field accessed. We 
should handle this explicitly and give an explicit message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-799) Workers re-set queue column in task_instance table

2017-01-24 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-799:
---

 Summary: Workers re-set queue column in task_instance table
 Key: AIRFLOW-799
 URL: https://issues.apache.org/jira/browse/AIRFLOW-799
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel
Priority: Minor


Right now, the scheduler uses the policy file to set the queue field in 
task_instance. Workers, when updating the state, will set the queue according 
to the DAG information, changing it from the result that would be from applying 
the policy file. This reduces auditability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-799) Workers re-set queue column in task_instance table

2017-01-24 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-799:

Assignee: (was: Alex Guziel)

> Workers re-set queue column in task_instance table
> --
>
> Key: AIRFLOW-799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-799
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Priority: Minor
>
> Right now, the scheduler uses the policy file to set the queue field in 
> task_instance. Workers, when updating the state, will set the queue according 
> to the DAG information, changing it from the result that would be from 
> applying the policy file. This reduces auditability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-836) The two endpoints /paused and queryview perform state-changing action over HTTP GET

2017-02-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-836:
---

 Summary: The two endpoints /paused and queryview perform 
state-changing action over HTTP GET
 Key: AIRFLOW-836
 URL: https://issues.apache.org/jira/browse/AIRFLOW-836
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


These two endpoints change state and allow HTTP GET, allowing CSRF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-837) Clear task in the UI should state to None rather than deleting DB row

2017-02-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-837:
---

 Summary: Clear task in the UI should state to None rather than 
deleting DB row
 Key: AIRFLOW-837
 URL: https://issues.apache.org/jira/browse/AIRFLOW-837
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-838) Race condition in LocalTaskJob

2017-02-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-838:
---

 Summary: Race condition in LocalTaskJob
 Key: AIRFLOW-838
 URL: https://issues.apache.org/jira/browse/AIRFLOW-838
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Priority: Minor


Right now, a LocalTaskJob will terminate if the state is not "running" but only 
if it has observed that the state was "running" before. This could lead to a 
situation in which it never terminates although the state is not "running" if 
it was from "running" to another state before it could be observed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-836) The paused endpoint is vulnerable to CSRF

2017-02-08 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-836:

Description: This endpoint uses GET and is state-changing which bad 
practice, and allows CSRF  (was: These two endpoints change state and allow 
HTTP GET, allowing CSRF)
Summary: The paused endpoint is vulnerable to CSRF  (was: The two 
endpoints /paused and queryview perform state-changing action over HTTP GET)

> The paused endpoint is vulnerable to CSRF
> -
>
> Key: AIRFLOW-836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-836
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> This endpoint uses GET and is state-changing which bad practice, and allows 
> CSRF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-836) The paused endpoint is vulnerable to CSRF

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-836:

Description: These endpoints use GET and are state-changing which is bad 
practice, and allows CSRF  (was: This endpoint uses GET and is state-changing 
which bad practice, and allows CSRF)

> The paused endpoint is vulnerable to CSRF
> -
>
> Key: AIRFLOW-836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-836
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> These endpoints use GET and are state-changing which is bad practice, and 
> allows CSRF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-836) The paused and queryview endpoints are vulnerable to CSRF

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-836:

Summary: The paused and queryview endpoints are vulnerable to CSRF  (was: 
The paused endpoint is vulnerable to CSRF)

> The paused and queryview endpoints are vulnerable to CSRF
> -
>
> Key: AIRFLOW-836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-836
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> These endpoints use GET and are state-changing which is bad practice, and 
> allows CSRF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-857) Use unittest.assert instead of assert

2017-02-09 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-857:
---

 Summary: Use unittest.assert instead of assert
 Key: AIRFLOW-857
 URL: https://issues.apache.org/jira/browse/AIRFLOW-857
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel
Priority: Minor


Right now, unit tests do something like
`assert x == y` 
which gives less descriptive output in case of failure than
`assertEqual(x, y)`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-696) Monitor queue lengths in CeleryExecutor

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel closed AIRFLOW-696.
---
Resolution: Won't Fix

> Monitor queue lengths in CeleryExecutor
> ---
>
> Key: AIRFLOW-696
> URL: https://issues.apache.org/jira/browse/AIRFLOW-696
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Monitor queue lengths for CeleryExecutor. This will make it easier to see how 
> much of the cluster is being used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-836) The paused and queryview endpoints are vulnerable to CSRF

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-836 started by Alex Guziel.
---
> The paused and queryview endpoints are vulnerable to CSRF
> -
>
> Key: AIRFLOW-836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-836
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> These endpoints use GET and are state-changing which is bad practice, and 
> allows CSRF



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-857) Use unittest.assert instead of assert

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-857 started by Alex Guziel.
---
> Use unittest.assert instead of assert
> -
>
> Key: AIRFLOW-857
> URL: https://issues.apache.org/jira/browse/AIRFLOW-857
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Minor
>
> Right now, unit tests do something like
> `assert x == y` 
> which gives less descriptive output in case of failure than
> `assertEqual(x, y)`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-679) Stop concurrent task instances from running due to race conditions

2017-02-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-679.
-
Resolution: Fixed

> Stop concurrent task instances from running due to race conditions
> --
>
> Key: AIRFLOW-679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-679
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Minor
>
> Right now, multiple copies of the same task instance can run if someone 
> clicks on the UI multiple times. To fix this, I propose two things:
> 1) record hostname and pid in TaskInstance table, then when heartbeating, 
> only continue running if it matches



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-861) Pickle_info endpoint is unauthenticated

2017-02-10 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-861:
---

 Summary: Pickle_info endpoint is unauthenticated
 Key: AIRFLOW-861
 URL: https://issues.apache.org/jira/browse/AIRFLOW-861
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now the admin/airflow/pickle_info is unauthenticated, allowing anyone to 
see the list of dags



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-861) Pickle_info endpoint is unauthenticated

2017-02-10 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-861 started by Alex Guziel.
---
> Pickle_info endpoint is unauthenticated
> ---
>
> Key: AIRFLOW-861
> URL: https://issues.apache.org/jira/browse/AIRFLOW-861
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now the admin/airflow/pickle_info is unauthenticated, allowing anyone 
> to see the list of dags



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-857) Use unittest.assert instead of assert

2017-02-10 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-857.
-
Resolution: Fixed

> Use unittest.assert instead of assert
> -
>
> Key: AIRFLOW-857
> URL: https://issues.apache.org/jira/browse/AIRFLOW-857
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Minor
>
> Right now, unit tests do something like
> `assert x == y` 
> which gives less descriptive output in case of failure than
> `assertEqual(x, y)`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-900) Double run job should not terminate the existing running job

2017-02-23 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-900:
---

 Summary: Double run job should not terminate the existing running 
job
 Key: AIRFLOW-900
 URL: https://issues.apache.org/jira/browse/AIRFLOW-900
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel


Right now, jobs seem to get run an hour after they start and due to the logic, 
both get killed. Since we can't isolate the cause, we improve the logic here to 
only kill the new job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-924) (Named)HivePartitionSensor broken if hook attr not set

2017-02-28 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-924:
---

 Summary: (Named)HivePartitionSensor broken if hook attr not set
 Key: AIRFLOW-924
 URL: https://issues.apache.org/jira/browse/AIRFLOW-924
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the import statement for (Named)HivePartitionSensor uses the wrong 
path



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-921) 1.8.0rc Issues

2017-02-28 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888782#comment-15888782
 ] 

Alex Guziel commented on AIRFLOW-921:
-

Also add this
https://issues.apache.org/jira/browse/AIRFLOW-924

> 1.8.0rc Issues
> --
>
> Key: AIRFLOW-921
> URL: https://issues.apache.org/jira/browse/AIRFLOW-921
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Dan Davydov
>Priority: Blocker
>
> These are the pending issues for the Airflow 1.8.0 release:
> Blockers:
> [~bolke] please merge into the next RC and then remove from the list the 
> issues below once they are merged into master
> - Sub-tasks linked in this JIRA
> - Skipped tasks potentially cause a dagrun to be marked as failure/success 
> prematurely (one theory is that this is the same issue as 
> https://issues.apache.org/jira/browse/AIRFLOW-872)
> Other Issues:
> - High DB Load (~8x more than 1.7) - We are still investigating but it's 
> probably not a blocker for the release - (Theories: Might need execution_date 
> index on dag_run (based on slow process list) OR it might be this query which 
> is long running SELECT union_ti.dag_id AS union_ti_dag_id, union_ti.state AS 
> union_ti_state, count( *) AS count_1
> FR))
> [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-921) 1.8.0rc Issues

2017-03-02 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893094#comment-15893094
 ] 

Alex Guziel commented on AIRFLOW-921:
-

The high DB load looks pretty periodic on our end (ie comes and go every 10 
minutes). I did some profiling and I found a lot of areas to fix so I'm working 
on those.

> 1.8.0rc Issues
> --
>
> Key: AIRFLOW-921
> URL: https://issues.apache.org/jira/browse/AIRFLOW-921
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Dan Davydov
>Priority: Blocker
>
> These are the pending issues for the Airflow 1.8.0 release:
> Blockers:
> [~bolke] please merge into the next RC and then remove from the list the 
> issues below once they are merged into master
> - Sub-tasks linked in this JIRA
> - Skipped tasks potentially cause a dagrun to be marked as failure/success 
> prematurely (one theory is that this is the same issue as 
> https://issues.apache.org/jira/browse/AIRFLOW-872)
> Other Issues:
> - High DB Load (~8x more than 1.7) - We are still investigating but it's 
> probably not a blocker for the release - (Theories: Might need execution_date 
> index on dag_run (based on slow process list) OR it might be this query which 
> is long running SELECT union_ti.dag_id AS union_ti_dag_id, union_ti.state AS 
> union_ti_state, count( *) AS count_1
> FR))
> - Front page loading time is a lot slower
> [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-937) task_stats makes extremely large prepared query

2017-03-02 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-937:
---

 Summary: task_stats makes extremely large prepared query
 Key: AIRFLOW-937
 URL: https://issues.apache.org/jira/browse/AIRFLOW-937
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the task_stats endpoint makes a few extremely long queries. We can 
give up some accuracy and get huge speed wins



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-938) SQLAlchemy query in task_stats should be compatible with Postgres

2017-03-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-938:
---

 Summary: SQLAlchemy query in task_stats should be compatible with 
Postgres
 Key: AIRFLOW-938
 URL: https://issues.apache.org/jira/browse/AIRFLOW-938
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, we check for truthiness by comparing to 1, which is not portable and 
does not work on pgsql



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-939) Add swp files to gitignore

2017-03-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-939:
---

 Summary: Add swp files to gitignore
 Key: AIRFLOW-939
 URL: https://issues.apache.org/jira/browse/AIRFLOW-939
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-961) LocalTaskJob onkill should get run on TERM

2017-03-09 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-961:
---

 Summary: LocalTaskJob onkill should get run on TERM
 Key: AIRFLOW-961
 URL: https://issues.apache.org/jira/browse/AIRFLOW-961
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the on_kill happens in the finally block, when it should also be 
handled in a SIGTERM



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-970) Latest runs on homepage should load async and in batch

2017-03-10 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-970:
---

 Summary: Latest runs on homepage should load async and in batch
 Key: AIRFLOW-970
 URL: https://issues.apache.org/jira/browse/AIRFLOW-970
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


The latest_dag_run column on the homepage makes one query for each dag and does 
it synchronously. We should do the opposite.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-976) Mark success running task causes it to fail

2017-03-13 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel reassigned AIRFLOW-976:
---

Assignee: Alex Guziel

> Mark success running task causes it to fail
> ---
>
> Key: AIRFLOW-976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-976
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Assignee: Alex Guziel
>
> Marking success on a running task in the UI causes it to fail.
> Expected Behavior:
> Task instance is killed and marked as successful
> Actual Behavior:
> Task instance is killed and marked as failed
> [~saguziel] [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-928) Same {task,execution_date} run multiple times in worker when using CeleryExecutor

2017-03-14 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925006#comment-15925006
 ] 

Alex Guziel commented on AIRFLOW-928:
-

[~bolke] Did see a double trigger one hour after, will see if related.

> Same {task,execution_date} run multiple times in worker when using 
> CeleryExecutor
> -
>
> Key: AIRFLOW-928
> URL: https://issues.apache.org/jira/browse/AIRFLOW-928
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: Airflow 1.7.1.3
> Environment: Docker
>Reporter: Uri Shamay
> Attachments: airflow.log, dag_runs.png, dummy_dag.py, processes.list, 
> rabbitmq.queue, scheduler.log, worker_2.log, worker.log
>
>
> Hi,
> When using with Airflow with CeleryExecutor, both RabbitMQ && Redis I tested, 
> I see that when workers are down, the scheduler run each period of time 
> **append** to the same key of {task,execution_date} in the broker, the same 
> {task,execution_date}, what means is that if workers are down/can't connect 
> to broker for few hours, I got in the broker thousands of same executions.
> In my scenario I have just one dummy dag to run with dag_concurrency of 4,
> I expected in that scenario that broker will hold just 4 messages, and the 
> scheduler shouldn't queuing another and another and another for same {task, 
> execution_date}
> What happened is that when workers start to consume messages, they got 
> thousands of tasks for just 4 tasks, and when they trying to write to 
> database for task_instances - there are errors of integrity while such 
> {task,execution_date} already exist.
> Note that in my test after let Airflow to consume works of just one dag 
> without workers for few hours, then I connect to the broker outside by custom 
> client and retrieve the messages - there was thousands of same 
> {dag,execution_date}.
> Even if the case is that there are a lot of dag works on the same key that 
> run just one instance when poll thousands - it's still bad behavior, better 
> to produce one message to the queue, and if some timeout occurred (like 
> visibility), to set the key - and not append to it. 
> What happened is when workers are down for long time and have a lot of jobs 
> that scheduled each minute, when workers come back, they got thousands of 
> same jobs => cause to the worker to run the same dags a lot of times => a lot 
> of wasted python runners => utilized all celery worker threads/processes => 
> starve all other jobs till he understood that need just one instance from all 
> same.
> Attached files:
> 1. airflow.log - this is the task log, you can see that few instances 
> processes of same {task,execution_date} write to the same log file.
> 2. worker.log - this is the worker log, you can see that worker trying to run 
> same {task,execution_date} multiple times + the errors from the database 
> integrity that said that those tasks on those dates already exists.
> 3. scheduler.log to show that scheduler decided to send again and again and 
> again infinitely the same {job,execution_date}
> 4. the dummy_dag.py of the test
> 5. rabbitmq.queue - show that after 5 minutes the broker queue contains 40 
> messages of same 4 {job,execution_date}
> 6. dag_runs.png - show that there are only 4 jobs that need to be run, while 
> there are much more messages in the queue
> 7. processes.list - show that when start worker and doing: ps -ef | grep 
> "airflow run", it show that worker run multiple times same 
> {job,execution_date}
> 8. worker_2.log - show that when worker started - the same 
> {job,execution_date} keys shown multiple times
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-867) Tons of unit tests are ignored

2017-03-15 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925649#comment-15925649
 ] 

Alex Guziel commented on AIRFLOW-867:
-

I wonder if this is bad news or good news

> Tons of unit tests are ignored
> --
>
> Key: AIRFLOW-867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: George Sakkis
>Assignee: George Sakkis
>
> I was poking around in tests and found out that lots of tests are not 
> discovered by nosetests:
> {noformat}
> $ nosetests -q --collect-only 
> --
> Ran 254 tests in 0.948s
> $ grep -R 'def test' tests/ | wc -l
> 360
> {noformat}
> Initially I thought it might be related to not having installed all extra 
> dependencies but it turns out it's because apparently nosetests expects 
> explicit import of the related modules instead of discovering them 
> automatically (like py.test). For example, when adding an {{from 
> .ti_deps.deps.runnable_exec_date_dep import *}} in {{tests/__init__.py}} it 
> finds 260 tests, while when commenting out all imports in this module it 
> finds only 15!
> h4. Possible options
> * Quick fix: Add the necessary missing "import *" to discover all current 
> tests.
> * Better fix: Rename all test modules to start with "test_"
>   -Move from nosetests to py.test and get rid of the ugly error-prone 'import 
> *' hack.-



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-991) Mark_success while a task is running leads to failure state

2017-03-15 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-991:
---

 Summary: Mark_success while a task is running leads to failure 
state
 Key: AIRFLOW-991
 URL: https://issues.apache.org/jira/browse/AIRFLOW-991
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-991) Mark_success while a task is running leads to failure state

2017-03-17 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel closed AIRFLOW-991.
---
Resolution: Duplicate

> Mark_success while a task is running leads to failure state
> ---
>
> Key: AIRFLOW-991
> URL: https://issues.apache.org/jira/browse/AIRFLOW-991
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-938) SQLAlchemy query in task_stats should be compatible with Postgres

2017-03-17 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-938.
-
Resolution: Fixed

> SQLAlchemy query in task_stats should be compatible with Postgres
> -
>
> Key: AIRFLOW-938
> URL: https://issues.apache.org/jira/browse/AIRFLOW-938
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now, we check for truthiness by comparing to 1, which is not portable 
> and does not work on pgsql



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-861) Pickle_info endpoint is unauthenticated

2017-03-17 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-861.
-
Resolution: Fixed

> Pickle_info endpoint is unauthenticated
> ---
>
> Key: AIRFLOW-861
> URL: https://issues.apache.org/jira/browse/AIRFLOW-861
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now the admin/airflow/pickle_info is unauthenticated, allowing anyone 
> to see the list of dags



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE

2017-03-17 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1007:


 Summary: Jinja sandbox is vulnerable to RCE
 Key: AIRFLOW-1007
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1007
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the jinja template functionality in chart_data takes arbitrary 
strings and executes them. We should use the sandbox functionality to prevent 
this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1035) Exponential backoff retry logic should use 2 as base

2017-03-23 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1035:


 Summary: Exponential backoff retry logic should use 2 as base
 Key: AIRFLOW-1035
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1035
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the exponential backoff logic computes it as 
(retry_period) ^ (retry_number) instead of retry_period * 2 ^ retry_number. 
See https://en.wikipedia.org/wiki/Exponential_backoff



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1036) Exponential backoff should use randomization

2017-03-23 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1036:


 Summary: Exponential backoff should use randomization
 Key: AIRFLOW-1036
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1036
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


This prevents the thundering herd problem. I think with the current way this is 
used, we would need to use some hashing function based on some subset of the 
dag_run, task_id, dag_id, and execution_date to emulate the RNG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1038) Specify celery serializers explicitly

2017-03-24 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1038:


 Summary: Specify celery serializers explicitly
 Key: AIRFLOW-1038
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1038
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Celery 3->4 upgrade changes the default task and result serializer from pickle 
to json. Pickle is faster and supports more types 
http://docs.celeryproject.org/en/latest/userguide/calling.html
This also causes issues when different versions of celery are running on 
different hosts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1038) Specify celery serializers explicitly and pin version

2017-03-24 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-1038:
-
Summary: Specify celery serializers explicitly and pin version  (was: 
Specify celery serializers explicitly)

> Specify celery serializers explicitly and pin version
> -
>
> Key: AIRFLOW-1038
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1038
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Celery 3->4 upgrade changes the default task and result serializer from 
> pickle to json. Pickle is faster and supports more types 
> http://docs.celeryproject.org/en/latest/userguide/calling.html
> This also causes issues when different versions of celery are running on 
> different hosts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-1036) Exponential backoff should use randomization

2017-03-27 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel reassigned AIRFLOW-1036:


Assignee: (was: Alex Guziel)

> Exponential backoff should use randomization
> 
>
> Key: AIRFLOW-1036
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1036
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>
> This prevents the thundering herd problem. I think with the current way this 
> is used, we would need to use some hashing function based on some subset of 
> the dag_run, task_id, dag_id, and execution_date to emulate the RNG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1047) Airflow logs vulnerable to XSS

2017-03-27 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1047:


 Summary: Airflow logs vulnerable to XSS
 Key: AIRFLOW-1047
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1047
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Navigating to a page with dag_id param specified as a html tag leads to that 
tag being rendered due to using Markup tag (which makes html be labeled as safe)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1059) Reset_state_for_orphaned_task should operate in batch for the scheduler

2017-03-30 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1059:


 Summary: Reset_state_for_orphaned_task should operate in batch for 
the scheduler
 Key: AIRFLOW-1059
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1059
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Scheduler startup is very slow due to resetting state making a query for each 
dag run. We should be able to do this in a constant number of queries which 
will increase scheduler startup time significantly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1064) TaskInstanceModelView is slow

2017-04-03 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1064:


 Summary: TaskInstanceModelView is slow
 Key: AIRFLOW-1064
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1064
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Due to a bad query (a full table scan), the TaskInstanceModelView is very slow. 
Adding an index is costly, and job_id is a good approximation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1069) Pool slots not obeyed

2017-04-04 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1069:


 Summary: Pool slots not obeyed
 Key: AIRFLOW-1069
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1069
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, the decrement is done in an incorrect way that is not preserved 
across iterations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-1069) Pool slots not obeyed

2017-04-04 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel closed AIRFLOW-1069.

Resolution: Invalid

> Pool slots not obeyed
> -
>
> Key: AIRFLOW-1069
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1069
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now, the decrement is done in an incorrect way that is not preserved 
> across iterations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1074) Do not count queued tasks in scheduler concurrency check

2017-04-05 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1074:


 Summary: Do not count queued tasks in scheduler concurrency check
 Key: AIRFLOW-1074
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1074
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1077) Subdags can deadlock

2017-04-05 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1077:


 Summary: Subdags can deadlock
 Key: AIRFLOW-1077
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1077
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel


Given a concurrency of n, if all n running tasks are Subdags, the subdags block 
any of their tasks from executing, leading to deadlock



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1078) Latest_runs endpoint broken in old flask versions

2017-04-05 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1078:


 Summary: Latest_runs endpoint broken in old flask versions
 Key: AIRFLOW-1078
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1078
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1081) Task duration page is slow

2017-04-06 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1081:


 Summary: Task duration page is slow
 Key: AIRFLOW-1081
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1081
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


It makes a number of queries proportional to the data size, instead of just 2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running

2017-04-11 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1104:


 Summary: Concurrency check in scheduler should count queued tasks 
as well as running
 Key: AIRFLOW-1104
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1104
 Project: Apache Airflow
  Issue Type: Bug
 Environment: see https://github.com/apache/incubator-airflow/pull/2221
"Tasks with the QUEUED state should also be counted below, but for now we 
cannot count them. This is because there is no guarantee that queued tasks in 
failed dagruns will or will not eventually run and queued tasks that will never 
run will consume slots and can stall a DAG. Once we can guarantee that all 
queued tasks in failed dagruns will never run (e.g. make sure that all 
running/newly queued TIs have running dagruns), then we can include QUEUED 
tasks here, with the constraint that they are in running dagruns."
Reporter: Alex Guziel
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1105) Consolidate airflow run "raw" and "local"

2017-04-11 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1105:


 Summary: Consolidate airflow run "raw" and "local"
 Key: AIRFLOW-1105
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1105
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1109) kill_process_tree should use KILL signal and log results

2017-04-12 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1109:


 Summary: kill_process_tree should use KILL signal and log results
 Key: AIRFLOW-1109
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1109
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-1109) kill_process_tree should use KILL signal and log results

2017-04-13 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel closed AIRFLOW-1109.

Resolution: Fixed

> kill_process_tree should use KILL signal and log results
> 
>
> Key: AIRFLOW-1109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1109
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1112) Log which pool is full in scheduler when pool slots are full

2017-04-14 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1112:


 Summary: Log which pool is full in scheduler when pool slots are 
full
 Key: AIRFLOW-1112
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1112
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1112) Log which pool is full in scheduler when pool slots are full

2017-04-14 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1112.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2242
[https://github.com/apache/incubator-airflow/pull/2242]

> Log which pool is full in scheduler when pool slots are full
> 
>
> Key: AIRFLOW-1112
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1112
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1131) DockerOperator jobs time out and get stuck in "running" forever

2017-04-20 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977160#comment-15977160
 ] 

Alex Guziel commented on AIRFLOW-1131:
--

Did you verify it was not actually running? Are you using celery_executor? The 
reload task actually also fails because of ```{models.py:1140} INFO - 
Dependencies not met for , dependency 'Task Instance Not Already Running' FAILED: 
Task is already running, it started on 2017-04-20 11:19:59.597425.``` so it 
never actually gets run. The original continues to run in our case.

> DockerOperator jobs time out and get stuck in "running" forever
> ---
>
> Key: AIRFLOW-1131
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1131
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
> Environment: Python 2.7.12
> git+git://github.com/apache/incubator-airflow.git@35e43f5067f4741640278b765c0e54e4fd45ffa3#egg=airflow[async,password,celery,crypto,postgres,hive,hdfs,jdbc]
>Reporter: Vitor Baptista
>
> With the following DAG and task:
> {code}
> import os
> from datetime import datetime, timedelta
> from airflow.models import DAG
> from airflow.operators.docker_operator import DockerOperator
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2017, 1, 1),
> 'retries': 3,
> 'retry_delay': timedelta(minutes=10),
> }
> dag = DAG(
> dag_id='smoke_test',
> default_args=default_args,
> max_active_runs=1,
> schedule_interval='@daily'
> )
> sleep_forever_task = DockerOperator(
> task_id='sleep_forever',
> dag=dag,
> image='alpine:latest',
> api_version=os.environ.get('DOCKER_API_VERSION', '1.23'),
> command='sleep {}'.format(60 * 60 * 24),
> )
> {code}
> When I run it, this is what I get:
> {code}
> *** Log file isn't local.
> *** Fetching here: 
> http://589ea17432ec:8793/log/smoke_test/sleep_forever/2017-04-18T00:00:00
> [2017-04-20 11:19:58,258] {models.py:172} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/smoke_test.py
> [2017-04-20 11:19:58,438] {base_task_runner.py:112} INFO - Running: ['bash', 
> '-c', u'airflow run smoke_test sleep_forever 2017-04-18T00:00:00 --job_id 
> 2537 --raw -sd DAGS_FOLDER/smoke_test.py']
> [2017-04-20 11:19:58,888] {base_task_runner.py:95} INFO - Subtask: 
> /usr/local/airflow/src/airflow/airflow/configuration.py:128: 
> DeprecationWarning: This method will be removed in future versions.  Use 
> 'parser.read_file()' instead.
> [2017-04-20 11:19:58,888] {base_task_runner.py:95} INFO - Subtask:   
> self.readfp(StringIO.StringIO(string))
> [2017-04-20 11:19:59,214] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,214] {__init__.py:56} INFO - Using executor 
> CeleryExecutor
> [2017-04-20 11:19:59,227] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,227] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-04-20 11:19:59,244] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,244] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2017-04-20 11:19:59,377] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,377] {models.py:172} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/smoke_test.py
> [2017-04-20 11:19:59,597] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,597] {models.py:1146} INFO - Dependencies all met for 
> 
> [2017-04-20 11:19:59,605] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,605] {models.py:1146} INFO - Dependencies all met for 
> 
> [2017-04-20 11:19:59,605] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,605] {models.py:1338} INFO - 
> [2017-04-20 11:19:59,606] {base_task_runner.py:95} INFO - Subtask: 
> 
> [2017-04-20 11:19:59,606] {base_task_runner.py:95} INFO - Subtask: Starting 
> attempt 1 of 4
> [2017-04-20 11:19:59,606] {base_task_runner.py:95} INFO - Subtask: 
> 
> [2017-04-20 11:19:59,606] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,620] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,620] {models.py:1362} INFO - Executing 
>  on 2017-04-18 00:00:00
> [2017-04-20 11:19:59,662] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-20 11:19:59,661] {docker_operator.py:132} INFO - Starting docker 
> container from image alpine:latest
> [2017-04-20 12:21:25,661] {models.py:172} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/smoke_test.py
> [2017-04-20 12:21:25,809] {base_task_runner.py:112} I

[jira] [Created] (AIRFLOW-1133) More tasks than the concurrency limit can run

2017-04-20 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1133:


 Summary: More tasks than the concurrency limit can run
 Key: AIRFLOW-1133
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1133
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel


There are two ways to put checks on dag concurrency:
1) The scheduler not queueing too many tasks (via checking the amount of tasks 
running)
2) The worker checking that not too many tasks are running (via the db)

Right now, both have issues.
1 doesn't considered queued tasks which may not be running now, but will be 
running soon. Hopefully, check 2 should catch it, but it does not check the 
condition properly as it only locks the row, and it seems locking the dag would 
also be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-1036) Exponential backoff should use randomization

2017-04-27 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel reassigned AIRFLOW-1036:


Assignee: Alex Guziel

> Exponential backoff should use randomization
> 
>
> Key: AIRFLOW-1036
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1036
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> This prevents the thundering herd problem. I think with the current way this 
> is used, we would need to use some hashing function based on some subset of 
> the dag_run, task_id, dag_id, and execution_date to emulate the RNG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1155) Add Tails.com to community

2017-04-28 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989420#comment-15989420
 ] 

Alex Guziel commented on AIRFLOW-1155:
--

Hmm for some reaosn cant close this issue

> Add Tails.com to community
> --
>
> Key: AIRFLOW-1155
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1155
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: Documentation
>Reporter: Alan Cruickshank
>Assignee: Alex Guziel
>Priority: Trivial
> Fix For: 1.9.0
>
>
> Add to README.md
> ```
> 1. [Tails.com](https://tails.com/) 
> [[@alanmcruickshank](https://github.com/alanmcruickshank)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-1155) Add Tails.com to community

2017-04-28 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel reassigned AIRFLOW-1155:


Assignee: Alex Guziel  (was: Alan Cruickshank)

> Add Tails.com to community
> --
>
> Key: AIRFLOW-1155
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1155
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: Documentation
>Reporter: Alan Cruickshank
>Assignee: Alex Guziel
>Priority: Trivial
> Fix For: 1.9.0
>
>
> Add to README.md
> ```
> 1. [Tails.com](https://tails.com/) 
> [[@alanmcruickshank](https://github.com/alanmcruickshank)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1268) Celery bug can cause tasks to be delayed indefinitely

2017-06-02 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1268:


 Summary: Celery bug can cause tasks to be delayed indefinitely
 Key: AIRFLOW-1268
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1268
 Project: Apache Airflow
  Issue Type: Bug
  Components: celery
 Environment: With celery_executor with redis
Reporter: Alex Guziel
Priority: Blocker


With celery, tasks can get delayed indefinitely (or default 1 hour) due to a 
bug with celery, see https://github.com/celery/celery/issues/3765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1268) Celery bug can cause tasks to be delayed indefinitely

2017-06-02 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-1268:
-
Priority: Critical  (was: Blocker)

> Celery bug can cause tasks to be delayed indefinitely
> -
>
> Key: AIRFLOW-1268
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1268
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
> Environment: With celery_executor with redis
>Reporter: Alex Guziel
>Priority: Critical
>
> With celery, tasks can get delayed indefinitely (or default 1 hour) due to a 
> bug with celery, see https://github.com/celery/celery/issues/3765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1269) Job table should be reloaded into memory on Scheduler start

2017-06-02 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1269:


 Summary: Job table should be reloaded into memory on Scheduler 
start
 Key: AIRFLOW-1269
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1269
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, running jobs that stop heartbeating will not be restarted if the 
scheduler has been restarted since then.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1289) Don't restrict scheduler threads to CPU cores

2017-06-07 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1289:


 Summary: Don't restrict scheduler threads to CPU cores
 Key: AIRFLOW-1289
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1289
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


There's really no reason to, and DAG processing can have blocking IO



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-1265) Exception happens when loading celery configurations.

2017-06-07 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel reassigned AIRFLOW-1265:


Assignee: Alex Guziel  (was: Chienhsiung Chao)

> Exception happens when loading celery configurations.
> -
>
> Key: AIRFLOW-1265
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1265
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: OSX
>Reporter: Chienhsiung Chao
>Assignee: Alex Guziel
>
> airflow@f300f25ced3a:/usr/local/script$ airflow
> [2017-06-02 02:25:59,263] {configuration.py:199} WARNING - section/key 
> [celery/celery_ssl_key] not found in config
> Traceback (most recent call last):
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 52, in 
> CeleryConfig
> BROKER_USE_SSL = {'keyfile': configuration.get('celery', 
> 'CELERY_SSL_KEY'),
>   File "/incubator-airflow/airflow/configuration.py", line 398, in get
> return conf.get(section, key, **kwargs)
>   File "/incubator-airflow/airflow/configuration.py", line 203, in get
> "in config".format(**locals()))
> airflow.exceptions.AirflowConfigException: section/key 
> [celery/celery_ssl_key] not found in config
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/incubator-airflow/airflow/bin/airflow", line 18, in 
> from airflow.bin.cli import CLIFactory
>   File "/incubator-airflow/airflow/bin/cli.py", line 46, in 
> from airflow import jobs, settings
>   File "/incubator-airflow/airflow/jobs.py", line 66, in 
> class BaseJob(Base, LoggingMixin):
>   File "/incubator-airflow/airflow/jobs.py", line 98, in BaseJob
> executor=executors.GetDefaultExecutor(),
>   File "/incubator-airflow/airflow/executors/__init__.py", line 43, in 
> GetDefaultExecutor
> DEFAULT_EXECUTOR = _get_executor(executor_name)
>   File "/incubator-airflow/airflow/executors/__init__.py", line 60, in 
> _get_executor
> from airflow.executors.celery_executor import CeleryExecutor
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 38, in 
> 
> class CeleryConfig(object):
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 60, in 
> CeleryConfig
> raise AirflowException('Exception: There was an unknown Celery SSL Error. 
>  Please ensure you want to use '
> airflow.exceptions.AirflowException: Exception: There was an unknown Celery 
> SSL Error.  Please ensure you want to use SSL and/or have all necessary certs 
> and key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1289) Don't restrict scheduler threads to CPU cores

2017-06-08 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1289.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2353
[https://github.com/apache/incubator-airflow/pull/2353]

> Don't restrict scheduler threads to CPU cores
> -
>
> Key: AIRFLOW-1289
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1289
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> There's really no reason to, and DAG processing can have blocking IO



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1309) Add optional hive_tblproperties in HiveToDruidTransfer

2017-06-15 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1309:


 Summary: Add optional hive_tblproperties in HiveToDruidTransfer
 Key: AIRFLOW-1309
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1309
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel
Priority: Minor


We should accept tblproperties for the tmp table in druid



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1334) Improve efficiency of checking for backfills on scheduler loop

2017-06-21 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1334:


 Summary: Improve efficiency of checking for backfills on scheduler 
loop
 Key: AIRFLOW-1334
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1334
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, it makes a query for each TI, which is quite slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1335) Use buffered logger

2017-06-21 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1335:


 Summary: Use buffered logger
 Key: AIRFLOW-1335
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1335
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1334) Improve efficiency of checking for backfills on scheduler loop

2017-06-22 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-1334:
-
Fix Version/s: 1.8.3

> Improve efficiency of checking for backfills on scheduler loop
> --
>
> Key: AIRFLOW-1334
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1334
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.8.3
>
>
> Right now, it makes a query for each TI, which is quite slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1335) Use buffered logger

2017-06-22 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-1335:
-
Fix Version/s: 1.9.0

> Use buffered logger
> ---
>
> Key: AIRFLOW-1335
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1335
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1335) Use buffered logger

2017-06-22 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1335.
--
Resolution: Fixed

Issue resolved by pull request #2386
[https://github.com/apache/incubator-airflow/pull/2386]

> Use buffered logger
> ---
>
> Key: AIRFLOW-1335
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1335
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1345) Don't commit on each loop

2017-06-23 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1345:


 Summary: Don't commit on each loop
 Key: AIRFLOW-1345
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1345
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel
 Fix For: 1.8.3


RIght now, in the main scheduler loop, we commit for each TI. While this 
minimize the time is a lock held, this expires all TIs, forcing us to do an n+1 
query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1348) Paginated UI has broken toggles after first page

2017-06-26 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063831#comment-16063831
 ] 

Alex Guziel commented on AIRFLOW-1348:
--

I don't think this is a new issue. It has been like this on Airbnb production 
for quite a while.

> Paginated UI has broken toggles after first page
> 
>
> Key: AIRFLOW-1348
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1348
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Chris Riccomini
> Attachments: page1.png, page2.png
>
>
> After upgrading to 1.8.2rc2, I'm seeing the main page paginate my list of 
> Airflow DAGs. Unfortunately, the toggles turn to checkboxes after the first 
> page. I'm attaching some screenshots to illustrate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1334) Improve efficiency of checking for backfills on scheduler loop

2017-06-27 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1334.
--
Resolution: Fixed

Issue resolved by pull request #2384
[https://github.com/apache/incubator-airflow/pull/2384]

> Improve efficiency of checking for backfills on scheduler loop
> --
>
> Key: AIRFLOW-1334
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1334
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.8.3
>
>
> Right now, it makes a query for each TI, which is quite slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1352) Revert bad logging Handler

2017-06-27 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1352:


 Summary: Revert bad logging Handler
 Key: AIRFLOW-1352
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1352
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now it uses some weird API so I'll revert rather than fix



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1352) Revert bad logging Handler

2017-06-27 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1352.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2403
[https://github.com/apache/incubator-airflow/pull/2403]

> Revert bad logging Handler
> --
>
> Key: AIRFLOW-1352
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1352
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Right now it uses some weird API so I'll revert rather than fix



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1265) Exception happens when loading celery configurations.

2017-06-30 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070582#comment-16070582
 ] 

Alex Guziel commented on AIRFLOW-1265:
--

This is done but celery messed up

> Exception happens when loading celery configurations.
> -
>
> Key: AIRFLOW-1265
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1265
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: OSX
>Reporter: Chienhsiung Chao
>Assignee: Alex Guziel
>
> airflow@f300f25ced3a:/usr/local/script$ airflow
> [2017-06-02 02:25:59,263] {configuration.py:199} WARNING - section/key 
> [celery/celery_ssl_key] not found in config
> Traceback (most recent call last):
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 52, in 
> CeleryConfig
> BROKER_USE_SSL = {'keyfile': configuration.get('celery', 
> 'CELERY_SSL_KEY'),
>   File "/incubator-airflow/airflow/configuration.py", line 398, in get
> return conf.get(section, key, **kwargs)
>   File "/incubator-airflow/airflow/configuration.py", line 203, in get
> "in config".format(**locals()))
> airflow.exceptions.AirflowConfigException: section/key 
> [celery/celery_ssl_key] not found in config
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/incubator-airflow/airflow/bin/airflow", line 18, in 
> from airflow.bin.cli import CLIFactory
>   File "/incubator-airflow/airflow/bin/cli.py", line 46, in 
> from airflow import jobs, settings
>   File "/incubator-airflow/airflow/jobs.py", line 66, in 
> class BaseJob(Base, LoggingMixin):
>   File "/incubator-airflow/airflow/jobs.py", line 98, in BaseJob
> executor=executors.GetDefaultExecutor(),
>   File "/incubator-airflow/airflow/executors/__init__.py", line 43, in 
> GetDefaultExecutor
> DEFAULT_EXECUTOR = _get_executor(executor_name)
>   File "/incubator-airflow/airflow/executors/__init__.py", line 60, in 
> _get_executor
> from airflow.executors.celery_executor import CeleryExecutor
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 38, in 
> 
> class CeleryConfig(object):
>   File "/incubator-airflow/airflow/executors/celery_executor.py", line 60, in 
> CeleryConfig
> raise AirflowException('Exception: There was an unknown Celery SSL Error. 
>  Please ensure you want to use '
> airflow.exceptions.AirflowException: Exception: There was an unknown Celery 
> SSL Error.  Please ensure you want to use SSL and/or have all necessary certs 
> and key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1059) Reset_state_for_orphaned_task should operate in batch for the scheduler

2017-07-14 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1059.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2205
[https://github.com/apache/incubator-airflow/pull/2205]

> Reset_state_for_orphaned_task should operate in batch for the scheduler
> ---
>
> Key: AIRFLOW-1059
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1059
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Scheduler startup is very slow due to resetting state making a query for each 
> dag run. We should be able to do this in a constant number of queries which 
> will increase scheduler startup time significantly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1345) Don't commit on each loop

2017-07-14 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1345.
--
   Resolution: Fixed
Fix Version/s: (was: 1.8.3)
   1.9.0

Issue resolved by pull request #2397
[https://github.com/apache/incubator-airflow/pull/2397]

> Don't commit on each loop
> -
>
> Key: AIRFLOW-1345
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1345
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> RIght now, in the main scheduler loop, we commit for each TI. While this 
> minimize the time is a lock held, this expires all TIs, forcing us to do an 
> n+1 query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1438) Scheduler batch queries should have a limit

2017-07-20 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1438:


 Summary: Scheduler batch queries should have a limit
 Key: AIRFLOW-1438
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1438
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Since they are in one query and there's a length limit, and they hold locks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1438) Scheduler batch queries should have a limit

2017-07-21 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1438.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2462
[https://github.com/apache/incubator-airflow/pull/2462]

> Scheduler batch queries should have a limit
> ---
>
> Key: AIRFLOW-1438
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1438
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Since they are in one query and there's a length limit, and they hold locks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1492) Add metric for task success/failure

2017-08-07 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1492:


 Summary: Add metric for task success/failure
 Key: AIRFLOW-1492
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1492
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1493) Fix race condition with airflow run

2017-08-07 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1493:


 Summary: Fix race condition with airflow run
 Key: AIRFLOW-1493
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1493
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Currently, airflow run spawns a process `airflow run --local` which spawns 
`airflow run --raw`.

Local manages the heartbeat. Raw performs a series of checks, sets the state to 
running, runs the task, then sets the state to failed or success. 

The problem is the heartbeat check on `airflow run --local` has to monitor the 
state in the DB, but because the change of state to running happens 
asynchronously, it must first observe the state in the DB to be running before 
it has the power of termination. However, there is no guarantee that it will 
observe this state. Thus, we should  move the pre-execution logic to airflow 
run --local



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1492) Add metric for task success/failure

2017-08-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1492.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2504
[https://github.com/apache/incubator-airflow/pull/2504]

> Add metric for task success/failure
> ---
>
> Key: AIRFLOW-1492
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1492
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1512) Add operator for running Python functions in a virtualenv

2017-08-15 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1512:


 Summary: Add operator for running Python functions in a virtualenv
 Key: AIRFLOW-1512
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1512
 Project: Apache Airflow
  Issue Type: New Feature
  Components: operators
Reporter: Alex Guziel
Assignee: Alex Guziel






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1522) Increase size of val column for variable table in MySQL

2017-08-18 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1522:


 Summary: Increase size of val column for variable table in MySQL
 Key: AIRFLOW-1522
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1522
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, it's 64KB, which is a bit too small. This increases it to 16MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1569) Requeue Celery tasks in RESERVED state

2017-09-05 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1569:


 Summary: Requeue Celery tasks in RESERVED state
 Key: AIRFLOW-1569
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1569
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, "fair" scheduling in Celery doesn't quite work (some tasks get 
RESERVED, which means they will get blocked from execution even if there are 
open slots). We should requeue them after 2 heartbeats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1570) Tasks that finish unnaturally don't surface error messages.

2017-09-05 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1570:


 Summary: Tasks that finish unnaturally don't surface error 
messages.
 Key: AIRFLOW-1570
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1570
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, there are two Airflow wrapper tasks to run tasks. Local and raw 
(local runs raw and checks its status). Local's checks work fine when raw 
changes the state, but do not surface errors when raw executes abruptly (ie in 
the event of a SIGKILL).  This should be changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1570) Tasks that finish unnaturally don't surface error messages.

2017-09-05 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154572#comment-16154572
 ] 

Alex Guziel commented on AIRFLOW-1570:
--

This is fixed by another PR

> Tasks that finish unnaturally don't surface error messages.
> ---
>
> Key: AIRFLOW-1570
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1570
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now, there are two Airflow wrapper tasks to run tasks. Local and raw 
> (local runs raw and checks its status). Local's checks work fine when raw 
> changes the state, but do not surface errors when raw executes abruptly (ie 
> in the event of a SIGKILL).  This should be changed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1493) Fix race condition with airflow run

2017-09-06 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel resolved AIRFLOW-1493.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2505
[https://github.com/apache/incubator-airflow/pull/2505]

> Fix race condition with airflow run
> ---
>
> Key: AIRFLOW-1493
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1493
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
> Fix For: 1.9.0
>
>
> Currently, airflow run spawns a process `airflow run --local` which spawns 
> `airflow run --raw`.
> Local manages the heartbeat. Raw performs a series of checks, sets the state 
> to running, runs the task, then sets the state to failed or success. 
> The problem is the heartbeat check on `airflow run --local` has to monitor 
> the state in the DB, but because the change of state to running happens 
> asynchronously, it must first observe the state in the DB to be running 
> before it has the power of termination. However, there is no guarantee that 
> it will observe this state. Thus, we should  move the pre-execution logic to 
> airflow run --local



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-422) Expose email for ingestion by performance monitoring tools

2016-08-12 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-422:
---

 Summary: Expose email for ingestion by performance monitoring tools
 Key: AIRFLOW-422
 URL: https://issues.apache.org/jira/browse/AIRFLOW-422
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, we surface some data for third party tools 
(https://issues.apache.org/jira/browse/AIRFLOW-244). We want emails as well so 
these tools can notify the right people about jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-422) Expose task information for ingestion by performance monitoring tools

2016-08-16 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-422:

Summary: Expose task information for ingestion by performance monitoring 
tools  (was: Expose email for ingestion by performance monitoring tools)

> Expose task information for ingestion by performance monitoring tools
> -
>
> Key: AIRFLOW-422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-422
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Right now, we surface some data for third party tools 
> (https://issues.apache.org/jira/browse/AIRFLOW-244). We want emails as well 
> so these tools can notify the right people about jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-422) Expose task information for ingestion by performance monitoring tools

2016-08-16 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-422:

Description: 
Right now, we surface some data for third party tools 
(https://issues.apache.org/jira/browse/AIRFLOW-244). We want other fields to be 
accessible by these tools. For example, emails will allow us to send digests.


  was:Right now, we surface some data for third party tools 
(https://issues.apache.org/jira/browse/AIRFLOW-244). We want emails as well so 
these tools can notify the right people about jobs.


> Expose task information for ingestion by performance monitoring tools
> -
>
> Key: AIRFLOW-422
> URL: https://issues.apache.org/jira/browse/AIRFLOW-422
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Right now, we surface some data for third party tools 
> (https://issues.apache.org/jira/browse/AIRFLOW-244). We want other fields to 
> be accessible by these tools. For example, emails will allow us to send 
> digests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-677) Kill zombie tasks after missing heartbeats

2016-12-06 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-677:
---

 Summary: Kill zombie tasks after missing heartbeats
 Key: AIRFLOW-677
 URL: https://issues.apache.org/jira/browse/AIRFLOW-677
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


If there's a connection error while heartbeating, it should retry. Also, if it 
hasn't been able to heartbeat for a while, it should kill the child processes 
so that we don't have 2 of the same task running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-679) Stop concurrent task instances from running due to race conditions

2016-12-06 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-679:
---

 Summary: Stop concurrent task instances from running due to race 
conditions
 Key: AIRFLOW-679
 URL: https://issues.apache.org/jira/browse/AIRFLOW-679
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel
Priority: Minor


Right now, multiple copies of the same task instance can run if someone clicks 
on the UI multiple times. To fix this, I propose two things:

1) Use a transaction to set state to running, and don't run otherwise
2) record hostname and pid in TaskInstance table, then when heartbeating, only 
continue running if it matches



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-677) Kill zombie tasks after missing heartbeats

2016-12-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel closed AIRFLOW-677.
---
Resolution: Fixed

> Kill zombie tasks after missing heartbeats
> --
>
> Key: AIRFLOW-677
> URL: https://issues.apache.org/jira/browse/AIRFLOW-677
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> If there's a connection error while heartbeating, it should retry. Also, if 
> it hasn't been able to heartbeat for a while, it should kill the child 
> processes so that we don't have 2 of the same task running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-679) Stop concurrent task instances from running due to race conditions

2016-12-09 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-679:

Description: 
Right now, multiple copies of the same task instance can run if someone clicks 
on the UI multiple times. To fix this, I propose two things:

1) record hostname and pid in TaskInstance table, then when heartbeating, only 
continue running if it matches

  was:
Right now, multiple copies of the same task instance can run if someone clicks 
on the UI multiple times. To fix this, I propose two things:

1) Use a transaction to set state to running, and don't run otherwise
2) record hostname and pid in TaskInstance table, then when heartbeating, only 
continue running if it matches


> Stop concurrent task instances from running due to race conditions
> --
>
> Key: AIRFLOW-679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-679
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Minor
>
> Right now, multiple copies of the same task instance can run if someone 
> clicks on the UI multiple times. To fix this, I propose two things:
> 1) record hostname and pid in TaskInstance table, then when heartbeating, 
> only continue running if it matches



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-696) Monitor queue lengths in CeleryExecutor

2016-12-12 Thread Alex Guziel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Guziel updated AIRFLOW-696:

Summary: Monitor queue lengths in CeleryExecutor  (was: Monitoring for 
queue lengths in CeleryExecutor)

> Monitor queue lengths in CeleryExecutor
> ---
>
> Key: AIRFLOW-696
> URL: https://issues.apache.org/jira/browse/AIRFLOW-696
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Monitor queue lengths for CeleryExecutor. This will make it easier to see how 
> much of the cluster is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-696) Monitoring for queue lengths in CeleryExecutor

2016-12-12 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-696:
---

 Summary: Monitoring for queue lengths in CeleryExecutor
 Key: AIRFLOW-696
 URL: https://issues.apache.org/jira/browse/AIRFLOW-696
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Monitor queue lengths for CeleryExecutor. This will make it easier to see how 
much of the cluster is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-1601) Add configurable time between SIGTERM and SIGKILL in task killer

2017-09-12 Thread Alex Guziel (JIRA)
Alex Guziel created AIRFLOW-1601:


 Summary: Add configurable time between SIGTERM and SIGKILL in task 
killer
 Key: AIRFLOW-1601
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1601
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Alex Guziel
Assignee: Alex Guziel


Right now, you can only wait 5 seconds, which might not be enough for some 
things to clean up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >