[jira] [Commented] (AIRFLOW-2105) Exception on known event creation

2018-02-15 Thread Yuliya Volkova (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366681#comment-16366681
 ] 

Yuliya Volkova commented on AIRFLOW-2105:
-

[~paymahn], try to airflow upgradedb, will it gone correct? 

It's first start after migration to 1.9.0 or not? 

> Exception on known event creation
> -
>
> Key: AIRFLOW-2105
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2105
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paymahn Moghadasian
>Priority: Minor
>
> I tried to create a known event through the UI and was shown the following 
> error:
> {noformat}
> ---
> Node: PaymahnSolvvy.local
> ---
> Traceback (most recent call last):
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py",
>  line 1988, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py",
>  line 1641, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py",
>  line 1544, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/_compat.py",
>  line 33, in reraise
> raise value
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py",
>  line 1639, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py",
>  line 1625, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py",
>  line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py",
>  line 368, in _run_view
> return fn(self, *args, **kwargs)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/model/base.py",
>  line 1947, in create_view
> return_url=return_url)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py",
>  line 308, in render
> return render_template(template, **kwargs)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/templating.py",
>  line 134, in render_template
> context, ctx.app)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/templating.py",
>  line 116, in _render
> rv = template.render(context)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/environment.py",
>  line 989, in render
> return self.environment.handle_exception(exc_info, True)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/environment.py",
>  line 754, in handle_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/_compat.py",
>  line 37, in reraise
> raise value.with_traceback(tb)
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/airflow/model_create.html",
>  line 18, in top-level template code
> {% extends 'admin/model/create.html' %}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/model/create.html",
>  line 3, in top-level template code
> {% from 'admin/lib.html' import extra with context %} {# backward 
> compatible #}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/admin/master.html",
>  line 18, in top-level template code
> {% extends 'admin/base.html' %}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/base.html",
>  line 30, in top-level template code
> {% block page_body %}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/admin/master.html",
>  line 104, in block "page_body"
> {% block body %}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/airflow/model_create.html",
>  line 28, in block "body"
> {{ super() }}
>   File 
> "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/model/create.html",
>  line 22, in block "body"
> {% block create_form %}
>   File 
> 

[jira] [Commented] (AIRFLOW-215) Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks

2018-02-15 Thread Yuliya Volkova (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366680#comment-16366680
 ] 

Yuliya Volkova commented on AIRFLOW-215:


[~bolke], is this issue fixed already? If is, maybe close the task? 

Didn't catch this behavior on 1.9.0 

> Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks
> --
>
> Key: AIRFLOW-215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-215
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, subdag
>Affects Versions: Airflow 1.7.1.2
>Reporter: Cyril Scetbon
>Priority: Major
>
> We have a main dag that dynamically creates subdags containing tasks using 
> BashOperator. Using CeleryExecutor we see Celery tasks been created with 
> *STARTED* status but they are not picked up by our worker. However, if we 
> restart our worker, then tasks are picked up. 
> Here you can find code if you want to try to reproduce it 
> https://www.dropbox.com/s/8u7xf8jt55v8zio/dags.zip.
> We also tested using LocalExecutor and everything worked fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-336) extra variable in celery_executor.py

2018-02-15 Thread Yuliya Volkova (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366670#comment-16366670
 ] 

Yuliya Volkova edited comment on AIRFLOW-336 at 2/16/18 7:23 AM:
-

I think, question is, why this: 

!Screen Shot 2018-02-16 at 101648 AM.png!

in celery_executor.py, still in master (above 1.9.0)

This variable seems like redundant and not using inside CeleryExecutor

To remove this?


was (Author: xnuinside):
I think, question is, why this: 

!Screen Shot 2018-02-16 at 101648 AM.png!

in celery_executor.py

This variable seems like redundant and not using inside CeleryExecutor

To remove this?

> extra variable in celery_executor.py
> 
>
> Key: AIRFLOW-336
> URL: https://issues.apache.org/jira/browse/AIRFLOW-336
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Armando Fandango
>Priority: Major
> Attachments: Screen Shot 2018-02-16 at 101648 AM.png
>
>
> PARALLELISM = configuration.get('core', 'PARALLELISM')
> Why is this extra variable in celery_executor.py ? 
> Is there a plan to use it in celeryConfig object ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-336) extra variable in celery_executor.py

2018-02-15 Thread Yuliya Volkova (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366670#comment-16366670
 ] 

Yuliya Volkova commented on AIRFLOW-336:


I think, question is, why this: 

!Screen Shot 2018-02-16 at 101648 AM.png!

in celery_executor.py

This variable seems like redundant and not using inside CeleryExecutor

To remove this?

> extra variable in celery_executor.py
> 
>
> Key: AIRFLOW-336
> URL: https://issues.apache.org/jira/browse/AIRFLOW-336
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Armando Fandango
>Priority: Major
> Attachments: Screen Shot 2018-02-16 at 101648 AM.png
>
>
> PARALLELISM = configuration.get('core', 'PARALLELISM')
> Why is this extra variable in celery_executor.py ? 
> Is there a plan to use it in celeryConfig object ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-336) extra variable in celery_executor.py

2018-02-15 Thread Yuliya Volkova (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuliya Volkova updated AIRFLOW-336:
---
Attachment: Screen Shot 2018-02-16 at 101648 AM.png

> extra variable in celery_executor.py
> 
>
> Key: AIRFLOW-336
> URL: https://issues.apache.org/jira/browse/AIRFLOW-336
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Armando Fandango
>Priority: Major
> Attachments: Screen Shot 2018-02-16 at 101648 AM.png
>
>
> PARALLELISM = configuration.get('core', 'PARALLELISM')
> Why is this extra variable in celery_executor.py ? 
> Is there a plan to use it in celeryConfig object ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2115) Default Airflow.cfg & Airflow Web UI contains links to pythonhosted

2018-02-15 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2115:
---

 Summary: Default Airflow.cfg & Airflow Web UI contains links to 
pythonhosted
 Key: AIRFLOW-2115
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2115
 Project: Apache Airflow
  Issue Type: Task
  Components: Documentation
Reporter: Kaxil Naik
Assignee: Kaxil Naik


The default `airflow.cfg` file points to pythonhosted docs which should be fixed
https://github.com/apache/incubator-airflow/blob/1e36b37b68ab354d1d7d1d1d3abd151ce2a7cac7/airflow/config_templates/default_airflow.cfg#L227

Also Airflow WebUI contains links to old documentations located at PythonHosted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME to PythonVirtualenvOperator

2018-02-15 Thread John Bodley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Bodley updated AIRFLOW-2114:
-
Summary: Add AIRFLOW_HOME to PythonVirtualenvOperator  (was: Add 
AIRFLOW_HOME_VAR to PythonVirtualenvOperator)

> Add AIRFLOW_HOME to PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-2114
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2114
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: John Bodley
>Assignee: John Bodley
>Priority: Minor
>
> The subprocess used by the PythonVirtualenvOperator should be passed the 
> AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get 
> propagated to subprocess in order to leverage the hook connection information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME to PythonVirtualenvOperator

2018-02-15 Thread John Bodley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Bodley updated AIRFLOW-2114:
-
Description: The subprocess used by the PythonVirtualenvOperator should be 
passed the AIRFLOW_HOME (similar to the BashOperator) which ensures that it get 
propagated to subprocess in order to leverage the hook connection information.  
(was: The subprocess used by the PythonVirtualenvOperator should be passed the 
AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get 
propagated to subprocess in order to leverage the hook connection information.)

> Add AIRFLOW_HOME to PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-2114
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2114
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: John Bodley
>Assignee: John Bodley
>Priority: Minor
>
> The subprocess used by the PythonVirtualenvOperator should be passed the 
> AIRFLOW_HOME (similar to the BashOperator) which ensures that it get 
> propagated to subprocess in order to leverage the hook connection information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator

2018-02-15 Thread John Bodley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Bodley updated AIRFLOW-2114:
-
Summary: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator  (was: Add 
AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline with the BashOperator)

> Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator
> 
>
> Key: AIRFLOW-2114
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2114
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: John Bodley
>Assignee: John Bodley
>Priority: Minor
>
> The subprocess used by the PythonVirtualenvOperator should be passed the 
> AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get 
> propagated to subprocess in order to leverage the hook connection information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2114) Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline with the BashOperator

2018-02-15 Thread John Bodley (JIRA)
John Bodley created AIRFLOW-2114:


 Summary: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline 
with the BashOperator
 Key: AIRFLOW-2114
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2114
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Reporter: John Bodley
Assignee: John Bodley


The subprocess used by the PythonVirtualenvOperator should be passed the 
AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get 
propagated to subprocess in order to leverage the hook connection information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-83) Add MongoDB hook and operator

2018-02-15 Thread Andy Cooper (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366285#comment-16366285
 ] 

Andy Cooper commented on AIRFLOW-83:


[~simasim] I have created the PR and the code - waiting for a response on CI/CD 
issues.

 

https://github.com/apache/incubator-airflow/pull/2962

> Add MongoDB hook and operator
> -
>
> Key: AIRFLOW-83
> URL: https://issues.apache.org/jira/browse/AIRFLOW-83
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: contrib, hooks, operators
>Reporter: vishal srivastava
>Assignee: Ajay Yadava
>Priority: Minor
>
> A mongodb hook and operator will be really useful for people who use airflow 
> with the mongo database. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2103) Authentication using password_auth backend prevents webserver from running

2018-02-15 Thread Marcos Bernardelli (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366282#comment-16366282
 ] 

Marcos Bernardelli commented on AIRFLOW-2103:
-

PR : https://github.com/apache/incubator-airflow/pull/3048

> Authentication using password_auth backend prevents webserver from running
> --
>
> Key: AIRFLOW-2103
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2103
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Mark Ward
>Assignee: Kaxil Naik
>Priority: Critical
>
> airflow webserver fails to run with config
> {code:java}
> [webserver]
> authenticate = True
> auth_backend = airflow.contrib.auth.backends.password_auth
> {code}
> and errors out with following traceback
> {code:java}
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 4, in 
>     
> __import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating',
>  'airflow')
>   File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", 
> line 750, in run_script
>     self.require(requires)[0].run_script(script_name, ns)
>   File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", 
> line 1527, in run_script
>     exec(code, namespace, namespace)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow",
>  line 27, in 
>     args.func(args)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py",
>  line 696, in webserver
>     app = cached_app(conf)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/app.py",
>  line 176, in cached_app
>     app = create_app(config, testing)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/app.py",
>  line 63, in create_app
>     from airflow.www import views
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/views.py",
>  line 97, in 
>     login_required = airflow.login.login_required
> AttributeError: module 'airflow.contrib.auth.backends.password_auth' has no 
> attribute 'login_required'
> {code}
> Broke at [https://github.com/apache/incubator-airflow/pull/2730] with the 
> changing of 
> {code:java}
> from flask_login import login_required, current_user, logout_user
> {code}
> to 
> {code:java}
> from flask_login import current_user
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2113) Address missing DagRuns callbacks

2018-02-15 Thread Alan Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Ma updated AIRFLOW-2113:
-
Summary: Address missing DagRuns callbacks  (was: DagRuns does not complete 
callbacks)

> Address missing DagRuns callbacks
> -
>
> Key: AIRFLOW-2113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alan Ma
>Assignee: Alan Ma
>Priority: Critical
>
> This originally arose from the missing notification from the on_failure and 
> on_success callback at the dag level. The stack trace is as follows:
> {code:java}
> [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
> Executing dag callback function: 
>  .GeneralNotifyFailed instance at 0x7fec9d8ad368>
> [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
> up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 346, in helper
> pickle_dags)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1586, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1175, in _process_dags
> dag_run = self.create_dag_run(dag)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 747, in create_dag_run
> dag.handle_callback(dr, success=False, reason='dagrun_timeout', 
> session=session)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
>  line 2990, in handle_callback
> d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 579, in get
> value = state._load_expired(state, passive)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
>  line 592, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
>  line 644, in load_scalar_attributes
> (state_str(state)))
> DetachedInstanceError: Instance  is not bound to a 
> Session; attribute refresh operation cannot proceed
> [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
> process (PID=7813) to work on 
> /home/charon/.virtualenvs/airflow/airflow_home/dags/c
> haron-airflow/dags/inapp_vendor_sku_breakdown.py\
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2113) Address missing DagRun callbacks

2018-02-15 Thread Alan Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Ma updated AIRFLOW-2113:
-
Summary: Address missing DagRun callbacks  (was: Address missing DagRuns 
callbacks)

> Address missing DagRun callbacks
> 
>
> Key: AIRFLOW-2113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alan Ma
>Assignee: Alan Ma
>Priority: Critical
>
> This originally arose from the missing notification from the on_failure and 
> on_success callback at the dag level. The stack trace is as follows:
> {code:java}
> [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
> Executing dag callback function: 
>  .GeneralNotifyFailed instance at 0x7fec9d8ad368>
> [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
> up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 346, in helper
> pickle_dags)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1586, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1175, in _process_dags
> dag_run = self.create_dag_run(dag)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 747, in create_dag_run
> dag.handle_callback(dr, success=False, reason='dagrun_timeout', 
> session=session)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
>  line 2990, in handle_callback
> d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 579, in get
> value = state._load_expired(state, passive)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
>  line 592, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
>  line 644, in load_scalar_attributes
> (state_str(state)))
> DetachedInstanceError: Instance  is not bound to a 
> Session; attribute refresh operation cannot proceed
> [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
> process (PID=7813) to work on 
> /home/charon/.virtualenvs/airflow/airflow_home/dags/c
> haron-airflow/dags/inapp_vendor_sku_breakdown.py\
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2113) DagRuns sometimes does not complete callbacks

2018-02-15 Thread Alan Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366251#comment-16366251
 ] 

Alan Ma commented on AIRFLOW-2113:
--

Given that the *handle_callback* method belongs to the DAG object, we are able 
to get the list of task directly with *get_task* instead of  communicating with 
the database.

> DagRuns sometimes does not complete callbacks
> -
>
> Key: AIRFLOW-2113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alan Ma
>Assignee: Alan Ma
>Priority: Critical
>
> This originally arose from the missing notification from the on_failure and 
> on_success callback at the dag level. The stack trace is as follows:
> {code:java}
> [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
> Executing dag callback function: 
>  .GeneralNotifyFailed instance at 0x7fec9d8ad368>
> [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
> up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 346, in helper
> pickle_dags)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1586, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1175, in _process_dags
> dag_run = self.create_dag_run(dag)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 747, in create_dag_run
> dag.handle_callback(dr, success=False, reason='dagrun_timeout', 
> session=session)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
>  line 2990, in handle_callback
> d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 579, in get
> value = state._load_expired(state, passive)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
>  line 592, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
>  line 644, in load_scalar_attributes
> (state_str(state)))
> DetachedInstanceError: Instance  is not bound to a 
> Session; attribute refresh operation cannot proceed
> [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
> process (PID=7813) to work on 
> /home/charon/.virtualenvs/airflow/airflow_home/dags/c
> haron-airflow/dags/inapp_vendor_sku_breakdown.py\
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2113) DagRuns sometimes does not complete callbacks

2018-02-15 Thread Alan Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Ma updated AIRFLOW-2113:
-
Summary: DagRuns sometimes does not complete callbacks  (was: DagRuns 
sometimes does not execute callbacks)

> DagRuns sometimes does not complete callbacks
> -
>
> Key: AIRFLOW-2113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alan Ma
>Assignee: Alan Ma
>Priority: Critical
>
> This originally arose from the missing notification from the on_failure and 
> on_success callback at the dag level. The stack trace is as follows:
> {code}
> [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
> Executing dag callback function: 
>  .GeneralNotifyFailed instance at 0x7fec9d8ad368>
> [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
> up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 346, in helper
> pickle_dags)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1586, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1175, in _process_dags
> dag_run = self.create_dag_run(dag)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 747, in create_dag_run
> dag.handle_callback(dr, success=False, reason='dagrun_timeout', 
> session=session)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
>  line 2990, in handle_callback
> d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 579, in get
> value = state._load_expired(state, passive)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
>  line 592, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
>  line 644, in load_scalar_attributes
> (state_str(state)))
> DetachedInstanceError: Instance  is not bound to a 
> Session; attribute refresh operation cannot proceed
> [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
> process (PID=7813) to work on 
> /home/charon/.virtualenvs/airflow/airflow_home/dags/c
> haron-airflow/dags/inapp_vendor_sku_breakdown.py\
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2113) DagRuns sometimes does not execute callbacks

2018-02-15 Thread Alan Ma (JIRA)
Alan Ma created AIRFLOW-2113:


 Summary: DagRuns sometimes does not execute callbacks
 Key: AIRFLOW-2113
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Alan Ma
Assignee: Alan Ma


This originally arose from the missing notification from the on_failure and 
on_success callback at the dag level. The stack trace is as follows:
{code:java}
[2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
Executing dag callback function: 

[2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
Dag: , paused: False
Dag: , paused: False
Dag: , paused: False
Dag: , paused: False
Dag: , paused: False
[2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
exception! Propagating...
Traceback (most recent call last):
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
 line 346, in helper
pickle_dags)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
 line 53, in wrapper
result = func(*args, **kwargs)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
 line 1586, in process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
 line 1175, in _process_dags
dag_run = self.create_dag_run(dag)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
 line 53, in wrapper
result = func(*args, **kwargs)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
 line 747, in create_dag_run
dag.handle_callback(dr, success=False, reason='dagrun_timeout', session=session)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
 line 53, in wrapper
result = func(*args, **kwargs)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
 line 2990, in handle_callback
d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
 line 237, in __get__
return self.impl.get(instance_state(instance), dict_)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
 line 579, in get
value = state._load_expired(state, passive)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
 line 592, in _load_expired
self.manager.deferred_scalar_loader(self, toload)
File 
"/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
 line 644, in load_scalar_attributes
(state_str(state)))
DetachedInstanceError: Instance  is not bound to a 
Session; attribute refresh operation cannot proceed
[2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
process (PID=7813) to work on 
/home/charon/.virtualenvs/airflow/airflow_home/dags/c
haron-airflow/dags/inapp_vendor_sku_breakdown.py\{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2112) Recent Tasks UI cell does not show all items

2018-02-15 Thread Diego Rabatone Oliveira (JIRA)
Diego Rabatone Oliveira created AIRFLOW-2112:


 Summary: Recent Tasks UI cell does not show all items
 Key: AIRFLOW-2112
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2112
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui
Affects Versions: 1.9.1
Reporter: Diego Rabatone Oliveira
Assignee: Diego Rabatone Oliveira
 Fix For: 1.9.1
 Attachments: 2018-02-15-182303_401x103_scrot.png

Recent Tasks cell on UI shows only some of the items from the SVG file within 
it.

!2018-02-15-182303_401x103_scrot.png!

This is because the SVG has its width hardcoded in 180px.

As we expect to have 8 items, with around 25-27px each, changin the width to 
240px would solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2111) Add the ability to pause/unpause a task within a DAG

2018-02-15 Thread Mark Reid (JIRA)
Mark Reid created AIRFLOW-2111:
--

 Summary: Add the ability to pause/unpause a task within a DAG
 Key: AIRFLOW-2111
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2111
 Project: Apache Airflow
  Issue Type: New Feature
  Components: scheduler, ui
Reporter: Mark Reid


It would be convenient to be able to pause/unpause a task within a DAG (rather 
than the entire DAG).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-835) SMTP Mail delivery fails with server using CRAM-MD5 auth

2018-02-15 Thread James Davidheiser (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366192#comment-16366192
 ] 

James Davidheiser commented on AIRFLOW-835:
---

I ran into this in 1.9.0 as well.  I solved it by creating a copy of email.py, 
with the username and password cast as Python 2 strings, and referencing that 
in airflow.cfg's email_backend.

> SMTP Mail delivery fails with server using CRAM-MD5 auth
> 
>
> Key: AIRFLOW-835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-835
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: Airflow 1.7.1
> Environment: https://hub.docker.com/_/python/ (debian:jessie + 
> python2.7 in docker)
>Reporter: Joseph Harris
>Priority: Minor
>
> Traceback when sending email from smtp-server configured to offer CRAM-MD5 
> (in all cases, tls included). This occurs because the configuration module 
> returns the password as a futures.types.newstr, instead of a plain str (see 
> below for gory details of why this breaks).
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 43, in send_email
> return backend(to, subject, html_content, files=files, dryrun=dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 79, in send_email_smtp
> send_MIME_email(SMTP_MAIL_FROM, to, msg, dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 95, in send_MIME_email
> s.login(SMTP_USER, SMTP_PASSWORD)
>   File "/usr/local/lib/python2.7/smtplib.py", line 607, in login
> (code, resp) = self.docmd(encode_cram_md5(resp, user, password))
>   File "/usr/local/lib/python2.7/smtplib.py", line 571, in encode_cram_md5
> response = user + " " + hmac.HMAC(password, challenge).hexdigest()
>   File "/usr/local/lib/python2.7/hmac.py", line 75, in __init__
> self.outer.update(key.translate(trans_5C))
>   File "/usr/local/lib/python2.7/site-packages/future/types/newstr.py", line 
> 390, in translate
> if ord(c) in table:
> TypeError: 'in ' requires string as left operand, not int
> SMTP configs:
> [email]
> email_backend = airflow.utils.email.send_email_smtp
> [smtp]
> smtp_host = {a_smtp_server}
> smtp_port = 587
> smtp_starttls = True
> smtp_ssl = False
> smtp_user = {a_username}
> smtp_password = {a_password}
> smtp_mail_from = {a_email_addr}
> *Gory details
> If the server offers CRAM-MD5, smptlib prefers this by default, and will try 
> to use hmac.HMAC to hash the password:
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l602
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l571
> But if the password is a newstr, newstr.translate expects a dict mapping 
> instead of str, and raises an exception.
> https://hg.python.org/cpython/file/2.7/Lib/hmac.py#l75
> All of this occurs after a successful SMTP.ehlo(), so it's probably not crap 
> container networking
> Could be resolved by passing the smtp password as a futures.types.newbytes, 
> as this behaves as expected:
> from future.types import newstr, newbytes
> import hmac
> # Make str / newstr types
> test = 'a_string'
> test_newstr = newstr(test)
> test_newbytes = newbytes(test)
> msg = 'future problems'
> # Test 1 - Try to do a HMAC:
> # fine
> hmac.HMAC(test, msg)
> # fails horribly
> hmac.HMAC(test_newstr, msg)
> # is completely fine
> hmac.HMAC(test_newbytes, msg)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2058) Scheduler uses MainThread for DAG file processing

2018-02-15 Thread Anonymous (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2058:
--

Assignee: Yang Pan

> Scheduler uses MainThread for DAG file processing
> -
>
> Key: AIRFLOW-2058
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2058
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.9.0
> Environment: Ubuntu, Airflow 1.9, Sequential executor
>Reporter: Yang Pan
>Assignee: Yang Pan
>Priority: Blocker
>
> By reading the [source code 
> |https://github.com/apache/incubator-airflow/blob/61ff29e578d1121ab4606fe122fb4e2db8f075b9/airflow/utils/dag_processing.py#L538]
>  it appears the scheduler will process each DAG file, either a .py or .zip, 
> using a new process. 
>  
> If I understand correctly, in theory what should happen in terms of 
> processing a .zip file is that the dedicated process will add the .zip file 
> to the PYTHONPATH, and load the file's module and dependency. When the DAG 
> read is done, the process gets destroyed. And since the PYTHONPATH is process 
> scoped, it won't pollute other processes.
>  
> However by printing out the threads and process id, it looks like Airflow 
> scheduler can sometimes accidentally pick up the main process instead of 
> creating a new one, and that's when collision happens.
>  
> Here is snippet of the PYTHONPATH when advanced_dag_dependency-1.zip is being 
> processed. As you can see when it's executed by MainThread, it contains other 
> .zip files. When it's using dedicated thread, only required .zip is added.
>  
> sys.path :['/root/airflow/dags/yang_subdag_2.zip', 
> '/root/airflow/dags/yang_subdag_2.zip', 
> '/root/airflow/dags/yang_subdag_1.zip', 
> '/root/airflow/dags/yang_subdag_1.zip', 
> '/root/airflow/dags/advanced_dag_dependency-2.zip', 
> '/root/airflow/dags/advanced_dag_dependency-2.zip', 
> '/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/root/airflow/dags/yang_subdag_1', '/usr/local/bin', '/usr/lib/python2.7', 
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', 
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', 
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', 
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', 
> '/root/airflow/dags', '/root/airflow/plugins'] 
> Print from MyFirstOperator in Dag 1 
> process id: 5059 
> thread id: <_MainThread(*MainThread*, started 140339858560768)> 
>  
> sys.path :[u'/root/airflow/dags/advanced_dag_dependency-1.zip', 
> '/usr/local/bin', '/usr/lib/python2.7', 
> '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', 
> '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', 
> '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', 
> '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', 
> '/root/airflow/dags', '/root/airflow/plugins'] 
> Print from MyFirstOperator in Dag 1 
> process id: 5076 
> thread id: <_MainThread(*DagFileProcessor283*, started 140137838294784)> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2108) BashOperator discards process indentation

2018-02-15 Thread Kaxil Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik reassigned AIRFLOW-2108:
---

Assignee: Kaxil Naik

> BashOperator discards process indentation
> -
>
> Key: AIRFLOW-2108
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2108
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.9.0
>Reporter: Chris Bandy
>Assignee: Kaxil Naik
>Priority: Minor
>
> When the BashOperator logs every line of output from the executing process, 
> it strips leading whitespace which makes it difficult to interpret output 
> that was formatted with indentation.
> For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through 
> this operator. When it finishes, it prints a summary which appears in the 
> logs like so:
> {noformat}
> [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 
> 2018-02-14T07:31:44.115000Z LOG report summary reset
> [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors 
>   read   imported  bytes  total time   read  write
> [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - 
> --  -  -  -  -  
> --  -  -
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data   
>0524524 1.438s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas
>   0  0  0 0.161s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types  
> 0 19 1920.413s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables 
>  0310310   3m2.316s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs
>   0155155 0.458s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - 
> --  -  -  -  -  
> --  -  -
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - 
> --  -  -  -  -  
> --  -  -
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build 
> Completion  0353353  1m37.323s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes
>   0353353  3m25.929s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences   
>0  0  0 2.677s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys  
> 0147147  1m21.091s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys   
>0 16 16 8.283s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers   
>0  0  0 0.339s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments  
> 0  0  0 0.000s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - 
> --  -  -  -  -  
> --  -  -
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time 
>  ✓  0  0  6m35.642s
> {noformat}
> Ideally, the leading whitespace would be retained, so the logs look like this:
> {noformat}
> [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - 
> 2018-02-14T07:31:44.115000Z LOG report summary reset
> [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table 
> name errors   read   imported  bytes  total time   read   
>write
> [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - 
> --  -  -  -  -  
> --  -  -
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta 
> data  0524524 1.438s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create 
> Schemas  0  0  0 0.161s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -   Create SQL 
> Types  0 19 1920.413s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -  Create 
> tables  0310310   3m2.316s
> [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table 
> OIDs  

[jira] [Work started] (AIRFLOW-2110) Enhance Http Hook to use a header in passed in the "extra" argument and add tenacity retry

2018-02-15 Thread Alberto Calderari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2110 started by Alberto Calderari.
--
> Enhance Http Hook to use a header in passed in the "extra" argument and add 
> tenacity retry
> --
>
> Key: AIRFLOW-2110
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2110
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: Airflow 1.8
>Reporter: Alberto Calderari
>Assignee: Alberto Calderari
>Priority: Minor
> Fix For: Airflow 2.0
>
>
> Add possibility to add a json header in the "extra" field in the connection:
> {"Authorization": "Bearer Here1sMyT0k3N"}
> Also add tenacity retry so the operator won't fall over in case of a bad 
> handshake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2110) Enhance Http Hook to use a header in passed in the "extra" argument and add tenacity retry

2018-02-15 Thread Alberto Calderari (JIRA)
Alberto Calderari created AIRFLOW-2110:
--

 Summary: Enhance Http Hook to use a header in passed in the 
"extra" argument and add tenacity retry
 Key: AIRFLOW-2110
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2110
 Project: Apache Airflow
  Issue Type: Improvement
  Components: hooks
Affects Versions: Airflow 1.8
Reporter: Alberto Calderari
Assignee: Alberto Calderari
 Fix For: Airflow 2.0


Add possibility to add a json header in the "extra" field in the connection:

{"Authorization": "Bearer Here1sMyT0k3N"}

Also add tenacity retry so the operator won't fall over in case of a bad 
handshake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2107) add time_partitioning to run_query

2018-02-15 Thread Ben Marengo (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2107 started by Ben Marengo.

> add time_partitioning to run_query
> --
>
> Key: AIRFLOW-2107
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2107
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp, hooks, operators
>Reporter: Ben Marengo
>Assignee: Ben Marengo
>Priority: Minor
>
> google have added a time partitioning field to the query part of their jobs 
> api
> [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.timePartitioning]
> we should mirror that in airflow (hook and operator)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-83) Add MongoDB hook and operator

2018-02-15 Thread Simon Dubois (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365498#comment-16365498
 ] 

Simon Dubois commented on AIRFLOW-83:
-

Hey [~andscoop].

Any news of this ?

> Add MongoDB hook and operator
> -
>
> Key: AIRFLOW-83
> URL: https://issues.apache.org/jira/browse/AIRFLOW-83
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: contrib, hooks, operators
>Reporter: vishal srivastava
>Assignee: Ajay Yadava
>Priority: Minor
>
> A mongodb hook and operator will be really useful for people who use airflow 
> with the mongo database. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly

2018-02-15 Thread rahul (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365286#comment-16365286
 ] 

rahul commented on AIRFLOW-2109:


And we are using MySQL rds as airflow metstore db

> scheduler stopped picking up jobs suddenly
> --
>
> Key: AIRFLOW-2109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2109
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: aws EC2 server
>Reporter: rahul
>Assignee: rahul
>Priority: Major
>
> Hi,
>  
> The scheduler stopped picking up jobs suddenly. when i resetdb it started 
> picking up properly.
>  
> could you please let me know the reason why airflow resetdb resolved this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly

2018-02-15 Thread rahul (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365285#comment-16365285
 ] 

rahul commented on AIRFLOW-2109:


We are using local executor and airflow is configured on AWS ec2 server.

Airflow resetdb resolved the issue . I just want to know why resetdb resolved 
the issue and what might be the root cause?

Thanks in advance .!

> scheduler stopped picking up jobs suddenly
> --
>
> Key: AIRFLOW-2109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2109
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: aws EC2 server
>Reporter: rahul
>Assignee: rahul
>Priority: Major
>
> Hi,
>  
> The scheduler stopped picking up jobs suddenly. when i resetdb it started 
> picking up properly.
>  
> could you please let me know the reason why airflow resetdb resolved this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly

2018-02-15 Thread Fokko Driesprong (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365283#comment-16365283
 ] 

Fokko Driesprong commented on AIRFLOW-2109:
---

We need much more information to give some sensible feedback. Do you have logs 
and what is your environment. What kind of executor are you using?

> scheduler stopped picking up jobs suddenly
> --
>
> Key: AIRFLOW-2109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2109
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: aws EC2 server
>Reporter: rahul
>Assignee: rahul
>Priority: Major
>
> Hi,
>  
> The scheduler stopped picking up jobs suddenly. when i resetdb it started 
> picking up properly.
>  
> could you please let me know the reason why airflow resetdb resolved this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2065) Worker logging can raise FileExistsError when more than one process execute concurrently

2018-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365272#comment-16365272
 ] 

Sébastien Brochet commented on AIRFLOW-2065:


I've opened a new PR with my fix: 
https://github.com/apache/incubator-airflow/pull/3040

Cheers,

> Worker logging can raise FileExistsError when more than one process execute 
> concurrently
> 
>
> Key: AIRFLOW-2065
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2065
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor, logging
>Affects Versions: 1.9.0
>Reporter: Sébastien Brochet
>Priority: Critical
>
> Hello,
>  
> We started observing random failing during the execution of our dags after 
> upgrading to 1.9.0. After careful debugging, we noticing the following 
> exception in the worker logs:
> {noformat}
> Traceback (most recent call last):
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 558, in configure
>      handler = self.configure_handler(handlers[name])
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 731, in configure_handler
>      result = factory(**kwargs)
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py",
>  line 48, in __init__
>      os.makedirs(self._get_log_directory())
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/os.py", line 220, 
> in makedirs
>      mkdir(name, mode)
>  FileExistsError: [Errno 17] File exists: 
> '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>    File "/projects/airflow-hadoop/anaconda3/bin/airflow", line 16, in 
>      from airflow import configuration
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/__init__.py",
>  line 31, in 
>      from airflow import settings
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/settings.py",
>  line 148, in 
>      configure_logging()
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py",
>  line 75, in configure_logging
>      raise e
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py",
>  line 70, in configure_logging
>      dictConfig(logging_config)
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 795, in dictConfig
>      dictConfigClass(config).configure()
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 566, in configure
>      '%r: %s' % (name, e))
>  ValueError: Unable to configure handler 'file.processor': [Errno 17] File 
> exists: '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05
> {noformat}
>  
> As you can see, an exception is raised when trying to create the directory 
> where to store the executor logs. This can happen if two tasks are scheduled 
> are the exact same time on the same worker. It appears to be the case here :
>  
> {noformat}
>  [2018-02-05 02:10:07,886] \{celery_executor.py:50} INFO - Executing command 
> in Celery: airflow run  pairing_sensor_check 2018-02-04T02:10:00 --local 
> --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py
>  [2018-02-05 02:10:07,908] \{celery_executor.py:50} INFO - Executing command 
> in Celery: airflow run yyy pairing_sensor_check 2018-02-04T02:10:00 --local 
> --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py
> {noformat}
> Culprits is here: 
> [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/log/file_processor_handler.py#L47-L48]
>  (not fixed in master)
> A simple fix would be to wrap the {{makedirs}} command into a {{try}} / 
> {{catch}} block.
>  
> Thanks,
>  
> Sébastien



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)