[jira] [Commented] (AIRFLOW-2105) Exception on known event creation
[ https://issues.apache.org/jira/browse/AIRFLOW-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366681#comment-16366681 ] Yuliya Volkova commented on AIRFLOW-2105: - [~paymahn], try to airflow upgradedb, will it gone correct? It's first start after migration to 1.9.0 or not? > Exception on known event creation > - > > Key: AIRFLOW-2105 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2105 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Paymahn Moghadasian >Priority: Minor > > I tried to create a known event through the UI and was shown the following > error: > {noformat} > --- > Node: PaymahnSolvvy.local > --- > Traceback (most recent call last): > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py", > line 1988, in wsgi_app > response = self.full_dispatch_request() > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py", > line 1641, in full_dispatch_request > rv = self.handle_user_exception(e) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py", > line 1544, in handle_user_exception > reraise(exc_type, exc_value, tb) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/_compat.py", > line 33, in reraise > raise value > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py", > line 1639, in full_dispatch_request > rv = self.dispatch_request() > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/app.py", > line 1625, in dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py", > line 69, in inner > return self._run_view(f, *args, **kwargs) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py", > line 368, in _run_view > return fn(self, *args, **kwargs) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/model/base.py", > line 1947, in create_view > return_url=return_url) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/base.py", > line 308, in render > return render_template(template, **kwargs) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/templating.py", > line 134, in render_template > context, ctx.app) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask/templating.py", > line 116, in _render > rv = template.render(context) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/environment.py", > line 989, in render > return self.environment.handle_exception(exc_info, True) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/environment.py", > line 754, in handle_exception > reraise(exc_type, exc_value, tb) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/jinja2/_compat.py", > line 37, in reraise > raise value.with_traceback(tb) > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/airflow/model_create.html", > line 18, in top-level template code > {% extends 'admin/model/create.html' %} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/model/create.html", > line 3, in top-level template code > {% from 'admin/lib.html' import extra with context %} {# backward > compatible #} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/admin/master.html", > line 18, in top-level template code > {% extends 'admin/base.html' %} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/base.html", > line 30, in top-level template code > {% block page_body %} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/admin/master.html", > line 104, in block "page_body" > {% block body %} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/airflow/www/templates/airflow/model_create.html", > line 28, in block "body" > {{ super() }} > File > "/Users/paymahn/solvvy/scheduler/venv/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/model/create.html", > line 22, in block "body" > {% block create_form %} > File >
[jira] [Commented] (AIRFLOW-215) Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366680#comment-16366680 ] Yuliya Volkova commented on AIRFLOW-215: [~bolke], is this issue fixed already? If is, maybe close the task? Didn't catch this behavior on 1.9.0 > Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks > -- > > Key: AIRFLOW-215 > URL: https://issues.apache.org/jira/browse/AIRFLOW-215 > Project: Apache Airflow > Issue Type: Bug > Components: celery, subdag >Affects Versions: Airflow 1.7.1.2 >Reporter: Cyril Scetbon >Priority: Major > > We have a main dag that dynamically creates subdags containing tasks using > BashOperator. Using CeleryExecutor we see Celery tasks been created with > *STARTED* status but they are not picked up by our worker. However, if we > restart our worker, then tasks are picked up. > Here you can find code if you want to try to reproduce it > https://www.dropbox.com/s/8u7xf8jt55v8zio/dags.zip. > We also tested using LocalExecutor and everything worked fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-336) extra variable in celery_executor.py
[ https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366670#comment-16366670 ] Yuliya Volkova edited comment on AIRFLOW-336 at 2/16/18 7:23 AM: - I think, question is, why this: !Screen Shot 2018-02-16 at 101648 AM.png! in celery_executor.py, still in master (above 1.9.0) This variable seems like redundant and not using inside CeleryExecutor To remove this? was (Author: xnuinside): I think, question is, why this: !Screen Shot 2018-02-16 at 101648 AM.png! in celery_executor.py This variable seems like redundant and not using inside CeleryExecutor To remove this? > extra variable in celery_executor.py > > > Key: AIRFLOW-336 > URL: https://issues.apache.org/jira/browse/AIRFLOW-336 > Project: Apache Airflow > Issue Type: Bug >Reporter: Armando Fandango >Priority: Major > Attachments: Screen Shot 2018-02-16 at 101648 AM.png > > > PARALLELISM = configuration.get('core', 'PARALLELISM') > Why is this extra variable in celery_executor.py ? > Is there a plan to use it in celeryConfig object ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-336) extra variable in celery_executor.py
[ https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366670#comment-16366670 ] Yuliya Volkova commented on AIRFLOW-336: I think, question is, why this: !Screen Shot 2018-02-16 at 101648 AM.png! in celery_executor.py This variable seems like redundant and not using inside CeleryExecutor To remove this? > extra variable in celery_executor.py > > > Key: AIRFLOW-336 > URL: https://issues.apache.org/jira/browse/AIRFLOW-336 > Project: Apache Airflow > Issue Type: Bug >Reporter: Armando Fandango >Priority: Major > Attachments: Screen Shot 2018-02-16 at 101648 AM.png > > > PARALLELISM = configuration.get('core', 'PARALLELISM') > Why is this extra variable in celery_executor.py ? > Is there a plan to use it in celeryConfig object ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-336) extra variable in celery_executor.py
[ https://issues.apache.org/jira/browse/AIRFLOW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Volkova updated AIRFLOW-336: --- Attachment: Screen Shot 2018-02-16 at 101648 AM.png > extra variable in celery_executor.py > > > Key: AIRFLOW-336 > URL: https://issues.apache.org/jira/browse/AIRFLOW-336 > Project: Apache Airflow > Issue Type: Bug >Reporter: Armando Fandango >Priority: Major > Attachments: Screen Shot 2018-02-16 at 101648 AM.png > > > PARALLELISM = configuration.get('core', 'PARALLELISM') > Why is this extra variable in celery_executor.py ? > Is there a plan to use it in celeryConfig object ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2115) Default Airflow.cfg & Airflow Web UI contains links to pythonhosted
Kaxil Naik created AIRFLOW-2115: --- Summary: Default Airflow.cfg & Airflow Web UI contains links to pythonhosted Key: AIRFLOW-2115 URL: https://issues.apache.org/jira/browse/AIRFLOW-2115 Project: Apache Airflow Issue Type: Task Components: Documentation Reporter: Kaxil Naik Assignee: Kaxil Naik The default `airflow.cfg` file points to pythonhosted docs which should be fixed https://github.com/apache/incubator-airflow/blob/1e36b37b68ab354d1d7d1d1d3abd151ce2a7cac7/airflow/config_templates/default_airflow.cfg#L227 Also Airflow WebUI contains links to old documentations located at PythonHosted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME to PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bodley updated AIRFLOW-2114: - Summary: Add AIRFLOW_HOME to PythonVirtualenvOperator (was: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator) > Add AIRFLOW_HOME to PythonVirtualenvOperator > > > Key: AIRFLOW-2114 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2114 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: John Bodley >Assignee: John Bodley >Priority: Minor > > The subprocess used by the PythonVirtualenvOperator should be passed the > AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get > propagated to subprocess in order to leverage the hook connection information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME to PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bodley updated AIRFLOW-2114: - Description: The subprocess used by the PythonVirtualenvOperator should be passed the AIRFLOW_HOME (similar to the BashOperator) which ensures that it get propagated to subprocess in order to leverage the hook connection information. (was: The subprocess used by the PythonVirtualenvOperator should be passed the AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get propagated to subprocess in order to leverage the hook connection information.) > Add AIRFLOW_HOME to PythonVirtualenvOperator > > > Key: AIRFLOW-2114 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2114 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: John Bodley >Assignee: John Bodley >Priority: Minor > > The subprocess used by the PythonVirtualenvOperator should be passed the > AIRFLOW_HOME (similar to the BashOperator) which ensures that it get > propagated to subprocess in order to leverage the hook connection information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2114) Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bodley updated AIRFLOW-2114: - Summary: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator (was: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline with the BashOperator) > Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator > > > Key: AIRFLOW-2114 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2114 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Reporter: John Bodley >Assignee: John Bodley >Priority: Minor > > The subprocess used by the PythonVirtualenvOperator should be passed the > AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get > propagated to subprocess in order to leverage the hook connection information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2114) Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline with the BashOperator
John Bodley created AIRFLOW-2114: Summary: Add AIRFLOW_HOME_VAR to PythonVirtualenvOperator inline with the BashOperator Key: AIRFLOW-2114 URL: https://issues.apache.org/jira/browse/AIRFLOW-2114 Project: Apache Airflow Issue Type: Bug Components: operators Reporter: John Bodley Assignee: John Bodley The subprocess used by the PythonVirtualenvOperator should be passed the AIRFLOW_HOME_VAR (similar to the BashOperator) which ensures that it get propagated to subprocess in order to leverage the hook connection information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-83) Add MongoDB hook and operator
[ https://issues.apache.org/jira/browse/AIRFLOW-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366285#comment-16366285 ] Andy Cooper commented on AIRFLOW-83: [~simasim] I have created the PR and the code - waiting for a response on CI/CD issues. https://github.com/apache/incubator-airflow/pull/2962 > Add MongoDB hook and operator > - > > Key: AIRFLOW-83 > URL: https://issues.apache.org/jira/browse/AIRFLOW-83 > Project: Apache Airflow > Issue Type: Wish > Components: contrib, hooks, operators >Reporter: vishal srivastava >Assignee: Ajay Yadava >Priority: Minor > > A mongodb hook and operator will be really useful for people who use airflow > with the mongo database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2103) Authentication using password_auth backend prevents webserver from running
[ https://issues.apache.org/jira/browse/AIRFLOW-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366282#comment-16366282 ] Marcos Bernardelli commented on AIRFLOW-2103: - PR : https://github.com/apache/incubator-airflow/pull/3048 > Authentication using password_auth backend prevents webserver from running > -- > > Key: AIRFLOW-2103 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2103 > Project: Apache Airflow > Issue Type: Bug >Reporter: Mark Ward >Assignee: Kaxil Naik >Priority: Critical > > airflow webserver fails to run with config > {code:java} > [webserver] > authenticate = True > auth_backend = airflow.contrib.auth.backends.password_auth > {code} > and errors out with following traceback > {code:java} > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 4, in > > __import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating', > 'airflow') > File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", > line 750, in run_script > self.require(requires)[0].run_script(script_name, ns) > File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", > line 1527, in run_script > exec(code, namespace, namespace) > File > "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow", > line 27, in > args.func(args) > File > "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py", > line 696, in webserver > app = cached_app(conf) > File > "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/app.py", > line 176, in cached_app > app = create_app(config, testing) > File > "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/app.py", > line 63, in create_app > from airflow.www import views > File > "/usr/local/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/www/views.py", > line 97, in > login_required = airflow.login.login_required > AttributeError: module 'airflow.contrib.auth.backends.password_auth' has no > attribute 'login_required' > {code} > Broke at [https://github.com/apache/incubator-airflow/pull/2730] with the > changing of > {code:java} > from flask_login import login_required, current_user, logout_user > {code} > to > {code:java} > from flask_login import current_user > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2113) Address missing DagRuns callbacks
[ https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Ma updated AIRFLOW-2113: - Summary: Address missing DagRuns callbacks (was: DagRuns does not complete callbacks) > Address missing DagRuns callbacks > - > > Key: AIRFLOW-2113 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2113 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alan Ma >Assignee: Alan Ma >Priority: Critical > > This originally arose from the missing notification from the on_failure and > on_success callback at the dag level. The stack trace is as follows: > {code:java} > [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - > Executing dag callback function: > .GeneralNotifyFailed instance at 0x7fec9d8ad368> > [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling > up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 346, in helper > pickle_dags) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1586, in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1175, in _process_dags > dag_run = self.create_dag_run(dag) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 747, in create_dag_run > dag.handle_callback(dr, success=False, reason='dagrun_timeout', > session=session) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py", > line 2990, in handle_callback > d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 579, in get > value = state._load_expired(state, passive) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", > line 592, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", > line 644, in load_scalar_attributes > (state_str(state))) > DetachedInstanceError: Instance is not bound to a > Session; attribute refresh operation cannot proceed > [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started > process (PID=7813) to work on > /home/charon/.virtualenvs/airflow/airflow_home/dags/c > haron-airflow/dags/inapp_vendor_sku_breakdown.py\ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2113) Address missing DagRun callbacks
[ https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Ma updated AIRFLOW-2113: - Summary: Address missing DagRun callbacks (was: Address missing DagRuns callbacks) > Address missing DagRun callbacks > > > Key: AIRFLOW-2113 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2113 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alan Ma >Assignee: Alan Ma >Priority: Critical > > This originally arose from the missing notification from the on_failure and > on_success callback at the dag level. The stack trace is as follows: > {code:java} > [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - > Executing dag callback function: > .GeneralNotifyFailed instance at 0x7fec9d8ad368> > [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling > up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 346, in helper > pickle_dags) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1586, in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1175, in _process_dags > dag_run = self.create_dag_run(dag) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 747, in create_dag_run > dag.handle_callback(dr, success=False, reason='dagrun_timeout', > session=session) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py", > line 2990, in handle_callback > d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 579, in get > value = state._load_expired(state, passive) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", > line 592, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", > line 644, in load_scalar_attributes > (state_str(state))) > DetachedInstanceError: Instance is not bound to a > Session; attribute refresh operation cannot proceed > [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started > process (PID=7813) to work on > /home/charon/.virtualenvs/airflow/airflow_home/dags/c > haron-airflow/dags/inapp_vendor_sku_breakdown.py\ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2113) DagRuns sometimes does not complete callbacks
[ https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366251#comment-16366251 ] Alan Ma commented on AIRFLOW-2113: -- Given that the *handle_callback* method belongs to the DAG object, we are able to get the list of task directly with *get_task* instead of communicating with the database. > DagRuns sometimes does not complete callbacks > - > > Key: AIRFLOW-2113 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2113 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alan Ma >Assignee: Alan Ma >Priority: Critical > > This originally arose from the missing notification from the on_failure and > on_success callback at the dag level. The stack trace is as follows: > {code:java} > [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - > Executing dag callback function: > .GeneralNotifyFailed instance at 0x7fec9d8ad368> > [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling > up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 346, in helper > pickle_dags) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1586, in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1175, in _process_dags > dag_run = self.create_dag_run(dag) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 747, in create_dag_run > dag.handle_callback(dr, success=False, reason='dagrun_timeout', > session=session) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py", > line 2990, in handle_callback > d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 579, in get > value = state._load_expired(state, passive) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", > line 592, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", > line 644, in load_scalar_attributes > (state_str(state))) > DetachedInstanceError: Instance is not bound to a > Session; attribute refresh operation cannot proceed > [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started > process (PID=7813) to work on > /home/charon/.virtualenvs/airflow/airflow_home/dags/c > haron-airflow/dags/inapp_vendor_sku_breakdown.py\ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2113) DagRuns sometimes does not complete callbacks
[ https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Ma updated AIRFLOW-2113: - Summary: DagRuns sometimes does not complete callbacks (was: DagRuns sometimes does not execute callbacks) > DagRuns sometimes does not complete callbacks > - > > Key: AIRFLOW-2113 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2113 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alan Ma >Assignee: Alan Ma >Priority: Critical > > This originally arose from the missing notification from the on_failure and > on_success callback at the dag level. The stack trace is as follows: > {code} > [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - > Executing dag callback function: > .GeneralNotifyFailed instance at 0x7fec9d8ad368> > [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling > up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > Dag: , paused: False > [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 346, in helper > pickle_dags) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1586, in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 1175, in _process_dags > dag_run = self.create_dag_run(dag) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", > line 747, in create_dag_run > dag.handle_callback(dr, success=False, reason='dagrun_timeout', > session=session) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", > line 53, in wrapper > result = func(*args, **kwargs) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py", > line 2990, in handle_callback > d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", > line 579, in get > value = state._load_expired(state, passive) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", > line 592, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File > "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", > line 644, in load_scalar_attributes > (state_str(state))) > DetachedInstanceError: Instance is not bound to a > Session; attribute refresh operation cannot proceed > [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started > process (PID=7813) to work on > /home/charon/.virtualenvs/airflow/airflow_home/dags/c > haron-airflow/dags/inapp_vendor_sku_breakdown.py\ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2113) DagRuns sometimes does not execute callbacks
Alan Ma created AIRFLOW-2113: Summary: DagRuns sometimes does not execute callbacks Key: AIRFLOW-2113 URL: https://issues.apache.org/jira/browse/AIRFLOW-2113 Project: Apache Airflow Issue Type: Bug Reporter: Alan Ma Assignee: Alan Ma This originally arose from the missing notification from the on_failure and on_success callback at the dag level. The stack trace is as follows: {code:java} [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - Executing dag callback function: [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags Dag: , paused: False Dag: , paused: False Dag: , paused: False Dag: , paused: False Dag: , paused: False [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an exception! Propagating... Traceback (most recent call last): File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", line 346, in helper pickle_dags) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", line 1586, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", line 1175, in _process_dags dag_run = self.create_dag_run(dag) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py", line 747, in create_dag_run dag.handle_callback(dr, success=False, reason='dagrun_timeout', session=session) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py", line 2990, in handle_callback d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__ return self.impl.get(instance_state(instance), dict_) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 579, in get value = state._load_expired(state, passive) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 592, in _load_expired self.manager.deferred_scalar_loader(self, toload) File "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 644, in load_scalar_attributes (state_str(state))) DetachedInstanceError: Instance is not bound to a Session; attribute refresh operation cannot proceed [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started process (PID=7813) to work on /home/charon/.virtualenvs/airflow/airflow_home/dags/c haron-airflow/dags/inapp_vendor_sku_breakdown.py\{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2112) Recent Tasks UI cell does not show all items
Diego Rabatone Oliveira created AIRFLOW-2112: Summary: Recent Tasks UI cell does not show all items Key: AIRFLOW-2112 URL: https://issues.apache.org/jira/browse/AIRFLOW-2112 Project: Apache Airflow Issue Type: Bug Components: ui Affects Versions: 1.9.1 Reporter: Diego Rabatone Oliveira Assignee: Diego Rabatone Oliveira Fix For: 1.9.1 Attachments: 2018-02-15-182303_401x103_scrot.png Recent Tasks cell on UI shows only some of the items from the SVG file within it. !2018-02-15-182303_401x103_scrot.png! This is because the SVG has its width hardcoded in 180px. As we expect to have 8 items, with around 25-27px each, changin the width to 240px would solve the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2111) Add the ability to pause/unpause a task within a DAG
Mark Reid created AIRFLOW-2111: -- Summary: Add the ability to pause/unpause a task within a DAG Key: AIRFLOW-2111 URL: https://issues.apache.org/jira/browse/AIRFLOW-2111 Project: Apache Airflow Issue Type: New Feature Components: scheduler, ui Reporter: Mark Reid It would be convenient to be able to pause/unpause a task within a DAG (rather than the entire DAG). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-835) SMTP Mail delivery fails with server using CRAM-MD5 auth
[ https://issues.apache.org/jira/browse/AIRFLOW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366192#comment-16366192 ] James Davidheiser commented on AIRFLOW-835: --- I ran into this in 1.9.0 as well. I solved it by creating a copy of email.py, with the username and password cast as Python 2 strings, and referencing that in airflow.cfg's email_backend. > SMTP Mail delivery fails with server using CRAM-MD5 auth > > > Key: AIRFLOW-835 > URL: https://issues.apache.org/jira/browse/AIRFLOW-835 > Project: Apache Airflow > Issue Type: Bug > Components: utils >Affects Versions: Airflow 1.7.1 > Environment: https://hub.docker.com/_/python/ (debian:jessie + > python2.7 in docker) >Reporter: Joseph Harris >Priority: Minor > > Traceback when sending email from smtp-server configured to offer CRAM-MD5 > (in all cases, tls included). This occurs because the configuration module > returns the password as a futures.types.newstr, instead of a plain str (see > below for gory details of why this breaks). > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 43, in send_email > return backend(to, subject, html_content, files=files, dryrun=dryrun) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 79, in send_email_smtp > send_MIME_email(SMTP_MAIL_FROM, to, msg, dryrun) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 95, in send_MIME_email > s.login(SMTP_USER, SMTP_PASSWORD) > File "/usr/local/lib/python2.7/smtplib.py", line 607, in login > (code, resp) = self.docmd(encode_cram_md5(resp, user, password)) > File "/usr/local/lib/python2.7/smtplib.py", line 571, in encode_cram_md5 > response = user + " " + hmac.HMAC(password, challenge).hexdigest() > File "/usr/local/lib/python2.7/hmac.py", line 75, in __init__ > self.outer.update(key.translate(trans_5C)) > File "/usr/local/lib/python2.7/site-packages/future/types/newstr.py", line > 390, in translate > if ord(c) in table: > TypeError: 'in ' requires string as left operand, not int > SMTP configs: > [email] > email_backend = airflow.utils.email.send_email_smtp > [smtp] > smtp_host = {a_smtp_server} > smtp_port = 587 > smtp_starttls = True > smtp_ssl = False > smtp_user = {a_username} > smtp_password = {a_password} > smtp_mail_from = {a_email_addr} > *Gory details > If the server offers CRAM-MD5, smptlib prefers this by default, and will try > to use hmac.HMAC to hash the password: > https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l602 > https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l571 > But if the password is a newstr, newstr.translate expects a dict mapping > instead of str, and raises an exception. > https://hg.python.org/cpython/file/2.7/Lib/hmac.py#l75 > All of this occurs after a successful SMTP.ehlo(), so it's probably not crap > container networking > Could be resolved by passing the smtp password as a futures.types.newbytes, > as this behaves as expected: > from future.types import newstr, newbytes > import hmac > # Make str / newstr types > test = 'a_string' > test_newstr = newstr(test) > test_newbytes = newbytes(test) > msg = 'future problems' > # Test 1 - Try to do a HMAC: > # fine > hmac.HMAC(test, msg) > # fails horribly > hmac.HMAC(test_newstr, msg) > # is completely fine > hmac.HMAC(test_newbytes, msg) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2058) Scheduler uses MainThread for DAG file processing
[ https://issues.apache.org/jira/browse/AIRFLOW-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2058: -- Assignee: Yang Pan > Scheduler uses MainThread for DAG file processing > - > > Key: AIRFLOW-2058 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2058 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.9.0 > Environment: Ubuntu, Airflow 1.9, Sequential executor >Reporter: Yang Pan >Assignee: Yang Pan >Priority: Blocker > > By reading the [source code > |https://github.com/apache/incubator-airflow/blob/61ff29e578d1121ab4606fe122fb4e2db8f075b9/airflow/utils/dag_processing.py#L538] > it appears the scheduler will process each DAG file, either a .py or .zip, > using a new process. > > If I understand correctly, in theory what should happen in terms of > processing a .zip file is that the dedicated process will add the .zip file > to the PYTHONPATH, and load the file's module and dependency. When the DAG > read is done, the process gets destroyed. And since the PYTHONPATH is process > scoped, it won't pollute other processes. > > However by printing out the threads and process id, it looks like Airflow > scheduler can sometimes accidentally pick up the main process instead of > creating a new one, and that's when collision happens. > > Here is snippet of the PYTHONPATH when advanced_dag_dependency-1.zip is being > processed. As you can see when it's executed by MainThread, it contains other > .zip files. When it's using dedicated thread, only required .zip is added. > > sys.path :['/root/airflow/dags/yang_subdag_2.zip', > '/root/airflow/dags/yang_subdag_2.zip', > '/root/airflow/dags/yang_subdag_1.zip', > '/root/airflow/dags/yang_subdag_1.zip', > '/root/airflow/dags/advanced_dag_dependency-2.zip', > '/root/airflow/dags/advanced_dag_dependency-2.zip', > '/root/airflow/dags/advanced_dag_dependency-1.zip', > '/root/airflow/dags/advanced_dag_dependency-1.zip', > '/root/airflow/dags/yang_subdag_1', '/usr/local/bin', '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', > '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', > '/root/airflow/dags', '/root/airflow/plugins'] > Print from MyFirstOperator in Dag 1 > process id: 5059 > thread id: <_MainThread(*MainThread*, started 140339858560768)> > > sys.path :[u'/root/airflow/dags/advanced_dag_dependency-1.zip', > '/usr/local/bin', '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', > '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages/PILcompat', '/root/airflow/config', > '/root/airflow/dags', '/root/airflow/plugins'] > Print from MyFirstOperator in Dag 1 > process id: 5076 > thread id: <_MainThread(*DagFileProcessor283*, started 140137838294784)> -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2108) BashOperator discards process indentation
[ https://issues.apache.org/jira/browse/AIRFLOW-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik reassigned AIRFLOW-2108: --- Assignee: Kaxil Naik > BashOperator discards process indentation > - > > Key: AIRFLOW-2108 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2108 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.9.0 >Reporter: Chris Bandy >Assignee: Kaxil Naik >Priority: Minor > > When the BashOperator logs every line of output from the executing process, > it strips leading whitespace which makes it difficult to interpret output > that was formatted with indentation. > For example, I'm executing [PGLoader|http://pgloader.readthedocs.io/] through > this operator. When it finishes, it prints a summary which appears in the > logs like so: > {noformat} > [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - > 2018-02-14T07:31:44.115000Z LOG report summary reset > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table name errors > read imported bytes total time read write > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - fetch meta data >0524524 1.438s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Schemas > 0 0 0 0.161s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL Types > 0 19 1920.413s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create tables > 0310310 3m2.316s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table OIDs > 0155155 0.458s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Index Build > Completion 0353353 1m37.323s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Indexes > 0353353 3m25.929s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Reset Sequences >0 0 0 2.677s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Primary Keys > 0147147 1m21.091s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Foreign Keys >0 16 16 8.283s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create Triggers >0 0 0 0.339s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Install Comments > 0 0 0 0.000s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Total import time > ✓ 0 0 6m35.642s > {noformat} > Ideally, the leading whitespace would be retained, so the logs look like this: > {noformat} > [2018-02-14 07:31:44,524] {bash_operator.py:101} INFO - > 2018-02-14T07:31:44.115000Z LOG report summary reset > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - table > name errors read imported bytes total time read >write > [2018-02-14 07:31:44,564] {bash_operator.py:101} INFO - > -- - - - - > -- - - > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO -fetch meta > data 0524524 1.438s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create > Schemas 0 0 0 0.161s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create SQL > Types 0 19 1920.413s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Create > tables 0310310 3m2.316s > [2018-02-14 07:31:44,567] {bash_operator.py:101} INFO - Set Table > OIDs
[jira] [Work started] (AIRFLOW-2110) Enhance Http Hook to use a header in passed in the "extra" argument and add tenacity retry
[ https://issues.apache.org/jira/browse/AIRFLOW-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2110 started by Alberto Calderari. -- > Enhance Http Hook to use a header in passed in the "extra" argument and add > tenacity retry > -- > > Key: AIRFLOW-2110 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2110 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Affects Versions: Airflow 1.8 >Reporter: Alberto Calderari >Assignee: Alberto Calderari >Priority: Minor > Fix For: Airflow 2.0 > > > Add possibility to add a json header in the "extra" field in the connection: > {"Authorization": "Bearer Here1sMyT0k3N"} > Also add tenacity retry so the operator won't fall over in case of a bad > handshake. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2110) Enhance Http Hook to use a header in passed in the "extra" argument and add tenacity retry
Alberto Calderari created AIRFLOW-2110: -- Summary: Enhance Http Hook to use a header in passed in the "extra" argument and add tenacity retry Key: AIRFLOW-2110 URL: https://issues.apache.org/jira/browse/AIRFLOW-2110 Project: Apache Airflow Issue Type: Improvement Components: hooks Affects Versions: Airflow 1.8 Reporter: Alberto Calderari Assignee: Alberto Calderari Fix For: Airflow 2.0 Add possibility to add a json header in the "extra" field in the connection: {"Authorization": "Bearer Here1sMyT0k3N"} Also add tenacity retry so the operator won't fall over in case of a bad handshake. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (AIRFLOW-2107) add time_partitioning to run_query
[ https://issues.apache.org/jira/browse/AIRFLOW-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2107 started by Ben Marengo. > add time_partitioning to run_query > -- > > Key: AIRFLOW-2107 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2107 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp, hooks, operators >Reporter: Ben Marengo >Assignee: Ben Marengo >Priority: Minor > > google have added a time partitioning field to the query part of their jobs > api > [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.timePartitioning] > we should mirror that in airflow (hook and operator) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-83) Add MongoDB hook and operator
[ https://issues.apache.org/jira/browse/AIRFLOW-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365498#comment-16365498 ] Simon Dubois commented on AIRFLOW-83: - Hey [~andscoop]. Any news of this ? > Add MongoDB hook and operator > - > > Key: AIRFLOW-83 > URL: https://issues.apache.org/jira/browse/AIRFLOW-83 > Project: Apache Airflow > Issue Type: Wish > Components: contrib, hooks, operators >Reporter: vishal srivastava >Assignee: Ajay Yadava >Priority: Minor > > A mongodb hook and operator will be really useful for people who use airflow > with the mongo database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly
[ https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365286#comment-16365286 ] rahul commented on AIRFLOW-2109: And we are using MySQL rds as airflow metstore db > scheduler stopped picking up jobs suddenly > -- > > Key: AIRFLOW-2109 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2109 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 > Environment: aws EC2 server >Reporter: rahul >Assignee: rahul >Priority: Major > > Hi, > > The scheduler stopped picking up jobs suddenly. when i resetdb it started > picking up properly. > > could you please let me know the reason why airflow resetdb resolved this > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly
[ https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365285#comment-16365285 ] rahul commented on AIRFLOW-2109: We are using local executor and airflow is configured on AWS ec2 server. Airflow resetdb resolved the issue . I just want to know why resetdb resolved the issue and what might be the root cause? Thanks in advance .! > scheduler stopped picking up jobs suddenly > -- > > Key: AIRFLOW-2109 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2109 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 > Environment: aws EC2 server >Reporter: rahul >Assignee: rahul >Priority: Major > > Hi, > > The scheduler stopped picking up jobs suddenly. when i resetdb it started > picking up properly. > > could you please let me know the reason why airflow resetdb resolved this > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2109) scheduler stopped picking up jobs suddenly
[ https://issues.apache.org/jira/browse/AIRFLOW-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365283#comment-16365283 ] Fokko Driesprong commented on AIRFLOW-2109: --- We need much more information to give some sensible feedback. Do you have logs and what is your environment. What kind of executor are you using? > scheduler stopped picking up jobs suddenly > -- > > Key: AIRFLOW-2109 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2109 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 > Environment: aws EC2 server >Reporter: rahul >Assignee: rahul >Priority: Major > > Hi, > > The scheduler stopped picking up jobs suddenly. when i resetdb it started > picking up properly. > > could you please let me know the reason why airflow resetdb resolved this > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2065) Worker logging can raise FileExistsError when more than one process execute concurrently
[ https://issues.apache.org/jira/browse/AIRFLOW-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365272#comment-16365272 ] Sébastien Brochet commented on AIRFLOW-2065: I've opened a new PR with my fix: https://github.com/apache/incubator-airflow/pull/3040 Cheers, > Worker logging can raise FileExistsError when more than one process execute > concurrently > > > Key: AIRFLOW-2065 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2065 > Project: Apache Airflow > Issue Type: Bug > Components: executor, logging >Affects Versions: 1.9.0 >Reporter: Sébastien Brochet >Priority: Critical > > Hello, > > We started observing random failing during the execution of our dags after > upgrading to 1.9.0. After careful debugging, we noticing the following > exception in the worker logs: > {noformat} > Traceback (most recent call last): > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 558, in configure > handler = self.configure_handler(handlers[name]) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 731, in configure_handler > result = factory(**kwargs) > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", > line 48, in __init__ > os.makedirs(self._get_log_directory()) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/os.py", line 220, > in makedirs > mkdir(name, mode) > FileExistsError: [Errno 17] File exists: > '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/projects/airflow-hadoop/anaconda3/bin/airflow", line 16, in > from airflow import configuration > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/__init__.py", > line 31, in > from airflow import settings > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/settings.py", > line 148, in > configure_logging() > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py", > line 75, in configure_logging > raise e > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py", > line 70, in configure_logging > dictConfig(logging_config) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 795, in dictConfig > dictConfigClass(config).configure() > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 566, in configure > '%r: %s' % (name, e)) > ValueError: Unable to configure handler 'file.processor': [Errno 17] File > exists: '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05 > {noformat} > > As you can see, an exception is raised when trying to create the directory > where to store the executor logs. This can happen if two tasks are scheduled > are the exact same time on the same worker. It appears to be the case here : > > {noformat} > [2018-02-05 02:10:07,886] \{celery_executor.py:50} INFO - Executing command > in Celery: airflow run pairing_sensor_check 2018-02-04T02:10:00 --local > --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py > [2018-02-05 02:10:07,908] \{celery_executor.py:50} INFO - Executing command > in Celery: airflow run yyy pairing_sensor_check 2018-02-04T02:10:00 --local > --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py > {noformat} > Culprits is here: > [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/log/file_processor_handler.py#L47-L48] > (not fixed in master) > A simple fix would be to wrap the {{makedirs}} command into a {{try}} / > {{catch}} block. > > Thanks, > > Sébastien -- This message was sent by Atlassian JIRA (v7.6.3#76005)