[jira] [Commented] (AIRFLOW-6981) Move Google Cloud Build from Discovery API to Python Library
[ https://issues.apache.org/jira/browse/AIRFLOW-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130084#comment-17130084 ] ASF GitHub Bot commented on AIRFLOW-6981: - ryanyuan commented on pull request #8575: URL: https://github.com/apache/airflow/pull/8575#issuecomment-641730079 Thanks @mik-laj, I've applied your patch. For the exception you had, it seems you ran the example dag but it failed in the midway and you re-ran it so that it said the trigger already existed. I tried to add a try catch to return the trigger if it already exists but I am unable to find a good way to get the trigger id, which will be used in CloudBuildHook.get_build_trigger(). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Move Google Cloud Build from Discovery API to Python Library > > > Key: AIRFLOW-6981 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6981 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 2.0.0 >Reporter: Ryan Yuan >Assignee: Ryan Yuan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ryanyuan commented on pull request #8575: [AIRFLOW-6981] Move Google Cloud Build from Discovery API to Python Library
ryanyuan commented on pull request #8575: URL: https://github.com/apache/airflow/pull/8575#issuecomment-641730079 Thanks @mik-laj, I've applied your patch. For the exception you had, it seems you ran the example dag but it failed in the midway and you re-ran it so that it said the trigger already existed. I tried to add a try catch to return the trigger if it already exists but I am unable to find a good way to get the trigger id, which will be used in CloudBuildHook.get_build_trigger(). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] shanit-saha commented on issue #9197: dagrun_operator in Airflow Version 1.10.10 ERRORS _run_raw_task result = task_copy.execute(context=context)
shanit-saha commented on issue #9197: URL: https://github.com/apache/airflow/issues/9197#issuecomment-641702409 Please advise as to what could be the work around. The DAG code perfectly works fine in our production with `Airflow1.10.2` , until now that when we are in the urgent need to migrate to `Airflow 1.10.10` this peculiar error is popping up. CC: @ashb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] syucream commented on pull request #8550: Add DataflowStartFlexTemplateOperator
syucream commented on pull request #8550: URL: https://github.com/apache/airflow/pull/8550#issuecomment-641689350 What's the current status? Do you have any progress? I want this patch, Can I help you anything? @mik-laj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] tag nightly-master updated (7fd3695 -> 82c8343)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to tag nightly-master in repository https://gitbox.apache.org/repos/asf/airflow.git. *** WARNING: tag nightly-master was modified! *** from 7fd3695 (commit) to 82c8343 (commit) from 7fd3695 Don't use the `|safe` filter in code, it's risky (#9180) add 337a2dc Fixes failure of the build scripts when remote repo does not exist (#9188) add de9d340 Improved cloud tool available in the trimmed down CI container (#9167) add b1c8c5e Allows using private endpoints in GKEStartPodOperator (#9169) add d8e5490 Update AWS connection example to show how to set from env var (#9191) add b762763 Query TaskReschedule only if task is UP_FOR_RESCHEDULE (#9087) add 7ad827f Set conn_type as not-nullable (#9187) add efb86df Call super.tearDown in SystemTest tearDown (#9196) add 6d4972a Remove httplib2 from Google requirements (#9194) add c18f4c0 Fix typo in BREEZE.rst (#9199) add 82c8343 Support additional apt dependencies (#9189) No new revisions were added by this update. Summary of changes: BREEZE.rst | 5 +- Dockerfile | 12 +++- Dockerfile.ci | 31 --- IMAGES.rst | 44 +++ UPDATING.md| 8 +++ ... 8f966b9c467a_set_conn_type_as_non_nullable.py} | 21 --- airflow/models/connection.py | 2 +- airflow/models/taskinstance.py | 10 ++-- airflow/models/taskreschedule.py | 31 --- .../google/cloud/operators/kubernetes_engine.py| 36 +--- airflow/ti_deps/deps/ready_to_reschedule.py| 10 +++- breeze | 1 + docs/concepts.rst | 2 +- docs/howto/connection/aws.rst | 26 - docs/howto/operator/gcp/kubernetes_engine.rst | 18 +- requirements/requirements-python3.6.txt| 56 +-- requirements/requirements-python3.7.txt| 53 +- requirements/requirements-python3.8.txt| 55 +-- requirements/setup-3.6.md5 | 2 +- requirements/setup-3.7.md5 | 2 +- requirements/setup-3.8.md5 | 2 +- scripts/ci/ci_docs.sh | 3 + scripts/ci/ci_fix_ownership.sh | 3 + scripts/ci/ci_flake8.sh| 6 ++ scripts/ci/ci_mypy.sh | 3 + scripts/ci/ci_pylint_main.sh | 6 ++ scripts/ci/ci_pylint_tests.sh | 6 ++ scripts/ci/ci_refresh_pylint_todo.sh | 3 + ...ci_test_backport_packages_import_all_classes.sh | 3 + ...ci_test_backport_packages_install_separately.sh | 3 + scripts/ci/docker-compose/forward-credentials.yml | 2 + scripts/ci/docker-compose/local-prod.yml | 2 + scripts/ci/docker-compose/local.yml| 2 + scripts/ci/in_container/_in_container_utils.sh | 16 -- scripts/ci/libraries/_build_images.sh | 2 +- scripts/ci/libraries/_initialization.sh| 7 +++ scripts/ci/libraries/_runs.sh | 12 scripts/ci/prepare_tool_scripts.sh | 64 ++ setup.py | 1 - tests/jobs/test_local_task_job.py | 2 +- .../cloud/operators/test_kubernetes_engine.py | 49 - tests/test_utils/system_tests_class.py | 2 +- tests/ti_deps/deps/test_ready_to_reschedule_dep.py | 40 +++--- 43 files changed, 485 insertions(+), 179 deletions(-) copy airflow/migrations/versions/{5e7d17757c7a_add_pid_field_to_taskinstance.py => 8f966b9c467a_set_conn_type_as_non_nullable.py} (61%) create mode 100755 scripts/ci/prepare_tool_scripts.sh
[GitHub] [airflow] thesuperzapper commented on pull request #9133: Include missing RBAC roles
thesuperzapper commented on pull request #9133: URL: https://github.com/apache/airflow/pull/9133#issuecomment-641659270 @ashb, I have made the change requested, is this PR now acceptable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on pull request #9170: [WIP] Read only endpoint for XCom #8134
mik-laj commented on pull request #9170: URL: https://github.com/apache/airflow/pull/9170#issuecomment-641653054 @ashb It looks good, but I don't know if it suits our code style. We use explicit joins instead of relying on ORM generated more often. Generating joints by ORM together with the use of multi-column primary keys can lead to spaghetti. Each model will have a relationship with each model because many values are repeated. We only use ORM-based relationships in one place - DagModel.tags It fits here because we can easily retrieve the object(DagModel) and metadata. (DagTag). In other places, we need careful control to ensure good performance. I also prepared an example. https://github.com/apache/airflow/pull/9170/commits/16f829927859604336013dd7a9755fcf7000f786 ```python query = session.query(XCom) query = query.filter(and_(XCom.dag_id == dag_id, XCom.task_id == task_id, XCom.key == xcom_key)) query = query.join(DR, and_(XCom.dag_id == DR.dag_id, XCom.execution_date == DR.execution_date)) query = query.filter(DR.run_id == dag_run_id) q_object = query.one_or_none() ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] casassg commented on issue #8058: [AIP-31] Add lazy way to get Airflow context from a wrapped function
casassg commented on issue #8058: URL: https://github.com/apache/airflow/issues/8058#issuecomment-641650049 Any updates on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil commented on pull request #8186: Add support for AWS Secrets Manager as Secrets Backend
kaxil commented on pull request #8186: URL: https://github.com/apache/airflow/pull/8186#issuecomment-641627856 > @kaxil Just curious if the choice to not support encrypted parameters was just to simplify the implementation or if there's another motivation? Yes it was just to simplify the implementation and to just get the basic functionality in :) But now that the scaffolding is done and basic features are working I would be happy to accept PRs to support more features. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] robmoore commented on pull request #8186: Add support for AWS Secrets Manager as Secrets Backend
robmoore commented on pull request #8186: URL: https://github.com/apache/airflow/pull/8186#issuecomment-641627087 @kaxil Just curious if the choice to not support encrypted parameters was just to simplify the implementation or if there's another motivation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] casassg commented on a change in pull request #8962: [AIRFLOW-8057] [AIP-31] Add @task decorator
casassg commented on a change in pull request #8962: URL: https://github.com/apache/airflow/pull/8962#discussion_r437767677 ## File path: airflow/operators/python.py ## @@ -145,6 +149,131 @@ def execute_callable(self): return self.python_callable(*self.op_args, **self.op_kwargs) +class _PythonFunctionalOperator(BaseOperator): +""" +Wraps a Python callable and captures args/kwargs when called for execution. + +:param python_callable: A reference to an object that is callable +:type python_callable: python callable +:param op_kwargs: a dictionary of keyword arguments that will get unpacked +in your function +:type op_kwargs: dict (templated) +:param op_args: a list of positional arguments that will get unpacked when +calling your callable +:type op_args: list (templated) +:param multiple_outputs: if set, function return value will be +unrolled to multiple XCom values. Dict will unroll to xcom values with keys as keys. +Defaults to False. +:type multiple_outputs: bool +""" + +template_fields = ('op_args', 'op_kwargs') +ui_color = '#ffefeb' + +# since we won't mutate the arguments, we should just do the shallow copy +# there are some cases we can't deepcopy the objects(e.g protobuf). +shallow_copy_attrs = ('python_callable',) + +@apply_defaults +def __init__( +self, +python_callable: Callable, +op_args: Tuple[Any], +op_kwargs: Dict[str, Any], +multiple_outputs: bool = False, +**kwargs +) -> None: +kwargs['task_id'] = self._get_unique_task_id(kwargs['task_id'], kwargs.get('dag', None)) +super().__init__(**kwargs) +self.python_callable = python_callable + +# Check that arguments can be binded +signature(python_callable).bind(*op_args, **op_kwargs) +self.multiple_outputs = multiple_outputs +self.op_args = op_args +self.op_kwargs = op_kwargs + +@staticmethod +def _get_unique_task_id(task_id: str, dag: Optional[DAG]) -> str: +dag = dag or DagContext.get_current_dag() +if not dag or task_id not in dag.task_ids: +return task_id +core = re.split(r'__\d+$', task_id)[0] +suffixes = sorted( +[int(re.split(r'^.+__', task_id)[1]) + for task_id in dag.task_ids + if re.match(rf'^{core}__\d+$', task_id)] +) +if not suffixes: +return f'{core}__1' +return f'{core}__{suffixes[-1] + 1}' + +@staticmethod +def validate_python_callable(python_callable): +"""Validate that python callable can be wrapped by operator. +Raises exception if invalid. + +:param python_callable: Python object to be validated +:raises: TypeError, AirflowException +""" +if not callable(python_callable): +raise TypeError('`python_callable` param must be callable') +if 'self' in signature(python_callable).parameters.keys(): +raise AirflowException('@task does not support methods') Review comment: It will fail as `@classmethod` does not return a callable it seems (surprising to me as well, but it fails on checking if python_callable is a callable). ## File path: airflow/operators/python.py ## @@ -145,6 +149,131 @@ def execute_callable(self): return self.python_callable(*self.op_args, **self.op_kwargs) +class _PythonFunctionalOperator(BaseOperator): +""" +Wraps a Python callable and captures args/kwargs when called for execution. + +:param python_callable: A reference to an object that is callable +:type python_callable: python callable +:param op_kwargs: a dictionary of keyword arguments that will get unpacked +in your function +:type op_kwargs: dict (templated) +:param op_args: a list of positional arguments that will get unpacked when +calling your callable +:type op_args: list (templated) +:param multiple_outputs: if set, function return value will be +unrolled to multiple XCom values. Dict will unroll to xcom values with keys as keys. +Defaults to False. +:type multiple_outputs: bool +""" + +template_fields = ('op_args', 'op_kwargs') +ui_color = '#ffefeb' + +# since we won't mutate the arguments, we should just do the shallow copy +# there are some cases we can't deepcopy the objects(e.g protobuf). +shallow_copy_attrs = ('python_callable',) + +@apply_defaults +def __init__( +self, +python_callable: Callable, +op_args: Tuple[Any], +op_kwargs: Dict[str, Any], +multiple_outputs: bool = False, +**kwargs +) -> None: +kwargs['task_id'] = self._get_unique_task_id(kwargs['task_id'], kwargs.get('dag', None)) +super().__init__(**kwargs) Review comment: Nvm, seems this proposed change fails
[GitHub] [airflow] jkbngl closed pull request #9201: Added MSSql To Oracle Transfer Operator
jkbngl closed pull request #9201: URL: https://github.com/apache/airflow/pull/9201 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3391) Upgrade pendulum to latest major version
[ https://issues.apache.org/jira/browse/AIRFLOW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129838#comment-17129838 ] ASF GitHub Bot commented on AIRFLOW-3391: - crhyatt closed pull request #7626: URL: https://github.com/apache/airflow/pull/7626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade pendulum to latest major version > > > Key: AIRFLOW-3391 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3391 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joshua Carp >Assignee: Bolke de Bruin >Priority: Trivial > > We're a major version behind on pendulum; we should upgrade to >=2.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3391) Upgrade pendulum to latest major version
[ https://issues.apache.org/jira/browse/AIRFLOW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129839#comment-17129839 ] ASF GitHub Bot commented on AIRFLOW-3391: - crhyatt commented on pull request #7626: URL: https://github.com/apache/airflow/pull/7626#issuecomment-641607055 @ashb I messed something up with a merge. I am going to close this PR and just use https://github.com/apache/airflow/pull/9184 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade pendulum to latest major version > > > Key: AIRFLOW-3391 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3391 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joshua Carp >Assignee: Bolke de Bruin >Priority: Trivial > > We're a major version behind on pendulum; we should upgrade to >=2.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] crhyatt closed pull request #7626: [AIRFLOW-3391] Upgrade pendulum to latest major version
crhyatt closed pull request #7626: URL: https://github.com/apache/airflow/pull/7626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] crhyatt commented on pull request #7626: [AIRFLOW-3391] Upgrade pendulum to latest major version
crhyatt commented on pull request #7626: URL: https://github.com/apache/airflow/pull/7626#issuecomment-641607055 @ashb I messed something up with a merge. I am going to close this PR and just use https://github.com/apache/airflow/pull/9184 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129818#comment-17129818 ] ASF GitHub Bot commented on AIRFLOW-3973: - eeshugerman commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641595881 :+1: I see, thanks, I was wondering how that works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] eeshugerman commented on pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
eeshugerman commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641595881 :+1: I see, thanks, I was wondering how that works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #9201: Added MSSql To Oracle Transfer Operator
boring-cyborg[bot] commented on pull request #9201: URL: https://github.com/apache/airflow/pull/9201#issuecomment-641592925 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://apache-airflow-slack.herokuapp.com/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jkbngl opened a new pull request #9201: Added MSSql To Oracle Transfer Operator
jkbngl opened a new pull request #9201: URL: https://github.com/apache/airflow/pull/9201 --- Make sure to mark the boxes below before creating PR: [x] Added an Operator building on top of the OracleToOracleTransfer operator to extract data from a Microsoft SQL Database to an Oracle Database. (This is my first pull request, please let me know I am sure I can still improve a lot, I read the guidelines and did my best to follow them as much as possible) - [X] Description above provides context of the change - [X] Unit tests coverage for changes (not needed for documentation changes) -> not sure if applicable for operators aswell, I dont see a test for every operator in https://github.com/apache/airflow/tree/master/tests/operators - [X] Target Github ISSUE in description if exists -> doesnt exist - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [X] Relevant documentation is updated including usage instructions. -> dont think any is needed for operators, should be auto generated, let me know if I need to do anything, here - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (c18f4c0 -> 82c8343)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from c18f4c0 Fix typo in BREEZE.rst (#9199) add 82c8343 Support additional apt dependencies (#9189) No new revisions were added by this update. Summary of changes: Dockerfile| 12 ++-- Dockerfile.ci | 10 +- IMAGES.rst| 44 docs/concepts.rst | 2 +- 4 files changed, 64 insertions(+), 4 deletions(-)
[GitHub] [airflow] potiuk merged pull request #9189: Support additional apt dependencies
potiuk merged pull request #9189: URL: https://github.com/apache/airflow/pull/9189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3391) Upgrade pendulum to latest major version
[ https://issues.apache.org/jira/browse/AIRFLOW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129771#comment-17129771 ] ASF GitHub Bot commented on AIRFLOW-3391: - ashb commented on a change in pull request #7626: URL: https://github.com/apache/airflow/pull/7626#discussion_r437697053 ## File path: tests/www/test_views.py ## @@ -883,7 +882,7 @@ def test_run_with_not_runnable_states(self, get_default_executor_function): self.check_content_in_response('', resp, resp_code=200) msg = "Task is in the {} state which is not a valid state for execution. " \ - .format(state) + "The task must be cleared in order to be run" +.format(state) + "The task must be cleared in order to be run" Review comment: You have a lot of whitespace-only/formatting-only changes here. Please remove them from this PR (having unrelated changes in like this in a PR makes it harder to back-port fixes.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade pendulum to latest major version > > > Key: AIRFLOW-3391 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3391 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joshua Carp >Assignee: Bolke de Bruin >Priority: Trivial > > We're a major version behind on pendulum; we should upgrade to >=2.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb commented on a change in pull request #7626: [AIRFLOW-3391] Upgrade pendulum to latest major version
ashb commented on a change in pull request #7626: URL: https://github.com/apache/airflow/pull/7626#discussion_r437697053 ## File path: tests/www/test_views.py ## @@ -883,7 +882,7 @@ def test_run_with_not_runnable_states(self, get_default_executor_function): self.check_content_in_response('', resp, resp_code=200) msg = "Task is in the {} state which is not a valid state for execution. " \ - .format(state) + "The task must be cleared in order to be run" +.format(state) + "The task must be cleared in order to be run" Review comment: You have a lot of whitespace-only/formatting-only changes here. Please remove them from this PR (having unrelated changes in like this in a PR makes it harder to back-port fixes.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #9195: Update pre-commit-hooks repo version
ashb commented on pull request #9195: URL: https://github.com/apache/airflow/pull/9195#issuecomment-641549005 isort seems to be wreaking havok with the UPDATING.md -- we might have to not pick the latest version (or else turn it tune it down): ```diff diff --git a/docs/howto/operator/gcp/gcs_to_gcs.rst b/docs/howto/operator/gcp/gcs_to_gcs.rst index e17bedf..bb3fda1 100644 --- a/docs/howto/operator/gcp/gcs_to_gcs.rst +++ b/docs/howto/operator/gcp/gcs_to_gcs.rst @@ -40,11 +40,9 @@ perform this task faster and more economically. The economic effects are especia Airflow is not hosted in Google Cloud Platform, because these operators reduce egress traffic. These operators modify source objects if the option that specifies whether objects should be deleted -from the source after they are transferred to the sink is enabled. When you use the Google Cloud Data Transfer service, you can specify whether overwriting objects that already exist in the sink is allowed, whether objects that exist only in the sink should be deleted, or whether objects should be deleted -from the source after they are transferred to the sink. Source objects can be specified using include and exclusion prefixes, as well as based on the file modification date. ``` It seems to think _any_ line starting with `from` is an import line! We need to only have it operate on .py files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129761#comment-17129761 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb edited a comment on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641543813 Ah, we don't need a PR in this case -- just marking the original PR for the milestone 1.10.11 is enough and we'd cherry pick it. Since you've got it opened I'll merge this though. (We like cherry-picks to have the "(cherry picked from commit ...)" footer that `-x` adds, and our tooling to work out which commits are yet to be backported for a release relies upon the `(#)` on the end of the commit, so I have manually done both of these for this PR.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb edited a comment on pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
ashb edited a comment on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641543813 Ah, we don't need a PR in this case -- just marking the original PR for the milestone 1.10.11 is enough and we'd cherry pick it. Since you've got it opened I'll merge this though. (We like cherry-picks to have the "(cherry picked from commit ...)" footer that `-x` adds, and our tooling to work out which commits are yet to be backported for a release relies upon the `(#)` on the end of the commit, so I have manually done both of these for this PR.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129758#comment-17129758 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb merged pull request #9182: URL: https://github.com/apache/airflow/pull/9182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129759#comment-17129759 ] ASF subversion and git services commented on AIRFLOW-3973: -- Commit 5b48a5394ecf5aa1f2b50a00807e6149ade21968 in airflow's branch refs/heads/v1-10-stable from Elliott Shugerman [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=5b48a53 ] [AIRFLOW-3973] Commit after each alembic migration (#4797) If `Variable`s are used in DAGs, and Postgres is used for the internal database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs with error messages (but does not fail). This commit corrects this by running each migration in a separate transaction. Co-authored-by: Elliott Shugerman (cherry picked from commit ea95e9c7236969acc807c65de0f12633d04753a0) > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb merged pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
ashb merged pull request #9182: URL: https://github.com/apache/airflow/pull/9182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch v1-10-stable updated: [AIRFLOW-3973] Commit after each alembic migration (#4797)
This is an automated email from the ASF dual-hosted git repository. ash pushed a commit to branch v1-10-stable in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/v1-10-stable by this push: new 5b48a53 [AIRFLOW-3973] Commit after each alembic migration (#4797) 5b48a53 is described below commit 5b48a5394ecf5aa1f2b50a00807e6149ade21968 Author: Elliott Shugerman AuthorDate: Tue Jun 9 14:12:47 2020 -0600 [AIRFLOW-3973] Commit after each alembic migration (#4797) If `Variable`s are used in DAGs, and Postgres is used for the internal database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the logs with error messages (but does not fail). This commit corrects this by running each migration in a separate transaction. Co-authored-by: Elliott Shugerman (cherry picked from commit ea95e9c7236969acc807c65de0f12633d04753a0) --- airflow/migrations/env.py | 1 + 1 file changed, 1 insertion(+) diff --git a/airflow/migrations/env.py b/airflow/migrations/env.py index 2de0c2f..234c795 100644 --- a/airflow/migrations/env.py +++ b/airflow/migrations/env.py @@ -81,6 +81,7 @@ def run_migrations_online(): with connectable.connect() as connection: context.configure( connection=connection, +transaction_per_migration=True, target_metadata=target_metadata, compare_type=COMPARE_TYPE, render_as_batch=True
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129757#comment-17129757 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641543813 Ah, we don't need a PR in this case -- just marking the original PR for the milestone 1.10.11 is enough and we'd cherry pick it. Since you've got it opened I'll merge this though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb commented on pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
ashb commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641543813 Ah, we don't need a PR in this case -- just marking the original PR for the milestone 1.10.11 is enough and we'd cherry pick it. Since you've got it opened I'll merge this though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] CodingJonas commented on pull request #8617: Extend docker swarm operator configuration
CodingJonas commented on pull request #8617: URL: https://github.com/apache/airflow/pull/8617#issuecomment-641541320 Yes, I guess I should have structured this PR more in the first place, I see that I split the commit into better reviewable commits in the next days! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] CodingJonas commented on a change in pull request #8617: Extend docker swarm operator configuration
CodingJonas commented on a change in pull request #8617: URL: https://github.com/apache/airflow/pull/8617#discussion_r437686189 ## File path: airflow/providers/docker/operators/docker_swarm.py ## @@ -93,20 +95,37 @@ class DockerSwarmOperator(DockerOperator): Supported only if the Docker engine is using json-file or journald logging drivers. The `tty` parameter should be set to use this with Python applications. :type enable_logging: bool +:param configs: List of ConfigReferences that will be exposed to the service +:type configs: list + Example: [{'source': my-conf, 'target': where/config/should/be/.env}] +:param secrets: List of SecretReference to be made available inside the containers +:type secrets: list +:param networks: List of network names or IDs to attach the service to +:type networks: list """ @apply_defaults def __init__( self, image, enable_logging=True, +networks=None, +configs=None, +secrets=None, *args, **kwargs): super().__init__(image=image, *args, **kwargs) self.enable_logging = enable_logging +self.networks = networks +self.configs = configs +self.secrets = secrets +self.service_name = self._get_service_name() self.service = None +def _get_service_name(self): +return '%s__af_%s' % (self.task_id, get_random_string()) Review comment: Good point, I looked into it but didn't find any mentions of length limitations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] CodingJonas commented on a change in pull request #8617: Extend docker swarm operator configuration
CodingJonas commented on a change in pull request #8617: URL: https://github.com/apache/airflow/pull/8617#discussion_r437686189 ## File path: airflow/providers/docker/operators/docker_swarm.py ## @@ -93,20 +95,37 @@ class DockerSwarmOperator(DockerOperator): Supported only if the Docker engine is using json-file or journald logging drivers. The `tty` parameter should be set to use this with Python applications. :type enable_logging: bool +:param configs: List of ConfigReferences that will be exposed to the service +:type configs: list + Example: [{'source': my-conf, 'target': where/config/should/be/.env}] +:param secrets: List of SecretReference to be made available inside the containers +:type secrets: list +:param networks: List of network names or IDs to attach the service to +:type networks: list """ @apply_defaults def __init__( self, image, enable_logging=True, +networks=None, +configs=None, +secrets=None, *args, **kwargs): super().__init__(image=image, *args, **kwargs) self.enable_logging = enable_logging +self.networks = networks +self.configs = configs +self.secrets = secrets +self.service_name = self._get_service_name() self.service = None +def _get_service_name(self): +return '%s__af_%s' % (self.task_id, get_random_string()) Review comment: Good point, I looked into it but didn't find any limitations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nullhack edited a comment on issue #9176: [Feature request] Add reverse run of DAGs
nullhack edited a comment on issue #9176: URL: https://github.com/apache/airflow/issues/9176#issuecomment-641536838 > I think `back_run=True` can work only if `catchup=True` but `depends_on_past=False` > `catchup=False` means past runs are not necessary. > Can anyone correct me if I am wrong? Yes, this is what I meant, thanks for spotting that. I'm changing the description This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nullhack commented on issue #9176: [Feature request] Add reverse run of DAGs
nullhack commented on issue #9176: URL: https://github.com/apache/airflow/issues/9176#issuecomment-641536838 > I think `back_run=True` can work only if `catchup=True` but `depends_on_past=False` > `catchup=False` means past runs are not necessary. > Can anyone correct me if I am wrong? Yes, this is what I meant, thanks for spotting that. I'm changing the question This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] eeshugerman commented on pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
eeshugerman commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641534402 This is already in `master`. I'm trying to backport to v1.10. Please see the discussion here: https://github.com/apache/airflow/pull/4797 (Sorry for the confusion, I should have added a note of explanation to these.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129748#comment-17129748 ] ASF GitHub Bot commented on AIRFLOW-3973: - eeshugerman commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641534402 This is already in `master`. I'm trying to backport to v1.10. Please see the discussion here: https://github.com/apache/airflow/pull/4797 (Sorry for the confusion, I should have added a note of explanation to these.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[airflow] branch v1-10-test updated: fixup! Kubernetes Cluster is started on host not in the container (#8265)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/v1-10-test by this push: new 782d916 fixup! Kubernetes Cluster is started on host not in the container (#8265) 782d916 is described below commit 782d916e0b86eff5fad48a78bf8f4c912b7570e3 Author: Jarek Potiuk AuthorDate: Tue Jun 9 20:52:30 2020 +0200 fixup! Kubernetes Cluster is started on host not in the container (#8265) --- airflow/www_rbac/views.py| 3 ++- docs/conf.py | 1 - scripts/ci/kubernetes/docker/airflow-test-env-init-db.sh | 2 +- scripts/ci/kubernetes/docker/bootstrap.sh| 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/airflow/www_rbac/views.py b/airflow/www_rbac/views.py index 453cf83..ef2b751 100644 --- a/airflow/www_rbac/views.py +++ b/airflow/www_rbac/views.py @@ -53,6 +53,8 @@ from sqlalchemy import and_, desc, func, or_, union_all from sqlalchemy.orm import joinedload from wtforms import SelectField, validators +import nvd3 + import airflow from airflow import models, jobs from airflow import settings, configuration @@ -69,7 +71,6 @@ from airflow.utils.dates import infer_time_unit, scale_time_units from airflow.utils.db import provide_session, create_session from airflow.utils.helpers import alchemy_to_dict, render_log_filename from airflow.utils.state import State -from airflow._vendor import nvd3 from airflow.www_rbac import utils as wwwutils from airflow.www_rbac.app import app, appbuilder from airflow.www_rbac.decorators import action_logging, gzipped, has_dag_access diff --git a/docs/conf.py b/docs/conf.py index b32a30a..67010b3 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -185,7 +185,6 @@ release = airflow.__version__ # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = [ -'_api/airflow/_vendor', '_api/airflow/api', '_api/airflow/bin', '_api/airflow/config_templates', diff --git a/scripts/ci/kubernetes/docker/airflow-test-env-init-db.sh b/scripts/ci/kubernetes/docker/airflow-test-env-init-db.sh index c790160..70c710e 100755 --- a/scripts/ci/kubernetes/docker/airflow-test-env-init-db.sh +++ b/scripts/ci/kubernetes/docker/airflow-test-env-init-db.sh @@ -26,7 +26,7 @@ echo # Init and upgrade the database to latest heads cd "${AIRFLOW_SOURCES}"/airflow || exit 1 -airflow db init +airflow initdb alembic upgrade heads echo diff --git a/scripts/ci/kubernetes/docker/bootstrap.sh b/scripts/ci/kubernetes/docker/bootstrap.sh index 9099b6b..7f9ca10 100755 --- a/scripts/ci/kubernetes/docker/bootstrap.sh +++ b/scripts/ci/kubernetes/docker/bootstrap.sh @@ -28,7 +28,7 @@ echo echo "Save minimised web files" echo -mv "$(python -m site | grep ^USER_SITE | awk '{print $2}' | tr -d "'")/airflow/www/static/dist/" \ +mv "$(python -m site | grep ^USER_SITE | awk '{print $2}' | tr -d "'")/airflow/www_rbac/static/dist/" \ "/tmp" echo
[GitHub] [airflow] RachaelDS commented on a change in pull request #8954: Wait for pipeline state in Data Fusion operators
RachaelDS commented on a change in pull request #8954: URL: https://github.com/apache/airflow/pull/8954#discussion_r437672204 ## File path: airflow/providers/google/cloud/hooks/datafusion.py ## @@ -386,15 +480,29 @@ def start_pipeline( pipeline_name, "workflows", "DataPipelineWorkflow", -"start" +"start", ) +runtime_args = runtime_args or {} +# Unfortunately making the start call to CDAP does not return a run_id to poll for state. Review comment: You can avoid using the faux run Id by making a call to the batch start pipeline endpoint - the run id will be returned in this case. For example: TYPE: POST URL: 'https://xxx.datafusion.googleusercontent.com/api/v3/namespaces/default/start' BODY: [{"appId": "app_id", "programType": "workflow", "programId": "DataPipelineWorkflow","runtimeargs": {}}] Batch start pipeline endpoint info: https://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#H3293 (documentation does not currently reflect that the run Id is returned) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj opened a new issue #9200: CLI toolls wrapper doesn't support pipe
mik-laj opened a new issue #9200: URL: https://github.com/apache/airflow/issues/9200 When I run the following command ussing native gcloud ``` Copying from ... / [1 files][0.0 B/0.0 B] Operation completed over 1 objects. ``` When I executed this command in Breeze. I got the following output ``` the input device is not a TTY ``` I would be happy if the behavior in these environments was identical. This problem causes Cloud Build system tests to fail. https://github.com/apache/airflow/blob/c18f4c035c3fcee2e47c5f274b2f59add4d14ced/tests/providers/google/cloud/operators/test_cloud_build_system_helper.py#L66 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] stale[bot] commented on pull request #8242: Jinjafy params arg in BaseOperator
stale[bot] commented on pull request #8242: URL: https://github.com/apache/airflow/pull/8242#issuecomment-641514215 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-6981) Move Google Cloud Build from Discovery API to Python Library
[ https://issues.apache.org/jira/browse/AIRFLOW-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129723#comment-17129723 ] ASF GitHub Bot commented on AIRFLOW-6981: - mik-laj commented on pull request #8575: URL: https://github.com/apache/airflow/pull/8575#issuecomment-641510720 I saw that a new DAG was not added to the system tests. I prepared change - https://termbin.com/7ezm To add to your branch, run the following command: ```bash curl https://termbin.com/7ezm | git am ``` When I did bug, I saw that there was a bug. One of the tasks ends with the following error. ``` logs/previous_runs/2020-06-09_19_00_50/example_gcp_cloud_build_trigger/create_build_trigger/2020-06-08T00:00:00+00:00/1.log ``` ``` [2020-06-09 19:00:33,927] {taskinstance.py:719} INFO - Dependencies all met for [2020-06-09 19:00:33,944] {taskinstance.py:719} INFO - Dependencies all met for [2020-06-09 19:00:33,945] {taskinstance.py:908} INFO - [2020-06-09 19:00:33,945] {taskinstance.py:909} INFO - Starting attempt 1 of 1 [2020-06-09 19:00:33,946] {taskinstance.py:910} INFO - [2020-06-09 19:00:33,953] {taskinstance.py:929} INFO - Executing on 2020-06-08T00:00:00+00:00 [2020-06-09 19:00:33,957] {standard_task_runner.py:53} INFO - Started process 9042 to run task [2020-06-09 19:00:34,039] {logging_mixin.py:91} INFO - Running on host 70cf9669db62 [2020-06-09 19:00:34,067] {taskinstance.py:1001} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=example_gcp_cloud_build_trigger AIRFLOW_CTX_TASK_ID=create_build_trigger AIRFLOW_CTX_EXECUTION_DATE=2020-06-08T00:00:00+00:00 AIRFLOW_CTX_DAG_RUN_ID=backfill__2020-06-08T00:00:00+00:00 [2020-06-09 19:00:35,144] {taskinstance.py:1194} ERROR - 409 trigger (198907790164, test-cloud-build-trigger) already exists Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 826, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.ALREADY_EXISTS details = "trigger (198907790164, test-cloud-build-trigger) already exists" debug_error_string = "{"created":"@1591729235.142798800","description":"Error received from peer ipv4:216.58.215.74:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"trigger (198907790164, test-cloud-build-trigger) already exists","grpc_status":6}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/airflow/airflow/models/taskinstance.py", line 1026, in _run_raw_task result = task_copy.execute(context=context) File "/opt/airflow/airflow/providers/google/cloud/operators/cloud_build.py", line 239, in execute result = hook.create_build_trigger( File "/opt/airflow/airflow/providers/google/common/hooks/base_google.py", line 356, in inner_wrapper return func(self, *args, **kwargs) File "/opt/airflow/airflow/providers/google/cloud/hooks/cloud_build.py", line 188, in create_build_trigger return client.create_build_trigger( File "/usr/local/lib/python3.8/site-packages/google/cloud/devtools/cloudbuild_v1/gapic/cloud_build_client.py", line 595, in create_build_trigger return self._inner_api_calls["create_build_trigger"]( File "/usr/local/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__ return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from google.api_core.exceptions.AlreadyExists: 409 trigger (198907790164, test-cloud-build-trigger) already exists [2020-06-09 19:00:35,158] {taskinstance.py:1231} INFO - Marking task as FAILED. dag_id=example_gcp_cloud_build_trigger, task_id=create_build_trigger, execution_date=20200608T00, start_date=20200609T190033, end_date=20200609T190035 [2020-06-09 19:00:35,917] {logging_mixin.py:91} INFO - [2020-06-09 19:00:35,917] {local_task_job.py:103} INFO - Task exited with return code 1 ``` Do you have an idea how to solve it?
[GitHub] [airflow] mik-laj commented on pull request #8575: [AIRFLOW-6981] Move Google Cloud Build from Discovery API to Python Library
mik-laj commented on pull request #8575: URL: https://github.com/apache/airflow/pull/8575#issuecomment-641510720 I saw that a new DAG was not added to the system tests. I prepared change - https://termbin.com/7ezm To add to your branch, run the following command: ```bash curl https://termbin.com/7ezm | git am ``` When I did bug, I saw that there was a bug. One of the tasks ends with the following error. ``` logs/previous_runs/2020-06-09_19_00_50/example_gcp_cloud_build_trigger/create_build_trigger/2020-06-08T00:00:00+00:00/1.log ``` ``` [2020-06-09 19:00:33,927] {taskinstance.py:719} INFO - Dependencies all met for [2020-06-09 19:00:33,944] {taskinstance.py:719} INFO - Dependencies all met for [2020-06-09 19:00:33,945] {taskinstance.py:908} INFO - [2020-06-09 19:00:33,945] {taskinstance.py:909} INFO - Starting attempt 1 of 1 [2020-06-09 19:00:33,946] {taskinstance.py:910} INFO - [2020-06-09 19:00:33,953] {taskinstance.py:929} INFO - Executing on 2020-06-08T00:00:00+00:00 [2020-06-09 19:00:33,957] {standard_task_runner.py:53} INFO - Started process 9042 to run task [2020-06-09 19:00:34,039] {logging_mixin.py:91} INFO - Running on host 70cf9669db62 [2020-06-09 19:00:34,067] {taskinstance.py:1001} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=example_gcp_cloud_build_trigger AIRFLOW_CTX_TASK_ID=create_build_trigger AIRFLOW_CTX_EXECUTION_DATE=2020-06-08T00:00:00+00:00 AIRFLOW_CTX_DAG_RUN_ID=backfill__2020-06-08T00:00:00+00:00 [2020-06-09 19:00:35,144] {taskinstance.py:1194} ERROR - 409 trigger (198907790164, test-cloud-build-trigger) already exists Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 826, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.ALREADY_EXISTS details = "trigger (198907790164, test-cloud-build-trigger) already exists" debug_error_string = "{"created":"@1591729235.142798800","description":"Error received from peer ipv4:216.58.215.74:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"trigger (198907790164, test-cloud-build-trigger) already exists","grpc_status":6}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/airflow/airflow/models/taskinstance.py", line 1026, in _run_raw_task result = task_copy.execute(context=context) File "/opt/airflow/airflow/providers/google/cloud/operators/cloud_build.py", line 239, in execute result = hook.create_build_trigger( File "/opt/airflow/airflow/providers/google/common/hooks/base_google.py", line 356, in inner_wrapper return func(self, *args, **kwargs) File "/opt/airflow/airflow/providers/google/cloud/hooks/cloud_build.py", line 188, in create_build_trigger return client.create_build_trigger( File "/usr/local/lib/python3.8/site-packages/google/cloud/devtools/cloudbuild_v1/gapic/cloud_build_client.py", line 595, in create_build_trigger return self._inner_api_calls["create_build_trigger"]( File "/usr/local/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__ return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from google.api_core.exceptions.AlreadyExists: 409 trigger (198907790164, test-cloud-build-trigger) already exists [2020-06-09 19:00:35,158] {taskinstance.py:1231} INFO - Marking task as FAILED. dag_id=example_gcp_cloud_build_trigger, task_id=create_build_trigger, execution_date=20200608T00, start_date=20200609T190033, end_date=20200609T190035 [2020-06-09 19:00:35,917] {logging_mixin.py:91} INFO - [2020-06-09 19:00:35,917] {local_task_job.py:103} INFO - Task exited with return code 1 ``` Do you have an idea how to solve it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this
[GitHub] [airflow] potiuk commented on issue #9193: Breaks production Docker image: Celery 4.4.4 (known bug) lacks "future" module thus breaks airflow
potiuk commented on issue #9193: URL: https://github.com/apache/airflow/issues/9193#issuecomment-641502284 Fixed in #9194 - the images should rebuild tonight (we have nightly builds) and they should contain celery 4.4.5 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #9193: Breaks production Docker image: Celery 4.4.4 (known bug) lacks "future" module thus breaks airflow
potiuk closed issue #9193: URL: https://github.com/apache/airflow/issues/9193 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129699#comment-17129699 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641497817 As mentioned in our other PR https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-git-branches -- this PR needs to target master, not v1-10-test/stable please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129698#comment-17129698 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb closed pull request #9183: URL: https://github.com/apache/airflow/pull/9183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-3973) `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is used for the internal database
[ https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129697#comment-17129697 ] ASF GitHub Bot commented on AIRFLOW-3973: - ashb commented on pull request #9183: URL: https://github.com/apache/airflow/pull/9183#issuecomment-641497412 Duplicate of #9182 - please https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-git-branches This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is > used for the internal database > --- > > Key: AIRFLOW-3973 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3973 > Project: Apache Airflow > Issue Type: Bug >Reporter: Elliott Shugerman >Assignee: Elliott Shugerman >Priority: Minor > Fix For: 2.0.0 > > > h2. Notes: > * This does not occur if the database is already initialized. If it is, run > `resetdb` instead to observe the bug. > * This does not occur with the default SQLite database. > h2. Example > {{ERROR [airflow.models.DagBag] Failed to import: > /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): > File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1236, in _execute_context cursor, statement, parameters, context File > "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 536, in do_execute cursor.execute(statement, parameters) > psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM > variable}} > h2. Explanation > The first thing {{airflow initdb}} does is run the Alembic migrations. All > migrations are run in one transaction. Most tables, including the > {{variable}} table, are defined in the initial migration. A [later > migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py] > imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} > calls its {{collect_dags}} method, which scans the DAGs directory and > attempts to load all DAGs it finds. When it loads a DAG that uses a > {{Variable}}, it will query the database to see if that {{Variable}} is > defined in the {{variable}} table. It's not clear to me how exactly the > connection for that query is created, but I think it is apparent that it does > _not_ use the same transaction that is used to run the migrations. Since the > migrations are not yet complete, and all migrations are run in one > transaction, the migration that creates the {{variable}} table has not yet > been committed, and therefore the table does not exist to any other > connection/transaction. This raises {{ProgrammingError}}, which is caught and > logged by {{collect_dags}}. > > h2. Proposed Solution > Run each Alembic migration in its own transaction. I will open a pull request > which accomplishes this shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb commented on pull request #9182: [AIRFLOW-3973] Commit after each alembic migration
ashb commented on pull request #9182: URL: https://github.com/apache/airflow/pull/9182#issuecomment-641497817 As mentioned in our other PR https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-git-branches -- this PR needs to target master, not v1-10-test/stable please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #9183: [AIRFLOW-3973] Commit after each alembic migration
ashb commented on pull request #9183: URL: https://github.com/apache/airflow/pull/9183#issuecomment-641497412 Duplicate of #9182 - please https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-git-branches This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb closed pull request #9183: [AIRFLOW-3973] Commit after each alembic migration
ashb closed pull request #9183: URL: https://github.com/apache/airflow/pull/9183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Udit107710 commented on pull request #9199: Fixed typo
Udit107710 commented on pull request #9199: URL: https://github.com/apache/airflow/pull/9199#issuecomment-641497300 Thanks @ashb ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb merged pull request #9199: Fixed typo
ashb merged pull request #9199: URL: https://github.com/apache/airflow/pull/9199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (6d4972a -> c18f4c0)
This is an automated email from the ASF dual-hosted git repository. ash pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from 6d4972a Remove httplib2 from Google requirements (#9194) add c18f4c0 Fix typo in BREEZE.rst (#9199) No new revisions were added by this update. Summary of changes: BREEZE.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #9199: Fixed typo
boring-cyborg[bot] commented on pull request #9199: URL: https://github.com/apache/airflow/pull/9199#issuecomment-641496812 Awesome work, congrats on your first merged pull request! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated: Remove httplib2 from Google requirements (#9194)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/master by this push: new 6d4972a Remove httplib2 from Google requirements (#9194) 6d4972a is described below commit 6d4972a0a7ff497a018f15a919a829f58de6be32 Author: Kamil Breguła AuthorDate: Tue Jun 9 20:33:17 2020 +0200 Remove httplib2 from Google requirements (#9194) * Remove httplib2 from Google requirements * fixup! Remove httplib2 from Google requirements --- requirements/requirements-python3.6.txt | 56 +++-- requirements/requirements-python3.7.txt | 53 +++ requirements/requirements-python3.8.txt | 55 +++- requirements/setup-3.6.md5 | 2 +- requirements/setup-3.7.md5 | 2 +- requirements/setup-3.8.md5 | 2 +- setup.py| 1 - 7 files changed, 80 insertions(+), 91 deletions(-) diff --git a/requirements/requirements-python3.6.txt b/requirements/requirements-python3.6.txt index d0c240e..3ebbbe9 100644 --- a/requirements/requirements-python3.6.txt +++ b/requirements/requirements-python3.6.txt @@ -29,11 +29,11 @@ Pygments==2.6.1 SQLAlchemy-JSONField==0.9.0 SQLAlchemy-Utils==0.36.6 SQLAlchemy==1.3.17 -Sphinx==3.0.4 +Sphinx==3.1.0 Unidecode==1.1.1 WTForms==2.3.1 Werkzeug==0.16.1 -adal==1.2.3 +adal==1.2.4 aiohttp==3.6.2 alabaster==0.7.12 alembic==1.4.2 @@ -45,13 +45,13 @@ apispec==1.3.3 appdirs==1.4.4 argcomplete==1.11.1 asn1crypto==1.3.0 -astroid==2.4.1 +astroid==2.3.3 async-generator==1.10 async-timeout==3.0.1 atlasclient==1.0.0 attrs==19.3.0 aws-sam-translator==1.24.0 -aws-xray-sdk==2.5.0 +aws-xray-sdk==2.6.0 azure-batch==9.0.0 azure-common==1.1.25 azure-cosmos==3.1.2 @@ -72,16 +72,16 @@ beautifulsoup4==4.7.1 billiard==3.6.3.0 black==19.10b0 blinker==1.4 -boto3==1.13.23 +boto3==1.13.25 boto==2.49.0 -botocore==1.16.23 +botocore==1.16.25 bowler==0.8.0 cached-property==1.5.1 cachetools==4.1.0 cassandra-driver==3.20.2 cattrs==1.0.0 -celery==4.4.4 -certifi==2020.4.5.1 +celery==4.4.5 +certifi==2020.4.5.2 cffi==1.14.0 cfgv==3.1.0 cfn-lint==0.33.0 @@ -100,14 +100,13 @@ croniter==0.3.32 cryptography==2.9.2 curlify==2.2.1 cx-Oracle==7.3.0 -dask==2.17.2 -dataclasses==0.7 +dask==2.18.0 datadog==0.36.0 decorator==4.4.2 defusedxml==0.6.0 dill==0.3.1.1 distlib==0.3.0 -distributed==2.17.0 +distributed==2.18.0 dnspython==1.16.0 docker-pycreds==0.4.0 docker==3.7.3 @@ -126,7 +125,7 @@ fastavro==0.23.4 filelock==3.0.12 fissix==20.5.1 flake8-colors==0.1.6 -flake8==3.8.2 +flake8==3.8.3 flaky==3.6.1 flask-swagger==0.2.13 flower==0.9.4 @@ -136,10 +135,10 @@ funcsigs==1.0.2 future-fstrings==1.2.0 future==0.18.2 gcsfs==0.6.2 -gevent==20.5.2 +gevent==20.6.0 gitdb==4.0.5 google-ads==4.0.0 -google-api-core==1.19.0 +google-api-core==1.19.1 google-api-python-client==1.9.1 google-auth-httplib2==0.0.3 google-auth-oauthlib==0.4.1 @@ -151,7 +150,7 @@ google-cloud-bigtable==1.2.1 google-cloud-container==0.5.0 google-cloud-core==1.3.0 google-cloud-datacatalog==0.7.0 -google-cloud-dataproc==0.8.0 +google-cloud-dataproc==0.8.1 google-cloud-dlp==0.15.0 google-cloud-kms==1.4.0 google-cloud-language==1.3.0 @@ -171,7 +170,7 @@ google-cloud-vision==1.0.0 google-resumable-media==0.5.1 googleapis-common-protos==1.52.0 graphviz==0.14 -greenlet==0.4.15 +greenlet==0.4.16 grpc-google-iam-v1==0.12.3 grpcio-gcp==0.2.2 grpcio==1.29.0 @@ -187,9 +186,9 @@ idna==2.9 ijson==2.6.1 imagesize==1.2.0 immutables==0.14 -importlib-metadata==1.6.0 -importlib-resources==1.5.0 -inflection==0.4.0 +importlib-metadata==1.6.1 +importlib-resources==2.0.0 +inflection==0.5.0 ipdb==0.13.2 ipython-genutils==0.2.0 ipython==7.15.0 @@ -211,9 +210,8 @@ jupyter-client==6.1.3 jupyter-core==4.6.3 kombu==4.6.10 kubernetes==11.0.0 -lazy-object-proxy==1.4.3 +lazy-object-proxy==1.5.0 ldap3==2.7 -libcst==0.3.6 lockfile==0.12.2 marshmallow-enum==1.5.1 marshmallow-sqlalchemy==0.23.1 @@ -261,11 +259,10 @@ pickleshare==0.7.5 pinotdb==0.1.1 pipdeptree==0.13.2 pluggy==0.13.1 -pre-commit==2.4.0 +pre-commit==2.5.0 presto-python-client==0.7.0 prison==0.1.3 prompt-toolkit==3.0.5 -proto-plus==0.4.0 protobuf==3.12.2 psutil==5.7.0 psycopg2-binary==2.8.5 @@ -293,7 +290,7 @@ pyparsing==2.4.7 pypd==1.1.0 pyrsistent==0.16.0 pysftp==0.2.9 -pyspark==2.4.5 +pyspark==2.4.6 pytest-cov==2.9.0 pytest-forked==1.1.3 pytest-instafail==0.4.1.post0 @@ -316,7 +313,7 @@ pywinrm==0.4.1 pyzmq==19.0.1 qds-sdk==1.16.0 redis==3.5.3 -regex==2020.5.14 +regex==2020.6.8 requests-kerberos==0.12.0 requests-mock==1.8.0 requests-ntlm==1.1.0 @@ -333,14 +330,14 @@ sentinels==1.0.0 sentry-sdk==0.14.4 setproctitle==1.1.10 sh==1.13.1 -simple-salesforce==1.0.0 +simple-salesforce==1.1.0 six==1.15.0 slackclient==2.7.1
[GitHub] [airflow] potiuk merged pull request #9194: Remove httplib2 from Google requirements
potiuk merged pull request #9194: URL: https://github.com/apache/airflow/pull/9194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #9199: Fixed typo
boring-cyborg[bot] commented on pull request #9199: URL: https://github.com/apache/airflow/pull/9199#issuecomment-641490412 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://apache-airflow-slack.herokuapp.com/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Udit107710 opened a new pull request #9199: Fixed typo
Udit107710 opened a new pull request #9199: URL: https://github.com/apache/airflow/pull/9199 Changed 'y' to 'by' since it was incorrect. --- Make sure to mark the boxes below before creating PR: [x] - [x ] Description above provides context of the change - [ ] Unit tests coverage for changes (not needed for documentation changes) - [x] Target Github ISSUE in description if exists - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [x] Relevant documentation is updated including usage instructions. - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] swapniel99 commented on issue #9176: [Feature request] Add reverse run of DAGs
swapniel99 commented on issue #9176: URL: https://github.com/apache/airflow/issues/9176#issuecomment-641485098 I think `back_run=True` can work only if `catchup=True` but `depends_on_past=False` catchup=False means past runs are not necessary. Can anyone correct me if I am wrong? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] swapniel99 edited a comment on issue #9176: [Feature request] Add reverse run of DAGs
swapniel99 edited a comment on issue #9176: URL: https://github.com/apache/airflow/issues/9176#issuecomment-641485098 I think `back_run=True` can work only if `catchup=True` but `depends_on_past=False` `catchup=False` means past runs are not necessary. Can anyone correct me if I am wrong? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating commented on a change in pull request #9159: Adding Google Deployment Manager Hook
SamWheating commented on a change in pull request #9159: URL: https://github.com/apache/airflow/pull/9159#discussion_r437614992 ## File path: airflow/providers/google/cloud/hooks/gdm.py ## @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +from typing import Any, Dict, List, Optional + +from googleapiclient.discovery import build + +from airflow.exceptions import AirflowException +from airflow.providers.google.common.hooks.base_google import GoogleBaseHook + + +class GoogleDeploymentManagerHook(GoogleBaseHook): # pylint: disable=abstract-method +""" +Interact with Google Cloud Deployment Manager using the Google Cloud Platform connection. +This allows for scheduled and programatic inspection and deletion fo resources managed by GDM. +""" + +def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None): +super(GoogleDeploymentManagerHook, self).__init__(gcp_conn_id, delegate_to=delegate_to) + +def get_conn(self): +""" +Returns a Google Deployment Manager service object. + +:rtype: googleapiclient.discovery.Resource +""" +http_authorized = self._authorize() +return build('deploymentmanager', 'v2', http=http_authorized, cache_discovery=False) + +@GoogleBaseHook.fallback_to_default_project_id +def list_deployments(self, project_id: Optional[str] = None, # pylint: disable=too-many-arguments + deployment_filter: Optional[str] = None, + max_results: Optional[int] = None, + order_by: Optional[str] = None, + page_token: Optional[str] = None) -> List[Dict[str, Any]]: +""" +Lists deployments in a google cloud project. + +:param project_id: The project ID for this request. +:type project_id: str +:param deployment_filter: A filter expression which limits resources returned in the response. +:type filter: str +:param max_results: The maximum number of results to return +:type max_results: Optional[int] +:param order_by: A field name to order by, ex: "creationTimestamp desc" +:type order_by: Optional[str] +:param page_token: specifies a page_token to use +:type page_token: str Review comment: Ok - I've updated the hook to fetch all deployments. I also removed the ability to specify maxResults, since it actually specifies max results _per page_ which is irrelevant if we're fetching all pages. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] shanit-saha edited a comment on pull request #7324: [AIRFLOW-6704] Copy common TaskInstance attributes from Task
shanit-saha edited a comment on pull request #7324: URL: https://github.com/apache/airflow/pull/7324#issuecomment-641433495 On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG `dag_process_pos` exists ``` task_trigger_dag_positional = TriggerDagRunOperator( trigger_dag_id="dag_process_pos", python_callable=set_up_dag_run_preprocessing, task_id="trigger_preprocess_dag", on_failure_callback=log_failure, execution_date=datetime.now(), provide_context=False, owner='airflow') def set_up_dag_run_preprocessing(context, dag_run_obj): ti = context['ti'] dag_name = context['ti'].task.trigger_dag_id dag_run = context['dag_run'] trans_id = dag_run.conf['transaction_id'] routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info") new_file_path = routing_info['file_location'] new_file_name = os.path.basename(routing_info['new_file_name']) file_path = os.path.join(new_file_path, new_file_name) batch_id = "123-AD-FF" dag_run_obj.payload = {'inputfilepath': file_path, 'transaction_id': trans_id, 'Id': batch_id} ``` The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out. ``` [2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one() Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute replace_microseconds=False) File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag replace_microseconds=replace_microseconds, File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag external_trigger=True, File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun run.refresh_from_db() File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db DR.run_id == self.run_id File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() ``` After which the `on_failure_callback` of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable. **P.S** : The DAG that is being triggered by the `TriggerDagRunOperator` , in this case `dag_process_pos` starts with task of type`dummy_operator` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-6704) TaskInstance.operator is not set when task is marked success or failed in the Web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129665#comment-17129665 ] ASF GitHub Bot commented on AIRFLOW-6704: - shanit-saha edited a comment on pull request #7324: URL: https://github.com/apache/airflow/pull/7324#issuecomment-641433495 On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG `dag_process_pos` exists ``` task_trigger_dag_positional = TriggerDagRunOperator( trigger_dag_id="dag_process_pos", python_callable=set_up_dag_run_preprocessing, task_id="trigger_preprocess_dag", on_failure_callback=log_failure, execution_date=datetime.now(), provide_context=False, owner='airflow') def set_up_dag_run_preprocessing(context, dag_run_obj): ti = context['ti'] dag_name = context['ti'].task.trigger_dag_id dag_run = context['dag_run'] trans_id = dag_run.conf['transaction_id'] routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info") new_file_path = routing_info['file_location'] new_file_name = os.path.basename(routing_info['new_file_name']) file_path = os.path.join(new_file_path, new_file_name) batch_id = "123-AD-FF" dag_run_obj.payload = {'inputfilepath': file_path, 'transaction_id': trans_id, 'Id': batch_id} ``` The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out. ``` [2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one() Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute replace_microseconds=False) File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag replace_microseconds=replace_microseconds, File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag external_trigger=True, File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun run.refresh_from_db() File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db DR.run_id == self.run_id File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() ``` After which the `on_failure_callback` of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable. **P.S** : The DAG that is being triggered by the `TriggerDagRunOperator` , in this case `dag_process_pos` starts with task of type`dummy_operator` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskInstance.operator is not set when task is marked success or failed in the > Web UI > > > Key: AIRFLOW-6704 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6704 > Project: Apache Airflow > Issue Type: Bug > Components: models >Affects Versions: 1.10.7 >Reporter: Qian Yu >Assignee: Qian Yu >Priority: Major > Fix For: 1.10.10 > > > {{TaskInstance.operator}} is currently only set when task is executed. But if > a task is marked success or failed, the {{operator}} field is left as > {{None}}. > This causes bugs when some code tries to use the operator field to find the > name of the class. > The fix is trivial, just set {{TaskInstance.operator}} in its constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] mik-laj commented on issue #8629: DockerOperator cannot pull image when docker client uses json-file log driver
mik-laj commented on issue #8629: URL: https://github.com/apache/airflow/issues/8629#issuecomment-641461423 @languitar We plan to release it in Airflow 2.0. However, you will be able to use it in Airflow 1.10 using backport packages. The releasing process of these packages is ongoing. I hope it will take us up to two weeks. https://github.com/apache/airflow#using-hooks-and-operators-from-master-in-airflow-110 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (7ad827f -> efb86df)
This is an automated email from the ASF dual-hosted git repository. turbaszek pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from 7ad827f Set conn_type as not-nullable (#9187) add efb86df Call super.tearDown in SystemTest tearDown (#9196) No new revisions were added by this update. Summary of changes: tests/test_utils/system_tests_class.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [airflow] turbaszek merged pull request #9196: Call super.tearDown in SystemTest tearDown
turbaszek merged pull request #9196: URL: https://github.com/apache/airflow/pull/9196 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on issue #9186: Add SRV DNS record support to http connections
mik-laj commented on issue #9186: URL: https://github.com/apache/airflow/issues/9186#issuecomment-641456055 It makes sense. Support for service discovery in Airflow may be helpful in some use cases. I'd love to see implementations of this change if you plan to work on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a change in pull request #9095: [WIP]Add readonly connection endpoints
ephraimbuddy commented on a change in pull request #9095: URL: https://github.com/apache/airflow/pull/9095#discussion_r437590382 ## File path: airflow/api_connexion/openapi/v1.yaml ## @@ -1146,14 +1146,19 @@ components: type: string conn_type: type: string + nullable: true Review comment: I hurriedly pushed now. I will still come back to it this night. [e40d532](https://github.com/apache/airflow/pull/9095/commits/e40d532c01461672514fa07291633bb3796fd9b6) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jaketf commented on pull request #8954: Wait for pipeline state in Data Fusion operators
jaketf commented on pull request #8954: URL: https://github.com/apache/airflow/pull/8954#issuecomment-641453114 Thanks for following up @turbaszek. *tl;dr* I think we should merge this PR as it fixes the immediate issue. We can file a lower priority issue to handle streaming pipelines in the future. This can be an additional kwarg that accepts a streaming flag and uses a different paths for polling. I've updated the threads. I agree I think we should keep this PR small and focused on patching the existing operator for starting data fusion batch pipelines. In general I think batch is more used than streaming and spark is more used than MR. In batch both MR and spark can be polled at the .../DataPipelineWorkflow/runs/run_id endpoint. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch v1-10-test updated (f21618b -> 6ecede2)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git. discard f21618b Chown should work now when building the documentation (#8600) add 6ecede2 Chown should work now when building the documentation (#8600) This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (f21618b) \ N -- N -- N refs/heads/v1-10-test (6ecede2) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/ci.yml | 1 + 1 file changed, 1 insertion(+)
[airflow] branch v1-10-test updated (f21618b -> 6ecede2)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git. discard f21618b Chown should work now when building the documentation (#8600) add 6ecede2 Chown should work now when building the documentation (#8600) This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (f21618b) \ N -- N -- N refs/heads/v1-10-test (6ecede2) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .github/workflows/ci.yml | 1 + 1 file changed, 1 insertion(+)
[GitHub] [airflow] potiuk commented on pull request #9198: Fixes failures if python2 is still default on path
potiuk commented on pull request #9198: URL: https://github.com/apache/airflow/pull/9198#issuecomment-641442431 Closed. The reason was different *major/minor version wrongly set) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed pull request #9198: Fixes failures if python2 is still default on path
potiuk closed pull request #9198: URL: https://github.com/apache/airflow/pull/9198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request #9198: Fixes failures if python2 is still default on path
potiuk opened a new pull request #9198: URL: https://github.com/apache/airflow/pull/9198 In some systems python2 is still default. This causes failure of the script when run without the image --- Make sure to mark the boxes below before creating PR: [x] - [x] Description above provides context of the change - [x] Unit tests coverage for changes (not needed for documentation changes) - [x] Target Github ISSUE in description if exists - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [x] Relevant documentation is updated including usage instructions. - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jaketf commented on a change in pull request #8954: Wait for pipeline state in Data Fusion operators
jaketf commented on a change in pull request #8954: URL: https://github.com/apache/airflow/pull/8954#discussion_r437578759 ## File path: airflow/providers/google/cloud/hooks/datafusion.py ## @@ -435,20 +435,52 @@ def _get_pipeline( "workflows", "DataPipelineWorkflow", "runs", +pipeline_id, ) response = self._cdap_request(url=url, method="GET") if response.status != 200: raise AirflowException( -f"Retrieving a pipeline failed with code {response.status}" +f"Retrieving a pipeline state failed with code {response.status}" ) +workflow = json.loads(response.data) +return workflow["status"] -pipelines_list = json.loads(response.data) -for pipe in pipelines_list: -runtime_args = json.loads(pipe["properties"]["runtimeArgs"]) -if runtime_args[job_id_key] == faux_pipeline_id: -return pipe +def _get_pipeline_run_id( +self, +pipeline_name: str, +faux_pipeline_id: str, +instance_url: str, +namespace: str = "default", +) -> str: +url = os.path.join( +instance_url, +"v3", +"namespaces", +namespace, +"apps", +pipeline_name, +"workflows", Review comment: As long as we cover batch pipelines (with spark or MR backend I think we should be good) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jaketf commented on a change in pull request #8954: Wait for pipeline state in Data Fusion operators
jaketf commented on a change in pull request #8954: URL: https://github.com/apache/airflow/pull/8954#discussion_r437578183 ## File path: airflow/providers/google/cloud/hooks/datafusion.py ## @@ -435,20 +435,52 @@ def _get_pipeline( "workflows", "DataPipelineWorkflow", "runs", +pipeline_id, ) response = self._cdap_request(url=url, method="GET") if response.status != 200: raise AirflowException( -f"Retrieving a pipeline failed with code {response.status}" +f"Retrieving a pipeline state failed with code {response.status}" ) +workflow = json.loads(response.data) +return workflow["status"] -pipelines_list = json.loads(response.data) -for pipe in pipelines_list: -runtime_args = json.loads(pipe["properties"]["runtimeArgs"]) -if runtime_args[job_id_key] == faux_pipeline_id: -return pipe +def _get_pipeline_run_id( +self, +pipeline_name: str, +faux_pipeline_id: str, +instance_url: str, +namespace: str = "default", +) -> str: +url = os.path.join( +instance_url, +"v3", +"namespaces", +namespace, +"apps", +pipeline_name, +"workflows", +"DataPipelineWorkflow", +"runs", +) +# Try 5 times to get the CDAP runid. We do this because the pipeline +# may not be present instantly +for _ in range(5): +response = self._cdap_request(url=url, method="GET") +if response.status != 200: Review comment: I think we should just handle batch pipelines in this PR (as this is implicitly all the current operator does). Also, anecdotally, I think this covers 90% of use cases for airflow. In the field i have not see a lot of streaming orchestration with airflow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] shanit-saha commented on pull request #7324: [AIRFLOW-6704] Copy common TaskInstance attributes from Task
shanit-saha commented on pull request #7324: URL: https://github.com/apache/airflow/pull/7324#issuecomment-641433495 On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG `dag_process_pos` exists ``` task_trigger_dag_positional = TriggerDagRunOperator( trigger_dag_id="dag_process_pos", python_callable=set_up_dag_run_preprocessing, task_id="trigger_preprocess_dag", on_failure_callback=log_failure, execution_date=datetime.now(), provide_context=False, owner='airflow') def set_up_dag_run_preprocessing(context, dag_run_obj): ti = context['ti'] dag_name = context['ti'].task.trigger_dag_id dag_run = context['dag_run'] trans_id = dag_run.conf['transaction_id'] routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info") new_file_path = routing_info['file_location'] new_file_name = os.path.basename(routing_info['new_file_name']) file_path = os.path.join(new_file_path, new_file_name) batch_id = "123-AD-FF" dag_run_obj.payload = {'inputfilepath': file_path, 'transaction_id': trans_id, 'Id': batch_id} ``` The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out. ``` [2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one() Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute replace_microseconds=False) File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag replace_microseconds=replace_microseconds, File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag external_trigger=True, File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun run.refresh_from_db() File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db DR.run_id == self.run_id File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() ``` After which the `on_failure_callback` of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-6704) TaskInstance.operator is not set when task is marked success or failed in the Web UI
[ https://issues.apache.org/jira/browse/AIRFLOW-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129613#comment-17129613 ] ASF GitHub Bot commented on AIRFLOW-6704: - shanit-saha commented on pull request #7324: URL: https://github.com/apache/airflow/pull/7324#issuecomment-641433495 On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG `dag_process_pos` exists ``` task_trigger_dag_positional = TriggerDagRunOperator( trigger_dag_id="dag_process_pos", python_callable=set_up_dag_run_preprocessing, task_id="trigger_preprocess_dag", on_failure_callback=log_failure, execution_date=datetime.now(), provide_context=False, owner='airflow') def set_up_dag_run_preprocessing(context, dag_run_obj): ti = context['ti'] dag_name = context['ti'].task.trigger_dag_id dag_run = context['dag_run'] trans_id = dag_run.conf['transaction_id'] routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info") new_file_path = routing_info['file_location'] new_file_name = os.path.basename(routing_info['new_file_name']) file_path = os.path.join(new_file_path, new_file_name) batch_id = "123-AD-FF" dag_run_obj.payload = {'inputfilepath': file_path, 'transaction_id': trans_id, 'Id': batch_id} ``` The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out. ``` [2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one() Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute replace_microseconds=False) File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag replace_microseconds=replace_microseconds, File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag external_trigger=True, File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun run.refresh_from_db() File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db DR.run_id == self.run_id File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() ``` After which the `on_failure_callback` of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TaskInstance.operator is not set when task is marked success or failed in the > Web UI > > > Key: AIRFLOW-6704 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6704 > Project: Apache Airflow > Issue Type: Bug > Components: models >Affects Versions: 1.10.7 >Reporter: Qian Yu >Assignee: Qian Yu >Priority: Major > Fix For: 1.10.10 > > > {{TaskInstance.operator}} is currently only set when task is executed. But if > a task is marked success or failed, the {{operator}} field is left as > {{None}}. > This causes bugs when some code tries to use the operator field to find the > name of the class. > The fix is trivial, just set {{TaskInstance.operator}} in its constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] boring-cyborg[bot] commented on issue #9197: Apache Airflow dagrun_operator in Airflow Version 1.10.10
boring-cyborg[bot] commented on issue #9197: URL: https://github.com/apache/airflow/issues/9197#issuecomment-641408208 Thanks for opening your first issue here! Be sure to follow the issue template! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #9195: Update pre-commit-hooks repo version
potiuk commented on pull request #9195: URL: https://github.com/apache/airflow/pull/9195#issuecomment-641418199 It looks like the new stylelint requires some style improvements as well @feluelle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dimon222 commented on issue #9118: Improve/surface errors when attempting to read S3 logs
dimon222 commented on issue #9118: URL: https://github.com/apache/airflow/issues/9118#issuecomment-641416176 Any clue what component can be touched to impact on this? I'm having trouble to find what code swallows exceptions. Really looking to investigate and resolve S3 logs issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] edejong edited a comment on issue #8903: BigQueryHook refactor + deterministic BQ Job ID
edejong edited a comment on issue #8903: URL: https://github.com/apache/airflow/issues/8903#issuecomment-641387021 I should really check GitHub more often, I only saw the notification now. Let me know if I understand the question correctly: should the BigQueryHook's interface rely on classes from the Google API client library, or should all data be passed in as dictionaries? I think it's one thing to have the Airflow hooks/operators coupled to the BigQuery REST interface which I guess is what you get passing in the config in a Python dict. This allows you to translate online examples very easily to a DAG. But it's a much bigger step to rely on the Google client library in the API because that introduces a tight coupling to this specific library. It would only look good if the Airflow code can stay 100% agnostic about what is passed to the library. Can we guarantee that, even for the future? And does that align with other GCP products? So my personal opinion is stick with the dict :) As for generating a job id for all job types, I agree that would be a very good move. Without it you would have to wait for a response to even have something to check up on after. That works most of the time, but in cases where it goes wrong it makes it harder to troubleshoot. I love the suggested job ids string. One small thing I would change is to add a prefix such as `airflow_` or even just `af_` to make it even easier to spot these in Stackdriver for example. I would generate some well defined job id string every time one wasn't provided by the user. See https://cloud.google.com/bigquery/docs/running-jobs#generate-jobid for recommendations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj commented on a change in pull request #9095: [WIP]Add readonly connection endpoints
mik-laj commented on a change in pull request #9095: URL: https://github.com/apache/airflow/pull/9095#discussion_r437559058 ## File path: airflow/api_connexion/openapi/v1.yaml ## @@ -1146,14 +1146,19 @@ components: type: string conn_type: type: string + nullable: true Review comment: Ok. Can you do rebase? My change has been merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed
potiuk commented on a change in pull request #9162: URL: https://github.com/apache/airflow/pull/9162#discussion_r437550994 ## File path: scripts/ci/in_container/entrypoint_ci.sh ## @@ -134,10 +123,6 @@ if [[ ${INTEGRATION_KERBEROS:="false"} == "true" ]]; then fi -# Start MiniCluster -java -cp "/opt/minicluster-1.1-SNAPSHOT/*" com.ing.minicluster.MiniCluster \ Review comment: Exactly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] shanit-saha opened a new issue #9197: Apache Airflow dagrun_operator in Airflow Version 1.10.10
shanit-saha opened a new issue #9197: URL: https://github.com/apache/airflow/issues/9197 On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type. Code snippet of the task looks something as below. Please assume that DAG `dag_process_pos` exists ``` task_trigger_dag_positional = TriggerDagRunOperator( trigger_dag_id="dag_process_pos", python_callable=set_up_dag_run_preprocessing, task_id="trigger_preprocess_dag", on_failure_callback=log_failure, execution_date=datetime.now(), provide_context=False, owner='airflow') def set_up_dag_run_preprocessing(context, dag_run_obj): ti = context['ti'] dag_name = context['ti'].task.trigger_dag_id dag_run = context['dag_run'] trans_id = dag_run.conf['transaction_id'] routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info") new_file_path = routing_info['file_location'] new_file_name = os.path.basename(routing_info['new_file_name']) file_path = os.path.join(new_file_path, new_file_name) batch_id = "123-AD-FF" dag_run_obj.payload = {'inputfilepath': file_path, 'transaction_id': trans_id, 'Id': batch_id} ``` The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out. ``` [2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one() Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute replace_microseconds=False) File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag replace_microseconds=replace_microseconds, File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag external_trigger=True, File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun run.refresh_from_db() File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db DR.run_id == self.run_id File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one raise orm_exc.NoResultFound("No row was found for one()") sqlalchemy.orm.exc.NoResultFound: No row was found for one() ``` After which the `on_failure_callback` of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] feluelle opened a new pull request #9196: Call super.tearDown in SystemTest tearDown
feluelle opened a new pull request #9196: URL: https://github.com/apache/airflow/pull/9196 `TestCase.setUp` and -`tearDown` are empty functions but I think it does not harm if we change it accordingly. --- Make sure to mark the boxes below before creating PR: [x] - [ ] Description above provides context of the change - [ ] Unit tests coverage for changes (not needed for documentation changes) - [ ] Target Github ISSUE in description if exists - [ ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [ ] Relevant documentation is updated including usage instructions. - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mik-laj merged pull request #9187: Set conn_type as not-nullable
mik-laj merged pull request #9187: URL: https://github.com/apache/airflow/pull/9187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] casassg commented on a change in pull request #8962: [AIRFLOW-8057] [AIP-31] Add @task decorator
casassg commented on a change in pull request #8962: URL: https://github.com/apache/airflow/pull/8962#discussion_r432601178 ## File path: airflow/operators/python.py ## @@ -145,6 +149,131 @@ def execute_callable(self): return self.python_callable(*self.op_args, **self.op_kwargs) +class _PythonFunctionalOperator(BaseOperator): +""" +Wraps a Python callable and captures args/kwargs when called for execution. + +:param python_callable: A reference to an object that is callable +:type python_callable: python callable +:param op_kwargs: a dictionary of keyword arguments that will get unpacked +in your function +:type op_kwargs: dict (templated) +:param op_args: a list of positional arguments that will get unpacked when +calling your callable +:type op_args: list (templated) +:param multiple_outputs: if set, function return value will be +unrolled to multiple XCom values. Dict will unroll to xcom values with keys as keys. +Defaults to False. +:type multiple_outputs: bool +""" + +template_fields = ('op_args', 'op_kwargs') +ui_color = '#ffefeb' + +# since we won't mutate the arguments, we should just do the shallow copy +# there are some cases we can't deepcopy the objects(e.g protobuf). +shallow_copy_attrs = ('python_callable',) + +@apply_defaults +def __init__( +self, +python_callable: Callable, +op_args: Tuple[Any], +op_kwargs: Dict[str, Any], +multiple_outputs: bool = False, +**kwargs +) -> None: +kwargs['task_id'] = self._get_unique_task_id(kwargs['task_id'], kwargs.get('dag', None)) +super().__init__(**kwargs) +self.python_callable = python_callable + +# Check that arguments can be binded +signature(python_callable).bind(*op_args, **op_kwargs) +self.multiple_outputs = multiple_outputs +self.op_args = op_args +self.op_kwargs = op_kwargs + +@staticmethod +def _get_unique_task_id(task_id: str, dag: Optional[DAG]) -> str: +dag = dag or DagContext.get_current_dag() +if not dag or task_id not in dag.task_ids: +return task_id +core = re.split(r'__\d+$', task_id)[0] +suffixes = sorted( +[int(re.split(r'^.+__', task_id)[1]) + for task_id in dag.task_ids + if re.match(rf'^{core}__\d+$', task_id)] +) +if not suffixes: +return f'{core}__1' +return f'{core}__{suffixes[-1] + 1}' + +@staticmethod +def validate_python_callable(python_callable): +"""Validate that python callable can be wrapped by operator. +Raises exception if invalid. + +:param python_callable: Python object to be validated +:raises: TypeError, AirflowException +""" +if not callable(python_callable): +raise TypeError('`python_callable` param must be callable') +if 'self' in signature(python_callable).parameters.keys(): +raise AirflowException('@task does not support methods') Review comment: Seems `ismethod` does not work. https://stackoverflow.com/questions/47599749/check-if-function-belongs-to-a-class, keeping it as is as the __qualname__ may involve a bit more difficult setup and does not accept functions defined in classes that are not methods. ## File path: airflow/operators/python.py ## @@ -145,6 +149,131 @@ def execute_callable(self): return self.python_callable(*self.op_args, **self.op_kwargs) +class _PythonFunctionalOperator(BaseOperator): +""" +Wraps a Python callable and captures args/kwargs when called for execution. + +:param python_callable: A reference to an object that is callable +:type python_callable: python callable +:param op_kwargs: a dictionary of keyword arguments that will get unpacked +in your function +:type op_kwargs: dict (templated) +:param op_args: a list of positional arguments that will get unpacked when +calling your callable +:type op_args: list (templated) +:param multiple_outputs: if set, function return value will be +unrolled to multiple XCom values. Dict will unroll to xcom values with keys as keys. +Defaults to False. +:type multiple_outputs: bool +""" + +template_fields = ('op_args', 'op_kwargs') +ui_color = '#ffefeb' + +# since we won't mutate the arguments, we should just do the shallow copy +# there are some cases we can't deepcopy the objects(e.g protobuf). +shallow_copy_attrs = ('python_callable',) + +@apply_defaults +def __init__( +self, +python_callable: Callable, +op_args: Tuple[Any], +op_kwargs: Dict[str, Any], +multiple_outputs: bool = False, +**kwargs +) -> None: +kwargs['task_id'] = self._get_unique_task_id(kwargs['task_id'],
[airflow] 01/02: Improved cloud tool available in the trimmed down CI container (#9167)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git commit 5661dc95a7e157b612fe817057048486275d735a Author: Jarek Potiuk AuthorDate: Tue Jun 9 09:33:16 2020 +0200 Improved cloud tool available in the trimmed down CI container (#9167) * Improved cloud tool available in the trimmed down CI container The tools now have shebangs which make them available for python tools. Also /opt/airflow is now mounted from the host Airflow sources which makes it possible for the tools to copy files directly to/from the sources of Airflow. It also contains one small change for Linux users - the files created by docker gcloud are created with root user so in order to fix that the directories mounted from the host are fixed when you exit the tool - their ownership is changed to be owned by the host user (cherry picked from commit de9d3401f9668c703a8f0665769fd17268f607ed) --- .pre-commit-config.yaml | 2 +- BREEZE.rst| 3 + Dockerfile.ci | 21 + breeze| 1 + requirements/requirements-python2.7.txt | 79 +- requirements/requirements-python3.5.txt | 95 +++--- requirements/requirements-python3.6.txt | 99 --- requirements/requirements-python3.7.txt | 97 +++--- requirements/setup-2.7.md5| 2 +- requirements/setup-3.5.md5| 2 +- requirements/setup-3.6.md5| 2 +- requirements/setup-3.7.md5| 2 +- scripts/ci/ci_docs.sh | 3 + scripts/ci/ci_fix_ownership.sh| 3 + scripts/ci/ci_flake8.sh | 6 ++ scripts/ci/ci_mypy.sh | 3 + scripts/ci/docker-compose/forward-credentials.yml | 2 + scripts/ci/docker-compose/local-prod.yml | 2 + scripts/ci/docker-compose/local.yml | 2 + scripts/ci/in_container/_in_container_utils.sh| 16 ++-- scripts/ci/libraries/_initialization.sh | 7 ++ scripts/ci/libraries/_runs.sh | 6 ++ scripts/ci/prepare_tool_scripts.sh| 64 +++ 23 files changed, 308 insertions(+), 211 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4979f24..7e49b77 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -60,7 +60,7 @@ repos: - id: insert-license name: Add license for all JS/CSS files files: \.(js|css)$ -exclude: ^\.github/.*$ +exclude: ^\.github/.*$|^airflow/www/static/.*|^airflow/www_rbac/static/.*$ args: - --comment-style - "/**| *| */" diff --git a/BREEZE.rst b/BREEZE.rst index aec975c..3a1a6a0 100644 --- a/BREEZE.rst +++ b/BREEZE.rst @@ -243,6 +243,9 @@ it is downloaded, it will stay until you remove the downloaded images from your For each of those CLI credentials are taken (automatically) from the credentials you have defined in your ${HOME} directory on host. +Those tools also have host Airflow source directory mounted in /opt/airflow path +so you can directly transfer files to/from your airflow host sources. + Those are currently installed CLIs (they are available as aliases to the docker commands): +---+--+-+---+ diff --git a/Dockerfile.ci b/Dockerfile.ci index 8035fa2..8051431 100644 --- a/Dockerfile.ci +++ b/Dockerfile.ci @@ -307,26 +307,7 @@ RUN if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \ pip install ${ADDITIONAL_PYTHON_DEPS}; \ fi -RUN \ -AWSCLI_IMAGE="amazon/aws-cli:latest" && \ -AZURECLI_IMAGE="mcr.microsoft.com/azure-cli:latest" && \ -GCLOUD_IMAGE="gcr.io/google.com/cloudsdktool/cloud-sdk:latest" && \ -echo "docker run --rm -it -v \${HOST_HOME}/.aws:/root/.aws ${AWSCLI_IMAGE} \"\$@\"" \ -> /usr/bin/aws && \ -echo "docker pull ${AWSCLI_IMAGE}" > /usr/bin/aws-update && \ -echo "docker run --rm -it -v \${HOST_HOME}/.azure:/root/.azure ${AZURECLI_IMAGE} \"\$@\"" \ -> /usr/bin/az && \ -echo "docker pull ${AZURECLI_IMAGE}" > /usr/bin/az-update && \ -echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} bq \"\$@\"" \ -> /usr/bin/bq && \ -echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/bq-update && \ -echo "docker run --rm -it -v \${HOST_HOME}/.config:/root/.config ${GCLOUD_IMAGE} gcloud \"\$@\"" \ -> /usr/bin/gcloud && \ -echo "docker pull ${GCLOUD_IMAGE}" > /usr/bin/gcloud-update && \ -echo "docker run --rm -it -v
[airflow] 02/02: Chown should work now when building the documentation (#8600)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git commit f21618bdae4ba157929477bfb96819b5caa15a78 Author: Jarek Potiuk AuthorDate: Tue Apr 28 13:03:08 2020 +0200 Chown should work now when building the documentation (#8600) (cherry picked from commit 1291ded55f9708d464947d878e61945c1b2ad3f3) --- .github/workflows/ci.yml | 3 +++ docs/build | 18 +- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index c48a7ac..21dbcb4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -78,6 +78,9 @@ jobs: CI_JOB_TYPE: "Documentation" steps: - uses: actions/checkout@master + - uses: actions/setup-python@v1 +with: + python-version: '3.6' - name: "Build CI image ${{ matrix.python-version }}" run: ./scripts/ci/ci_prepare_image_on_ci.sh - name: "Build docs" diff --git a/docs/build b/docs/build index 3b51212..f9856e9 100755 --- a/docs/build +++ b/docs/build @@ -78,12 +78,12 @@ def prepare_directories() -> None: # The _api folder should be deleted by then but just in case we should change the ownership host_user_id = os.environ["HOST_USER_ID"] host_group_id = os.environ["HOST_GROUP_ID"] -print(f"Changing ownership of docs/_build folder back to ${host_user_id}:${host_group_id}") -run(["sudo", "chown", f'"${host_user_id}":"${host_group_id}"', "_build"], check=True) +print(f"Changing ownership of docs/_build folder back to {host_user_id}:{host_group_id}") +run(["sudo", "chown", "-R", f'{host_user_id}:{host_group_id}', "_build"], check=True) if os.path.exists("_api"): -run(["sudo", "chown", f'"${host_user_id}":"${host_group_id}"', "_api"], check=True) +run(["sudo", "chown", "-R", f'{host_user_id}:{host_group_id}', "_api"], check=True) -print(f"Changed ownership of docs/_build folder back to ${host_user_id}:${host_group_id}") +print(f"Changed ownership of docs/_build folder back to {host_user_id}:{host_group_id}") atexit.register(restore_ownership) @@ -198,11 +198,11 @@ def parse_sphinx_warnings(warning_text: str) -> List[DocBuildError]: for sphinx_warning in warning_text.split("\n"): if not sphinx_warning: continue -warining_parts = sphinx_warning.split(":", 2) -if len(warining_parts) == 3: +warning_parts = sphinx_warning.split(":", 2) +if len(warning_parts) == 3: sphinx_build_errors.append( DocBuildError( -file_path=warining_parts[0], line_no=int(warining_parts[1]), message=warining_parts[2] +file_path=warning_parts[0], line_no=int(warning_parts[1]), message=warning_parts[2] ) ) else: @@ -239,8 +239,8 @@ def build_sphinx_docs() -> None: warning_text = tmp_file.read().decode() # Remove 7-bit C1 ANSI escape sequences warning_text = re.sub(r"\x1B[@-_][0-?]*[ -/]*[@-~]", "", warning_text) -sphinx_build_errrors = parse_sphinx_warnings(warning_text) -build_errors.extend(sphinx_build_errrors) +sphinx_build_errors = parse_sphinx_warnings(warning_text) +build_errors.extend(sphinx_build_errors) print("Current working directory: ", os.getcwd())
[GitHub] [airflow] mik-laj commented on a change in pull request #9170: [WIP] Read only endpoint for XCom #8134
mik-laj commented on a change in pull request #9170: URL: https://github.com/apache/airflow/pull/9170#discussion_r437084427 ## File path: airflow/api_connexion/schemas/xcom_schema.py ## @@ -0,0 +1,67 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from marshmallow import post_dump +from marshmallow_sqlalchemy import SQLAlchemySchema, auto_field + +from airflow.models import XCom + + +class XComCollectionItemSchema(SQLAlchemySchema): + +class Meta: +""" Meta """ +model = XCom + +COLLECTION_NAME = 'xcom_entries' +FIELDS_FROM_NONE_TO_EMPTY_STRING = ['key', 'task_id', 'dag_id'] Review comment: These fields are primary keys, so they must always be value. Why do you want them to be empty? ## File path: airflow/api_connexion/schemas/xcom_schema.py ## @@ -0,0 +1,67 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from marshmallow import post_dump +from marshmallow_sqlalchemy import SQLAlchemySchema, auto_field + +from airflow.models import XCom + + +class XComCollectionItemSchema(SQLAlchemySchema): + +class Meta: +""" Meta """ +model = XCom + +COLLECTION_NAME = 'xcom_entries' +FIELDS_FROM_NONE_TO_EMPTY_STRING = ['key', 'task_id', 'dag_id'] Review comment: These fields are primary keys, so they must always have a value. Why do you want them to be empty? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating commented on a change in pull request #9159: Adding Google Deployment Manager Hook
SamWheating commented on a change in pull request #9159: URL: https://github.com/apache/airflow/pull/9159#discussion_r436680469 ## File path: airflow/providers/google/cloud/hooks/gdm.py ## @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +from typing import Any, Dict, List, Optional + +from googleapiclient.discovery import build + +from airflow.exceptions import AirflowException +from airflow.providers.google.common.hooks.base_google import GoogleBaseHook + + +class GoogleDeploymentManagerHook(GoogleBaseHook): # pylint: disable=abstract-method +""" +Interact with Google Cloud Deployment Manager using the Google Cloud Platform connection. +This allows for scheduled and programatic inspection and deletion fo resources managed by GDM. +""" + +def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None): +super(GoogleDeploymentManagerHook, self).__init__(gcp_conn_id, delegate_to=delegate_to) + +def get_conn(self): +""" +Returns a Google Deployment Manager service object. + +:rtype: googleapiclient.discovery.Resource +""" +http_authorized = self._authorize() +return build('deploymentmanager', 'v2', http=http_authorized, cache_discovery=False) + +@GoogleBaseHook.fallback_to_default_project_id +def list_deployments(self, project_id: Optional[str] = None, # pylint: disable=too-many-arguments + deployment_filter: Optional[str] = None, + max_results: Optional[int] = None, + order_by: Optional[str] = None, + page_token: Optional[str] = None) -> List[Dict[str, Any]]: +""" +Lists deployments in a google cloud project. + +:param project_id: The project ID for this request. +:type project_id: str +:param deployment_filter: A filter expression which limits resources returned in the response. +:type filter: str +:param max_results: The maximum number of results to return +:type max_results: Optional[int] +:param order_by: A field name to order by, ex: "creationTimestamp desc" +:type order_by: Optional[str] +:param page_token: specifies a page_token to use +:type page_token: str Review comment: This option comes straight from the [API](https://cloud.google.com/deployment-manager/docs/reference/latest/deployments/list#parameters): _maxResults, unsigned integer:_ _The maximum number of results per page that should be returned. If the number of available results is larger than maxResults, Compute Engine returns a nextPageToken that can be used to get the next page of results in subsequent list requests. Acceptable values are 0 to 500, inclusive. (Default: 500)_ So there's no way to get all of the entries, though I would assume its super rare to have >500 deployments in the same project. Can we leave this param in just so the hook doesn't become unusable past 500 deployments? ## File path: airflow/providers/google/cloud/hooks/gdm.py ## @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +from typing import Any, Dict, List, Optional + +from googleapiclient.discovery import build + +from airflow.exceptions import AirflowException +from airflow.providers.google.common.hooks.base_google import GoogleBaseHook + + +class
[airflow] branch v1-10-test updated (4b8fbf2 -> f21618b)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git. omit 4b8fbf2 Improved cloud tool available in the trimmed down CI container (#9167) new 5661dc9 Improved cloud tool available in the trimmed down CI container (#9167) new f21618b Chown should work now when building the documentation (#8600) This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (4b8fbf2) \ N -- N -- N refs/heads/v1-10-test (f21618b) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .github/workflows/ci.yml| 3 + .pre-commit-config.yaml | 2 +- docs/build | 18 +++--- requirements/requirements-python2.7.txt | 79 +- requirements/requirements-python3.5.txt | 95 --- requirements/requirements-python3.6.txt | 99 + requirements/requirements-python3.7.txt | 97 requirements/setup-2.7.md5 | 2 +- requirements/setup-3.5.md5 | 2 +- requirements/setup-3.6.md5 | 2 +- requirements/setup-3.7.md5 | 2 +- 11 files changed, 206 insertions(+), 195 deletions(-)
[GitHub] [airflow] potiuk commented on pull request #9029: Remove Hive/Hadoop/Java dependency from unit tests
potiuk commented on pull request #9029: URL: https://github.com/apache/airflow/pull/9029#issuecomment-641191536 Hey @jhtimmins -> I cherry-picked your changes to https://github.com/apache/airflow/commit/d4a00b7c69d5d1c4514bc70b2a0022f36b85a06b I'd love if you can take a look and see if I figured the cherry-picking well :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] leahecole commented on pull request #9174: Don't use the term "whitelist" - language matters
leahecole commented on pull request #9174: URL: https://github.com/apache/airflow/pull/9174#issuecomment-640865479 I was just going to make this PR - so glad you beat me to it!! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] crhyatt opened a new pull request #8500: [Airflow-3391] Upgrade pendulum to latest major version
crhyatt opened a new pull request #8500: URL: https://github.com/apache/airflow/pull/8500 --- Make sure to mark the boxes below before creating PR: [x] - [x] Description above provides context of the change - [x] Unit tests coverage for changes (not needed for documentation changes) - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)" - [x] Relevant documentation is updated including usage instructions. - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example). --- In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org