[jira] [Created] (AIRFLOW-3592) Logs cannot be viewed while in rescheduled state
Stefan Seelmann created AIRFLOW-3592: Summary: Logs cannot be viewed while in rescheduled state Key: AIRFLOW-3592 URL: https://issues.apache.org/jira/browse/AIRFLOW-3592 Project: Apache Airflow Issue Type: Sub-task Components: webserver Affects Versions: 1.10.1 Reporter: Stefan Seelmann Fix For: 1.10.2, 2.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors
seelmann commented on issue #3596: [AIRFLOW-2747] Explicit re-schedule of sensors URL: https://github.com/apache/incubator-airflow/pull/3596#issuecomment-450479933 I created several sub-tasks in https://issues.apache.org/jira/browse/AIRFLOW-2747 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3591) Fix start date, end date, duration for rescheduled tasks
Stefan Seelmann created AIRFLOW-3591: Summary: Fix start date, end date, duration for rescheduled tasks Key: AIRFLOW-3591 URL: https://issues.apache.org/jira/browse/AIRFLOW-3591 Project: Apache Airflow Issue Type: Sub-task Components: webserver Affects Versions: 1.10.1 Reporter: Stefan Seelmann Fix For: 1.10.2, 2.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] odracci commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
odracci commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450485058 @dimberman I mentioned it in https://github.com/apache/incubator-airflow/pull/3770/files#diff-bbf16e7665ac448883f2ceeb40db35cdR624 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3594) Update License Headers in Python Files
Felix Uellendall created AIRFLOW-3594: - Summary: Update License Headers in Python Files Key: AIRFLOW-3594 URL: https://issues.apache.org/jira/browse/AIRFLOW-3594 Project: Apache Airflow Issue Type: Task Reporter: Felix Uellendall Assignee: Felix Uellendall Some Python Files still have an old version of the Apache License. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3594) Unify different License Header
[ https://issues.apache.org/jira/browse/AIRFLOW-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730667#comment-16730667 ] ASF GitHub Bot commented on AIRFLOW-3594: - feluelle commented on pull request #4399: [AIRFLOW-3594] Unify different License Header URL: https://github.com/apache/incubator-airflow/pull/4399 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3594 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Some files have an old version of the Apache License. This PR updates these and so unifies the license header for all files. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unify different License Header > -- > > Key: AIRFLOW-3594 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3594 > Project: Apache Airflow > Issue Type: Task >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Trivial > > Some Files still have an old version of the Apache License. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3462) Refactor: Move TaskReschedule out of models.py
[ https://issues.apache.org/jira/browse/AIRFLOW-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Seelmann reassigned AIRFLOW-3462: Assignee: Stefan Seelmann > Refactor: Move TaskReschedule out of models.py > -- > > Key: AIRFLOW-3462 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3462 > Project: Apache Airflow > Issue Type: Task > Components: models >Affects Versions: 1.10.1 >Reporter: Fokko Driesprong >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3590) In case of reschedule executor should not log success
Stefan Seelmann created AIRFLOW-3590: Summary: In case of reschedule executor should not log success Key: AIRFLOW-3590 URL: https://issues.apache.org/jira/browse/AIRFLOW-3590 Project: Apache Airflow Issue Type: Sub-task Components: executor Reporter: Stefan Seelmann Fix For: 1.10.2, 2.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3593) Allow '@' in usernames.
[ https://issues.apache.org/jira/browse/AIRFLOW-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] will-beta updated AIRFLOW-3593: --- Description: The username provided by *Azure Database for PostgreSQL server* has a '@'. But *sql_alchemy_conn* and *result_backend* in the *airflow.cfg* do not allow it. {panel:title=exception info"} (virtualenv-airflow) AirFlowTest@AirFlowTest:~/airflow$ airflow initdb [2018-12-29 11:00:40,925] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800 [2018-12-29 11:00:41,418] {__init__.py:51} INFO - Using executor CeleryExecutor DB: postgresql+psycopg2://admin%40pg-test1:***@pg-test1.postgres.database.chinacloudapi.cn/airflow [2018-12-29 11:00:41,620] {db.py:338} INFO - Creating tables Traceback (most recent call last): File "/home/AirFlowTest/virtualenv-airflow/bin/airflow", line 32, in args.func(args) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 1011, in initdb db_utils.initdb(settings.RBAC) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", line 92, in initdb upgradedb() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/airflow/utils/db.py", line 346, in upgradedb command.upgrade(config, 'heads') File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/alembic/command.py", line 174, in upgrade script.run_env() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/alembic/script/base.py", line 416, in run_env util.load_python_file(self.dir, 'env.py') File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file module = load_module_py(module_id, path) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/alembic/util/compat.py", line 79, in load_module_py mod = imp.load_source(module_id, path, fp) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/airflow/migrations/env.py", line 91, in run_migrations_online() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/airflow/migrations/env.py", line 78, in run_migrations_online with connectable.connect() as connection: File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2091, in connect return self._connection_cls(self, **kwargs) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 90, in __init__ if connection is not None else engine.raw_connection() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2177, in raw_connection self.pool.unique_connection, _connection) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2151, in _wrap_pool_connect e, dialect, self) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1465, in _handle_dbapi_exception_noconnection exc_info File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect return fn() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 328, in unique_connection return _ConnectionFairy._checkout(self) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 768, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 516, in checkout rec = pool._do_get() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 1140, in _do_get self._dec_overflow() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 1137, in _do_get return self._create_connection() File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 333, in _create_connection return _ConnectionRecord(self) File "/home/AirFlowTest/virtualenv-airflow/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 461, in __init__
[GitHub] stale[bot] commented on issue #3605: [AIRFLOW-1238] Decode URL-encoded characters.
stale[bot] commented on issue #3605: [AIRFLOW-1238] Decode URL-encoded characters. URL: https://github.com/apache/incubator-airflow/pull/3605#issuecomment-450503386 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp opened a new pull request #4401: [AIRFLOW-3596] Clean up undefined template variables.
jmcarp opened a new pull request #4401: [AIRFLOW-3596] Clean up undefined template variables. URL: https://github.com/apache/incubator-airflow/pull/4401 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3596 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3594) Unify different License Header
[ https://issues.apache.org/jira/browse/AIRFLOW-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Uellendall updated AIRFLOW-3594: -- Description: Some Files still have an old version of the Apache License. (was: Some Python Files still have an old version of the Apache License.) > Unify different License Header > -- > > Key: AIRFLOW-3594 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3594 > Project: Apache Airflow > Issue Type: Task >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Trivial > > Some Files still have an old version of the Apache License. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feluelle opened a new pull request #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator
feluelle opened a new pull request #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator URL: https://github.com/apache/incubator-airflow/pull/4400 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3595 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - adds missing doc parameter destination_filepath - adds missing file close for tmp file (through ContextManager Usage) - refactoring ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chinhngt commented on issue #4389: [AIRFLOW-3583] Fix AirflowException import
chinhngt commented on issue #4389: [AIRFLOW-3583] Fix AirflowException import URL: https://github.com/apache/incubator-airflow/pull/4389#issuecomment-450495458 @jgao54 Thanks for taking a look. I must missed something then. Below is the exception I got when turning remote logging to wasb on: webserver_1 | Unable to load the config, contains a configuration error. webserver_1 | Traceback (most recent call last): webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 382, in resolve webserver_1 | found = getattr(found, frag) webserver_1 | AttributeError: module 'airflow.utils.log' has no attribute 'wasb_task_handler' webserver_1 | webserver_1 | During handling of the above exception, another exception occurred: webserver_1 | webserver_1 | Traceback (most recent call last): webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 384, in resolve webserver_1 | self.importer(used) webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/wasb_task_handler.py", line 23, in webserver_1 | from airflow.contrib.hooks.wasb_hook import WasbHook webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/wasb_hook.py", line 21, in webserver_1 | from airflow import AirflowException webserver_1 | ImportError: cannot import name 'AirflowException' webserver_1 | webserver_1 | The above exception was the direct cause of the following exception: webserver_1 | webserver_1 | Traceback (most recent call last): webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 558, in configure webserver_1 | handler = self.configure_handler(handlers[name]) webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 708, in configure_handler webserver_1 | klass = self.resolve(cname) webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 391, in resolve webserver_1 | raise v webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 384, in resolve webserver_1 | self.importer(used) webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/wasb_task_handler.py", line 23, in webserver_1 | from airflow.contrib.hooks.wasb_hook import WasbHook webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/wasb_hook.py", line 21, in webserver_1 | from airflow import AirflowException webserver_1 | ValueError: Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'AirflowException' webserver_1 | webserver_1 | During handling of the above exception, another exception occurred: webserver_1 | webserver_1 | Traceback (most recent call last): webserver_1 | File "/usr/local/bin/airflow", line 21, in webserver_1 | from airflow import configuration webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py", line 36, in webserver_1 | from airflow import settings webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py", line 259, in webserver_1 | configure_logging() webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/logging_config.py", line 72, in configure_logging webserver_1 | raise e webserver_1 | File "/usr/local/lib/python3.5/dist-packages/airflow/logging_config.py", line 67, in configure_logging webserver_1 | dictConfig(logging_config) webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 795, in dictConfig webserver_1 | dictConfigClass(config).configure() webserver_1 | File "/usr/lib/python3.5/logging/config.py", line 566, in configure webserver_1 | '%r: %s' % (name, e)) webserver_1 | ValueError: Unable to configure handler 'processor': Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'AirflowException' This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #1964: [AIRFLOW-722] Add Celery queue sensor
stale[bot] commented on issue #1964: [AIRFLOW-722] Add Celery queue sensor URL: https://github.com/apache/incubator-airflow/pull/1964#issuecomment-450503388 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2747: AIRFLOW-1772: Fix bug with handling cron expressions as an schedule i…
stale[bot] commented on issue #2747: AIRFLOW-1772: Fix bug with handling cron expressions as an schedule i… URL: https://github.com/apache/incubator-airflow/pull/2747#issuecomment-450503385 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3594) Unify different License Header
[ https://issues.apache.org/jira/browse/AIRFLOW-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Uellendall updated AIRFLOW-3594: -- Summary: Unify different License Header (was: Update License Headers in Python Files) > Unify different License Header > -- > > Key: AIRFLOW-3594 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3594 > Project: Apache Airflow > Issue Type: Task >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Trivial > > Some Python Files still have an old version of the Apache License. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3595) Add tests for HiveToSambaOperator
Felix Uellendall created AIRFLOW-3595: - Summary: Add tests for HiveToSambaOperator Key: AIRFLOW-3595 URL: https://issues.apache.org/jira/browse/AIRFLOW-3595 Project: Apache Airflow Issue Type: Test Reporter: Felix Uellendall Assignee: Felix Uellendall -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3327) BiqQuery job checking doesn't include location, which api requires outside US/EU
[ https://issues.apache.org/jira/browse/AIRFLOW-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3327. - Resolution: Fixed Fix Version/s: 2.0.0 > BiqQuery job checking doesn't include location, which api requires outside > US/EU > > > Key: AIRFLOW-3327 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3327 > Project: Apache Airflow > Issue Type: Bug >Reporter: Daniel Swiegers >Assignee: Kaxil Naik >Priority: Minor > Labels: google-cloud-bigquery > Fix For: 1.10.2, 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > We use this api but don't set / pass through the geographical location. > Which is required in areas other than US and EU. > Can be seen in contrib/hooks/big_query_hook.py poll_job_complete > [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get] > |The geographic location of the job. Required except for US and EU. See > details at > https://cloud.google.com/bigquery/docs/locations#specifying_your_location.| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feluelle opened a new pull request #4399: [AIRFLOW-3594] Unify different License Header
feluelle opened a new pull request #4399: [AIRFLOW-3594] Unify different License Header URL: https://github.com/apache/incubator-airflow/pull/4399 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3594 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Some files have an old version of the Apache License. This PR updates these and so unifies the license header for all files. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3595) Add tests for HiveToSambaOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730694#comment-16730694 ] ASF GitHub Bot commented on AIRFLOW-3595: - feluelle commented on pull request #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator URL: https://github.com/apache/incubator-airflow/pull/4400 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3595 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - adds missing doc parameter destination_filepath - adds missing file close for tmp file (through ContextManager Usage) - refactoring ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add tests for HiveToSambaOperator > - > > Key: AIRFLOW-3595 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3595 > Project: Apache Airflow > Issue Type: Test >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator
codecov-io edited a comment on issue #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator URL: https://github.com/apache/incubator-airflow/pull/4400#issuecomment-450495465 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=h1) Report > Merging [#4400](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/e3fd7d4d809e18eef85ad24c9c6dbd2ce1c782a1?src=pr=desc) will **increase** coverage by `0.18%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4400/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4400 +/- ## == + Coverage 78.17% 78.35% +0.18% == Files 204 204 Lines 1652916529 == + Hits1292112951 +30 + Misses 3608 3578 -30 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/hive\_to\_samba\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV90b19zYW1iYV9vcGVyYXRvci5weQ==) | `100% <100%> (+100%)` | :arrow_up: | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (-0.05%)` | :arrow_down: | | [airflow/hooks/samba\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9zYW1iYV9ob29rLnB5) | `38.88% <0%> (+38.88%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=footer). Last update [e3fd7d4...5fe5acc](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator
codecov-io commented on issue #4400: [AIRFLOW-3595] Add tests for Hive2SambaOperator URL: https://github.com/apache/incubator-airflow/pull/4400#issuecomment-450495465 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=h1) Report > Merging [#4400](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/e3fd7d4d809e18eef85ad24c9c6dbd2ce1c782a1?src=pr=desc) will **increase** coverage by `0.18%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4400/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4400 +/- ## == + Coverage 78.17% 78.35% +0.18% == Files 204 204 Lines 1652916529 == + Hits1292112951 +30 + Misses 3608 3578 -30 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/hive\_to\_samba\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV90b19zYW1iYV9vcGVyYXRvci5weQ==) | `100% <100%> (+100%)` | :arrow_up: | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (-0.05%)` | :arrow_down: | | [airflow/hooks/samba\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/4400/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9zYW1iYV9ob29rLnB5) | `38.88% <0%> (+38.88%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=footer). Last update [e3fd7d4...5fe5acc](https://codecov.io/gh/apache/incubator-airflow/pull/4400?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3596) Clean up undefined template variables
Josh Carp created AIRFLOW-3596: -- Summary: Clean up undefined template variables Key: AIRFLOW-3596 URL: https://issues.apache.org/jira/browse/AIRFLOW-3596 Project: Apache Airflow Issue Type: Improvement Reporter: Josh Carp Assignee: Josh Carp Several jinja templates refer to variables that are never defined. We should either provide those variables or stop using them in the templates. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-496) HiveServer2Hook invokes incorrect Auth mechanism when user not specified
[ https://issues.apache.org/jira/browse/AIRFLOW-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730891#comment-16730891 ] jack commented on AIRFLOW-496: -- The problem is not in Impala it's in Airflow. Airflow uses Impala as a library so it should send parameters in the required format. I assume it's caused here: https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L766 > HiveServer2Hook invokes incorrect Auth mechanism when user not specified > > > Key: AIRFLOW-496 > URL: https://issues.apache.org/jira/browse/AIRFLOW-496 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks >Reporter: Shreyas Joshi >Assignee: Sandish Kumar HN >Priority: Major > > h3. Summary > {{HiveServer2Hook}} Seems to be ignoring the auth_mechanism when the user is > not specified. I am not entirely sure if the solution should be should change > impyala or Airflow. > h3. Reproducing the problem > With this connection string for Hive: > {{AIRFLOW_CONN_GH_HIVE=hive2://@localhost:1/}} (No user name and no > password) > I get the following error from {{HiveServer2hook}}: > {code} > from airflow.hooks import HiveServer2Hook > hive_hook = HiveServer2Hook (hiveserver2_conn_id='GH_HIVE') > {code} > {noformat} > [2016-09-08 14:30:52,420] {base_hook.py:53} INFO - Using connection to: > localhost > Traceback (most recent call last): > File "", line 1, in > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/airflow/hooks/hive_hooks.py", > line 464, in get_conn > database=db.schema or 'default') > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/impala/dbapi.py", > line 147, in connect > auth_mechanism=auth_mechanism) > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/impala/hiveserver2.py", > line 658, in connect > transport.open() > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/thrift_sasl/__init__.py", > line 72, in open > message=("Could not start SASL: %s" % self.sasl.getError())) > thriftpy.transport.TTransportException: TTransportException(type=1, > message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no > mechanism available: No worthy mechs found'") > {noformat} > h3. More detail > [Here|https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L591] > {{db.login}} ends up being an empty string rather than {{None}}. This seems > to cause impala to try sasl. Changing {{db.login}} from an empty string to > {{None}} seems to fix the issue. > So, the following does not work > {code} > from impala.dbapi import connect > connect (host='localhost', port=1, user='', auth_mechanism='PLAIN', > database= 'default') > {code} > The error is: > {noformat} > Traceback (most recent call last): > File "", line 1, in > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/impala/dbapi.py", > line 147, in connect > auth_mechanism=auth_mechanism) > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/impala/hiveserver2.py", > line 658, in connect > transport.open() > File > "/Users/shreyasjoshis/python-envs/default-env/lib/python3.5/site-packages/thrift_sasl/__init__.py", > line 72, in open > message=("Could not start SASL: %s" % self.sasl.getError())) > thriftpy.transport.TTransportException: TTransportException(type=1, > message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no > mechanism available: No worthy mechs found'") > {noformat} > But the following does: > {code} > from impala.dbapi import connect > connect (host='localhost', port=1, user=None, auth_mechanism='PLAIN', > database= 'default') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jgao54 commented on issue #4389: [AIRFLOW-3583] Fix AirflowException import
jgao54 commented on issue #4389: [AIRFLOW-3583] Fix AirflowException import URL: https://github.com/apache/incubator-airflow/pull/4389#issuecomment-450528332 @chinhngt actually you are right, was able to reproduce. I'd expect airflow init module to be imported but turns out for logging config it's not the case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3327) BiqQuery job checking doesn't include location, which api requires outside US/EU
[ https://issues.apache.org/jira/browse/AIRFLOW-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3327: Fix Version/s: (was: 2.0.0) > BiqQuery job checking doesn't include location, which api requires outside > US/EU > > > Key: AIRFLOW-3327 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3327 > Project: Apache Airflow > Issue Type: Bug >Reporter: Daniel Swiegers >Assignee: Kaxil Naik >Priority: Minor > Labels: google-cloud-bigquery > Fix For: 1.10.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > We use this api but don't set / pass through the geographical location. > Which is required in areas other than US and EU. > Can be seen in contrib/hooks/big_query_hook.py poll_job_complete > [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get] > |The geographic location of the job. Required except for US and EU. See > details at > https://cloud.google.com/bigquery/docs/locations#specifying_your_location.| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450533177 @odracci yeah that LGTM then. @Fokko good to go! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-2790) snakebite syntax error: baseTime = min(time * (1L << retries), cap);
[ https://issues.apache.org/jira/browse/AIRFLOW-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yohei Onishi reassigned AIRFLOW-2790: - Assignee: Yohei Onishi > snakebite syntax error: baseTime = min(time * (1L << retries), cap); > > > Key: AIRFLOW-2790 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2790 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Affects Versions: 1.9.0 > Environment: Amazon Linux >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > Does anybody know how can I fix this issue? > * Got the following error when importing > airflow.operators.sensors.ExternalTaskSensor. > * apache-airflow 1.9.0 depends on snakebite 2.11.0 and it does not work with > Python3. https://github.com/spotify/snakebite/issues/250 > [2018-07-23 06:42:51,828] \{models.py:288} ERROR - Failed to import: > /home/airflow/airflow/dags/example_task_sensor2.py > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, > in process_file > m = imp.load_source(mod_name, filepath) > File "/usr/lib64/python3.6/imp.py", line 172, in load_source > module = _load(spec) > File "", line 675, in _load > File "", line 655, in _load_unlocked > File "", line 678, in exec_module > File "", line 205, in _call_with_frames_removed > File "/home/airflow/airflow/dags/example_task_sensor2.py", line 10, in > > from airflow.operators.sensors import ExternalTaskSensor > File "/usr/local/lib/python3.6/site-packages/airflow/operators/sensors.py", > line 34, in > from airflow.hooks.hdfs_hook import HDFSHook > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/hdfs_hook.py", > line 20, in > from snakebite.client import Client, HAClient, Namenode, AutoConfigClient > File "/usr/local/lib/python3.6/site-packages/snakebite/client.py", line 1473 > baseTime = min(time * (1L << retries), cap); > ^ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730906#comment-16730906 ] Yohei Onishi commented on AIRFLOW-3571: --- OK will do > GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS > to BiqQuery but a task is failed > - > > Key: AIRFLOW-3571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I am using the following service in asia-northeast1-c zone. * GCS: > asia-northeast1-c > * BigQuery dataset and table: asia-northeast1-c > * Composer: asia-northeast1-c > My task created by GoogleCloudStorageToBigQueryOperator succeeded to > uploading CSV file from a GCS bucket to a BigQuery table but the task was > failed due to the following error. > > {code:java} > [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask > bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] > {discovery.py:871} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status > check failed. Final error was: %s', 404) > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 981, in run_with_configuratio > jobId=self.running_job_id).execute( > File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", > line 130, in positional_wrappe > return wrapped(*args, **kwargs > File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line > 851, in execut > raise HttpError(resp, content, uri=self.uri > googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > returned "Not found: Job my-project:job_abc123" > During handling of the above exception, another exception occurred > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line > 237, in execut > time_partitioning=self.time_partitioning > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 951, in run_loa > return self.run_with_configuration(configuration > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 1003, in run_with_configuratio > err.resp.status > Exception: ('BigQuery job status check failed. Final error was: %s', 404 > {code} > The task failed to find a job {color:#ff}fmy-project:job_abc123{color} > but the correct job id is{color:#ff} > my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, > not actual id.) > I suppose the operator does not treat zone properly. > > {code:java} > $ bq show -j my-project:asia-northeast1:job_abc123 > Job my-project:asia-northeast1:job_abc123 > Job Type State Start Time Duration User Email Bytes Processed Bytes Billed > Billing Tier Labels > -- - - -- > -- > - -- -- > load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3316) GCS to BQ operator leaves schema_fields operator unset when autodetect=True
[ https://issues.apache.org/jira/browse/AIRFLOW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730901#comment-16730901 ] jack commented on AIRFLOW-3316: --- I'm unable to reproduce this issue. first, {color:#24292e}schema_fields is optional field. You don't need to assign None. If there is no schema then don't specify this field.{color} {color:#24292e}second, even if you specified schema_fields = None it doesn't matter as this is the default value of schema_fields.{color} {color:#24292e}The block of {color} {code:java} if not self.schema_fields:{code} is there in cases that schema_fields need to be overwrite after this block either it will have a value or it will be None. {color:#24292e}Please provide your DAG for us to test. {color} > GCS to BQ operator leaves schema_fields operator unset when autodetect=True > --- > > Key: AIRFLOW-3316 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3316 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.10.1 >Reporter: Conrad Lee >Assignee: Conrad Lee >Priority: Minor > > When I use the GoogleCloudStorageToBigQueryOperator to load data from Parquet > into BigQuery, I leave the schema_fields argument set to 'None' and set > autodetect=True. > > This causes the following error: > > {code:java} > [2018-11-08 09:42:03,690] {models.py:1736} ERROR - local variable > 'schema_fields' referenced before assignment > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/home/airflow/gcs/plugins/bq_operator_updated.py", line 2018, in > execut > schema_fields=schema_fields > UnboundLocalError: local variable 'schema_fields' referenced before assignmen > {code} > > The problem is this set of checks in which the schema_fields variable is set > neglects to cover all the cases > {code:java} > if not self.schema_fields: > if self.schema_object and self.source_format != 'DATASTORE_BACKUP': > gcs_hook = GoogleCloudStorageHook( > google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, > delegate_to=self.delegate_to) > schema_fields = json.loads(gcs_hook.download( > self.bucket, > self.schema_object).decode("utf-8")) > elif self.schema_object is None and self.autodetect is False: > raise ValueError('At least one of `schema_fields`, `schema_object`, ' > 'or `autodetect` must be passed.') > else: > schema_fields = self.schema_fields > {code} > After the `elif` we need to handle the case where autodetect is set to True. > This can be done by simply adding two lines: > {code:java} > if not self.schema_fields: > if self.schema_object and self.source_format != 'DATASTORE_BACKUP': > gcs_hook = GoogleCloudStorageHook( > google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, > delegate_to=self.delegate_to) > schema_fields = json.loads(gcs_hook.download( > self.bucket, > self.schema_object).decode("utf-8")) > elif self.schema_object is None and self.autodetect is False: > raise ValueError('At least one of `schema_fields`, `schema_object`, ' > 'or `autodetect` must be passed.') > else: > schema_fiels = None > else: > schema_fields = self.schema_fields{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4364: [AIRFLOW-3550] Standardize GKE hook.
kaxil closed pull request #4364: [AIRFLOW-3550] Standardize GKE hook. URL: https://github.com/apache/incubator-airflow/pull/4364 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/gcp_container_hook.py b/airflow/contrib/hooks/gcp_container_hook.py index 3934f07a95..4a610e56c9 100644 --- a/airflow/contrib/hooks/gcp_container_hook.py +++ b/airflow/contrib/hooks/gcp_container_hook.py @@ -21,7 +21,7 @@ import time from airflow import AirflowException, version -from airflow.hooks.base_hook import BaseHook +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook from google.api_core.exceptions import AlreadyExists, NotFound from google.api_core.gapic_v1.method import DEFAULT @@ -34,15 +34,24 @@ OPERATIONAL_POLL_INTERVAL = 15 -class GKEClusterHook(BaseHook): +class GKEClusterHook(GoogleCloudBaseHook): -def __init__(self, project_id, location): -self.project_id = project_id +def __init__(self, + gcp_conn_id='google_cloud_default', + delegate_to=None, + location=None): +super(GKEClusterHook, self).__init__( +gcp_conn_id=gcp_conn_id, delegate_to=delegate_to) +self._client = None self.location = location -# Add client library info for better error tracking -client_info = ClientInfo(client_library_version='airflow_v' + version.version) -self.client = container_v1.ClusterManagerClient(client_info=client_info) +def get_client(self): +if self._client is None: +credentials = self._get_credentials() +# Add client library info for better error tracking +client_info = ClientInfo(client_library_version='airflow_v' + version.version) +self._client = container_v1.ClusterManagerClient(credentials=credentials, client_info=client_info) +return self._client @staticmethod def _dict_to_proto(py_dict, proto): @@ -60,13 +69,15 @@ def _dict_to_proto(py_dict, proto): dict_json_str = json.dumps(py_dict) return json_format.Parse(dict_json_str, proto) -def wait_for_operation(self, operation): +def wait_for_operation(self, operation, project_id=None): """ Given an operation, continuously fetches the status from Google Cloud until either completion or an error occurring :param operation: The Operation to wait for :type operation: A google.cloud.container_V1.gapic.enums.Operator +:param project_id: Google Cloud Platform project ID +:type project_id: str :return: A new, updated operation fetched from Google Cloud """ self.log.info("Waiting for OPERATION_NAME %s" % operation.name) @@ -79,20 +90,22 @@ def wait_for_operation(self, operation): raise exceptions.GoogleCloudError( "Operation has failed with status: %s" % operation.status) # To update status of operation -operation = self.get_operation(operation.name) +operation = self.get_operation(operation.name, project_id=project_id or self.project_id) return operation -def get_operation(self, operation_name): +def get_operation(self, operation_name, project_id=None): """ Fetches the operation from Google Cloud :param operation_name: Name of operation to fetch :type operation_name: str +:param project_id: Google Cloud Platform project ID +:type project_id: str :return: The new, updated operation from Google Cloud """ -return self.client.get_operation(project_id=self.project_id, - zone=self.location, - operation_id=operation_name) +return self.get_client().get_operation(project_id=project_id or self.project_id, + zone=self.location, + operation_id=operation_name) @staticmethod def _append_label(cluster_proto, key, val): @@ -114,7 +127,7 @@ def _append_label(cluster_proto, key, val): cluster_proto.resource_labels.update({key: val}) return cluster_proto -def delete_cluster(self, name, retry=DEFAULT, timeout=DEFAULT): +def delete_cluster(self, name, project_id=None, retry=DEFAULT, timeout=DEFAULT): """ Deletes the cluster, including the Kubernetes endpoint and all worker nodes. Firewalls and routes that were configured during @@ -125,6 +138,8 @@ def delete_cluster(self, name, retry=DEFAULT, timeout=DEFAULT): :param name: The name of the cluster to delete
[jira] [Resolved] (AIRFLOW-3568) S3ToGoogleCloudStorageOperator failed after succeeding in copying files from s3
[ https://issues.apache.org/jira/browse/AIRFLOW-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3568. - Resolution: Fixed Fix Version/s: 1.10.2 https://github.com/apache/incubator-airflow/pull/4371 > S3ToGoogleCloudStorageOperator failed after succeeding in copying files from > s3 > --- > > Key: AIRFLOW-3568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3568 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > Fix For: 1.10.2 > > > I tried to copy files from s3 to gcs using > S3ToGoogleCloudStorageOperator. The file successfully was uploaded to GCS but > the task failed with the following error. > {code:java} > [2018-12-26 07:56:33,062] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,062] {discovery.py:871} INFO - > URL being requested: POST > https://www.googleapis.com/upload/storage/v1/b/stg-rfid-etl-tmp/o?name=rfid_wh%2Fuq%2Fjp%2Fno_resp_carton_1D%2F2018%2F12%2F24%2F21%2Fno_resp_carton_20181224210201.csv=json=media > [2018-12-26 07:56:33,214] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,213] {s3_to_gcs_operator.py:177} > INFO - All done, uploaded 1 files to Google Cloud Storage > [2018-12-26 07:56:33,217] {models.py:1736} ERROR - Object of type 'set' is > not JSON serializable > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1637, in _run_raw_tas > self.xcom_push(key=XCOM_RETURN_KEY, value=result > File "/usr/local/lib/airflow/airflow/models.py", line 1983, in xcom_pus > execution_date=execution_date or self.execution_date > File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrappe > return func(*args, **kwargs > File "/usr/local/lib/airflow/airflow/models.py", line 4531, in se > value = json.dumps(value).encode('UTF-8' > File "/usr/local/lib/python3.6/json/__init__.py", line 231, in dump > return _default_encoder.encode(obj > File "/usr/local/lib/python3.6/json/encoder.py", line 199, in encod > chunks = self.iterencode(o, _one_shot=True > File "/usr/local/lib/python3.6/json/encoder.py", line 257, in iterencod > return _iterencode(o, 0 > File "/usr/local/lib/python3.6/json/encoder.py", line 180, in defaul > o.__class__.__name__ > TypeError: Object of type 'set' is not JSON serializabl > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,217] {models.py:1736} ERROR - > Object of type 'set' is not JSON serializable > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 Traceback (most recent call last): > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1637, in _run_raw_task > [2018-12-26 07:56:33,220] {models.py:1756} INFO - Marking task as UP_FOR_RETRY > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 self.xcom_push(key=XCOM_RETURN_KEY, value=result) > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1983, in xcom_push > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 execution_date=execution_date or > self.execution_date) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/utils/db.py", > line 74, in wrapper > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return func(*args, **kwargs) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 4531, in set > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 value = json.dumps(value).encode('UTF-8') > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/__init__.py", > line 231, in dumps > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return _default_encoder.encode(obj) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/encoder.py", > line 199, in encode >
[jira] [Commented] (AIRFLOW-2939) `set` fails in case of `exisiting_files is None` and in case of `json.dumps`
[ https://issues.apache.org/jira/browse/AIRFLOW-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730853#comment-16730853 ] ASF GitHub Bot commented on AIRFLOW-2939: - kaxil commented on pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > `set` fails in case of `exisiting_files is None` and in case of `json.dumps` > > > Key: AIRFLOW-2939 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2939 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 2.0.0 >Reporter: Kiyoshi Nomo >Assignee: Yohei Onishi >Priority: Major > Fix For: 1.10.2 > > > h1. Problems > h2. TypeError: 'NoneType' object is not iterable > [https://github.com/apache/incubator-airflow/blob/06b62c42b0b55ea55b86b130317594738d2f36a2/airflow/contrib/operators/gcs_to_s3.py#L91] > > {code:java} > >>> set(None) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'NoneType' object is not iterable > {code} > > h2. TypeError: set(['a']) is not JSON serializable > [https://github.com/apache/incubator-airflow/blob/b78c7fb8512f7a40f58b46530e9b3d5562fe84ea/airflow/models.py#L4483] > > {code:python} > >>> json.dumps(set(['a'])) > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/__init__.py", > line 244, in dumps > return _default_encoder.encode(obj) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 270, in iterencode > return _iterencode(o, 0) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 184, in default > raise TypeError(repr(o) + " is not JSON serializable") > TypeError: set(['a']) is not JSON serializable > {code} > > h1. Solution > * Check that the existing fils is not None. > * Convert it to the `set` and return it to the `list` after get to the > difference of files. > {code:python} > if existing_files is not None: > files = list(set(files) - set(existing_files)) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3371) BigQueryHook's Ability to Create View
[ https://issues.apache.org/jira/browse/AIRFLOW-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3371: Fix Version/s: (was: 2.0.0) 1.10.2 > BigQueryHook's Ability to Create View > - > > Key: AIRFLOW-3371 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3371 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Ryan Yuan >Assignee: Ryan Yuan >Priority: Major > Fix For: 1.10.2 > > > Modify *BigQueryBaseCursor.create_empty_table()* to take in an optional > 'view' parameter to create view in BigQuery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3355) Fix BigQueryCursor.execute to work with Python3
[ https://issues.apache.org/jira/browse/AIRFLOW-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3355: Fix Version/s: (was: 2.0.0) 1.10.2 > Fix BigQueryCursor.execute to work with Python3 > --- > > Key: AIRFLOW-3355 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3355 > Project: Apache Airflow > Issue Type: Bug > Components: gcp, hooks >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.2 > > > {{BigQueryCursor.execute}} uses {{dict.iteritems}} internally, so it fails > with Python3 if binding parameters are provided. > {code} > In [1]: import sys > In [2]: sys.version > Out[2]: '3.6.6 (default, Sep 12 2018, 18:26:19) \n[GCC 8.0.1 20180414 > (experimental) [trunk revision 259383]]' > In [3]: from airflow.contrib.hooks.bigquery_hook import BigQueryHook > In [4]: hook = BigQueryHook() > In [5]: conn = hook.get_conn() > [2018-11-15 19:01:35,856] {discovery.py:267} INFO - URL being requested: GET > https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest > In [6]: cur = conn.cursor() > In [7]: cur.execute("SELECT count(*) FROM ds.t WHERE c = %(v)d", {"v": 0}) > --- > AttributeErrorTraceback (most recent call last) > in > > 1 cur.execute("SELECT count(*) FROM ds.t WHERE c = %(v)d", {"v": 0}) > ~/dev/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py in > execute(self, operation, parameters) >1561 """ >1562 sql = _bind_parameters(operation, > -> 1563parameters) if parameters else > operation >1564 self.job_id = self.run_query(sql) >1565 > ~/dev/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py in > _bind_parameters(operation, parameters) >1684 # inspired by MySQL Python Connector (conversion.py) >1685 string_parameters = {} > -> 1686 for (name, value) in parameters.iteritems(): >1687 if value is None: >1688 string_parameters[name] = 'NULL' > AttributeError: 'dict' object has no attribute 'iteritems' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3332) Add BigQuery Streaming insert_all to BigQueryHook
[ https://issues.apache.org/jira/browse/AIRFLOW-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3332: Fix Version/s: (was: 2.0.0) 1.10.2 > Add BigQuery Streaming insert_all to BigQueryHook > - > > Key: AIRFLOW-3332 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3332 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Ryan Yuan >Assignee: Ryan Yuan >Priority: Major > Fix For: 1.10.2 > > > Add a function to BigQueryHook to allow inserting one or more rows into a > BigQuery table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2863) GKEClusterHook catches wrong exception
[ https://issues.apache.org/jira/browse/AIRFLOW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-2863. - Resolution: Fixed Fix Version/s: 1.10.2 > GKEClusterHook catches wrong exception > -- > > Key: AIRFLOW-2863 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2863 > Project: Apache Airflow > Issue Type: Bug >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Fix For: 1.10.2 > > > Instead of successfully catching the error and reporting success, it reports > a failure, since it catches the wrong error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id
[ https://issues.apache.org/jira/browse/AIRFLOW-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730856#comment-16730856 ] ASF GitHub Bot commented on AIRFLOW-3550: - kaxil commented on pull request #4364: [AIRFLOW-3550] Standardize GKE hook. URL: https://github.com/apache/incubator-airflow/pull/4364 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > GKEClusterHook doesn't use gcp_conn_id > -- > > Key: AIRFLOW-3550 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3550 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0, 1.10.1 >Reporter: Wilson Lian >Priority: Major > > The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using > the default service account (if present). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2917) Set AIRFLOW__CORE__SQL_ALCHEMY_CONN only when needed for k8s executor
[ https://issues.apache.org/jira/browse/AIRFLOW-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2917: Fix Version/s: (was: 2.0.0) 1.10.2 > Set AIRFLOW__CORE__SQL_ALCHEMY_CONN only when needed for k8s executor > - > > Key: AIRFLOW-2917 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2917 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: 1.10.0 >Reporter: John Cheng >Assignee: John Cheng >Priority: Minor > Fix For: 1.10.2 > > > In Kubernetes executor, `AIRFLOW__CORE__SQL_ALCHEMY_CONN` is set as an > environment variable even when it is specified in configmap or secrets. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik reopened AIRFLOW-2997: - > Support for clustered tables in Bigquery hooks/operators > > > Key: AIRFLOW-2997 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2997 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Gordon Ball >Priority: Minor > Fix For: 1.10.2 > > > Bigquery support for clustered tables was added (at GCP "Beta" level) on > 2018-07-30. This feature allows load or table-creating query operations to > request that data be stored sorted by a subset of columns, allowing more > efficient (and potentially cheaper) subsequent queries. > Support for specifying fields to cluster on should be added to at least the > bigquery hook, load-from-GCS operator and query operator. > Documentation: https://cloud.google.com/bigquery/docs/clustered-tables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2640) Add Cassandra table sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2640: Fix Version/s: (was: 2.0.0) 1.10.2 > Add Cassandra table sensor > -- > > Key: AIRFLOW-2640 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2640 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.2 > > > Just like a partition sensor for Hive, add a sensor to wait for a table to be > created in a Cassandra cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2916) Add argument `verify` for AwsHook() and S3 related sensors/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2916: Fix Version/s: (was: 2.0.0) 1.10.2 > Add argument `verify` for AwsHook() and S3 related sensors/operators > > > Key: AIRFLOW-2916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2916 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks, operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > Fix For: 1.10.2 > > > The AwsHook() and S3-related operators/sensors are depending on package boto3. > In boto3, when we initiate a client or a resource, argument `verify` is > provided (https://boto3.readthedocs.io/en/latest/reference/core/session.html > ). > It is useful when > # users want to use a different CA cert bundle than the one used by botocore. > # users want to have '--no-verify-ssl'. This is especially useful when we're > using on-premises S3 or other implementations of object storage, like IBM's > Cloud Object Storage. > However, this feature is not provided in Airflow for S3 yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2889) Fix typos detected by github.com/client9/misspell
[ https://issues.apache.org/jira/browse/AIRFLOW-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2889: Fix Version/s: (was: 2.0.0) 1.10.2 > Fix typos detected by github.com/client9/misspell > - > > Key: AIRFLOW-2889 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2889 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kazuhiro Sera >Priority: Minor > Fix For: 1.10.2 > > > Fixing typos is sometimes very hard. It's not so easy to visually review > them. Recently, I discovered a very useful tool for it, > [misspell](https://github.com/client9/misspell). > This pull request fixes minor typos detected by > [misspell](https://github.com/client9/misspell) except for the false > positives. If you would like me to work on other files as well, let me know. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-491) Add cache parameter in BigQuery query method
[ https://issues.apache.org/jira/browse/AIRFLOW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-491: --- Fix Version/s: (was: 2.0.0) 1.10.2 > Add cache parameter in BigQuery query method > > > Key: AIRFLOW-491 > URL: https://issues.apache.org/jira/browse/AIRFLOW-491 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp >Affects Versions: 1.7.1 >Reporter: Chris Riccomini >Assignee: Iuliia Volkova >Priority: Major > Fix For: 1.10.2 > > > The current BigQuery query() method does not have a user_query_cache > parameter. This param always defaults to true (see > [here|https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query]). > I'd like to disable query caching for some data consistency checks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2758) Add a sensor for MongoDB
[ https://issues.apache.org/jira/browse/AIRFLOW-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2758: Fix Version/s: (was: 2.0.0) 1.10.2 > Add a sensor for MongoDB > > > Key: AIRFLOW-2758 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2758 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2755) k8s workers think DAGs are always in `/tmp/dags`
[ https://issues.apache.org/jira/browse/AIRFLOW-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2755: Fix Version/s: (was: 2.0.0) 1.10.2 > k8s workers think DAGs are always in `/tmp/dags` > > > Key: AIRFLOW-2755 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2755 > Project: Apache Airflow > Issue Type: Bug > Components: configuration, worker >Reporter: Aldo >Assignee: Aldo >Priority: Minor > Fix For: 1.10.2 > > > We have Airflow configured to use the `KubernetesExecutor` and run tasks in > newly created pods. > I tried to use the `PythonOperator` to import the python callable from a > python module located in the DAGs directory as [that should be > possible|https://github.com/apache/incubator-airflow/blob/c7a472ed6b0d8a4720f57ba1140c8cf665757167/airflow/__init__.py#L42]. > Airflow complained that the module was not found. > After a fair amount of digging we found that the issue was that the workers > have the `AIRFLOW__CORE__DAGS_FOLDER` environment variable set to `/tmp/dags` > as [you can see from the > code|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/kubernetes/worker_configuration.py#L84]. > Unset that environment variable from within the task's pod and running the > task manually worked as expected. I think that this path should be > configurable (I'll give it a try to add a `kubernetes.worker_dags_folder` > configuration). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2655) Default Kubernetes worker configurations are inconsistent
[ https://issues.apache.org/jira/browse/AIRFLOW-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2655: Fix Version/s: (was: 2.0.0) 1.10.2 > Default Kubernetes worker configurations are inconsistent > - > > Key: AIRFLOW-2655 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2655 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: 1.10.0 >Reporter: Shintaro Murakami >Priority: Minor > Fix For: 1.10.2 > > > if optional config `airflow_configmap` is not set, the worker configured with > `LocalExecutor` and sql_alchemy_conn starts with `sqlite`. > This combination is not allowed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3402) Set default kubernetes affinity and toleration settings in airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3402: Fix Version/s: 1.10.2 > Set default kubernetes affinity and toleration settings in airflow.cfg > -- > > Key: AIRFLOW-3402 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3402 > Project: Apache Airflow > Issue Type: Improvement > Components: kubernetes >Reporter: Kevin Pullin >Priority: Major > Fix For: 1.10.2 > > > Currently airflow supports setting kubernetes `affinity` and `toleration` > configuration inside dags using either a `KubernetesExecutorConfig` > definition or using the `KubernetesPodOperator`. > In order to reduce having to set and maintain this configuration in every > dag, it'd be useful to have the ability to set these globally in the > airflow.cfg file. One use case is to force all kubernetes pods to run on a > particular set of dedicated airflow nodes, which requires both affinity rules > and tolerations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] stale[bot] closed pull request #4127: Bug Fix: Secrets object and key separated by ":"
stale[bot] closed pull request #4127: Bug Fix: Secrets object and key separated by ":" URL: https://github.com/apache/incubator-airflow/pull/4127 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/kubernetes/worker_configuration.py b/airflow/contrib/kubernetes/worker_configuration.py index 74658e384a..83fa93e431 100644 --- a/airflow/contrib/kubernetes/worker_configuration.py +++ b/airflow/contrib/kubernetes/worker_configuration.py @@ -97,7 +97,7 @@ def _get_secrets(self): """Defines any necessary secrets for the pod executor""" worker_secrets = [] for env_var_name, obj_key_pair in six.iteritems(self.kube_config.kube_secrets): -k8s_secret_obj, k8s_secret_key = obj_key_pair.split('=') +k8s_secret_obj, k8s_secret_key = obj_key_pair.split(':') worker_secrets.append( Secret('env', env_var_name, k8s_secret_obj, k8s_secret_key)) return worker_secrets This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2939) `set` fails in case of `exisiting_files is None` and in case of `json.dumps`
[ https://issues.apache.org/jira/browse/AIRFLOW-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-2939. - Resolution: Fixed Fix Version/s: 1.10.2 Resolved by https://github.com/apache/incubator-airflow/pull/4371 > `set` fails in case of `exisiting_files is None` and in case of `json.dumps` > > > Key: AIRFLOW-2939 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2939 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 2.0.0 >Reporter: Kiyoshi Nomo >Assignee: Yohei Onishi >Priority: Major > Fix For: 1.10.2 > > > h1. Problems > h2. TypeError: 'NoneType' object is not iterable > [https://github.com/apache/incubator-airflow/blob/06b62c42b0b55ea55b86b130317594738d2f36a2/airflow/contrib/operators/gcs_to_s3.py#L91] > > {code:java} > >>> set(None) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'NoneType' object is not iterable > {code} > > h2. TypeError: set(['a']) is not JSON serializable > [https://github.com/apache/incubator-airflow/blob/b78c7fb8512f7a40f58b46530e9b3d5562fe84ea/airflow/models.py#L4483] > > {code:python} > >>> json.dumps(set(['a'])) > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/__init__.py", > line 244, in dumps > return _default_encoder.encode(obj) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 270, in iterencode > return _iterencode(o, 0) > File "/usr/local/opt/pyenv/versions/2.7.11/lib/python2.7/json/encoder.py", > line 184, in default > raise TypeError(repr(o) + " is not JSON serializable") > TypeError: set(['a']) is not JSON serializable > {code} > > h1. Solution > * Check that the existing fils is not None. > * Convert it to the `set` and return it to the `list` after get to the > difference of files. > {code:python} > if existing_files is not None: > files = list(set(files) - set(existing_files)) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2887) Add to BigQueryBaseCursor methods for creating insert dataset
[ https://issues.apache.org/jira/browse/AIRFLOW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2887: Fix Version/s: (was: 2.0.0) 1.10.2 > Add to BigQueryBaseCursor methods for creating insert dataset > - > > Key: AIRFLOW-2887 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2887 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > Fix For: 1.10.2 > > > In BigQueryBaseCursor exist only: > def delete_dataset(self, project_id, dataset_id) > And there are no hook to > create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
kaxil closed pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/operators/gcs_to_s3.py b/airflow/contrib/operators/gcs_to_s3.py index 23a4e9cec8..6029661f37 100644 --- a/airflow/contrib/operators/gcs_to_s3.py +++ b/airflow/contrib/operators/gcs_to_s3.py @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) if files: hook = GoogleCloudStorageHook( diff --git a/airflow/contrib/operators/s3_to_gcs_operator.py b/airflow/contrib/operators/s3_to_gcs_operator.py index 6fbe2c0b83..9008c2da1c 100644 --- a/airflow/contrib/operators/s3_to_gcs_operator.py +++ b/airflow/contrib/operators/s3_to_gcs_operator.py @@ -152,7 +152,7 @@ def execute(self, context): else: existing_files.append(f) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) if len(files) > 0: self.log.info('{0} files are going to be synced: {1}.'.format( len(files), files)) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services