[jira] [Created] (AIRFLOW-2931) Make Quick Start work when user doesn't have `cryptography` installed
Mitchell Lloyd created AIRFLOW-2931: --- Summary: Make Quick Start work when user doesn't have `cryptography` installed Key: AIRFLOW-2931 URL: https://issues.apache.org/jira/browse/AIRFLOW-2931 Project: Apache Airflow Issue Type: Improvement Affects Versions: Airflow 1.8 Reporter: Mitchell Lloyd Following the [Quick Start guide|https://airflow.incubator.apache.org/start.html] I ran into the following error when running airflow initdb: {code:java} [2018-08-21 21:31:38,602] {__init__.py:45} INFO - Using executor SequentialExecutor DB: sqlite:Users/mitch/projects/airflow/airflow.db [2018-08-21 21:31:38,716] {db.py:312} INFO - Creating tables INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. [2018-08-21 21:31:38,792] {models.py:643} ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "/Users/mitch/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/models.py", line 105, in get_fernet return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) File "/Users/mitch/.pyenv/versions/3.6.6/lib/python3.6/site-packages/cryptography/fernet.py", line 34, in __init__ key = base64.urlsafe_b64decode(key) File "/Users/mitch/.pyenv/versions/3.6.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/Users/mitch/.pyenv/versions/3.6.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding{code} My airflow.cfg file contained {code:java} # Secret key to save connection passwords in the db fernet_key = cryptography_not_found_storing_passwords_in_plain_text{code} [~kevcampb] Helped me on gitter and directed me to [this link|https://bcb.github.io/airflow/fernet-key] which resolved my issue. It seems that the quick start instructions should be updated to include this fernet key generation information or perhaps there is a way to ensure that cryptography is installed when apache-airflow is installed and automatically generate a key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r211827451 ## File path: airflow/executors/celery_executor.py ## @@ -84,7 +84,7 @@ def execute_async(self, key, command, self.log.info("[celery] queuing {key} through celery, " "queue={queue}".format(**locals())) self.tasks[key] = execute_command.apply_async( -args=[command], queue=queue) +args=command, queue=queue) Review comment: I would like to. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
feng-tao commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r211821195 ## File path: airflow/executors/celery_executor.py ## @@ -84,7 +84,7 @@ def execute_async(self, key, command, self.log.info("[celery] queuing {key} through celery, " "queue={queue}".format(**locals())) self.tasks[key] = execute_command.apply_async( -args=[command], queue=queue) +args=command, queue=queue) Review comment: @YingboWang , good find. Do you want to create a pr for this issue? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work started] (AIRFLOW-2930) scheduler exit when using celery executor
[ https://issues.apache.org/jira/browse/AIRFLOW-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2930 started by Yingbo Wang. > scheduler exit when using celery executor > - > > Key: AIRFLOW-2930 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2930 > Project: Apache Airflow > Issue Type: Bug > Components: celery >Reporter: Yingbo Wang >Assignee: Yingbo Wang >Priority: Major > > Caused by: > [https://github.com/apache/incubator-airflow/pull/3740] > > Use CeleryExecutor for airflow, scheduler exit after a Dag is activated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2930) scheduler exit when using celery executor
Yingbo Wang created AIRFLOW-2930: Summary: scheduler exit when using celery executor Key: AIRFLOW-2930 URL: https://issues.apache.org/jira/browse/AIRFLOW-2930 Project: Apache Airflow Issue Type: Bug Components: celery Reporter: Yingbo Wang Assignee: Yingbo Wang Caused by: [https://github.com/apache/incubator-airflow/pull/3740] Use CeleryExecutor for airflow, scheduler exit after a Dag is activated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on issue #3754: [Airflow-1737] Handle `execution_date`s with fractional seconds in www/views
feng-tao commented on issue #3754: [Airflow-1737] Handle `execution_date`s with fractional seconds in www/views URL: https://github.com/apache/incubator-airflow/pull/3754#issuecomment-414893693 @7yl4r , I don't think we update 1.9 branch any more. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414892947 lgtm. @Fokko , I think those are from your local build dir. I can't find those in my repo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao removed a comment on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
feng-tao removed a comment on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414890984 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao removed a comment on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
feng-tao removed a comment on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414891090 @Fokko , I think the files are from your local build dir. I can't find from my repo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414891090 @Fokko , I think the files are from your local build dir. I can't find from my repo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
feng-tao commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414890984 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3773: [AIRFLOW-2921][AIRFLOW-2922] Fix two potential bugs in CeleryExecutor()
XD-DENG commented on issue #3773: [AIRFLOW-2921][AIRFLOW-2922] Fix two potential bugs in CeleryExecutor() URL: https://github.com/apache/incubator-airflow/pull/3773#issuecomment-414875711 Thanks @yrqls21 ! Luckily Big `O` is not involved this time ;-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ChengzhiZhao commented on issue #3730: [AIRFLOW-2882] Add import and export for pool cli using JSON
ChengzhiZhao commented on issue #3730: [AIRFLOW-2882] Add import and export for pool cli using JSON URL: https://github.com/apache/incubator-airflow/pull/3730#issuecomment-414874406 I added a new JIRA ticket to add set and get in pool class to get more descriptive response: https://issues.apache.org/jira/browse/AIRFLOW-2929 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on issue #3773: [AIRFLOW-2921][AIRFLOW-2922] Fix two potential bugs in CeleryExecutor()
yrqls21 commented on issue #3773: [AIRFLOW-2921][AIRFLOW-2922] Fix two potential bugs in CeleryExecutor() URL: https://github.com/apache/incubator-airflow/pull/3773#issuecomment-414874161 Great catch, we have actually a patch in Airbnb that covered the 2nd bug( involves more changes and we're baking it in our env). I agree on the fix for the 1st bug and its test. Also agree that it is very hard to have test to guard the 2nd bug. Tyvm for the fixes, LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2929) Add get and set for pool class in models.py
Chengzhi Zhao created AIRFLOW-2929: -- Summary: Add get and set for pool class in models.py Key: AIRFLOW-2929 URL: https://issues.apache.org/jira/browse/AIRFLOW-2929 Project: Apache Airflow Issue Type: Improvement Components: models Affects Versions: 1.9.0 Reporter: Chengzhi Zhao Assignee: Chengzhi Zhao Currently Pool class in models.py doesn't have get and set method, I suggest we add those methods to make them similar as Variable/Connection class, it will also be easier to have cli get more descriptive response as discussed here https://github.com/apache/incubator-airflow/pull/3730#discussion_r210185544 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211802793 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: Yup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r211801472 ## File path: airflow/executors/celery_executor.py ## @@ -84,7 +84,7 @@ def execute_async(self, key, command, self.log.info("[celery] queuing {key} through celery, " "queue={queue}".format(**locals())) self.tasks[key] = execute_command.apply_async( -args=[command], queue=queue) +args=command, queue=queue) Review comment: @bolkedebruin Just found an issue for this commit. The update for function "apply_async" will crash scheduler when using celery executor. Tested in local environment. Concern for update in line 87: >**Before**: >`args=[command]` and command is a unicode type string . > samples: `command = "airflow run dag323..."` and at this time `args = ["airflow run example"]` >`execute_command("airflow run dag323")` asynchronously >**After**: >`args=command` and command is a list of short unicode strings. > samples: `command = ["airflow", "run", "dag323",...]` and now`args = ["airflow", "run", "dag323", ...]` >`execute_command("airflow", "run", "dag323",...)` ** Error here** `execute_command.apply_async(args = command, queue = queue)` will pass a list of arguments rather than one to function `execute_command` which is defined as taking only one argument. The test in test_celery_executor.py right now only have one item in the testing `command` list and can not cover this case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch
YingboWang commented on a change in pull request #3740: [AIRFLOW-2888] Remove shell=True and bash from task launch URL: https://github.com/apache/incubator-airflow/pull/3740#discussion_r211801472 ## File path: airflow/executors/celery_executor.py ## @@ -84,7 +84,7 @@ def execute_async(self, key, command, self.log.info("[celery] queuing {key} through celery, " "queue={queue}".format(**locals())) self.tasks[key] = execute_command.apply_async( -args=[command], queue=queue) +args=command, queue=queue) Review comment: @bolkedebruin Just found an issue for this commit. The update for function "apply_async" will crash scheduler when using celery executor. Tested in local environment. Concern for update in line 87: **Before**: >`args=[command]` and command is a unicode type string . > samples: `command = "airflow run dag323..."` and at this time `args = ["airflow run example"]` >`execute_command("airflow run dag323")` asynchronously **After**: >`args=command` and command is a list of short unicode strings. > samples: `command = ["airflow", "run", "dag323",...]` and now`args = ["airflow", "run", "dag323", ...]` >`execute_command("airflow", "run", "dag323",...)` ** Error here** `execute_command.apply_async(args = command, queue = queue)` will pass a list of arguments rather than one to function `execute_command` which is defined as taking only one argument. The test in test_celery_executor.py right now only have one item in the testing `command` list and can not cover this case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211783774 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: So, you think it is safe to drop the test for `TEMPORARILY_UNAVAILABLE` code in the JSON response, and simply retry for any 5XX error? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3777: Add Flipp to list of Airflow users
kaxil commented on a change in pull request #3777: Add Flipp to list of Airflow users URL: https://github.com/apache/incubator-airflow/pull/3777#discussion_r211774722 ## File path: README.md ## @@ -144,6 +144,7 @@ Currently **officially** using Airflow: 1. [Easy Taxi](http://www.easytaxi.com/) [[@caique-lima](https://github.com/caique-lima) & [@WesleyBatista](https://github.com/WesleyBatista) & [@diraol](https://github.com/diraol)] 1. [eRevalue](https://www.datamaran.com) [[@hamedhsn](https://github.com/hamedhsn)] 1. [evo.company](https://evo.company/) [[@orhideous](https://github.com/orhideous)] +1. [Flipp] (https://www.flipp.com) [@sethwilsonwishabi] Review comment: It should be ``` [Flipp](https://www.flipp.com) [[@sethwilsonwishabi](https://github.com/sethwilsonwishabi)] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #3778: [AIRFLOW-XXX] Fix some operator names in the docs
kaxil closed pull request #3778: [AIRFLOW-XXX] Fix some operator names in the docs URL: https://github.com/apache/incubator-airflow/pull/3778 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/concepts.rst b/docs/concepts.rst index 09c1293806..50c18c9b98 100644 --- a/docs/concepts.rst +++ b/docs/concepts.rst @@ -115,13 +115,12 @@ Airflow provides operators for many common tasks, including: - ``BashOperator`` - executes a bash command - ``PythonOperator`` - calls an arbitrary Python function - ``EmailOperator`` - sends an email -- ``HTTPOperator`` - sends an HTTP request +- ``SimpleHttpOperator`` - sends an HTTP request - ``MySqlOperator``, ``SqliteOperator``, ``PostgresOperator``, ``MsSqlOperator``, ``OracleOperator``, ``JdbcOperator``, etc. - executes a SQL command - ``Sensor`` - waits for a certain time, file, database row, S3 key, etc... - In addition to these basic building blocks, there are many more specific -operators: ``DockerOperator``, ``HiveOperator``, ``S3FileTransferOperator``, +operators: ``DockerOperator``, ``HiveOperator``, ``S3FileTransformOperator``, ``PrestoToMysqlOperator``, ``SlackOperator``... you get the idea! The ``airflow/contrib/`` directory contains yet more operators built by the This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness
codecov-io edited a comment on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness URL: https://github.com/apache/incubator-airflow/pull/3779#issuecomment-414834847 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=h1) Report > Merging [#3779](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3779/graphs/tree.svg?height=150=650=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3779 +/- ## == - Coverage 77.67% 77.67% -0.01% == Files 204 204 Lines 1584015840 == - Hits1230412303 -1 - Misses 3536 3537 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3779/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=footer). Last update [27309b1...31c2c8d](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness
codecov-io edited a comment on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness URL: https://github.com/apache/incubator-airflow/pull/3779#issuecomment-414834847 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=h1) Report > Merging [#3779](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3779/graphs/tree.svg?height=150=pr=WdLKlKHOAU=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3779 +/- ## == - Coverage 77.67% 77.67% -0.01% == Files 204 204 Lines 1584015840 == - Hits1230412303 -1 - Misses 3536 3537 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3779/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=footer). Last update [27309b1...31c2c8d](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness
codecov-io commented on issue #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness URL: https://github.com/apache/incubator-airflow/pull/3779#issuecomment-414834847 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=h1) Report > Merging [#3779](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3779/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3779 +/- ## == - Coverage 77.67% 77.67% -0.01% == Files 204 204 Lines 1584015840 == - Hits1230412303 -1 - Misses 3536 3537 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3779/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=footer). Last update [27309b1...31c2c8d](https://codecov.io/gh/apache/incubator-airflow/pull/3779?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211770305 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): Review comment: I think it's best to move it outside of the class here. See discussion in https://stackoverflow.com/questions/735975/static-methods-in-python. > Finally, use staticmethod() sparingly! There are very few situations where static-methods are necessary in Python, and I've seen them used many times where a separate "top-level" function would have been clearer. > You don't really need to use the staticmethod decorator. Just declaring a method (that doesn't expect the self parameter) and call it from the class. The decorator is only there in case you want to be able to call it from an instance as well (which was not what you wanted to do) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211769661 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: Yeah that sounds correct and may be simpler than parsing the response into JSON. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name
[ https://issues.apache.org/jira/browse/AIRFLOW-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588015#comment-16588015 ] ASF GitHub Bot commented on AIRFLOW-2928: - imsut opened a new pull request #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness URL: https://github.com/apache/incubator-airflow/pull/3779 ### Description some components in Airflow use the first 8 bytes of uuid.uuid1 to generate a unique job name. The first 8 bytes, however, seem to come from clock. so if this is called multiple times in a short time period, two ids will likely collide. uuid.uuid4 provides random values. ### Tests ```py Python 2.7.15 (default, Jun 17 2018, 12:46:58) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import uuid >>> for i in range(10): ... uuid.uuid1() ... UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') UUID('e8c25994-a586-11e8-9421-8c859010d0c2') UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') >>> for i in range(10): ... uuid.uuid4() ... UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') UUID('fab63643-ad33-4307-837b-68444fce7240') UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') UUID('4f0a55e8-1357-4697-a345-e60891685b00') UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') Attachments ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use uuid.uuid4 to create unique job name > > > Key: AIRFLOW-2928 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2928 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Ken Kawamoto >Priority: Minor > > some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate > a unique job name. The first 8 bytes, however, seem to come from clock. so if > this is called multiple times in a short time period, two ids will likely > collide. > _uuid.uuid4_ provides random values. > {code} > Python 2.7.15 (default, Jun 17 2018, 12:46:58) > [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import uuid > >>> for i in range(10): > ... uuid.uuid1() > ... > UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') > UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') > UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') > UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') > UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') > UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') > UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') > UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') > UUID('e8c25994-a586-11e8-9421-8c859010d0c2') > UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') > >>> for i in range(10): > ... uuid.uuid4() > ... > UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') > UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') > UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') > UUID('fab63643-ad33-4307-837b-68444fce7240') > UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') > UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') > UUID('4f0a55e8-1357-4697-a345-e60891685b00') > UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') > UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') > UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211763481 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): Review comment: I am tempted to leave it in the class as it is not used anywhere else. But, I have not strong objection to move it above the class definition. I will use `retryable` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] imsut opened a new pull request #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness
imsut opened a new pull request #3779: [AIRFLOW-2928] replace uuid1 with uuid4 for better randomness URL: https://github.com/apache/incubator-airflow/pull/3779 ### Description some components in Airflow use the first 8 bytes of uuid.uuid1 to generate a unique job name. The first 8 bytes, however, seem to come from clock. so if this is called multiple times in a short time period, two ids will likely collide. uuid.uuid4 provides random values. ### Tests ```py Python 2.7.15 (default, Jun 17 2018, 12:46:58) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import uuid >>> for i in range(10): ... uuid.uuid1() ... UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') UUID('e8c25994-a586-11e8-9421-8c859010d0c2') UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') >>> for i in range(10): ... uuid.uuid4() ... UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') UUID('fab63643-ad33-4307-837b-68444fce7240') UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') UUID('4f0a55e8-1357-4697-a345-e60891685b00') UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') Attachments ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name
Ken Kawamoto created AIRFLOW-2928: - Summary: Use uuid.uuid4 to create unique job name Key: AIRFLOW-2928 URL: https://issues.apache.org/jira/browse/AIRFLOW-2928 Project: Apache Airflow Issue Type: Improvement Reporter: Ken Kawamoto some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate a unique job name. The first 8 bytes, however, seem to come from clock. so if this is called multiple times in a short time period, two ids will likely collide. _uuid.uuid4_ provides random values. {code} Python 2.7.15 (default, Jun 17 2018, 12:46:58) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import uuid >>> for i in range(10): ... uuid.uuid1() ... UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') UUID('e8c25994-a586-11e8-9421-8c859010d0c2') UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') >>> for i in range(10): ... uuid.uuid4() ... UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') UUID('fab63643-ad33-4307-837b-68444fce7240') UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') UUID('4f0a55e8-1357-4697-a345-e60891685b00') UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211760792 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: @andrewmchen I'm certainly not claiming that. What I meant is that we don't know what was the HTTP status code when we encountered the `TEMPORARILY_UNAVAILABLE` error (as we did not log the status code). I suppose it is most likely a `500` or something similar. Do you know what the exact status code is in this case? So, yes, the goal of this PR is mostly to add retries to the case when the `TEMPORARILY_UNAVAILABLE` error occurs. Having said that, we could also retry whenever the HTTP status code is >= 500. Does that sound correct? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211756760 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): Review comment: nit: lift outside of the class and `retriable` -> `retryable`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211756631 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: Sorry, just to clarify: You're not claiming that the HTTP response code can be 200 with a non-json response body right? If so, that is not a known issue. If the fix is to start catching `TEMPORARILY_UNAVAILABLE` error codes in the response, then I do think that the ValueError check is reasonable. I do think we should say it is retryable though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r211753269 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -117,29 +123,51 @@ def _do_api_call(self, endpoint_info, json): else: raise AirflowException('Unexpected HTTP Method: ' + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, json=json, auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except (requests_exceptions.ConnectionError, +requests_exceptions.Timeout) as e: +self._log_request_error(attempt_num, e) +except requests_exceptions.HTTPError as e: +response = e.response +if not self._retriable_error(response): # In this case, the user probably made a mistake. # Don't retry. raise AirflowException('Response: {0}, Status Code: {1}'.format( response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -'Attempt %s API Request to Databricks failed with reason: %s', -attempt_num, e -) -raise AirflowException(('API requests to Databricks failed {} times. ' + - 'Giving up.').format(self.retry_limit)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException(('API requests to Databricks failed {} times. ' + +'Giving up.').format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +'Attempt %s API Request to Databricks failed with reason: %s', +attempt_num, error +) + +@staticmethod +def _retriable_error(response): +try: +error_code = response.json().get('error_code') +return error_code == 'TEMPORARILY_UNAVAILABLE' Review comment: Thanks, this is definitely a known issue and can happen sometimes when the gateway proxy is not behaving correctly... That being said, this fix is definitely necessary and appreciated! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3393: [AIRFLOW-2499] Dockerised CI pipeline
Fokko commented on a change in pull request #3393: [AIRFLOW-2499] Dockerised CI pipeline URL: https://github.com/apache/incubator-airflow/pull/3393#discussion_r211750311 ## File path: .travis.yml ## @@ -19,94 +19,40 @@ sudo: true dist: trusty language: python -jdk: - - oraclejdk8 -services: - - cassandra - - mongodb - - mysql - - postgresql - - rabbitmq -addons: - apt: -packages: - - slapd - - ldap-utils - - openssh-server - - mysql-server-5.6 - - mysql-client-core-5.6 - - mysql-client-5.6 - - krb5-user - - krb5-kdc - - krb5-admin-server - - oracle-java8-installer - postgresql: "9.2" -python: - - "2.7" - - "3.5" env: global: +- DOCKER_COMPOSE_VERSION=1.20.0 - SLUGIFY_USES_TEXT_UNIDECODE=yes - TRAVIS_CACHE=$HOME/.travis_cache/ -- KRB5_CONFIG=/etc/krb5.conf -- KRB5_KTNAME=/etc/airflow.keytab -# Travis on google cloud engine has a global /etc/boto.cfg that -# does not work with python 3 -- BOTO_CONFIG=/tmp/bogusvalue matrix: +- TOX_ENV=flake8 - TOX_ENV=py27-backend_mysql - TOX_ENV=py27-backend_sqlite - TOX_ENV=py27-backend_postgres -- TOX_ENV=py35-backend_mysql -- TOX_ENV=py35-backend_sqlite -- TOX_ENV=py35-backend_postgres -- TOX_ENV=flake8 +- TOX_ENV=py35-backend_mysql PYTHON_VERSION=3 +- TOX_ENV=py35-backend_sqlite PYTHON_VERSION=3 +- TOX_ENV=py35-backend_postgres PYTHON_VERSION=3 - TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0 -- TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 -matrix: - exclude: -- python: "3.5" - env: TOX_ENV=py27-backend_mysql -- python: "3.5" - env: TOX_ENV=py27-backend_sqlite -- python: "3.5" - env: TOX_ENV=py27-backend_postgres -- python: "2.7" - env: TOX_ENV=py35-backend_mysql -- python: "2.7" - env: TOX_ENV=py35-backend_sqlite -- python: "2.7" - env: TOX_ENV=py35-backend_postgres -- python: "2.7" - env: TOX_ENV=flake8 -- python: "3.5" - env: TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0 -- python: "2.7" - env: TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 +- TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 PYTHON_VERSION=3 cache: directories: - $HOME/.wheelhouse/ +- $HOME/.cache/pip - $HOME/.travis_cache/ before_install: - - yes | ssh-keygen -t rsa -C your_em...@youremail.com -P '' -f ~/.ssh/id_rsa - - cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys - - ln -s ~/.ssh/authorized_keys ~/.ssh/authorized_keys2 - - chmod 600 ~/.ssh/* - - jdk_switcher use oraclejdk8 + - sudo ls -lh $HOME/.cache/pip/ + - sudo rm -rf $HOME/.cache/pip/* $HOME/.wheelhouse/* + - sudo chown -R travis.travis $HOME/.cache/pip install: + # Use recent docker-compose version + - sudo rm /usr/local/bin/docker-compose Review comment: Makes sense to me This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3778: [AIRFLOW-XXX] Fix some operator names in the docs
codecov-io commented on issue #3778: [AIRFLOW-XXX] Fix some operator names in the docs URL: https://github.com/apache/incubator-airflow/pull/3778#issuecomment-414792827 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=h1) Report > Merging [#3778](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3778/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3778 +/- ## === Coverage 77.67% 77.67% === Files 204 204 Lines 1584015840 === Hits1230412304 Misses 3536 3536 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=footer). Last update [27309b1...27814c0](https://codecov.io/gh/apache/incubator-airflow/pull/3778?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3777: Add Flipp to list of Airflow users
codecov-io edited a comment on issue #3777: Add Flipp to list of Airflow users URL: https://github.com/apache/incubator-airflow/pull/3777#issuecomment-414784611 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=h1) Report > Merging [#3777](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3777/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3777 +/- ## === Coverage 77.67% 77.67% === Files 204 204 Lines 1584015840 === Hits1230412304 Misses 3536 3536 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=footer). Last update [27309b1...f6326ff](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3777: Add Flipp to list of Airflow users
codecov-io edited a comment on issue #3777: Add Flipp to list of Airflow users URL: https://github.com/apache/incubator-airflow/pull/3777#issuecomment-414784611 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=h1) Report > Merging [#3777](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3777/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3777 +/- ## === Coverage 77.67% 77.67% === Files 204 204 Lines 1584015840 === Hits1230412304 Misses 3536 3536 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=footer). Last update [27309b1...f6326ff](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3777: Add Flipp to list of Airflow users
codecov-io commented on issue #3777: Add Flipp to list of Airflow users URL: https://github.com/apache/incubator-airflow/pull/3777#issuecomment-414784611 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=h1) Report > Merging [#3777](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/27309b13f17402eaa61d4e4fede8785effa8bbb7?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3777/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3777 +/- ## === Coverage 77.67% 77.67% === Files 204 204 Lines 1584015840 === Hits1230412304 Misses 3536 3536 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=footer). Last update [27309b1...f6326ff](https://codecov.io/gh/apache/incubator-airflow/pull/3777?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2927) Task.get_task_instances() not setting start date if none provided
[ https://issues.apache.org/jira/browse/AIRFLOW-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Prussak updated AIRFLOW-2927: --- Description: Task.get_task_instances() does not have any logic to set a default start_date. This is inconsistent with Dag.get_task_instances() which does. [The offending function|https://github.com/apache/incubator-airflow/blob/b78c7fb8512f7a40f58b46530e9b3d5562fe84ea/airflow/models.py#L2867] was: Task.get_task_instances() does not have any logic to set a default start_date. This is inconsistent with Dag.get_task_instances() which does. > Task.get_task_instances() not setting start date if none provided > - > > Key: AIRFLOW-2927 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2927 > Project: Apache Airflow > Issue Type: Bug >Reporter: Marek Prussak >Priority: Minor > > Task.get_task_instances() does not have any logic to set a default > start_date. This is inconsistent with Dag.get_task_instances() which does. > [The offending > function|https://github.com/apache/incubator-airflow/blob/b78c7fb8512f7a40f58b46530e9b3d5562fe84ea/airflow/models.py#L2867] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] sethwilsonwishabi opened a new pull request #3777: Add Flipp to list of Airflow users
sethwilsonwishabi opened a new pull request #3777: Add Flipp to list of Airflow users URL: https://github.com/apache/incubator-airflow/pull/3777 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2927) Task.get_task_instances() not setting start date if none provided
Marek Prussak created AIRFLOW-2927: -- Summary: Task.get_task_instances() not setting start date if none provided Key: AIRFLOW-2927 URL: https://issues.apache.org/jira/browse/AIRFLOW-2927 Project: Apache Airflow Issue Type: Bug Reporter: Marek Prussak Task.get_task_instances() does not have any logic to set a default start_date. This is inconsistent with Dag.get_task_instances() which does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tedmiston commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module
tedmiston commented on issue #3760: [AIRFLOW-2909] Deprecate airflow.operators.sensors module URL: https://github.com/apache/incubator-airflow/pull/3760#issuecomment-414772164 @Fokko Sure, I just checked and I do not have this `build` directory in my incubator-airflow. Is it possible that those files you're seeing locally are cached from a setuptools build or something? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211658743 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: Good point, `jobReference` could be useful to set, as that is how you can specify an explicit `location` for the job to run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2887) Add to BigQueryBaseCursor methods for creating and updating datasets
[ https://issues.apache.org/jira/browse/AIRFLOW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova updated AIRFLOW-2887: Description: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] was: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] and update datasets ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) > Add to BigQueryBaseCursor methods for creating and updating datasets > > > Key: AIRFLOW-2887 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2887 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > > In BigQueryBaseCursor exist only: > def delete_dataset(self, project_id, dataset_id) > And there are no hooks to > create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2887) Add to BigQueryBaseCursor methods for creating and updating datasets
[ https://issues.apache.org/jira/browse/AIRFLOW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova updated AIRFLOW-2887: Description: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hook to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] was: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > Add to BigQueryBaseCursor methods for creating and updating datasets > > > Key: AIRFLOW-2887 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2887 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > > In BigQueryBaseCursor exist only: > def delete_dataset(self, project_id, dataset_id) > And there are no hook to > create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2887) Add to BigQueryBaseCursor methods for creating and updating datasets
[ https://issues.apache.org/jira/browse/AIRFLOW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova updated AIRFLOW-2887: Description: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] and update datasets ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) was: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] and update datasets ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) If it's so, could I add methods and operators for those actions? > Add to BigQueryBaseCursor methods for creating and updating datasets > > > Key: AIRFLOW-2887 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2887 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > > In BigQueryBaseCursor exist only: > def delete_dataset(self, project_id, dataset_id) > And there are no hooks to > create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > and update datasets > ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2887) Add to BigQueryBaseCursor methods for creating and updating datasets
[ https://issues.apache.org/jira/browse/AIRFLOW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova updated AIRFLOW-2887: Description: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] and update datasets ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) If it's so, could I add methods and operators for those actions? was: In BigQueryBaseCursor exist only: def delete_dataset(self, project_id, dataset_id) And there are no hooks to create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] and update datasets ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) [~kaxilnaik], or I'm not right? If it's so, could I add methods and operators for those actions? > Add to BigQueryBaseCursor methods for creating and updating datasets > > > Key: AIRFLOW-2887 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2887 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > > In BigQueryBaseCursor exist only: > def delete_dataset(self, project_id, dataset_id) > And there are no hooks to > create([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert)] > and update datasets > ([https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update]) > > If it's so, could I add methods and operators for those actions? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211637030 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: @tswast , check it pls This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). But if we say what it's only the way to set 'configuration' and we will not available this way to send any other params, for example, 'jobReference' or 'statistics', so it's not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). For example, 'jobReference' or 'statistics': https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs. But if we say what it's only the way to set 'configuration' so it's not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211634204 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: I thought about this param like a possibility to set any settings in one dict, that supported by API (because of 'src_fmt_configs'). But if we say what it's only the way to set 'configuration' and we will not available this way to send any other params, for example, 'jobReference' or 'statistics' .. so way what I could define one dict with all params and send it. So if it's not needed and it will be only for 'configuration' - it's of course not needed. What do you think? Use it only for 'configuration' ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211627543 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: There are many sub-keys of `query` that we care about. For example: `['query']['schemaUpdateOptions']` would get overridden, as would `['query']['query']`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211625138 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: mm.. no, it will re-write only keys what will be added by the user in api_resource_configs, or I didn't catch what you mean. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211623810 ## File path: tests/contrib/hooks/test_bigquery_hook.py ## @@ -281,6 +281,18 @@ def test_run_query_sql_dialect_override(self, run_with_config): args, kwargs = run_with_config.call_args self.assertIs(args[0]['query']['useLegacySql'], bool_val) +@mock.patch.object(hook.BigQueryBaseCursor, 'run_with_configuration') +def test_api_resource_configs(self, run_with_config): +for bool_val in [True, False]: +cursor = hook.BigQueryBaseCursor(mock.Mock(), "project_id") +cursor.run_query('query', + api_resource_configs={ + 'configuration': + {'query': {'useQueryCache': bool_val}}}) + +args, kwargs = run_with_config.call_args +self.assertIs(args[0]['query']['useQueryCache'], bool_val) Review comment: Double-check that the query string still appears in the resource. I think it might be overwritten with the current logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211623025 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: Review comment: Why require an extra `configuration` key? What other keys would be present? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
tswast commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r211622902 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -656,6 +671,10 @@ def run_query(self, configuration['query'][ 'schemaUpdateOptions'] = schema_update_options +if 'configuration' in api_resource_configs: +for key in api_resource_configs['configuration']: +configuration[key] = api_resource_configs['configuration'][key] Review comment: There are pretty few top-level keys: "query" and "labels" are the only two I can think of. Wouldn't specifying a `query` key in `api_resource_configs` overwrite all the options that were just applied? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside removed a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside removed a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414666013 @ashb , yeap.. thx.. I'm trying to understand how I broke it) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xoen commented on a change in pull request #3504: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow
xoen commented on a change in pull request #3504: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/3504#discussion_r211610465 ## File path: airflow/contrib/aws_glue_job_hook.py ## @@ -0,0 +1,212 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +import os.path +import time + + +class AwsGlueJobHook(AwsHook): +""" +Interact with AWS Glue - create job, trigger, crawler + +:param job_name: unique job name per AWS account +:type str +:param desc: job description +:type str +:param concurrent_run_limit: The maximum number of concurrent runs allowed for a job +:type int +:param script_location: path to etl script either on s3 or local +:type str +:param conns: A list of connections used by the job +:type list +:param retry_limit: Maximum number of times to retry this job if it fails +:type int +:param num_of_dpus: Number of AWS Glue DPUs to allocate to this Job +:type int +:param region_name: aws region name (example: us-east-1) +:type region_name: str +:param s3_bucket: S3 bucket where logs and local etl script will be uploaded +:type str +:param iam_role_name: AWS IAM Role for Glue Job +:type str +""" + +def __init__(self, + job_name=None, + desc=None, + concurrent_run_limit=None, + script_location=None, + conns=None, + retry_limit=None, + num_of_dpus=None, + aws_conn_id='aws_default', + region_name=None, + iam_role_name=None, + s3_bucket=None, *args, **kwargs): +self.job_name = job_name +self.desc = desc +self.concurrent_run_limit = concurrent_run_limit or 1 +self.script_location = script_location +self.conns = conns or ["s3"] +self.retry_limit = retry_limit or 0 +self.num_of_dpus = num_of_dpus or 10 +self.aws_conn_id = aws_conn_id +self.region_name = region_name +self.s3_bucket = s3_bucket +self.role_name = iam_role_name +self.S3_PROTOCOL = "s3://" +self.S3_ARTIFACTS_PREFIX = 'artifacts/glue-scripts/' +self.S3_GLUE_LOGS = 'logs/glue-logs/' +super(AwsGlueJobHook, self).__init__(*args, **kwargs) + +def get_conn(self): +conn = self.get_client_type('glue', self.region_name) +return conn + +def list_jobs(self): +conn = self.get_conn() +return conn.get_jobs() + +def get_iam_execution_role(self): +""" +:return: iam role for job execution +""" +iam_client = self.get_client_type('iam', self.region_name) + +try: +glue_execution_role = iam_client.get_role(RoleName=self.role_name) +self.log.info("Iam Role Name: {}".format(self.role_name)) +return glue_execution_role +except Exception as general_error: +raise AirflowException( +'Failed to create aws glue job, error: {error}'.format( +error=str(general_error) +) +) + +def initialize_job(self, script_arguments=None): +""" +Initializes connection with AWS Glue +to run job +:return: +""" +if self.s3_bucket is None: +raise AirflowException( +'Could not initialize glue job, ' +'error: Specify Parameter `s3_bucket`' +) + +glue_client = self.get_conn() + +try: +job_response = self.get_or_create_glue_job() +job_name = job_response['Name'] +job_run = glue_client.start_job_run( +JobName=job_name, +Arguments=script_arguments +) +return self.job_completion(job_name, job_run['JobRunId']) +except Exception as general_error: +raise AirflowException( +'Failed to run aws glue job, error:
[jira] [Closed] (AIRFLOW-2242) Add Gousto to companies list
[ https://issues.apache.org/jira/browse/AIRFLOW-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-2242. -- Resolution: Won't Fix Please open a PR to add your company, re-open this Jira and we'll merge it. > Add Gousto to companies list > > > Key: AIRFLOW-2242 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2242 > Project: Apache Airflow > Issue Type: Task >Reporter: Dejan >Assignee: Dejan >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414666013 @ashb , yeap.. thx.. I'm trying to understand how I broke it) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-1287) Python ShortCircuitOperator missing DAG filter when setting downstream deps as skipped
[ https://issues.apache.org/jira/browse/AIRFLOW-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-1287. -- Resolution: Duplicate > Python ShortCircuitOperator missing DAG filter when setting downstream deps > as skipped > -- > > Key: AIRFLOW-1287 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1287 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: Airflow 1.8 >Reporter: Luke Rohde >Assignee: Luke Rohde >Priority: Major > > The ShortCircuitOperator sets status to skipped for task instances by name > and execution date, but fails to filter by DAG. So if multiple DAGs have the > tasks with the same name, this sets downstream tasks to skipped on other DAGs. > https://github.com/apache/incubator-airflow/pull/2351 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-408) Not enough arguments for format string
[ https://issues.apache.org/jira/browse/AIRFLOW-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-408. --- Resolution: Fixed > Not enough arguments for format string > -- > > Key: AIRFLOW-408 > URL: https://issues.apache.org/jira/browse/AIRFLOW-408 > Project: Apache Airflow > Issue Type: Bug >Reporter: Li Xuanji >Assignee: Li Xuanji >Priority: Major > > https://landscape.io/github/apache/incubator-airflow/559/modules/airflow/bin/cli.py#L68 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1395) sql_alchemy_conn_cmd does not work when executor is non SequentialExecutor
[ https://issues.apache.org/jira/browse/AIRFLOW-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587375#comment-16587375 ] Ash Berlin-Taylor commented on AIRFLOW-1395: There were some issues with {{_cmd}} options not working. This may have been fixed (by correctly checking for cmd options if the non-cmd form isn't in the config) in 1.9.0 > sql_alchemy_conn_cmd does not work when executor is non SequentialExecutor > -- > > Key: AIRFLOW-1395 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1395 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 > Environment: Linux >Reporter: mohamed younuskhan >Assignee: mohamed younuskhan >Priority: Major > > I was trying to use sql_alchemy_conn_cmd to proctect the postgress password. > I am using LocalExecutor as executor. Airflow initialization fails with below > exception. It appears the it load the default config and along with > overridden environment variable. When executor was already overridden and > sql_alchemy_conn was set to defaul sqlite. The error check fails and throws > below exception. > File "/bin/airflow", line 17, in > from airflow import configuration > File "/lib/python2.7/site-packages/airflow/__init__.py", line 29, in > > from airflow import configuration as conf > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 784, in > conf.read(AIRFLOW_CONFIG) > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 636, in read > self._validate() > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 548, in _validate > self.get('core', 'executor'))) > airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the > LocalExecutor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-1328) Daily DAG execute the past day
[ https://issues.apache.org/jira/browse/AIRFLOW-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-1328. -- Resolution: Not A Bug > Daily DAG execute the past day > -- > > Key: AIRFLOW-1328 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1328 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: Airflow 1.8 > Environment: debian jessie >Reporter: Pierre-Antoine Tible >Priority: Major > > Hello, > I'm running Airflow 1.8 under debian jessie. I installed it via pip. > I am using the LocalScheduler with a Mysql. > I made a simple DAG with a BashOperator for a daily task (two times) : > +_default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime.now() - timedelta(days=1, seconds=6), > 'email': ['XXX'], > 'email_on_failure': True, > 'email_on_retry': False, > 'retries': 1, > 'retry_delay': timedelta(minutes=5), > 'execution_timeout': None, > #'catchup': False, > #'backfill': False, > # 'queue': 'bash_queue', > # 'pool': 'backfill', > # 'priority_weight': 10, > # 'end_date': datetime(2016, 1, 1), > } > dag = DAG('campaign-reminder', default_args=default_args, > schedule_interval="0,0 7,15 * * *", concurrency=1, max_active_runs=1) > dag.catchup = False > t1 = BashOperator( > task_id='campaign-reminder', > bash_command=' ', > dag=dag)_+ > I did it today, it works, but the execution date was "06-19T15:00:00", we are > the 20th, so it's one day behind the schedule. > My first though was a mistake with the start_date, so I put a datetime() and > it did the same ... > I don't understand why. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1572) Add Carbonite to to companies list
[ https://issues.apache.org/jira/browse/AIRFLOW-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-1572. Resolution: Fixed > Add Carbonite to to companies list > -- > > Key: AIRFLOW-1572 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1572 > Project: Apache Airflow > Issue Type: Bug >Reporter: Adam Boscarino >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1791) Unexpected "AttributeError: 'unicode' object has no attribute 'val'" from Variable.setdefault
[ https://issues.apache.org/jira/browse/AIRFLOW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-1791. Resolution: Fixed Fix Version/s: 1.9.0 > Unexpected "AttributeError: 'unicode' object has no attribute 'val'" from > Variable.setdefault > - > > Key: AIRFLOW-1791 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1791 > Project: Apache Airflow > Issue Type: Bug > Components: core >Affects Versions: Airflow 1.8 > Environment: Python 2.7, Airflow 1.8.2 >Reporter: Shawn Wang >Priority: Major > Fix For: 1.9.0 > > > In Variable.setdefault method, > {code:python} > obj = Variable.get(key, default_var=default_sentinel, > deserialize_json=False) > if obj is default_sentinel: > // ... > else: > if deserialize_json: > return json.loads(obj.val) > else: > return obj.val > {code} > While obj is retrieved by "get" method which has already fetched the val > attribute from obj, so this "obj.val" throws the AttributeError. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1146) izip use in Python 3.4
[ https://issues.apache.org/jira/browse/AIRFLOW-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-1146. Resolution: Fixed Fix Version/s: 1.9.0 > izip use in Python 3.4 > -- > > Key: AIRFLOW-1146 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1146 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks >Affects Versions: Airflow 1.8 >Reporter: Alexander Panzhin >Priority: Major > Fix For: 1.9.0 > > > Python 3 no longer has itertools.izip, but it is still used in > airflow/hooks/hive_hooks.py > This causes all kinds of havoc. > This needs fixed, if this is to be used on Python 3+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2182) Configured
[ https://issues.apache.org/jira/browse/AIRFLOW-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-2182. -- Resolution: Invalid > Configured > -- > > Key: AIRFLOW-2182 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2182 > Project: Apache Airflow > Issue Type: New Feature > Components: authentication >Reporter: Richard Ferrer >Assignee: Richard Ferrer >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ashb commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
ashb commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414654266 Your error seems to be specific to Py 2.7: ``` == 11) ERROR: test_bql_deprecation_warning (tests.contrib.hooks.test_bigquery_hook.TestBigQueryBaseCursor) -- Traceback (most recent call last): .tox/py27-backend_postgres/lib/python2.7/site-packages/mock/mock.py line 1305 in patched return func(*args, **keywargs) tests/contrib/hooks/test_bigquery_hook.py line 227 in test_bql_deprecation_warning w[0].message.args[0]) IndexError: list index out of range ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-169) Hide expire dags in UI
[ https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587340#comment-16587340 ] jack commented on AIRFLOW-169: -- To work around this simply "trun off" the dags from the UI. then you have a button to hide them. However I can see how it may be a problem with the cli command. If you have many dags that ran once or had end date the list can be large for no good reason. > Hide expire dags in UI > -- > > Key: AIRFLOW-169 > URL: https://issues.apache.org/jira/browse/AIRFLOW-169 > Project: Apache Airflow > Issue Type: Wish > Components: ui >Reporter: Sumit Maheshwari >Priority: Major > > It would be great if we've option to hide expired schedules from UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1395) sql_alchemy_conn_cmd does not work when executor is non SequentialExecutor
[ https://issues.apache.org/jira/browse/AIRFLOW-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587331#comment-16587331 ] jack commented on AIRFLOW-1395: --- Your error says you are trying to work with SQLite & LocalExecutor - +which you can't.+ For LocalExecutor you need to use PostgreSQL or MySQL. Your title isn't related to the details of the error. > sql_alchemy_conn_cmd does not work when executor is non SequentialExecutor > -- > > Key: AIRFLOW-1395 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1395 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8 > Environment: Linux >Reporter: mohamed younuskhan >Assignee: mohamed younuskhan >Priority: Major > > I was trying to use sql_alchemy_conn_cmd to proctect the postgress password. > I am using LocalExecutor as executor. Airflow initialization fails with below > exception. It appears the it load the default config and along with > overridden environment variable. When executor was already overridden and > sql_alchemy_conn was set to defaul sqlite. The error check fails and throws > below exception. > File "/bin/airflow", line 17, in > from airflow import configuration > File "/lib/python2.7/site-packages/airflow/__init__.py", line 29, in > > from airflow import configuration as conf > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 784, in > conf.read(AIRFLOW_CONFIG) > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 636, in read > self._validate() > File "/lib/python2.7/site-packages/airflow/configuration.py", line > 548, in _validate > self.get('core', 'executor'))) > airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the > LocalExecutor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ron819 commented on issue #2638: AIRFLOW-1649 - Adding xcom_push to S3KeySensor wildcard matches.
ron819 commented on issue #2638: AIRFLOW-1649 - Adding xcom_push to S3KeySensor wildcard matches. URL: https://github.com/apache/incubator-airflow/pull/2638#issuecomment-414648350 @andylockran any chance to revisit this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414645184 @kaxil and @tswast , pls check This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414644891 @ashb , hi Ash, sorry for what I'm pulling you, but I know what only you can help me with travis :) I got a strange error and already had re-triggered Travis once (got 2 checks with the same error). Thank you in advance! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414644891 @ashb , hi Ash, sorry for what I'm pulling you, but I know what only you can help me this travis :) I got a strange error and already had re-triggered Travis once (got 2 checks with the same error). Thank you in advance! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #3776: [AIRFLOW-XXX] Fix minor typo
Fokko closed pull request #3776: [AIRFLOW-XXX] Fix minor typo URL: https://github.com/apache/incubator-airflow/pull/3776 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/executors/base_executor.py b/airflow/executors/base_executor.py index 8baed1a250..a989dc4408 100644 --- a/airflow/executors/base_executor.py +++ b/airflow/executors/base_executor.py @@ -193,7 +193,7 @@ def execute_async(self, def end(self): # pragma: no cover """ -This method is called when the caller is done submitting job and is +This method is called when the caller is done submitting job and wants to wait synchronously for the job submitted previously to be all done. """ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2923) LatestOnlyOperator cascade skip through all_done and dummy
[ https://issues.apache.org/jira/browse/AIRFLOW-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kulyk updated AIRFLOW-2923: -- Description: DAG with consolidating point (calc_ready : dummy) as per [https://airflow.apache.org/concepts.html#latest-run-only] given task should be ran even catching up an execution DagRuns for a previous periods However, LatestOnlyOperator cascading through calc_ready despite of it is a dummy and trigger_rule=all_done Same behavior when trigger_rule=all_success {code} t_ready = DummyOperator( task_id = 'calc_ready', trigger_rule = TriggerRule.ALL_DONE, dag=dag) {code} !screenshot-1.png! was: DAG with consolidating point (calc_ready : dummy) as per https://airflow.apache.org/concepts.html#latest-run-only given task should be ran even catching up an execution DagRuns for a previous periods However, LatestOnlyOperator cascading through `calc_ready` despite of it is a `dummy` and `trigger_rule`=`all_done` Same behavior when `trigger_rule`=`all_success` {code:python} t_ready = DummyOperator( task_id = 'calc_ready', trigger_rule = TriggerRule.ALL_DONE, dag=dag) {code} !screenshot-1.png! > LatestOnlyOperator cascade skip through all_done and dummy > -- > > Key: AIRFLOW-2923 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2923 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 > Environment: CeleryExecutor, 2-nodes cluster, RMQ, PostgreSQL >Reporter: Dmytro Kulyk >Priority: Major > Labels: cascade, latestonly, skip > Attachments: cube_update.py, screenshot-1.png > > > DAG with consolidating point (calc_ready : dummy) > as per [https://airflow.apache.org/concepts.html#latest-run-only] given task > should be ran even catching up an execution DagRuns for a previous periods > However, LatestOnlyOperator cascading through calc_ready despite of it is a > dummy and trigger_rule=all_done > Same behavior when trigger_rule=all_success > {code} > t_ready = DummyOperator( > task_id = 'calc_ready', > trigger_rule = TriggerRule.ALL_DONE, > dag=dag) > {code} > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2923) LatestOnlyOperator cascade skip through all_done and dummy
[ https://issues.apache.org/jira/browse/AIRFLOW-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kulyk updated AIRFLOW-2923: -- Environment: CeleryExecutor, 2-nodes cluster, RMQ, PostgreSQL > LatestOnlyOperator cascade skip through all_done and dummy > -- > > Key: AIRFLOW-2923 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2923 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 > Environment: CeleryExecutor, 2-nodes cluster, RMQ, PostgreSQL >Reporter: Dmytro Kulyk >Priority: Major > Labels: cascade, latestonly, skip > Attachments: cube_update.py, screenshot-1.png > > > DAG with consolidating point (calc_ready : dummy) > as per https://airflow.apache.org/concepts.html#latest-run-only given task > should be ran even catching up an execution DagRuns for a previous periods > However, LatestOnlyOperator cascading through `calc_ready` despite of it is a > `dummy` and `trigger_rule`=`all_done` > Same behavior when `trigger_rule`=`all_success` > {code:python} > t_ready = DummyOperator( > task_id = 'calc_ready', > trigger_rule = TriggerRule.ALL_DONE, > dag=dag) > {code} > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2923) LatestOnlyOperator cascade skip through all_done and dummy
[ https://issues.apache.org/jira/browse/AIRFLOW-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kulyk updated AIRFLOW-2923: -- Labels: cascade latestonly skip (was: ) > LatestOnlyOperator cascade skip through all_done and dummy > -- > > Key: AIRFLOW-2923 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2923 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 >Reporter: Dmytro Kulyk >Priority: Major > Labels: cascade, latestonly, skip > Attachments: cube_update.py, screenshot-1.png > > > DAG with consolidating point (calc_ready : dummy) > as per https://airflow.apache.org/concepts.html#latest-run-only given task > should be ran even catching up an execution DagRuns for a previous periods > However, LatestOnlyOperator cascading through `calc_ready` despite of it is a > `dummy` and `trigger_rule`=`all_done` > Same behavior when `trigger_rule`=`all_success` > {code:python} > t_ready = DummyOperator( > task_id = 'calc_ready', > trigger_rule = TriggerRule.ALL_DONE, > dag=dag) > {code} > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'src_fmt_configs'
xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'src_fmt_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-414561148 @kaxil, no problem) I will make changes today This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services