[GitHub] tedmiston commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env
tedmiston commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env URL: https://github.com/apache/incubator-airflow/pull/3703#discussion_r208734436 ## File path: setup.py ## @@ -161,6 +164,7 @@ def write_version(filename=os.path.join(*['airflow', databricks = ['requests>=2.5.1, <3'] datadog = ['datadog>=0.14.0'] doc = [ +'mock', Review comment: @Fokko What Kaxil found is consistent with what I saw as well. There's a bit more info on it in (1) in my comment here https://github.com/apache/incubator-airflow/pull/3703#issuecomment-410836135. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing
codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing URL: https://github.com/apache/incubator-airflow/pull/3648#issuecomment-408263467 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=h1) Report > Merging [#3648](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `86.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3648/graphs/tree.svg?token=WdLKlKHOAU=150=pr=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3648 +/- ## == + Coverage 77.64% 77.64% +<.01% == Files 204 204 Lines 1580115825 +24 == + Hits1226812287 +19 - Misses 3533 3538 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `67.1% <100%> (+0.88%)` | :arrow_up: | | [airflow/www/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `88.81% <100%> (+0.28%)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.04% <100%> (+0.15%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.72% <60%> (-0.14%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=footer). Last update [8b04e20...e8669f1](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing
codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing URL: https://github.com/apache/incubator-airflow/pull/3648#issuecomment-408263467 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=h1) Report > Merging [#3648](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `86.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3648/graphs/tree.svg?width=650=150=WdLKlKHOAU=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3648 +/- ## == + Coverage 77.64% 77.64% +<.01% == Files 204 204 Lines 1580115825 +24 == + Hits1226812287 +19 - Misses 3533 3538 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `67.1% <100%> (+0.88%)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.04% <100%> (+0.15%)` | :arrow_up: | | [airflow/www/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `88.81% <100%> (+0.28%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.72% <60%> (-0.14%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=footer). Last update [8b04e20...e8669f1](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …
codecov-io commented on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722#issuecomment-411556404 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=h1) Report > Merging [#3722](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **decrease** coverage by `0.14%`. > The diff coverage is `46.15%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3722/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3722 +/- ## == - Coverage 77.64% 77.49% -0.15% == Files 204 204 Lines 1580115826 +25 == - Hits1226812264 -4 - Misses 3533 3562 +29 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/hooks/base\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9iYXNlX2hvb2sucHk=) | `76.56% <15.38%> (-15.6%)` | :arrow_down: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `71.59% <76.92%> (+0.24%)` | :arrow_up: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `71.42% <0%> (-10%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `88.6% <0%> (-0.85%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.49% <0%> (-0.27%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.76% <0%> (-0.13%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=footer). Last update [8b04e20...5aaf8ea](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
codecov-io commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12 URL: https://github.com/apache/incubator-airflow/pull/3723#issuecomment-411565604 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=h1) Report > Merging [#3723](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3723/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3723 +/- ## === Coverage 77.64% 77.64% === Files 204 204 Lines 1580115801 === Hits1226812268 Misses 3533 3533 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=footer). Last update [8b04e20...271ea66](https://codecov.io/gh/apache/incubator-airflow/pull/3723?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing
codecov-io edited a comment on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing URL: https://github.com/apache/incubator-airflow/pull/3648#issuecomment-408263467 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=h1) Report > Merging [#3648](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `86.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3648/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3648 +/- ## == + Coverage 77.64% 77.64% +<.01% == Files 204 204 Lines 1580115825 +24 == + Hits1226812287 +19 - Misses 3533 3538 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `67.1% <100%> (+0.88%)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.04% <100%> (+0.15%)` | :arrow_up: | | [airflow/www/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `88.81% <100%> (+0.28%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.72% <60%> (-0.14%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3648/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=footer). Last update [8b04e20...e8669f1](https://codecov.io/gh/apache/incubator-airflow/pull/3648?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …
codecov-io edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722#issuecomment-411556404 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=h1) Report > Merging [#3722](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **decrease** coverage by `0.14%`. > The diff coverage is `46.15%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3722/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3722 +/- ## == - Coverage 77.64% 77.49% -0.15% == Files 204 204 Lines 1580115826 +25 == - Hits1226812264 -4 - Misses 3533 3562 +29 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/hooks/base\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9iYXNlX2hvb2sucHk=) | `76.56% <15.38%> (-15.6%)` | :arrow_down: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `71.59% <76.92%> (+0.24%)` | :arrow_up: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `71.42% <0%> (-10%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `88.6% <0%> (-0.85%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.49% <0%> (-0.27%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.76% <0%> (-0.13%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=footer). Last update [8b04e20...5aaf8ea](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
r39132 commented on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12 URL: https://github.com/apache/incubator-airflow/pull/3723#issuecomment-411582062 @Fokko I ran setup.py install on a branch based on this PR and I am getting exceptions when starting the scheduler: ``` sianand@LM-SJN-21002367:~/Projects/airflow_incubator $ pip freeze | grep tenacity tenacity==4.12.0 [2018-08-08 16:06:35,130] {models.py:368} ERROR - Failed to import: /usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/example_dags/example_http_operator.py Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/models.py", line 365, in process_file m = imp.load_source(mod_name, filepath) File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", line 172, in load_source module = _load(spec) File "", line 684, in _load File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/example_dags/example_http_operator.py", line 27, in from airflow.operators.http_operator import SimpleHttpOperator File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/operators/http_operator.py", line 21, in from airflow.hooks.http_hook import HttpHook File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/hooks/http_hook.py", line 23, in import tenacity File "/usr/local/lib/python3.6/site-packages/tenacity-4.12.0-py3.6.egg/tenacity/__init__.py", line 21, in import asyncio File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/__init__.py", line 21, in from .base_events import * File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 17, in import concurrent.futures File "/usr/local/lib/python3.6/site-packages/concurrent/futures/__init__.py", line 8, in from concurrent.futures._base import (FIRST_COMPLETED, File "/usr/local/lib/python3.6/site-packages/concurrent/futures/_base.py", line 381 raise exception_type, self._exception, self._traceback ^ SyntaxError: invalid syntax ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Noremac201 commented on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing
Noremac201 commented on issue #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing URL: https://github.com/apache/incubator-airflow/pull/3648#issuecomment-411556954 The tests are passing now, however instead of checking the POST response webpage, it checks the variables in the database. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2876) Bump version of Tenacity
[ https://issues.apache.org/jira/browse/AIRFLOW-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573904#comment-16573904 ] ASF GitHub Bot commented on AIRFLOW-2876: - Fokko opened a new pull request #3723: [AIRFLOW-2876] Update Tenacity to 4.12 URL: https://github.com/apache/incubator-airflow/pull/3723 Tenacity 4.8 is not python 3.7 compatible because it contains reserved keywords in the code: ``` [2018-08-08 21:21:22,016] {models.py:366} ERROR - Failed to import: /usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/models.py", line 363, in process_file m = imp.load_source(mod_name, filepath) File "/usr/local/lib/python3.7/imp.py", line 172, in load_source module = _load(spec) File "", line 696, in _load File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in from airflow.operators.http_operator import SimpleHttpOperator File "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", line 21, in from airflow.hooks.http_hook import HttpHook File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 23, in import tenacity File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 352 from tenacity.async import AsyncRetrying ``` Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2876 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-2876\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bump version of Tenacity > > > Key: AIRFLOW-2876 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2876 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > > Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to > 4.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko opened a new pull request #3723: [AIRFLOW-2876] Update Tenacity to 4.12
Fokko opened a new pull request #3723: [AIRFLOW-2876] Update Tenacity to 4.12 URL: https://github.com/apache/incubator-airflow/pull/3723 Tenacity 4.8 is not python 3.7 compatible because it contains reserved keywords in the code: ``` [2018-08-08 21:21:22,016] {models.py:366} ERROR - Failed to import: /usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/models.py", line 363, in process_file m = imp.load_source(mod_name, filepath) File "/usr/local/lib/python3.7/imp.py", line 172, in load_source module = _load(spec) File "", line 696, in _load File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in from airflow.operators.http_operator import SimpleHttpOperator File "/usr/local/lib/python3.7/site-packages/airflow/operators/http_operator.py", line 21, in from airflow.hooks.http_hook import HttpHook File "/usr/local/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 23, in import tenacity File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 352 from tenacity.async import AsyncRetrying ``` Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2876 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-2876\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2759) Simplify proxy server based access to external platforms like Google cloud
[ https://issues.apache.org/jira/browse/AIRFLOW-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573840#comment-16573840 ] ASF GitHub Bot commented on AIRFLOW-2759: - amohan34 opened a new pull request #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722 …level Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-2759\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2759 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-2759\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: To test, run the following commands within tests/contrib/hooks nosetests test_gcp_api_base_hook_proxy.py --with-coverage --cover-package=airflow.contrib.hooks.gcp_api_base_hook ### Commits - [x†] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Simplify proxy server based access to external platforms like Google cloud > --- > > Key: AIRFLOW-2759 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2759 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Aishwarya Mohan >Assignee: Aishwarya Mohan >Priority: Major > Labels: hooks, proxy > > Several companies adopt a Proxy Server based approach in order to provide an > additional layer of security while communicating with external platforms to > establish legitimacy of caller and calle. A potential use case would be > writing logs from Airflow to a cloud storage platform like google cloud via > an intermediary proxy server. > In the current scenario the proxy details need to be hardcoded and passed to > the HTTP client library(httplib2) in the GoogleCloudBaseHook class > (gcp_api_base_hook.py). It would be convenient if the proxy details (for > example, host and port) can be extracted from the airflow configuration file > as opposed to hardcoding the details at hook level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] amohan34 opened a new pull request #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …
amohan34 opened a new pull request #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722 …level Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-2759\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2759 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-2759\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: To test, run the following commands within tests/contrib/hooks nosetests test_gcp_api_base_hook_proxy.py --with-coverage --cover-package=airflow.contrib.hooks.gcp_api_base_hook ### Commits - [x†] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12
r39132 edited a comment on issue #3723: [AIRFLOW-2876] Update Tenacity to 4.12 URL: https://github.com/apache/incubator-airflow/pull/3723#issuecomment-411582062 @Fokko I ran setup.py install on a branch based on this PR and I am getting exceptions when starting the scheduler. Here's the version of tenacity that was installed: ``` sianand@LM-SJN-21002367:~/Projects/airflow_incubator $ pip freeze | grep tenacity tenacity==4.12.0 ``` When I run the scheduler, I see: ``` [2018-08-08 16:06:35,130] {models.py:368} ERROR - Failed to import: /usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/example_dags/example_http_operator.py Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/models.py", line 365, in process_file m = imp.load_source(mod_name, filepath) File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", line 172, in load_source module = _load(spec) File "", line 684, in _load File "", line 665, in _load_unlocked File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/example_dags/example_http_operator.py", line 27, in from airflow.operators.http_operator import SimpleHttpOperator File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/operators/http_operator.py", line 21, in from airflow.hooks.http_hook import HttpHook File "/usr/local/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/hooks/http_hook.py", line 23, in import tenacity File "/usr/local/lib/python3.6/site-packages/tenacity-4.12.0-py3.6.egg/tenacity/__init__.py", line 21, in import asyncio File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/__init__.py", line 21, in from .base_events import * File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 17, in import concurrent.futures File "/usr/local/lib/python3.6/site-packages/concurrent/futures/__init__.py", line 8, in from concurrent.futures._base import (FIRST_COMPLETED, File "/usr/local/lib/python3.6/site-packages/concurrent/futures/_base.py", line 381 raise exception_type, self._exception, self._traceback ^ SyntaxError: invalid syntax ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2876) Bump version of Tenacity
Fokko Driesprong created AIRFLOW-2876: - Summary: Bump version of Tenacity Key: AIRFLOW-2876 URL: https://issues.apache.org/jira/browse/AIRFLOW-2876 Project: Apache Airflow Issue Type: Bug Reporter: Fokko Driesprong Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to 4.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
codecov-io edited a comment on issue #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#issuecomment-408564225 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=h1) Report > Merging [#3658](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/096ba9ecd961cdaebd062599f408571ffb21165a?src=pr=desc) will **increase** coverage by `0.52%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3658/graphs/tree.svg?height=150=WdLKlKHOAU=650=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3658 +/- ## == + Coverage 77.11% 77.63% +0.52% == Files 206 204 -2 Lines 1577215801 +29 == + Hits1216212267 +105 + Misses 3610 3534 -76 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/api/common/experimental/mark\_tasks.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9tYXJrX3Rhc2tzLnB5) | `66.92% <0%> (-1.08%)` | :arrow_down: | | [airflow/hooks/druid\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kcnVpZF9ob29rLnB5) | `87.67% <0%> (-1.07%)` | :arrow_down: | | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `99.01% <0%> (-0.99%)` | :arrow_down: | | [airflow/www/validators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmFsaWRhdG9ycy5weQ==) | `100% <0%> (ø)` | :arrow_up: | | [airflow/operators/hive\_stats\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV9zdGF0c19vcGVyYXRvci5weQ==) | `0% <0%> (ø)` | :arrow_up: | | [airflow/sensors/hdfs\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2hkZnNfc2Vuc29yLnB5) | `100% <0%> (ø)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.88% <0%> (ø)` | :arrow_up: | | [airflow/hooks/presto\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wcmVzdG9faG9vay5weQ==) | `39.13% <0%> (ø)` | :arrow_up: | | [airflow/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9fX2luaXRfXy5weQ==) | `80.43% <0%> (ø)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.35% <0%> (ø)` | :arrow_up: | | ... and [15 more](https://codecov.io/gh/apache/incubator-airflow/pull/3658/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=footer). Last update [096ba9e...2ef4f6f](https://codecov.io/gh/apache/incubator-airflow/pull/3658?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2872) Implement "Ad Hoc Query" in /www_rbac, and refine existing QueryView()
[ https://issues.apache.org/jira/browse/AIRFLOW-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG updated AIRFLOW-2872: --- Description: To implement "Ad Hoc Query" in for RBAC in /www_rbac, based on the existing implementation in /www. In addition, refine the existing QueryView(): # The ".csv" button in *Ad Hoc Query* view is responding with a plain text file, rather than a CSV file (even though users can manually change the extension). # Argument 'has_data' passed to the template is not used by the template 'airflow/query.html'. # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced before assignment' # 'result = df.to_html()' should only be invoked when user doesn NOT choose '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the result it returns will not be used if user askes for CSV downloading instead of a html page. was: # The ".csv" button in *Ad Hoc Query* view is responding with a plain text file, rather than a CSV file (even though users can manually change the extension). # Argument 'has_data' passed to the template is not used by the template 'airflow/query.html'. # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced before assignment' # 'result = df.to_html()' should only be invoked when user doesn NOT choose '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the result it returns will not be used if user askes for CSV downloading instead of a html page. Summary: Implement "Ad Hoc Query" in /www_rbac, and refine existing QueryView() (was: Minor bugs in "Ad Hoc Query" view, and refinement) > Implement "Ad Hoc Query" in /www_rbac, and refine existing QueryView() > -- > > Key: AIRFLOW-2872 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2872 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > To implement "Ad Hoc Query" in for RBAC in /www_rbac, based on the existing > implementation in /www. > In addition, refine the existing QueryView(): > # The ".csv" button in *Ad Hoc Query* view is responding with a plain text > file, rather than a CSV file (even though users can manually change the > extension). > # Argument 'has_data' passed to the template is not used by the template > 'airflow/query.html'. > # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced > before assignment' > # 'result = df.to_html()' should only be invoked when user doesn NOT choose > '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the > result it returns will not be used if user askes for CSV downloading instead > of a html page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2861) Need index on log table
[ https://issues.apache.org/jira/browse/AIRFLOW-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vardan Gupta resolved AIRFLOW-2861. --- Resolution: Fixed Fix Version/s: 2.0.0 Change has been merged to Master. > Need index on log table > --- > > Key: AIRFLOW-2861 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2861 > Project: Apache Airflow > Issue Type: Improvement > Components: database >Affects Versions: 1.10.0 >Reporter: Vardan Gupta >Assignee: Vardan Gupta >Priority: Major > Fix For: 2.0.0 > > > Delete dag functionality is added in v1-10-stable, whose implementation > during the metadata cleanup > [part|https://github.com/apache/incubator-airflow/blob/dc78b9196723ca6724185231ccd6f5bbe8edcaf3/airflow/api/common/experimental/delete_dag.py#L48], > look for classes which has attribute named as dag_id and then formulate the > query on matching model and then delete from metadata, we've few numbers > where we've observed slowness especially in log table because it doesn't have > any single or multiple-column index. Creating an index would boost the > performance though insertion will be a bit slower. Since deletion will be a > sync call, would be good idea to create index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] lxneng closed pull request #3554: [AIRFLOW-2686] Fix Default Variables not base on default_timezone
lxneng closed pull request #3554: [AIRFLOW-2686] Fix Default Variables not base on default_timezone URL: https://github.com/apache/incubator-airflow/pull/3554 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 260c0ba5a2..2fe229a685 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -1809,14 +1809,15 @@ def get_template_context(self, session=None): tables = None if 'tables' in task.params: tables = task.params['tables'] +# convert to default timezone +execution_date_tz = settings.TIMEZONE.convert(self.execution_date) +ds = execution_date_tz.strftime('%Y-%m-%d') +ts = execution_date_tz.isoformat() +yesterday_ds = (execution_date_tz - timedelta(1)).strftime('%Y-%m-%d') +tomorrow_ds = (execution_date_tz + timedelta(1)).strftime('%Y-%m-%d') -ds = self.execution_date.strftime('%Y-%m-%d') -ts = self.execution_date.isoformat() -yesterday_ds = (self.execution_date - timedelta(1)).strftime('%Y-%m-%d') -tomorrow_ds = (self.execution_date + timedelta(1)).strftime('%Y-%m-%d') - -prev_execution_date = task.dag.previous_schedule(self.execution_date) -next_execution_date = task.dag.following_schedule(self.execution_date) +prev_execution_date = task.dag.previous_schedule(execution_date_tz) +next_execution_date = task.dag.following_schedule(execution_date_tz) next_ds = None if next_execution_date: @@ -1903,7 +1904,7 @@ def __repr__(self): 'end_date': ds, 'dag_run': dag_run, 'run_id': run_id, -'execution_date': self.execution_date, +'execution_date': execution_date_tz, 'prev_execution_date': prev_execution_date, 'next_execution_date': next_execution_date, 'latest_date': ds, This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3718: [AIRFLOW-2872] Implement 'Ad Hoc Query' for RBAC and Refine QueryView()
XD-DENG commented on issue #3718: [AIRFLOW-2872] Implement 'Ad Hoc Query' for RBAC and Refine QueryView() URL: https://github.com/apache/incubator-airflow/pull/3718#issuecomment-411634875 Hi @bolkedebruin , as suggested, I have implemented the **Ad Hoc Query** view for RBAC under `/www_rbac`. It's mainly based on the current implementation in `/www`. I have updated the subject and contents of this PR accordingly as well. **Screenshot - Nav Bar** https://user-images.githubusercontent.com/11539188/43878430-ac307542-9bd1-11e8-988e-114daaff4b10.png;> **Screenshot - *Ad Hoc Query* view** https://user-images.githubusercontent.com/11539188/43878434-b2321a0e-9bd1-11e8-911d-9a841fc1ed7a.png;> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …
codecov-io edited a comment on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722#issuecomment-411556404 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=h1) Report > Merging [#3722](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **decrease** coverage by `59.92%`. > The diff coverage is `15.38%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3722/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3722 +/- ## === - Coverage 77.64% 17.71% -59.93% === Files 204 204 Lines 1580115826 +25 === - Hits12268 2803 -9465 - Misses 353313023 +9490 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `26.13% <15.38%> (-45.21%)` | :arrow_down: | | [airflow/hooks/base\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9iYXNlX2hvb2sucHk=) | `40.62% <15.38%> (-51.54%)` | :arrow_down: | | [airflow/hooks/slack\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9zbGFja19ob29rLnB5) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/api/common/experimental/get\_dag\_runs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9nZXRfZGFnX3J1bnMucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/example\_dags/test\_utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvdGVzdF91dGlscy5weQ==) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/www/forms.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvZm9ybXMucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/sensors/time\_delta\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3RpbWVfZGVsdGFfc2Vuc29yLnB5) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/operators/email\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZW1haWxfb3BlcmF0b3IucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/sensors/time\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3RpbWVfc2Vuc29yLnB5) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/utils/json.py](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9qc29uLnB5) | `0% <0%> (-100%)` | :arrow_down: | | ... and [163 more](https://codecov.io/gh/apache/incubator-airflow/pull/3722/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=footer). Last update [8b04e20...9742956](https://codecov.io/gh/apache/incubator-airflow/pull/3722?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3718: [AIRFLOW-2872] Implement 'Ad Hoc Query' for RBAC and Refine QueryView()
codecov-io edited a comment on issue #3718: [AIRFLOW-2872] Implement 'Ad Hoc Query' for RBAC and Refine QueryView() URL: https://github.com/apache/incubator-airflow/pull/3718#issuecomment-411331261 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=h1) Report > Merging [#3718](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8b04e20709ebeb41aeefc0c5e3f12d35108ea504?src=pr=desc) will **decrease** coverage by `0.2%`. > The diff coverage is `25%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3718/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3718 +/- ## == - Coverage 77.64% 77.43% -0.21% == Files 204 204 Lines 1580115859 +58 == + Hits1226812281 +13 - Misses 3533 3578 +45 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3718/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.88% <100%> (ø)` | :arrow_up: | | [airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3718/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=) | `97.82% <100%> (+0.04%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3718/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `71.2% <17.5%> (-1.66%)` | :arrow_down: | | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/3718/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `62.42% <29.41%> (-3.8%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=footer). Last update [8b04e20...6b13086](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2686) Default Variables not base on default_timezone
[ https://issues.apache.org/jira/browse/AIRFLOW-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574189#comment-16574189 ] ASF GitHub Bot commented on AIRFLOW-2686: - lxneng closed pull request #3554: [AIRFLOW-2686] Fix Default Variables not base on default_timezone URL: https://github.com/apache/incubator-airflow/pull/3554 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 260c0ba5a2..2fe229a685 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -1809,14 +1809,15 @@ def get_template_context(self, session=None): tables = None if 'tables' in task.params: tables = task.params['tables'] +# convert to default timezone +execution_date_tz = settings.TIMEZONE.convert(self.execution_date) +ds = execution_date_tz.strftime('%Y-%m-%d') +ts = execution_date_tz.isoformat() +yesterday_ds = (execution_date_tz - timedelta(1)).strftime('%Y-%m-%d') +tomorrow_ds = (execution_date_tz + timedelta(1)).strftime('%Y-%m-%d') -ds = self.execution_date.strftime('%Y-%m-%d') -ts = self.execution_date.isoformat() -yesterday_ds = (self.execution_date - timedelta(1)).strftime('%Y-%m-%d') -tomorrow_ds = (self.execution_date + timedelta(1)).strftime('%Y-%m-%d') - -prev_execution_date = task.dag.previous_schedule(self.execution_date) -next_execution_date = task.dag.following_schedule(self.execution_date) +prev_execution_date = task.dag.previous_schedule(execution_date_tz) +next_execution_date = task.dag.following_schedule(execution_date_tz) next_ds = None if next_execution_date: @@ -1903,7 +1904,7 @@ def __repr__(self): 'end_date': ds, 'dag_run': dag_run, 'run_id': run_id, -'execution_date': self.execution_date, +'execution_date': execution_date_tz, 'prev_execution_date': prev_execution_date, 'next_execution_date': next_execution_date, 'latest_date': ds, This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Default Variables not base on default_timezone > -- > > Key: AIRFLOW-2686 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2686 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Luo >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] lxneng commented on issue #3554: [AIRFLOW-2686] Fix Default Variables not base on default_timezone
lxneng commented on issue #3554: [AIRFLOW-2686] Fix Default Variables not base on default_timezone URL: https://github.com/apache/incubator-airflow/pull/3554#issuecomment-411614674 Agree with @Fokko , That would be mush better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing
feng-tao commented on a change in pull request #3648: [AIRFLOW-2786] Fix editing Variable with empty key crashing URL: https://github.com/apache/incubator-airflow/pull/3648#discussion_r208799225 ## File path: tests/www_rbac/test_views.py ## @@ -172,6 +172,26 @@ def test_xss_prevention(self): self.assertNotIn("", resp.data.decode("utf-8")) +def test_import_variables(self): +import mock Review comment: nit: put the import mock among others lib import. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-2686) Default Variables not base on default_timezone
[ https://issues.apache.org/jira/browse/AIRFLOW-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Luo closed AIRFLOW-2686. - Resolution: Fixed > Default Variables not base on default_timezone > -- > > Key: AIRFLOW-2686 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2686 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Luo >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
codecov-io edited a comment on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-410963628 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=h1) Report > Merging [#3708](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d47580feaf80eeebb416d0179dfa8db3f4e1d2c9?src=pr=desc) will **decrease** coverage by `59.85%`. > The diff coverage is `44.44%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3708/graphs/tree.svg?token=WdLKlKHOAU=150=pr=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3708 +/- ## === - Coverage 77.57% 17.72% -59.86% === Files 204 204 Lines 1577615789 +13 === - Hits12239 2798 -9441 - Misses 353712991 +9454 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `14.54% <ø> (-49.81%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `12.22% <100%> (-70.54%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `27.58% <20%> (-61.03%)` | :arrow_down: | | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `56.92% <42.1%> (-16.99%)` | :arrow_down: | | [...w/example\_dags/example\_latest\_only\_with\_trigger.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9sYXRlc3Rfb25seV93aXRoX3RyaWdnZXIucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/example\_dags/example\_branch\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9icmFuY2hfb3BlcmF0b3IucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/example\_dags/example\_docker\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV9kb2NrZXJfb3BlcmF0b3IucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/example\_dags/subdags/subdag.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3Mvc3ViZGFncy9zdWJkYWcucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/www\_rbac/blueprints.py](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9ibHVlcHJpbnRzLnB5) | `0% <0%> (-100%)` | :arrow_down: | | ... and [167 more](https://codecov.io/gh/apache/incubator-airflow/pull/3708/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=footer). Last update [d47580f...97e4be4](https://codecov.io/gh/apache/incubator-airflow/pull/3708?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208483513 ## File path: airflow/contrib/operators/oracle_to_azure_data_lake_transfer.py ## @@ -1,113 +1,115 @@ -# -*- coding: utf-8 -*- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -from airflow.hooks.oracle_hook import OracleHook -from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook -from airflow.models import BaseOperator -from airflow.utils.decorators import apply_defaults -from airflow.utils.file import TemporaryDirectory - -import unicodecsv as csv -import os - - -class OracleToAzureDataLakeTransfer(BaseOperator): -""" -Moves data from Oracle to Azure Data Lake. The operator runs the query against -Oracle and stores the file locally before loading it into Azure Data Lake. - - -:param filename: file name to be used by the csv file. -:type filename: str -:param azure_data_lake_conn_id: destination azure data lake connection. -:type azure_data_lake_conn_id: str -:param azure_data_lake_path: destination path in azure data lake to put the file. -:type azure_data_lake_path: str -:param oracle_conn_id: source Oracle connection. -:type oracle_conn_id: str -:param sql: SQL query to execute against the Oracle database. (templated) -:type sql: str -:param sql_params: Parameters to use in sql query. (templated) -:type sql_params: str -:param delimiter: field delimiter in the file. -:type delimiter: str -:param encoding: enconding type for the file. -:type encoding: str -:param quotechar: Character to use in quoting. -:type quotechar: str -:param quoting: Quoting strategy. See unicodecsv quoting for more information. -:type quoting: str -""" - -template_fields = ('filename', 'sql', 'sql_params') -ui_color = '#e08c8c' - -@apply_defaults -def __init__( -self, -filename, -azure_data_lake_conn_id, -azure_data_lake_path, -oracle_conn_id, -sql, -sql_params={}, -delimiter=",", -encoding="utf-8", -quotechar='"', -quoting=csv.QUOTE_MINIMAL, -*args, **kwargs): -super(OracleToAzureDataLakeTransfer, self).__init__(*args, **kwargs) -self.filename = filename -self.oracle_conn_id = oracle_conn_id -self.sql = sql -self.sql_params = sql_params -self.azure_data_lake_conn_id = azure_data_lake_conn_id -self.azure_data_lake_path = azure_data_lake_path -self.delimiter = delimiter -self.encoding = encoding -self.quotechar = quotechar -self.quoting = quoting - -def _write_temp_file(self, cursor, path_to_save): -with open(path_to_save, 'wb') as csvfile: -csv_writer = csv.writer(csvfile, delimiter=self.delimiter, -encoding=self.encoding, quotechar=self.quotechar, -quoting=self.quoting) -csv_writer.writerow(map(lambda field: field[0], cursor.description)) -csv_writer.writerows(cursor) -csvfile.flush() - -def execute(self, context): -oracle_hook = OracleHook(oracle_conn_id=self.oracle_conn_id) -azure_data_lake_hook = AzureDataLakeHook( -azure_data_lake_conn_id=self.azure_data_lake_conn_id) - -self.log.info("Dumping Oracle query results to local file") -conn = oracle_hook.get_conn() -cursor = conn.cursor() -cursor.execute(self.sql, self.sql_params) - -with TemporaryDirectory(prefix='airflow_oracle_to_azure_op_') as temp: -self._write_temp_file(cursor, os.path.join(temp, self.filename)) -self.log.info("Uploading local file to Azure Data Lake") -azure_data_lake_hook.upload_file(os.path.join(temp, self.filename), - os.path.join(self.azure_data_lake_path, - self.filename)) -
[GitHub] Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208478718 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: What do you mean? Setting the `{}` in the arguments is bad practice by the way Python creates these objects. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
Fokko commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208483352 ## File path: airflow/contrib/operators/mongo_to_s3.py ## @@ -105,7 +106,8 @@ def _stringify(self, iterable, joinable='\n'): [json.dumps(doc, default=json_util.default) for doc in iterable] ) -def transform(self, docs): +@staticmethod Review comment: Nit, I always put static function above the non-static one in the class order This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208486017 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,239 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param config: training_config +:type config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param config: tuning_config +:type config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +sec = sec + self.check_interval + +if self.max_ingestion_time and sec > self.max_ingestion_time: +# ensure that the job gets killed if the max ingestion time is exceeded +raise AirflowException("SageMaker job took more than " + "%s seconds", self.max_ingestion_time) + +time.sleep(self.check_interval) +try: +status = describe_function(*args)[key] +self.log.info("Job still running for %s seconds... " + "current status is %s" % (sec, status)) +except
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572828#comment-16572828 ] Ash Berlin-Taylor commented on AIRFLOW-2871: One other thing that would be nice to do is to have mutliple versions of Docs on RTD, and set the default to the latest release, rather than Master. There has been a bit of confusion in the chat channel over things in the docs not "working" because of this. > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env
Fokko commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env URL: https://github.com/apache/incubator-airflow/pull/3703#discussion_r208478397 ## File path: setup.py ## @@ -161,6 +164,7 @@ def write_version(filename=os.path.join(*['airflow', databricks = ['requests>=2.5.1, <3'] datadog = ['datadog>=0.14.0'] doc = [ +'mock', Review comment: Good point. @tedmiston Any idea? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env
kaxil commented on a change in pull request #3703: [AIRFLOW-2857] Fix broken RTD env URL: https://github.com/apache/incubator-airflow/pull/3703#discussion_r208502954 ## File path: setup.py ## @@ -161,6 +164,7 @@ def write_version(filename=os.path.join(*['airflow', databricks = ['requests>=2.5.1, <3'] datadog = ['datadog>=0.14.0'] doc = [ +'mock', Review comment: I did some digging and looks like we use it here: https://github.com/apache/incubator-airflow/blob/acca61c602e341da06ebee2eca3a26f4e7400238/docs/conf.py#L16 This is used for mocking import of various modules: https://github.com/apache/incubator-airflow/blob/master/docs/conf.py#L18-L31 Reference: http://blog.rtwilson.com/how-to-make-your-sphinx-documentation-compile-with-readthedocs-when-youre-using-numpy-and-scipy/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572901#comment-16572901 ] Ash Berlin-Taylor commented on AIRFLOW-2871: Oh - I so we do. I guess what I'd like then is: - latest to not be master, but the latest release. - Nicer version numbers (i.e. from the tags, not v1-9-stable etc) - if a.i.a.org is to be kept the warning banner saying this is for master with a link to RTD.) I don't know how possible any of that is. > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA
[jira] [Updated] (AIRFLOW-2873) Improvements to Quick Start flow
[ https://issues.apache.org/jira/browse/AIRFLOW-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] G. Geijteman updated AIRFLOW-2873: -- Description: Thank you for developing Airflow! Having ran through the Quick Start, i've come across two issues that I would like to highlight: {code:java} bash-3.2$ cd ~/project/airflow/ bash-3.2$ export AIRFLOW_HOME=~/project/airflow bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv bash-3.2$ source venv/bin/activate (venv) bash-3.2$ pip install --upgrade pip (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 (venv) bash-3.2$ python -V Python 3.6.5 (venv) bash-3.2$ pip -V pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 3.6) (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} Results in: {code:java} During handling of the above exception, another exception occurred:Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 103, in get_fernet raise AirflowException('Failed to import Fernet, it may not be installed') airflow.exceptions.AirflowException: Failed to import Fernet, it may not be installed [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 101, in get_fernet from cryptography.fernet import Fernet ModuleNotFoundError: No module named 'cryptography'{code} This is solved by: {code:java} (venv) bash-3.2$ pip install cryptography{code} *Proposed fix:* _Include the `cryptography` package in the setup / package requirements_ Having fixed that, the following issue occurs when trying to: {code:java} (venv) bash-3.2$ airflow initdb{code} Exempt: {code:java} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 107, in get_fernet raise AirflowException("Could not create Fernet object: {}".format(ve)) airflow.exceptions.AirflowException: Could not create Fernet object: Incorrect padding [2018-08-08 10:50:50,697] {models.py:643} ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 105, in get_fernet return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) File "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", line 34, in _init_ key = base64.urlsafe_b64decode(key) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding{code} Which after some googling leads to the conclusion that the ~/project/airflow/airflow.cfg fernet_key field is not set to the correct value. *Feature request:* _Have the setup automatically generate a valid fernet key for the user._ The fact that this page exists: [https://bcb.github.io/airflow/fernet-key] suggests this could easily be a part of the package. I understand that this project is in incubator phase, but I would say having a quick start that is not working as-is will discourage users from trying out this project. Thank you for considering. was: Thank you for developing Airflow! Having ran through the Quick Start, i've come across two issues that I would like to highlight: {code:java} bash-3.2$ cd ~/project/airflow/ bash-3.2$ export AIRFLOW_HOME=~/project/airflow bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv bash-3.2$ source venv/bin/activate (venv) bash-3.2$ pip install --upgrade pip (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 (venv) bash-3.2$ python -V Python 3.6.5 (venv) bash-3.2$ pip -V pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 3.6) (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} Results in: {code:java} During handling of the above exception, another exception occurred:Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File
[jira] [Updated] (AIRFLOW-2873) Improvements to Quick Start flow
[ https://issues.apache.org/jira/browse/AIRFLOW-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] G. Geijteman updated AIRFLOW-2873: -- Description: Thank you for developing Airflow! Having ran through the [Quick Start|https://airflow.incubator.apache.org/start.html], i've come across two issues that I would like to highlight: {code:java} bash-3.2$ cd ~/project/airflow/ bash-3.2$ export AIRFLOW_HOME=~/project/airflow bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv bash-3.2$ source venv/bin/activate (venv) bash-3.2$ pip install --upgrade pip (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 (venv) bash-3.2$ python -V Python 3.6.5 (venv) bash-3.2$ pip -V pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 3.6) (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} Results in: {code:java} During handling of the above exception, another exception occurred:Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 103, in get_fernet raise AirflowException('Failed to import Fernet, it may not be installed') airflow.exceptions.AirflowException: Failed to import Fernet, it may not be installed [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 101, in get_fernet from cryptography.fernet import Fernet ModuleNotFoundError: No module named 'cryptography'{code} This is solved by: {code:java} (venv) bash-3.2$ pip install cryptography{code} *Proposed fix:* _Include the `cryptography` package in the setup / package requirements_ Having fixed that, the following issue occurs when trying to: {code:java} (venv) bash-3.2$ airflow initdb{code} Exempt: {code:java} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 107, in get_fernet raise AirflowException("Could not create Fernet object: {}".format(ve)) airflow.exceptions.AirflowException: Could not create Fernet object: Incorrect padding [2018-08-08 10:50:50,697] {models.py:643} ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 105, in get_fernet return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) File "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", line 34, in _init_ key = base64.urlsafe_b64decode(key) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding{code} Which after some googling leads to the conclusion that the ~/project/airflow/airflow.cfg fernet_key field is not set to the correct value. *Feature request:* _Have the setup automatically generate a valid fernet key for the user._ The fact that this page exists: [https://bcb.github.io/airflow/fernet-key] suggests this could easily be a part of the package. I understand that this project is in incubator phase, but I would say having a quick start that is not working as-is will discourage users from trying out this project. Thank you for considering. was: Thank you for developing Airflow! Having ran through the Quick Start, i've come across two issues that I would like to highlight: {code:java} bash-3.2$ cd ~/project/airflow/ bash-3.2$ export AIRFLOW_HOME=~/project/airflow bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv bash-3.2$ source venv/bin/activate (venv) bash-3.2$ pip install --upgrade pip (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 (venv) bash-3.2$ python -V Python 3.6.5 (venv) bash-3.2$ pip -V pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 3.6) (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} Results in: {code:java} During handling of the above exception, another exception occurred:Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572916#comment-16572916 ] Kaxil Naik commented on AIRFLOW-2871: - [~ashb] I will sort that out with a new PR and ping you once it is ready for review. > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572921#comment-16572921 ] Kaxil Naik commented on AIRFLOW-2871: - [~tedmiston] Few pointers: - We are using VirtualEnv setup for RTD - Based on the discussion above with Ash, I will also make some distinction between latest vs stable. - I remember previously I had wiped out the cache. I will redo that. If there are any issues for me to do anything list, I will add you to RTD. > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA
[GitHub] xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208514928 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @kaxil but why dict() instead of {}? just want to understand for my self as I know {} more efficient https://stackoverflow.com/questions/664118/whats-the-difference-between-dict-and. In official docs cannot see any recommendations to use dict() instead {} https://docs.python.org/3.6/library/stdtypes.html#dict and in standard library used {} not dict() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208515928 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: `{}` would probably be better in hindsight, but there's not much in it. On my laptop: ``` In [5]: %timeit {} 47.2 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit dict() 168 ns ± 1.77 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` so yes dict() is 3 times as "slow" as `{}`, but neither is particularly slow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208519487 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @xnuinside Hi, there is no specific reason for me to use `{}` compared to `dict()`. The point for this change as I mentioned earlier was to remove an empty dictionary from default arguments. And as Ash pointed out `{}` dict literal is faster that dict constructor `dict()` but there is not huge difference as the dict builds up. Check out the below links for more detailed read: - https://doughellmann.com/blog/2012/11/12/the-performance-impact-of-using-dict-instead-of-in-cpython-2-7-2/ - https://stackoverflow.com/questions/6610606/is-there-a-difference-between-using-a-dict-literal-and-a-dict-constructor This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208519487 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @xnuinside Hi, there is no specific reason for me to use `{}` compared to `dict()`. The point for this change as I mentioned earlier was to remove an empty list from default arguments. And as Ash pointed out `{}` dict literal is faster that dict constructor `dict()` but there is not huge difference as the dict builds up. Check out the below links for more detailed read: - https://doughellmann.com/blog/2012/11/12/the-performance-impact-of-using-dict-instead-of-in-cpython-2-7-2/ - https://stackoverflow.com/questions/6610606/is-there-a-difference-between-using-a-dict-literal-and-a-dict-constructor This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208520636 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @kaxil , @ashb , thx! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2874) Enable Flask App Builder theme support
[ https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Verdan Mahmood updated AIRFLOW-2874: Description: To customize the look and feel of Apache Airflow (an effort towards making Airflow a whitelabel application), we should enable the support of FAB's theme, which can be set in configuration. Theme can be use in conjunction of existing `navbar_color` configuration or can be used separately by simple unsetting the navbar_color config. http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes was: To customize the look and feel of Apache Airflow, we should enable the support of FAB's theme, which can be set in configuration. Theme can be use in conjunction of existing `navbar_color` configuration or can be used separately by simple unsetting the navbar_color config. > Enable Flask App Builder theme support > -- > > Key: AIRFLOW-2874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2874 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Verdan Mahmood >Priority: Major > > To customize the look and feel of Apache Airflow (an effort towards making > Airflow a whitelabel application), we should enable the support of FAB's > theme, which can be set in configuration. > Theme can be use in conjunction of existing `navbar_color` configuration or > can be used separately by simple unsetting the navbar_color config. > > http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2874) Enable Flask App Builder theme support
Verdan Mahmood created AIRFLOW-2874: --- Summary: Enable Flask App Builder theme support Key: AIRFLOW-2874 URL: https://issues.apache.org/jira/browse/AIRFLOW-2874 Project: Apache Airflow Issue Type: Improvement Reporter: Verdan Mahmood Assignee: Verdan Mahmood To customize the look and feel of Apache Airflow, we should enable the support of FAB's theme, which can be set in configuration. Theme can be use in conjunction of existing `navbar_color` configuration or can be used separately by simple unsetting the navbar_color config. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572898#comment-16572898 ] Kaxil Naik commented on AIRFLOW-2871: - We have also listed this at our confluence page at https://cwiki.apache.org/confluence/display/AIRFLOW/Building+and+deploying+the+docs >The site for the Airflow documentation that used to be located at >pythonhosted.org is now located at http://airflow.incubator.apache.org/ .This >should point to latest stable, while readthedocs.org keeps track of versioned >documentation. > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik],
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572906#comment-16572906 ] Bolke de Bruin commented on AIRFLOW-2870: - or use with_entities, trying that > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572906#comment-16572906 ] Bolke de Bruin edited comment on AIRFLOW-2870 at 8/8/18 9:05 AM: - or use with_entities, trying that. It's a very annoying migration. DagBags can be huge was (Author: bolke): or use with_entities, trying that > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2873) Improvements to Quick Start flow
G. Geijteman created AIRFLOW-2873: - Summary: Improvements to Quick Start flow Key: AIRFLOW-2873 URL: https://issues.apache.org/jira/browse/AIRFLOW-2873 Project: Apache Airflow Issue Type: Improvement Components: configuration Affects Versions: Airflow 1.9.0 Reporter: G. Geijteman Thank you for developing Airflow! Having ran through the Quick Start, i've come across two issues that I would like to highlight: {code:java} bash-3.2$ cd ~/project/airflow/ bash-3.2$ export AIRFLOW_HOME=~/project/airflow bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv bash-3.2$ source venv/bin/activate (venv) bash-3.2$ pip install --upgrade pip (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 (venv) bash-3.2$ python -V Python 3.6.5 (venv) bash-3.2$ pip -V pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 3.6) (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} Results in: {code:java} During handling of the above exception, another exception occurred:Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 103, in get_fernet raise AirflowException('Failed to import Fernet, it may not be installed') airflow.exceptions.AirflowException: Failed to import Fernet, it may not be installed [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 101, in get_fernet from cryptography.fernet import Fernet ModuleNotFoundError: No module named 'cryptography'{code} This is solved by: {code:java} (venv) bash-3.2$ pip install cryptography{code} *Proposed fix:* _Include the `cryptography` package_ Having fixed that, the following issue occurs when trying to: {code:java} (venv) bash-3.2$ airflow initdb{code} Exempt: {code:java} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 639, in set_extra fernet = get_fernet() File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 107, in get_fernet raise AirflowException("Could not create Fernet object: {}".format(ve)) airflow.exceptions.AirflowException: Could not create Fernet object: Incorrect padding [2018-08-08 10:50:50,697] {models.py:643} ERROR - Failed to load fernet while encrypting value, using non-encrypted value. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 105, in get_fernet return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) File "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", line 34, in _init_ key = base64.urlsafe_b64decode(key) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding{code} Which after some googling leads to the conclusion that the ~/project/airflow/airflow.cfg fernet_key field is not set to the correct value. *Feature request:* _Have the setup automatically generate a valid fernet key for the user._ The fact that this page exists: [https://bcb.github.io/airflow/fernet-key] suggests this could easily be a part of the package. I understand that this project is in incubator phase, but I would say having a quick start that is not working as-is will discourage users from trying out this project. Thank you for considering. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2873) Improvements to Quick Start flow
[ https://issues.apache.org/jira/browse/AIRFLOW-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572914#comment-16572914 ] Ash Berlin-Taylor commented on AIRFLOW-2873: Thanks for reporting this. An easy quickstart procedure is definitely something we value! The hard-dependency on cryptography should have been removed in master/1.10.0rc3 The padding issue might be as well. Could you try again after installing this? {code:bash} AIRFLOW_GPL_UNIDECODE=yes pip install 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow>=1.10' {code} > Improvements to Quick Start flow > > > Key: AIRFLOW-2873 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2873 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Affects Versions: Airflow 1.9.0 >Reporter: G. Geijteman >Priority: Major > > Thank you for developing Airflow! > Having ran through the [Quick > Start|https://airflow.incubator.apache.org/start.html], i've come across two > issues that I would like to highlight: > {code:java} > bash-3.2$ cd ~/project/airflow/ > bash-3.2$ export AIRFLOW_HOME=~/project/airflow > bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv > bash-3.2$ source venv/bin/activate > (venv) bash-3.2$ pip install --upgrade pip > (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version > 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 > x86_64 > (venv) bash-3.2$ python -V > Python 3.6.5 > (venv) bash-3.2$ pip -V > pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python > 3.6) > (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} > Results in: > {code:java} > During handling of the above exception, another exception occurred:Traceback > (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 639, in set_extra > fernet = get_fernet() > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 103, in get_fernet > raise AirflowException('Failed to import Fernet, it may not be installed') > airflow.exceptions.AirflowException: Failed to import Fernet, it may not be > installed > [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while > encrypting value, using non-encrypted value. > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 101, in get_fernet > from cryptography.fernet import Fernet > ModuleNotFoundError: No module named 'cryptography'{code} > This is solved by: > {code:java} > (venv) bash-3.2$ pip install cryptography{code} > *Proposed fix:* > _Include the `cryptography` package in the setup / package requirements_ > > Having fixed that, the following issue occurs when trying to: > {code:java} > (venv) bash-3.2$ airflow initdb{code} > Exempt: > {code:java} > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 639, in set_extra > fernet = get_fernet() > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 107, in get_fernet > raise AirflowException("Could not create Fernet object: {}".format(ve)) > airflow.exceptions.AirflowException: Could not create Fernet object: > Incorrect padding > [2018-08-08 10:50:50,697] > {models.py:643} > ERROR - Failed to load fernet while encrypting value, using non-encrypted > value. > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 105, in get_fernet > return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) > File > "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", > line 34, in _init_ > key = base64.urlsafe_b64decode(key) > File > "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", > line 133, in urlsafe_b64decode > return b64decode(s) > File > "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", > line 87, in b64decode > return binascii.a2b_base64(s) > binascii.Error: Incorrect padding{code} > Which after some googling leads to the conclusion that the > ~/project/airflow/airflow.cfg fernet_key field is not set to the correct > value. > *Feature request:* > _Have the setup automatically generate a valid fernet key for the user._ > The fact that this page exists: [https://bcb.github.io/airflow/fernet-key] > suggests this could easily be a part of the package. > I understand that this project is in incubator phase, but I would say having > a quick start that
[jira] [Commented] (AIRFLOW-2873) Improvements to Quick Start flow
[ https://issues.apache.org/jira/browse/AIRFLOW-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572924#comment-16572924 ] G. Geijteman commented on AIRFLOW-2873: --- [~ashb] Thank you, I could now run the installation phase without a problem. I suppose the Quick start will be fixed when 1.10 is out of RC then. One small thing, I do get a warning when I try: {code:java} airflow initdb{code} Results in: {code:java} ~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py:154: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead exc_info=1) WARNI [airflow.utils.log.logging_mixin.LoggingMixin] cryptography not found - values will not be stored encrypted. Traceback (most recent call last): File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", line 147, in get_fernet from cryptography.fernet import Fernet, InvalidToken ModuleNotFoundError: No module named 'cryptography' WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Could not import KubernetesPodOperator: No module named 'kubernetes' WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Install kubernetes dependencies with: pip install airflow['kubernetes']{code} It's cool to see Kubernetes is supported, but it now seems a requirement for the database init. That seems a little silly. I suppose that's still work in progress on the master? > Improvements to Quick Start flow > > > Key: AIRFLOW-2873 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2873 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Affects Versions: Airflow 1.9.0 >Reporter: G. Geijteman >Priority: Major > > Thank you for developing Airflow! > Having ran through the [Quick > Start|https://airflow.incubator.apache.org/start.html], i've come across two > issues that I would like to highlight: > {code:java} > bash-3.2$ cd ~/project/airflow/ > bash-3.2$ export AIRFLOW_HOME=~/project/airflow > bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv > bash-3.2$ source venv/bin/activate > (venv) bash-3.2$ pip install --upgrade pip > (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version > 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 > x86_64 > (venv) bash-3.2$ python -V > Python 3.6.5 > (venv) bash-3.2$ pip -V > pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python > 3.6) > (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code} > Results in: > {code:java} > During handling of the above exception, another exception occurred:Traceback > (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 639, in set_extra > fernet = get_fernet() > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 103, in get_fernet > raise AirflowException('Failed to import Fernet, it may not be installed') > airflow.exceptions.AirflowException: Failed to import Fernet, it may not be > installed > [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while > encrypting value, using non-encrypted value. > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 101, in get_fernet > from cryptography.fernet import Fernet > ModuleNotFoundError: No module named 'cryptography'{code} > This is solved by: > {code:java} > (venv) bash-3.2$ pip install cryptography{code} > *Proposed fix:* > _Include the `cryptography` package in the setup / package requirements_ > > Having fixed that, the following issue occurs when trying to: > {code:java} > (venv) bash-3.2$ airflow initdb{code} > Exempt: > {code:java} > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 639, in set_extra > fernet = get_fernet() > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 107, in get_fernet > raise AirflowException("Could not create Fernet object: {}".format(ve)) > airflow.exceptions.AirflowException: Could not create Fernet object: > Incorrect padding > [2018-08-08 10:50:50,697] > {models.py:643} > ERROR - Failed to load fernet while encrypting value, using non-encrypted > value. > Traceback (most recent call last): > File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", > line 105, in get_fernet > return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8')) > File > "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", > line 34, in _init_ > key = base64.urlsafe_b64decode(key) > File >
[GitHub] xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411349555 @kaxil , yeah, sure, first time I have done it the way as you write, but then found close PR with the reason of https://issues.apache.org/jira/browse/AIRFLOW-559 ) was confused and made changes. Any way, done. And thanks for review! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (AIRFLOW-2416) executor_config column in task_instance isn't getting created
[ https://issues.apache.org/jira/browse/AIRFLOW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572886#comment-16572886 ] John Cheng edited comment on AIRFLOW-2416 at 8/8/18 8:42 AM: - It's not fixed completely. I try to change the executor_config in my DAG and clear the task status. However, the new task instance didn't pick up the new value. I try to look into the DB and found that task_instance.executor_config is not cleared when I clear the task status. was (Author: johnchenghk01): It's not fixed yet. I try to change the executor_config in my DAG and clear the task status. However, the new task instance didn't pick up the new value. I try to look into the DB and found that task_instance.executor_config is not cleared when I clear the task status. > executor_config column in task_instance isn't getting created > - > > Key: AIRFLOW-2416 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2416 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 > Environment: Running on a mac (System Version: OS X 10.11.6 > (15G19009)) dev environment with Python 3.6. >Reporter: Curtis Deems >Assignee: Cameron Moberg >Priority: Major > > There's a new column called 'executor_config' in the 'task_instance' table > that the scheduler is attempting to query. The column isn't created with > initdb or upgradedb so the scheduler just loops and never picks up any dag > objects. The only way I discovered this was to run the scheduler thru the > debugger and review the exceptions thrown by the scheduler. This issue > doesn't show up in the scheduler logs or any other output that I could see. > The workaround is to create the column manually but since the root issue is > not easily discoverable this could be a blocker for some people. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2871: Attachment: screenshot-2.png > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2871: Attachment: screenshot-1.png > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2871) Harden and improve Read the Docs build environment
[ https://issues.apache.org/jira/browse/AIRFLOW-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572895#comment-16572895 ] Kaxil Naik commented on AIRFLOW-2871: - [~ashb] We had documented this at https://github.com/apache/incubator-airflow#getting-started We also have docs for 1.9 release as shown in the image below: !screenshot-2.png! > Harden and improve Read the Docs build environment > -- > > Key: AIRFLOW-2871 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > h2. Context > In the process of resolving AIRFLOW-2857 (via [PR > 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some > oddities in our Read the Docs (RTD) build environment especially around > cached dependencies. This motivates hardening and showing some love to our > RTD setup. > h2. Problem > I dug into the RTD build logs for a moment to find some closure on the mock > dependency discussed in PR #3703 above. I think that our RTD environment > possibly has been working by coincidence off of cached dependencies. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --ignore-installed --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > .[doc,docker,gcp_api,emr]{code} > The directory referenced by that --cache-dir arg earlier in the log happens > to have mock installed already. > {code:java} > python > /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip > install --upgrade --cache-dir > /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip > Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 > alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 > sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} > Here are some logs where you can see that (view raw): > # Latest successful (Aug. 7, 2018. 9:21 a.m.) - > [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] > # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - > [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] > # Last successful before (2) (July 18, 2018. 3:23 a.m.) - > [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] > # First build (2016) - > [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] > It appears that mock and others have potentially been cached since the first > RTD build in 2016 (4). > These versions like mock==1.0.1 do not appear to be coming from anywhere in > our current config in incubator-airflow; I believe they are installed as > [core dependencies of RTD > itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. > Some but not all of these dependencies get upgraded to newer versions further > down in the build. In the case of mock, we were getting lucky that > mock==1.0.1 was a dependency of RTD and our setup inherited that old version > which allowed the docs build to succeed. (There might be other cases of > dependencies like this too.) > h2. Solution > My proposed enhancements to harden and improve our RTD setup are: > * Hardening > ** Set our RTD build to use a virtual environment if it's not already > ** Set our RTD build to ignore packages outside of its virtualenv like > dependencies of RTD itself > ** Specify any dependencies broken by ^ > ** Test wiping a version in the build environment (not sure if this clears > cache dir) > *** > [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] > *** > [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] > ** Make build.image, python.version, etc explicit in yaml config > *** [https://docs.readthedocs.io/en/latest/yaml-config.html] > ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x > * Misc > ** Improve RTD project page to have tags and description > ** Lint YAML file > Note: I don't yet have maintainer access for airflow on RTD which I believe > this would require. I am happy to take this issue if I can get that. I have > experience as an admin of another project on RTD (simple-salesforce). > > /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411349555 @kaxil , yeah, sure, first time I have done it the way as you write, but then found close PR with the reason of https://issues.apache.org/jira/browse/AIRFLOW-559 ) was confused an made changes. Any way, done. And thanks for review! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] verdan opened a new pull request #3719: [AIRFLOW-2874] Enables FAB's theme support
verdan opened a new pull request #3719: [AIRFLOW-2874] Enables FAB's theme support URL: https://github.com/apache/incubator-airflow/pull/3719 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow 2874](https://issues.apache.org/jira/browse/AIRFLOW-2874) issues and references them in the PR title. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: In an effort to make Apache Airflow a white label application, we should enable the FAB's theme support. Users can then easily change the theme according to their needs. There already exists a configuration to set the Navbar Color, that can be used in conjunction with the theme option, or can simply be unset to fully utilize the theme. List of available themes are available here: https://github.com/dpgaspar/Flask-AppBuilder-Skeleton/blob/master/config.py#L88 _Note: Some of the dark themes might not work properly because of the charts that we are using at the moment._ Here are some of the screenshots from different themes: ![screen shot 2018-08-08 at 11 41 17 am](https://user-images.githubusercontent.com/25360476/43831843-bb6705d4-9b05-11e8-85cc-a818fb48c77a.png) ![screen shot 2018-08-08 at 11 42 05 am](https://user-images.githubusercontent.com/25360476/43831844-bb812356-9b05-11e8-8429-3b06c8cf1a7a.png) ![screen shot 2018-08-08 at 11 42 48 am](https://user-images.githubusercontent.com/25360476/43831846-bb9b7ada-9b05-11e8-9396-b8c9a67d84fa.png) ![screen shot 2018-08-08 at 11 43 20 am](https://user-images.githubusercontent.com/25360476/43831847-bbb150da-9b05-11e8-98e3-29f32d0b71e1.png) ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2874) Enable Flask App Builder theme support
[ https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572989#comment-16572989 ] ASF GitHub Bot commented on AIRFLOW-2874: - verdan opened a new pull request #3719: [AIRFLOW-2874] Enables FAB's theme support URL: https://github.com/apache/incubator-airflow/pull/3719 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow 2874](https://issues.apache.org/jira/browse/AIRFLOW-2874) issues and references them in the PR title. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: In an effort to make Apache Airflow a white label application, we should enable the FAB's theme support. Users can then easily change the theme according to their needs. There already exists a configuration to set the Navbar Color, that can be used in conjunction with the theme option, or can simply be unset to fully utilize the theme. List of available themes are available here: https://github.com/dpgaspar/Flask-AppBuilder-Skeleton/blob/master/config.py#L88 _Note: Some of the dark themes might not work properly because of the charts that we are using at the moment._ Here are some of the screenshots from different themes: ![screen shot 2018-08-08 at 11 41 17 am](https://user-images.githubusercontent.com/25360476/43831843-bb6705d4-9b05-11e8-85cc-a818fb48c77a.png) ![screen shot 2018-08-08 at 11 42 05 am](https://user-images.githubusercontent.com/25360476/43831844-bb812356-9b05-11e8-8429-3b06c8cf1a7a.png) ![screen shot 2018-08-08 at 11 42 48 am](https://user-images.githubusercontent.com/25360476/43831846-bb9b7ada-9b05-11e8-9396-b8c9a67d84fa.png) ![screen shot 2018-08-08 at 11 43 20 am](https://user-images.githubusercontent.com/25360476/43831847-bbb150da-9b05-11e8-98e3-29f32d0b71e1.png) ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable Flask App Builder theme support > -- > > Key: AIRFLOW-2874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2874 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Verdan Mahmood >Priority: Major > > To customize the look and feel of Apache Airflow (an effort towards making > Airflow a whitelabel application), we should enable the support of FAB's > theme, which can be set in configuration. > Theme can be use in conjunction of existing `navbar_color` configuration or > can be used separately by simple unsetting the navbar_color config. > > http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411360602 tests failed with ERROR [airflow.models.DagBag] Failed to bag_dag: /home/travis/build/apache/incubator-airflow/tests/dags/test_zip_invalid_cron.zip seems, like tests in master broken This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin closed pull request #3708: [AIRFLOW-2859] Implement own UtcDateTime
bolkedebruin closed pull request #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index e2001789d9..45b7903d3e 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -1007,7 +1007,6 @@ def initdb(args): # noqa print("Done.") -@cli_utils.action_logging def resetdb(args): print("DB: " + repr(settings.engine.url)) if args.yes or input("This will drop existing tables " diff --git a/airflow/jobs.py b/airflow/jobs.py index cc26feee53..e8ba437e0b 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -40,7 +40,6 @@ Column, Integer, String, func, Index, or_, and_, not_) from sqlalchemy.exc import OperationalError from sqlalchemy.orm.session import make_transient -from sqlalchemy_utc import UtcDateTime from tabulate import tabulate from time import sleep @@ -52,6 +51,7 @@ from airflow.task.task_runner import get_task_runner from airflow.ti_deps.dep_context import DepContext, QUEUE_DEPS, RUN_DEPS from airflow.utils import asciiart, helpers, timezone +from airflow.utils.configuration import tmp_configuration_copy from airflow.utils.dag_processing import (AbstractDagFileProcessor, DagFileProcessorManager, SimpleDag, @@ -60,9 +60,9 @@ from airflow.utils.db import create_session, provide_session from airflow.utils.email import send_email from airflow.utils.log.logging_mixin import LoggingMixin, set_context, StreamLogWriter -from airflow.utils.state import State -from airflow.utils.configuration import tmp_configuration_copy from airflow.utils.net import get_hostname +from airflow.utils.state import State +from airflow.utils.sqlalchemy import UtcDateTime Base = models.Base ID_LEN = models.ID_LEN diff --git a/airflow/models.py b/airflow/models.py index 288bd4c937..e7d38ebd65 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -60,7 +60,6 @@ from sqlalchemy import func, or_, and_, true as sqltrue from sqlalchemy.ext.declarative import declarative_base, declared_attr from sqlalchemy.orm import reconstructor, relationship, synonym -from sqlalchemy_utc import UtcDateTime from croniter import croniter import six @@ -88,6 +87,7 @@ as_tuple, is_container, validate_key, pprinttable) from airflow.utils.operator_resources import Resources from airflow.utils.state import State +from airflow.utils.sqlalchemy import UtcDateTime from airflow.utils.timeout import timeout from airflow.utils.trigger_rule import TriggerRule from airflow.utils.weight_rule import WeightRule diff --git a/airflow/utils/sqlalchemy.py b/airflow/utils/sqlalchemy.py index baddd9dcf1..76c112785f 100644 --- a/airflow/utils/sqlalchemy.py +++ b/airflow/utils/sqlalchemy.py @@ -22,15 +22,19 @@ from __future__ import print_function from __future__ import unicode_literals +import datetime import os +import pendulum import time import random from sqlalchemy import event, exc, select +from sqlalchemy.types import DateTime, TypeDecorator from airflow.utils.log.logging_mixin import LoggingMixin log = LoggingMixin().log +utc = pendulum.timezone('UTC') def setup_event_handlers( @@ -101,13 +105,21 @@ def ping_connection(connection, branch): def connect(dbapi_connection, connection_record): connection_record.info['pid'] = os.getpid() -@event.listens_for(engine, "connect") -def set_sqlite_pragma(dbapi_connection, connection_record): -if 'sqlite3.Connection' in str(type(dbapi_connection)): +if engine.dialect.name == "sqlite": +@event.listens_for(engine, "connect") +def set_sqlite_pragma(dbapi_connection, connection_record): cursor = dbapi_connection.cursor() cursor.execute("PRAGMA foreign_keys=ON") cursor.close() +# this ensures sanity in mysql when storing datetimes (not required for postgres) +if engine.dialect.name == "mysql": +@event.listens_for(engine, "connect") +def set_mysql_timezone(dbapi_connection, connection_record): +cursor = dbapi_connection.cursor() +cursor.execute("SET time_zone = '+00:00'") +cursor.close() + @event.listens_for(engine, "checkout") def checkout(dbapi_connection, connection_record, connection_proxy): pid = os.getpid() @@ -117,3 +129,46 @@ def checkout(dbapi_connection, connection_record, connection_proxy): "Connection record belongs to pid {}, " "attempting to check out in pid {}".format(connection_record.info['pid'], pid) ) + + +class UtcDateTime(TypeDecorator): +""" +
[jira] [Commented] (AIRFLOW-2859) DateTimes returned from the database are not converted to UTC
[ https://issues.apache.org/jira/browse/AIRFLOW-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572715#comment-16572715 ] ASF GitHub Bot commented on AIRFLOW-2859: - bolkedebruin closed pull request #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index e2001789d9..45b7903d3e 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -1007,7 +1007,6 @@ def initdb(args): # noqa print("Done.") -@cli_utils.action_logging def resetdb(args): print("DB: " + repr(settings.engine.url)) if args.yes or input("This will drop existing tables " diff --git a/airflow/jobs.py b/airflow/jobs.py index cc26feee53..e8ba437e0b 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -40,7 +40,6 @@ Column, Integer, String, func, Index, or_, and_, not_) from sqlalchemy.exc import OperationalError from sqlalchemy.orm.session import make_transient -from sqlalchemy_utc import UtcDateTime from tabulate import tabulate from time import sleep @@ -52,6 +51,7 @@ from airflow.task.task_runner import get_task_runner from airflow.ti_deps.dep_context import DepContext, QUEUE_DEPS, RUN_DEPS from airflow.utils import asciiart, helpers, timezone +from airflow.utils.configuration import tmp_configuration_copy from airflow.utils.dag_processing import (AbstractDagFileProcessor, DagFileProcessorManager, SimpleDag, @@ -60,9 +60,9 @@ from airflow.utils.db import create_session, provide_session from airflow.utils.email import send_email from airflow.utils.log.logging_mixin import LoggingMixin, set_context, StreamLogWriter -from airflow.utils.state import State -from airflow.utils.configuration import tmp_configuration_copy from airflow.utils.net import get_hostname +from airflow.utils.state import State +from airflow.utils.sqlalchemy import UtcDateTime Base = models.Base ID_LEN = models.ID_LEN diff --git a/airflow/models.py b/airflow/models.py index 288bd4c937..e7d38ebd65 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -60,7 +60,6 @@ from sqlalchemy import func, or_, and_, true as sqltrue from sqlalchemy.ext.declarative import declarative_base, declared_attr from sqlalchemy.orm import reconstructor, relationship, synonym -from sqlalchemy_utc import UtcDateTime from croniter import croniter import six @@ -88,6 +87,7 @@ as_tuple, is_container, validate_key, pprinttable) from airflow.utils.operator_resources import Resources from airflow.utils.state import State +from airflow.utils.sqlalchemy import UtcDateTime from airflow.utils.timeout import timeout from airflow.utils.trigger_rule import TriggerRule from airflow.utils.weight_rule import WeightRule diff --git a/airflow/utils/sqlalchemy.py b/airflow/utils/sqlalchemy.py index baddd9dcf1..76c112785f 100644 --- a/airflow/utils/sqlalchemy.py +++ b/airflow/utils/sqlalchemy.py @@ -22,15 +22,19 @@ from __future__ import print_function from __future__ import unicode_literals +import datetime import os +import pendulum import time import random from sqlalchemy import event, exc, select +from sqlalchemy.types import DateTime, TypeDecorator from airflow.utils.log.logging_mixin import LoggingMixin log = LoggingMixin().log +utc = pendulum.timezone('UTC') def setup_event_handlers( @@ -101,13 +105,21 @@ def ping_connection(connection, branch): def connect(dbapi_connection, connection_record): connection_record.info['pid'] = os.getpid() -@event.listens_for(engine, "connect") -def set_sqlite_pragma(dbapi_connection, connection_record): -if 'sqlite3.Connection' in str(type(dbapi_connection)): +if engine.dialect.name == "sqlite": +@event.listens_for(engine, "connect") +def set_sqlite_pragma(dbapi_connection, connection_record): cursor = dbapi_connection.cursor() cursor.execute("PRAGMA foreign_keys=ON") cursor.close() +# this ensures sanity in mysql when storing datetimes (not required for postgres) +if engine.dialect.name == "mysql": +@event.listens_for(engine, "connect") +def set_mysql_timezone(dbapi_connection, connection_record): +cursor = dbapi_connection.cursor() +cursor.execute("SET time_zone = '+00:00'") +cursor.close() + @event.listens_for(engine, "checkout") def checkout(dbapi_connection, connection_record, connection_proxy): pid = os.getpid() @@ -117,3 +129,46 @@ def checkout(dbapi_connection,
[jira] [Commented] (AIRFLOW-2859) DateTimes returned from the database are not converted to UTC
[ https://issues.apache.org/jira/browse/AIRFLOW-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572716#comment-16572716 ] ASF subversion and git services commented on AIRFLOW-2859: -- Commit 6fd4e6055e36e9867923b0b402363fcd8c30e297 in incubator-airflow's branch refs/heads/master from bolkedebruin [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=6fd4e60 ] [AIRFLOW-2859] Implement own UtcDateTime (#3708) The different UtcDateTime implementations all have issues. Either they replace tzinfo directly without converting or they do not convert to UTC at all. We also ensure all mysql connections are in UTC in order to keep sanity, as mysql will ignore the timezone of a field when inserting/updating. > DateTimes returned from the database are not converted to UTC > - > > Key: AIRFLOW-2859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2859 > Project: Apache Airflow > Issue Type: Bug > Components: database >Reporter: Bolke de Bruin >Priority: Blocker > Fix For: 1.10.0 > > > This is due to the fact that sqlalchemy-utcdatetime does not convert to UTC > when the database returns datetimes with tzinfo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572816#comment-16572816 ] George Leslie-Waksman commented on AIRFLOW-2870: The process to reproduce is as follows: # Start with an Airflow deployment that predates {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} (e.g. 1.8.1) # Run Airflow enough to populate task_instances in the metadata database (run one of the sample dags) # Install an Airflow version after {{27c6a30d7c24_add_executor_config_to_task_instance.py}} (e.g. 1.10rc3) # {{airflow upgradedb}} This will fail with a message about the column "task_instance.executor_config" not existing. My current understanding of what is happening: * When constructing a sqlalchemy orm query using a declarative model (i.e. {{TaskInstance}}), the database table must be consistent with the structure of that model. ** SQLAlchemy's mapper will query all columns known to the orm mapper (code side) and assume they exist in the database * When running a migration, the database table is in a transitionary state * The code in {{airflow/models.py}} reflects the state of the database after running ALL migrations through the present * When we are using the 1.10rc3 code to run migrations and we reach {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}}, we [import TaskInstance|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L36] as if it has all future columns and then [query the old schema|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L64] Under typical circumstances, one can avoid this issue by performing migrations using alembic + SQLAlchemy core (no orm) and directly manipulating the tables. However, in this case, we need to populate information from a {{Task}} object that does not have a representation in the database. We may be able to work around the database issues by manipulating SQLAlchemy's [column loading|http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols] but that may be tricky given the intertwined nature of Airflow's model code. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2826) Add hook for Google Cloud KMS
[ https://issues.apache.org/jira/browse/AIRFLOW-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572848#comment-16572848 ] ASF subversion and git services commented on AIRFLOW-2826: -- Commit acca61c602e341da06ebee2eca3a26f4e7400238 in incubator-airflow's branch refs/heads/master from [~jakahn] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=acca61c ] [AIRFLOW-2826] Add GoogleCloudKMSHook (#3677) Adds a hook enabling encryption and decryption through Google Cloud KMS. This should also contribute to AIRFLOW-2062. > Add hook for Google Cloud KMS > - > > Key: AIRFLOW-2826 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2826 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Jasper Kahn >Assignee: Jasper Kahn >Priority: Minor > Labels: features > > Add a hook to support interacting with Google Cloud KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2062) Support fine-grained Connection encryption
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572849#comment-16572849 ] ASF subversion and git services commented on AIRFLOW-2062: -- Commit acca61c602e341da06ebee2eca3a26f4e7400238 in incubator-airflow's branch refs/heads/master from [~jakahn] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=acca61c ] [AIRFLOW-2826] Add GoogleCloudKMSHook (#3677) Adds a hook enabling encryption and decryption through Google Cloud KMS. This should also contribute to AIRFLOW-2062. > Support fine-grained Connection encryption > -- > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Priority: Minor > > This effort targets containerized tasks (e.g., those launched by > KubernetesExecutor). Under that paradigm, each task could potentially operate > under different credentials, and fine-grained Connection encryption will > enable an administrator to restrict which connections can be accessed by > which tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2826) Add hook for Google Cloud KMS
[ https://issues.apache.org/jira/browse/AIRFLOW-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572847#comment-16572847 ] ASF GitHub Bot commented on AIRFLOW-2826: - Fokko closed pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook URL: https://github.com/apache/incubator-airflow/pull/3677 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/gcp_kms_hook.py b/airflow/contrib/hooks/gcp_kms_hook.py new file mode 100644 index 00..6f2b3aedff --- /dev/null +++ b/airflow/contrib/hooks/gcp_kms_hook.py @@ -0,0 +1,108 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +import base64 + +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + +from apiclient.discovery import build + + +def _b64encode(s): +""" Base 64 encodes a bytes object to a string """ +return base64.b64encode(s).decode('ascii') + + +def _b64decode(s): +""" Base 64 decodes a string to bytes. """ +return base64.b64decode(s.encode('utf-8')) + + +class GoogleCloudKMSHook(GoogleCloudBaseHook): +""" +Interact with Google Cloud KMS. This hook uses the Google Cloud Platform +connection. +""" + +def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None): +super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, delegate_to=delegate_to) + +def get_conn(self): +""" +Returns a KMS service object. + +:rtype: apiclient.discovery.Resource +""" +http_authorized = self._authorize() +return build( +'cloudkms', 'v1', http=http_authorized, cache_discovery=False) + +def encrypt(self, key_name, plaintext, authenticated_data=None): +""" +Encrypts a plaintext message using Google Cloud KMS. + +:param key_name: The Resource Name for the key (or key version) + to be used for encyption. Of the form + ``projects/*/locations/*/keyRings/*/cryptoKeys/**`` +:type key_name: str +:param plaintext: The message to be encrypted. +:type plaintext: bytes +:param authenticated_data: Optional additional authenticated data that + must also be provided to decrypt the message. +:type authenticated_data: bytes +:return: The base 64 encoded ciphertext of the original message. +:rtype: str +""" +keys = self.get_conn().projects().locations().keyRings().cryptoKeys() +body = {'plaintext': _b64encode(plaintext)} +if authenticated_data: +body['additionalAuthenticatedData'] = _b64encode(authenticated_data) + +request = keys.encrypt(name=key_name, body=body) +response = request.execute() + +ciphertext = response['ciphertext'] +return ciphertext + +def decrypt(self, key_name, ciphertext, authenticated_data=None): +""" +Decrypts a ciphertext message using Google Cloud KMS. + +:param key_name: The Resource Name for the key to be used for decyption. + Of the form ``projects/*/locations/*/keyRings/*/cryptoKeys/**`` +:type key_name: str +:param ciphertext: The message to be decrypted. +:type ciphertext: str +:param authenticated_data: Any additional authenticated data that was + provided when encrypting the message. +:type authenticated_data: bytes +:return: The original message. +:rtype: bytes +""" +keys = self.get_conn().projects().locations().keyRings().cryptoKeys() +body = {'ciphertext': ciphertext} +if authenticated_data: +body['additionalAuthenticatedData'] = _b64encode(authenticated_data) + +request = keys.decrypt(name=key_name, body=body) +response = request.execute() + +plaintext =
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572876#comment-16572876 ] Bolke de Bruin commented on AIRFLOW-2870: - Gotcha. The weakness of using orm in alembic. Column loading might be an option as we do not need the full model. Or instead of using the database as a reference use the dagbag as a reference and update by using direct sql. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572883#comment-16572883 ] Ash Berlin-Taylor commented on AIRFLOW-2870: Direct SQL might be an option, or https://stackoverflow.com/questions/24612395/how-do-i-execute-inserts-and-updates-in-an-alembic-upgrade-script suggests defining a model in the migration file directly, rather than importing one. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572834#comment-16572834 ] George Leslie-Waksman commented on AIRFLOW-2870: Exact steps to reproduce: {noformat} cd temp pyenv virtualenv 2.7.15 temp pyenv local temp pip install pip==9.0.1 pip install apache-airflow==1.8.1 AIRFLOW_HOME=. airflow initdb AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 example_bash_operator SLUGIFY_USES_TEXT_UNIDECODE=yes pip install https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz AIRFLOW_HOME=. airflow upgradedb {noformat} This results in the following error output: {noformat} [2018-08-08 00:51:32,656] {__init__.py:51} INFO - Using executor SequentialExecutor /Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py:1596: DeprecationWarning: The celeryd_concurrency option in [celery] has been renamed to worker_concurrency - the old setting has been used, but please update your config. default=conf.get('celery', 'worker_concurrency')), DB: sqlite:///./airflow.db [2018-08-08 00:51:32,833] {db.py:338} INFO - Creating tables INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Could not import KubernetesPodOperator: No module named kubernetes WARNI [airflow.utils.log.logging_mixin.LoggingMixin] Install kubernetes dependencies with: pip install airflow['kubernetes'] Traceback (most recent call last): File "/Users/georgelesliewaksman/.pyenv/versions/temp2/bin/airflow", line 32, in args.func(args) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper return f(*args, **kwargs) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py", line 1020, in upgradedb db_utils.upgradedb() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/db.py", line 346, in upgradedb command.upgrade(config, 'heads') File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/command.py", line 174, in upgrade script.run_env() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/script/base.py", line 416, in run_env util.load_python_file(self.dir, 'env.py') File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file module = load_module_py(module_id, path) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/compat.py", line 79, in load_module_py mod = imp.load_source(module_id, path, fp) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py", line 91, in run_migrations_online() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py", line 86, in run_migrations_online context.run_migrations() File "", line 8, in run_migrations File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/environment.py", line 807, in run_migrations self.get_context().run_migrations(**kw) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/migration.py", line 321, in run_migrations step.migration_fn(**kw) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py", line 66, in upgrade ).limit(BATCH_SIZE).all() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2703, in all return list(self) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__ return self._execute_and_instances(context) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2878, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute return meth(self, multiparams,
[jira] [Reopened] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob
[ https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor reopened AIRFLOW-2848: Re-opening so I can set Fix Version - it doesn't appear that this is doable without it being open. Thanks Jira. > dag_id is missing in metadata table "job" for LocalTaskJob > -- > > Key: AIRFLOW-2848 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2848 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > Attachments: After this fix.png, Before this fix.png > > > dag_id is missing for all entries in metadata table "job" with job_type > "LocalTaskJob". > This is due to that dag_id was not specified within class LocalTaskJob. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2872) Minor bugs in "Ad Hoc Query" view, and refinement
Xiaodong DENG created AIRFLOW-2872: -- Summary: Minor bugs in "Ad Hoc Query" view, and refinement Key: AIRFLOW-2872 URL: https://issues.apache.org/jira/browse/AIRFLOW-2872 Project: Apache Airflow Issue Type: Improvement Components: ui Reporter: Xiaodong DENG Assignee: Xiaodong DENG # The ".csv" button in *Ad Hoc Query* view is responding with a plain text file, rather than a CSV file (even though users can manually change the extension). # Argument 'has_data' passed to the template is not used by the template 'airflow/query.html'. # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced before assignment' # 'result = df.to_html()' should only be invoked when user doesn NOT choose '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the result it returns will not be used if user askes for CSV downloading instead of a html page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob
[ https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-2848. Resolution: Fixed Fix Version/s: 2.0.0 > dag_id is missing in metadata table "job" for LocalTaskJob > -- > > Key: AIRFLOW-2848 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2848 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Fix For: 2.0.0 > > Attachments: After this fix.png, Before this fix.png > > > dag_id is missing for all entries in metadata table "job" with job_type > "LocalTaskJob". > This is due to that dag_id was not specified within class LocalTaskJob. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2869) Remove smart quote from default config
[ https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572854#comment-16572854 ] ASF subversion and git services commented on AIRFLOW-2869: -- Commit 700f5f088dbead866170c9a3fe7e021e86ab30bb in incubator-airflow's branch refs/heads/v1-10-test from William Horton [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=700f5f0 ] [AIRFLOW-2869] Remove smart quote from default config Closes #3716 from wdhorton/remove-smart-quote- from-cfg (cherry picked from commit 67e2bb96cdc5ea37226d11332362d3bd3778cea0) Signed-off-by: Bolke de Bruin > Remove smart quote from default config > -- > > Key: AIRFLOW-2869 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Trivial > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2416) executor_config column in task_instance isn't getting created
[ https://issues.apache.org/jira/browse/AIRFLOW-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572886#comment-16572886 ] John Cheng commented on AIRFLOW-2416: - It's not fixed yet. I try to change the executor_config in my DAG and clear the task status. However, the new task instance didn't pick up the new value. I try to look into the DB and found that task_instance.executor_config is not cleared when I clear the task status. > executor_config column in task_instance isn't getting created > - > > Key: AIRFLOW-2416 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2416 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 > Environment: Running on a mac (System Version: OS X 10.11.6 > (15G19009)) dev environment with Python 3.6. >Reporter: Curtis Deems >Assignee: Cameron Moberg >Priority: Major > > There's a new column called 'executor_config' in the 'task_instance' table > that the scheduler is attempting to query. The column isn't created with > initdb or upgradedb so the scheduler just loops and never picks up any dag > objects. The only way I discovered this was to run the scheduler thru the > debugger and review the exceptions thrown by the scheduler. This issue > doesn't show up in the scheduler logs or any other output that I could see. > The workaround is to create the column manually but since the root issue is > not easily discoverable this could be a blocker for some people. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook
Fokko closed pull request #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook URL: https://github.com/apache/incubator-airflow/pull/3677 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/gcp_kms_hook.py b/airflow/contrib/hooks/gcp_kms_hook.py new file mode 100644 index 00..6f2b3aedff --- /dev/null +++ b/airflow/contrib/hooks/gcp_kms_hook.py @@ -0,0 +1,108 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +import base64 + +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + +from apiclient.discovery import build + + +def _b64encode(s): +""" Base 64 encodes a bytes object to a string """ +return base64.b64encode(s).decode('ascii') + + +def _b64decode(s): +""" Base 64 decodes a string to bytes. """ +return base64.b64decode(s.encode('utf-8')) + + +class GoogleCloudKMSHook(GoogleCloudBaseHook): +""" +Interact with Google Cloud KMS. This hook uses the Google Cloud Platform +connection. +""" + +def __init__(self, gcp_conn_id='google_cloud_default', delegate_to=None): +super(GoogleCloudKMSHook, self).__init__(gcp_conn_id, delegate_to=delegate_to) + +def get_conn(self): +""" +Returns a KMS service object. + +:rtype: apiclient.discovery.Resource +""" +http_authorized = self._authorize() +return build( +'cloudkms', 'v1', http=http_authorized, cache_discovery=False) + +def encrypt(self, key_name, plaintext, authenticated_data=None): +""" +Encrypts a plaintext message using Google Cloud KMS. + +:param key_name: The Resource Name for the key (or key version) + to be used for encyption. Of the form + ``projects/*/locations/*/keyRings/*/cryptoKeys/**`` +:type key_name: str +:param plaintext: The message to be encrypted. +:type plaintext: bytes +:param authenticated_data: Optional additional authenticated data that + must also be provided to decrypt the message. +:type authenticated_data: bytes +:return: The base 64 encoded ciphertext of the original message. +:rtype: str +""" +keys = self.get_conn().projects().locations().keyRings().cryptoKeys() +body = {'plaintext': _b64encode(plaintext)} +if authenticated_data: +body['additionalAuthenticatedData'] = _b64encode(authenticated_data) + +request = keys.encrypt(name=key_name, body=body) +response = request.execute() + +ciphertext = response['ciphertext'] +return ciphertext + +def decrypt(self, key_name, ciphertext, authenticated_data=None): +""" +Decrypts a ciphertext message using Google Cloud KMS. + +:param key_name: The Resource Name for the key to be used for decyption. + Of the form ``projects/*/locations/*/keyRings/*/cryptoKeys/**`` +:type key_name: str +:param ciphertext: The message to be decrypted. +:type ciphertext: str +:param authenticated_data: Any additional authenticated data that was + provided when encrypting the message. +:type authenticated_data: bytes +:return: The original message. +:rtype: bytes +""" +keys = self.get_conn().projects().locations().keyRings().cryptoKeys() +body = {'ciphertext': ciphertext} +if authenticated_data: +body['additionalAuthenticatedData'] = _b64encode(authenticated_data) + +request = keys.decrypt(name=key_name, body=body) +response = request.execute() + +plaintext = _b64decode(response['plaintext']) +return plaintext diff --git a/tests/contrib/hooks/test_gcp_kms_hook.py b/tests/contrib/hooks/test_gcp_kms_hook.py new file mode 100644 index 00..eabf20e564 --- /dev/null +++
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572857#comment-16572857 ] George Leslie-Waksman commented on AIRFLOW-2870: If we instead upgrade from 1.8.1 -> 1.9.0 -> 1.10rc3, we do not run into a problem because 1.9.0 exists between {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} and {{27c6a30d7c24_add_executor_config_to_task_instance.py}} {noformat} cd temp pyenv virtualenv 2.7.15 temp pyenv local temp pip install pip==9.0.1 pip install apache-airflow==1.8.1 AIRFLOW_HOME=. airflow initdb AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 example_bash_operator pip install apache-airflow==1.9.0 AIRFLOW_HOME=. airflow upgradedb SLUGIFY_USES_TEXT_UNIDECODE=yes pip install https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz AIRFLOW_HOME=. airflow upgradedb {noformat} This is a fine workaround but, absent a warning notice or a great deal of digging into code and understanding how migrations, there is no way for a user to know that it is not possible to upgrade Airflow from <1.9.0 to >=1.10.0. Furthermore, failing to upgrade and then trying to go through the intermediary version will leave the database in an inconsistent state that requires manual database intervention to repair: {noformat} cd temp pyenv virtualenv 2.7.15 temp pyenv local temp pip install pip==9.0.1 pip install apache-airflow==1.8.1 AIRFLOW_HOME=. airflow initdb AIRFLOW_HOME=. airflow backfill -s 2018-01-01 -e 2018-01-02 example_bash_operator SLUGIFY_USES_TEXT_UNIDECODE=yes pip install https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-source.tar.gz AIRFLOW_HOME=. airflow upgradedb pip install apache-airflow==1.9.0 AIRFLOW_HOME=. airflow upgradedb {noformat} failure on 1.9.0 upgrade attempt: {noformat} [2018-08-08 01:04:53,318] {__init__.py:45} INFO - Using executor SequentialExecutor DB: sqlite:///./airflow.db [2018-08-08 01:04:53,450] {db.py:312} INFO - Creating tables INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/Users/georgelesliewaksman/.pyenv/versions/temp2/bin/airflow", line 27, in args.func(args) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/bin/cli.py", line 913, in upgradedb db_utils.upgradedb() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/utils/db.py", line 320, in upgradedb command.upgrade(config, 'heads') File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/command.py", line 174, in upgrade script.run_env() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/script/base.py", line 416, in run_env util.load_python_file(self.dir, 'env.py') File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file module = load_module_py(module_id, path) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/util/compat.py", line 79, in load_module_py mod = imp.load_source(module_id, path, fp) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py", line 86, in run_migrations_online() File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/env.py", line 81, in run_migrations_online context.run_migrations() File "", line 8, in run_migrations File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/environment.py", line 807, in run_migrations self.get_context().run_migrations(**kw) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/runtime/migration.py", line 321, in run_migrations step.migration_fn(**kw) File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py", line 39, in upgrade server_default="-1")) File "", line 8, in add_column File "", line 3, in add_column File "/Users/georgelesliewaksman/.pyenv/versions/2.7.15/envs/temp2/lib/python2.7/site-packages/alembic/operations/ops.py", line 1541, in add_column return operations.invoke(op) File
[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572855#comment-16572855 ] ASF subversion and git services commented on AIRFLOW-2140: -- Commit f58246d2ef265eb762c179a12c40e011ce62cea1 in incubator-airflow's branch refs/heads/v1-10-test from [~ashb] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=f58246d ] [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (#3700) This extra dep is a quasi-breaking change when upgrading - previously there were no deps outside of Airflow itself for this hook. Importing the k8s libs breaks installs that aren't also using Kubernetes. This makes the dep optional for anyone who doesn't explicitly use the functionality (cherry picked from commit 0be002eebb182b607109a0390d7f6fb8795c668b) Signed-off-by: Bolke de Bruin > Add Kubernetes Scheduler to Spark Submit Operator > - > > Key: AIRFLOW-2140 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2140 > Project: Apache Airflow > Issue Type: New Feature >Affects Versions: 1.9.0 >Reporter: Rob Keevil >Assignee: Rob Keevil >Priority: Major > Fix For: 2.0.0 > > > Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the > existing Standalone, Yarn and Mesos resource managers. > https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md > We should extend the spark submit operator to enable the new K8s spark submit > options, and to be able to monitor Spark jobs running within Kubernetes. > I already have working code for this, I need to test the monitoring/log > parsing code and make sure that Airflow is able to terminate Kubernetes pods > when jobs are cancelled etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208498985 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @Fokko No no, it is a bad practise to use `{}` or `dict()` as a default argument. Reference: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments The above change is because we had the below code: ![image](https://user-images.githubusercontent.com/8811558/43825996-141df72c-9aee-11e8-8b2f-c55f383f5ae9.png) which I changed to the following: ![image](https://user-images.githubusercontent.com/8811558/43826016-1f5c03a4-9aee-11e8-8c2c-06e2f715ae33.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3718: [AIRFLOW-2872] Fix and Refine 'Ad Hoc Query' View
codecov-io commented on issue #3718: [AIRFLOW-2872] Fix and Refine 'Ad Hoc Query' View URL: https://github.com/apache/incubator-airflow/pull/3718#issuecomment-411331261 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=h1) Report > Merging [#3718](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/6fd4e6055e36e9867923b0b402363fcd8c30e297?src=pr=desc) will **not change** coverage. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3718/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3718 +/- ## === Coverage 77.63% 77.63% === Files 204 204 Lines 1580015800 === Hits1226712267 Misses 3533 3533 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3718/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.88% <100%> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=footer). Last update [6fd4e60...9c62c02](https://codecov.io/gh/apache/incubator-airflow/pull/3718?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208485615 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,239 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param config: training_config +:type config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param config: tuning_config +:type config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +sec = sec + self.check_interval + +if self.max_ingestion_time and sec > self.max_ingestion_time: +# ensure that the job gets killed if the max ingestion time is exceeded +raise AirflowException("SageMaker job took more than " + "%s seconds", self.max_ingestion_time) + +time.sleep(self.check_interval) +try: +status = describe_function(*args)[key] +self.log.info("Job still running for %s seconds... " + "current status is %s" % (sec, status)) +except
[GitHub] Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
Fokko commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208486176 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,239 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param config: training_config +:type config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param config: tuning_config +:type config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +sec = sec + self.check_interval + +if self.max_ingestion_time and sec > self.max_ingestion_time: +# ensure that the job gets killed if the max ingestion time is exceeded +raise AirflowException("SageMaker job took more than " + "%s seconds", self.max_ingestion_time) + +time.sleep(self.check_interval) +try: +status = describe_function(*args)[key] +self.log.info("Job still running for %s seconds... " + "current status is %s" % (sec, status)) +except
[GitHub] XD-DENG opened a new pull request #3718: [AIRFLOW-2872] Fix and Refine 'Ad Hoc Query' View
XD-DENG opened a new pull request #3718: [AIRFLOW-2872] Fix and Refine 'Ad Hoc Query' View URL: https://github.com/apache/incubator-airflow/pull/3718 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2872 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - 1. The `.csv` button in `Ad Hoc Query` actually downloads a plain text file, whil it should be a CSV file. Plain text file works too after changing extension to '.csv', but it would be nicer to directly respond with a CSV file. This is addressed by changing `mimetype` to `text/csv`. - 2. Argument `has_data` passed to the template is not used at all in the template `airflow/query.html`. It can be removed in the `self.render()`. - 3. We should respond with a CSV file only when user opts in for '.csv' AND the data is available ('df' is generated). Otherwise we will encounter error below: `UnboundLocalError: local variable 'df' referenced before assignment`. - 4. `result = df.to_html(...)` should only be invoked when user doesn NOT choose '.csv'. Otherwise it's a waste of resource to invoke `df.to_html()` since the result it returns will not be used if user asks for CSV downloading instead of a html page. - 5. remove a commented line. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2872) Minor bugs in "Ad Hoc Query" view, and refinement
[ https://issues.apache.org/jira/browse/AIRFLOW-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572840#comment-16572840 ] ASF GitHub Bot commented on AIRFLOW-2872: - XD-DENG opened a new pull request #3718: [AIRFLOW-2872] Fix and Refine 'Ad Hoc Query' View URL: https://github.com/apache/incubator-airflow/pull/3718 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2872 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - 1. The `.csv` button in `Ad Hoc Query` actually downloads a plain text file, whil it should be a CSV file. Plain text file works too after changing extension to '.csv', but it would be nicer to directly respond with a CSV file. This is addressed by changing `mimetype` to `text/csv`. - 2. Argument `has_data` passed to the template is not used at all in the template `airflow/query.html`. It can be removed in the `self.render()`. - 3. We should respond with a CSV file only when user opts in for '.csv' AND the data is available ('df' is generated). Otherwise we will encounter error below: `UnboundLocalError: local variable 'df' referenced before assignment`. - 4. `result = df.to_html(...)` should only be invoked when user doesn NOT choose '.csv'. Otherwise it's a waste of resource to invoke `df.to_html()` since the result it returns will not be used if user asks for CSV downloading instead of a html page. - 5. remove a commented line. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Minor bugs in "Ad Hoc Query" view, and refinement > - > > Key: AIRFLOW-2872 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2872 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > # The ".csv" button in *Ad Hoc Query* view is responding with a plain text > file, rather than a CSV file (even though users can manually change the > extension). > # Argument 'has_data' passed to the template is not used by the template > 'airflow/query.html'. > # Sometimes get errors 'UnboundLocalError: local variable 'df' referenced > before assignment' > # 'result = df.to_html()' should only be invoked when user doesn NOT choose > '.csv'. Otherwise it's a waste of resource to invoke 'df.to_html()' since the > result it returns will not be used if user askes for CSV downloading instead > of a html page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2859) DateTimes returned from the database are not converted to UTC
[ https://issues.apache.org/jira/browse/AIRFLOW-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572863#comment-16572863 ] ASF subversion and git services commented on AIRFLOW-2859: -- Commit 8fc8c7ae5483c002f5264b087b26a20fd8ae7b67 in incubator-airflow's branch refs/heads/v1-10-test from bolkedebruin [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=8fc8c7a ] [AIRFLOW-2859] Implement own UtcDateTime (#3708) The different UtcDateTime implementations all have issues. Either they replace tzinfo directly without converting or they do not convert to UTC at all. We also ensure all mysql connections are in UTC in order to keep sanity, as mysql will ignore the timezone of a field when inserting/updating. (cherry picked from commit 6fd4e6055e36e9867923b0b402363fcd8c30e297) Signed-off-by: Bolke de Bruin > DateTimes returned from the database are not converted to UTC > - > > Key: AIRFLOW-2859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2859 > Project: Apache Airflow > Issue Type: Bug > Components: database >Reporter: Bolke de Bruin >Priority: Blocker > Fix For: 1.10.0 > > > This is due to the fact that sqlalchemy-utcdatetime does not convert to UTC > when the database returns datetimes with tzinfo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
kaxil commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411331977 As far as I know, there is a plan to deprecate args and kwargs keywords in Airflow. Can you instead just add this as a param, like all the other BQ operators? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin commented on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration
bolkedebruin commented on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration URL: https://github.com/apache/incubator-airflow/pull/3720#issuecomment-411468352 Ah let me fix that This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration
codecov-io edited a comment on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration URL: https://github.com/apache/incubator-airflow/pull/3720#issuecomment-411401924 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=h1) Report > Merging [#3720](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/8687ab9271b7b93473584a720f225f20fa9a7aa4?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3720/graphs/tree.svg?token=WdLKlKHOAU=pr=650=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3720 +/- ## === Coverage 77.63% 77.63% === Files 204 204 Lines 1580015800 === Hits1226712267 Misses 3533 3533 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=footer). Last update [8687ab9...311bc91](https://codecov.io/gh/apache/incubator-airflow/pull/3720?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2861) Need index on log table
[ https://issues.apache.org/jira/browse/AIRFLOW-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573530#comment-16573530 ] ASF GitHub Bot commented on AIRFLOW-2861: - bolkedebruin closed pull request #3709: [AIRFLOW-2861] Added index on log table URL: https://github.com/apache/incubator-airflow/pull/3709 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py b/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py new file mode 100644 index 00..3249a2e058 --- /dev/null +++ b/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from alembic import op + +"""add idx_log_dag + +Revision ID: dd25f486b8ea +Revises: 9635ae0956e7 +Create Date: 2018-08-07 06:41:41.028249 + +""" + +# revision identifiers, used by Alembic. +revision = 'dd25f486b8ea' +down_revision = '9635ae0956e7' +branch_labels = None +depends_on = None + + +def upgrade(): +op.create_index('idx_log_dag', 'log', ['dag_id'], unique=False) + + +def downgrade(): +op.drop_index('idx_log_dag', table_name='log') diff --git a/airflow/models.py b/airflow/models.py index 288bd4c937..c79220805c 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -2098,6 +2098,10 @@ class Log(Base): owner = Column(String(500)) extra = Column(Text) +__table_args__ = ( +Index('idx_log_dag', dag_id), +) + def __init__(self, event, task_instance, owner=None, extra=None, **kwargs): self.dttm = timezone.utcnow() self.event = event This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Need index on log table > --- > > Key: AIRFLOW-2861 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2861 > Project: Apache Airflow > Issue Type: Improvement > Components: database >Affects Versions: 1.10.0 >Reporter: Vardan Gupta >Assignee: Vardan Gupta >Priority: Major > > Delete dag functionality is added in v1-10-stable, whose implementation > during the metadata cleanup > [part|https://github.com/apache/incubator-airflow/blob/dc78b9196723ca6724185231ccd6f5bbe8edcaf3/airflow/api/common/experimental/delete_dag.py#L48], > look for classes which has attribute named as dag_id and then formulate the > query on matching model and then delete from metadata, we've few numbers > where we've observed slowness especially in log table because it doesn't have > any single or multiple-column index. Creating an index would boost the > performance though insertion will be a bit slower. Since deletion will be a > sync call, would be good idea to create index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2861) Need index on log table
[ https://issues.apache.org/jira/browse/AIRFLOW-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573531#comment-16573531 ] ASF subversion and git services commented on AIRFLOW-2861: -- Commit 6f7fe74b9ff6c2abae2764988c152af8bcc8e199 in incubator-airflow's branch refs/heads/master from [~vardan] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=6f7fe74 ] [AIRFLOW-2861] Add index on log table (#3709) > Need index on log table > --- > > Key: AIRFLOW-2861 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2861 > Project: Apache Airflow > Issue Type: Improvement > Components: database >Affects Versions: 1.10.0 >Reporter: Vardan Gupta >Assignee: Vardan Gupta >Priority: Major > > Delete dag functionality is added in v1-10-stable, whose implementation > during the metadata cleanup > [part|https://github.com/apache/incubator-airflow/blob/dc78b9196723ca6724185231ccd6f5bbe8edcaf3/airflow/api/common/experimental/delete_dag.py#L48], > look for classes which has attribute named as dag_id and then formulate the > query on matching model and then delete from metadata, we've few numbers > where we've observed slowness especially in log table because it doesn't have > any single or multiple-column index. Creating an index would boost the > performance though insertion will be a bit slower. Since deletion will be a > sync call, would be good idea to create index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin closed pull request #3709: [AIRFLOW-2861] Added index on log table
bolkedebruin closed pull request #3709: [AIRFLOW-2861] Added index on log table URL: https://github.com/apache/incubator-airflow/pull/3709 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py b/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py new file mode 100644 index 00..3249a2e058 --- /dev/null +++ b/airflow/migrations/versions/dd25f486b8ea_add_idx_log_dag.py @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from alembic import op + +"""add idx_log_dag + +Revision ID: dd25f486b8ea +Revises: 9635ae0956e7 +Create Date: 2018-08-07 06:41:41.028249 + +""" + +# revision identifiers, used by Alembic. +revision = 'dd25f486b8ea' +down_revision = '9635ae0956e7' +branch_labels = None +depends_on = None + + +def upgrade(): +op.create_index('idx_log_dag', 'log', ['dag_id'], unique=False) + + +def downgrade(): +op.drop_index('idx_log_dag', table_name='log') diff --git a/airflow/models.py b/airflow/models.py index 288bd4c937..c79220805c 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -2098,6 +2098,10 @@ class Log(Base): owner = Column(String(500)) extra = Column(Text) +__table_args__ = ( +Index('idx_log_dag', dag_id), +) + def __init__(self, event, task_instance, owner=None, extra=None, **kwargs): self.dttm = timezone.utcnow() self.event = event This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config
[ https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Horton updated AIRFLOW-2875: Description: I encountered this when I was using an environment variable for `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and communicate with the SQS queue, but when it received a task and began to run it, I encountered an error with this trace: {code:java} [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring Traceback (most recent call last): [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/bin/airflow", line 32, in [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring args.func(args) [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring return f(*args, **kwargs) [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 460, in run [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring conf.set(section, option, value) [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", line 1239, in set [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring super(ConfigParser, self).set(section, option, value) [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", line 914, in set [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring value) [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", line 392, in before_set [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring "position %d" % (value, tmp_value.find('%'))) [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask mirroring ValueError: invalid interpolation syntax in {code} The issue was that the broker url had a percent sign, and when the cli called `conf.set(section, option, value)`, it was throwing because it interpreted the percent as an interpolation. To avoid this issue, I would propose that the environment variables be escaped when being written in `utils.configuration.tmp_configuration_copy`, so that when `conf.set` is called in `bin/cli`, it doesn't throw on these unescaped values. was: I encountered this when I was using an environment variable for `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and communicate with the SQS queue, but when it received a task and began to run it, I encountered an error with this trace: ``` [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring Traceback (most recent call last): [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/bin/airflow", line 32, in [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring args.func(args) [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring return f(*args, **kwargs) [2018-08-08 15:19:24,402] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 460, in run [2018-08-08 15:19:24,403] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring conf.set(section, option, value) [2018-08-08 15:19:24,403] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/__init__.py", line 1239, in set [2018-08-08 15:19:24,406] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring super(ConfigParser, self).set(section, option, value) [2018-08-08 15:19:24,406] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring File "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/__init__.py", line 914, in set [2018-08-08 15:19:24,406] \{base_task_runner.py:108} INFO - Job 13898: Subtask mirroring value) [2018-08-08 15:19:24,406]
[GitHub] wdhorton opened a new pull request #3721: [AIRFLOW-2875] Escape env vars in tmp config
wdhorton opened a new pull request #3721: [AIRFLOW-2875] Escape env vars in tmp config URL: https://github.com/apache/incubator-airflow/pull/3721 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: When writing tmp config files, escapes the env variables, since they are read by `ConfigParser` when the tmp config file is loaded when the task is being run. ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2826) Add hook for Google Cloud KMS
[ https://issues.apache.org/jira/browse/AIRFLOW-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jasper Kahn resolved AIRFLOW-2826. -- Resolution: Fixed > Add hook for Google Cloud KMS > - > > Key: AIRFLOW-2826 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2826 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Jasper Kahn >Assignee: Jasper Kahn >Priority: Minor > Labels: features > > Add a hook to support interacting with Google Cloud KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208672024 ## File path: airflow/contrib/hooks/sagemaker_hook.py ## @@ -0,0 +1,239 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import copy +import time +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook + + +class SageMakerHook(AwsHook): +""" +Interact with Amazon SageMaker. +sagemaker_conn_id is required for using +the config stored in db for training/tuning +""" + +def __init__(self, + sagemaker_conn_id=None, + use_db_config=False, + region_name=None, + check_interval=5, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerHook, self).__init__(*args, **kwargs) +self.sagemaker_conn_id = sagemaker_conn_id +self.use_db_config = use_db_config +self.region_name = region_name +self.check_interval = check_interval +self.max_ingestion_time = max_ingestion_time +self.conn = self.get_conn() + +def check_for_url(self, s3url): +""" +check if the s3url exists +:param s3url: S3 url +:type s3url:str +:return: bool +""" +bucket, key = S3Hook.parse_s3_url(s3url) +s3hook = S3Hook(aws_conn_id=self.aws_conn_id) +if not s3hook.check_for_bucket(bucket_name=bucket): +raise AirflowException( +"The input S3 Bucket {} does not exist ".format(bucket)) +if not s3hook.check_for_key(key=key, bucket_name=bucket): +raise AirflowException("The input S3 Key {} does not exist in the Bucket" + .format(s3url, bucket)) +return True + +def check_valid_training_input(self, training_config): +""" +Run checks before a training starts +:param config: training_config +:type config: dict +:return: None +""" +for channel in training_config['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_valid_tuning_input(self, tuning_config): +""" +Run checks before a tuning job starts +:param config: tuning_config +:type config: dict +:return: None +""" +for channel in tuning_config['TrainingJobDefinition']['InputDataConfig']: +self.check_for_url(channel['DataSource'] + ['S3DataSource']['S3Uri']) + +def check_status(self, non_terminal_states, + failed_state, key, + describe_function, *args): +""" +:param non_terminal_states: the set of non_terminal states +:type non_terminal_states: dict +:param failed_state: the set of failed states +:type failed_state: dict +:param key: the key of the response dict +that points to the state +:type key: string +:param describe_function: the function used to retrieve the status +:type describe_function: python callable +:param args: the arguments for the function +:return: None +""" +sec = 0 +running = True + +while running: + +sec = sec + self.check_interval + +if self.max_ingestion_time and sec > self.max_ingestion_time: +# ensure that the job gets killed if the max ingestion time is exceeded +raise AirflowException("SageMaker job took more than " + "%s seconds", self.max_ingestion_time) + +time.sleep(self.check_interval) +try: +status = describe_function(*args)[key] +self.log.info("Job still running for %s seconds... " + "current status is %s" % (sec, status)) +
[GitHub] bolkedebruin commented on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration
bolkedebruin commented on issue #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration URL: https://github.com/apache/incubator-airflow/pull/3720#issuecomment-411469023 @gwax done. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573528#comment-16573528 ] ASF subversion and git services commented on AIRFLOW-2870: -- Commit 546f1cdb5208ba8e1cf3bde36bbdbb639fa20b22 in incubator-airflow's branch refs/heads/master from bolkedebruin [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=546f1cd ] [AIRFLOW-2870] Use abstract TaskInstance for migration (#3720) If we use the full model for migration it can have columns added that are not available yet in the database. Using an abstraction ensures only the columns that are required for data migration are present. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573527#comment-16573527 ] ASF GitHub Bot commented on AIRFLOW-2870: - bolkedebruin closed pull request #3720: [AIRFLOW-2870] Use abstract TaskInstance for migration URL: https://github.com/apache/incubator-airflow/pull/3720 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py b/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py index b7213a3031..27a9f593b5 100644 --- a/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py +++ b/airflow/migrations/versions/27c6a30d7c24_add_executor_config_to_task_instance.py @@ -1,16 +1,22 @@ # flake8: noqa # -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at # -# http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + """kubernetes_resource_checkpointing diff --git a/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py b/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py index 4347bae92a..c489c05f7e 100644 --- a/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py +++ b/airflow/migrations/versions/33ae817a1ff4_add_kubernetes_resource_checkpointing.py @@ -1,16 +1,22 @@ # flake8: noqa # -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at # -# http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + """kubernetes_resource_checkpointing diff --git a/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py b/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py index 6bc48f1105..5c921c6a98 100644 --- a/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py +++ b/airflow/migrations/versions/86770d1215c0_add_kubernetes_scheduler_uniqueness.py @@ -1,16 +1,22 @@ # flake8: noqa # -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more
[GitHub] bolkedebruin edited a comment on issue #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3
bolkedebruin edited a comment on issue #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3 URL: https://github.com/apache/incubator-airflow/pull/3560#issuecomment-411484838 Can you revise and squash your commits? The we can (most likely :-) ) merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services