[GitHub] yeluolei commented on issue #3675: [AIRFLOW-2834] fix build script for k8s docker
yeluolei commented on issue #3675: [AIRFLOW-2834] fix build script for k8s docker URL: https://github.com/apache/incubator-airflow/pull/3675#issuecomment-417559822 @ashb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417558995 @dimberman we missed [creating the airflow database for postgresql](https://github.com/apache/incubator-airflow/blob/b7f63c59d75ad21d210a72bd6212e5a7b2c6f25b/.travis.yml#L104) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417552202 @dimberman I'm trying to figure out the simplest changes that can get this to work. So far: - `airflow initdb` is failing. It might be easier to [install postgres in the travisci host again](https://github.com/apache/incubator-airflow/blob/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8/.travis.yml#L28) - After this, we'll need another value for [`backend_postgres`](https://github.com/apache/incubator-airflow/blob/a9705c21f1bbd5d79cbd92dee84673b34332dab8/tox.ini#L51) (or another variable altogether) when running the k8s tests. This one should point to [localhost instead](https://github.com/apache/incubator-airflow/blob/c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8/tox.ini#L49) - The final error I see is ` kinit: command not found`, but the script keeps running after failing that one anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417545995 @gerardo Ok it's now solidly back in the court of "getting TOX to work". Kubeadm is able to build and deploy. PTAL and let me know how we can get these to pass. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417541379 cc: @feng-tao @kaxil just a warning any PR merged right now is not being tested against kubernetes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1) Report > Merging [#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3823 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer). Last update [8245447...63cf213](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1) Report > Merging [#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3823 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer). Last update [8245447...63cf213](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417539567 Hi both @ashb @feng-tao , I have updated the code bases on your earlier inputs: - Changed the way to specify bucket/key, in order to be consistent with existing S3 operators/sensors. - Add this class to `docs/code.rst`. - Test cases are updated (there are two cases to test different argument combinations). - Added a note in the comment (which would be documentation later), highlighting that the S3 connection used here must be able to access both source and destination bucket/key. CI passed. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417538581 @Fokko @bolkedebruin @gerardo I was able to get kubeadm to work with a local registry (that was a rough experience lol). I'm still running into some weird TOX issues (like being unable to find python 3.5) but progress! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators
feng-tao commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators URL: https://github.com/apache/incubator-airflow/pull/3820#issuecomment-417536750 lgtm @kaxil This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2983) Add prev_ds_nodash and next_ds_nodash macro
[ https://issues.apache.org/jira/browse/AIRFLOW-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598158#comment-16598158 ] ASF GitHub Bot commented on AIRFLOW-2983: - feng-tao closed pull request #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 55badf4828..93368e1f18 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -1815,12 +1815,16 @@ def get_template_context(self, session=None): next_execution_date = task.dag.following_schedule(self.execution_date) next_ds = None +next_ds_nodash = None if next_execution_date: next_ds = next_execution_date.strftime('%Y-%m-%d') +next_ds_nodash = next_ds.replace('-', '') prev_ds = None +prev_ds_nodash = None if prev_execution_date: prev_ds = prev_execution_date.strftime('%Y-%m-%d') +prev_ds_nodash = prev_ds.replace('-', '') ds_nodash = ds.replace('-', '') ts_nodash = ts.replace('-', '').replace(':', '') @@ -1887,7 +1891,9 @@ def __repr__(self): 'dag': task.dag, 'ds': ds, 'next_ds': next_ds, +'next_ds_nodash': next_ds_nodash, 'prev_ds': prev_ds, +'prev_ds_nodash': prev_ds_nodash, 'ds_nodash': ds_nodash, 'ts': ts, 'ts_nodash': ts_nodash, diff --git a/docs/code.rst b/docs/code.rst index 80ec76193f..c9e9b3d431 100644 --- a/docs/code.rst +++ b/docs/code.rst @@ -242,12 +242,14 @@ VariableDescription = ``{{ ds }}``the execution date as ``-MM-DD`` ``{{ ds_nodash }}`` the execution date as ``MMDD`` -``{{ prev_ds }}`` the previous execution date as ``-MM-DD``. +``{{ prev_ds }}`` the previous execution date as ``-MM-DD`` if ``{{ ds }}`` is ``2016-01-08`` and ``schedule_interval`` is ``@weekly``, -``{{ prev_ds }}`` will be ``2016-01-01``. -``{{ next_ds }}`` the next execution date as ``-MM-DD``. +``{{ prev_ds }}`` will be ``2016-01-01`` +``{{ prev_ds_nodash }}``the previous execution date as ``MMDD`` if exists, else ``None` +``{{ next_ds }}`` the next execution date as ``-MM-DD`` if ``{{ ds }}`` is ``2016-01-01`` and ``schedule_interval`` is ``@weekly``, -``{{ prev_ds }}`` will be ``2016-01-08``. +``{{ prev_ds }}`` will be ``2016-01-08`` +``{{ next_ds_nodash }}``the next execution date as ``MMDD`` if exists, else ``None` ``{{ yesterday_ds }}`` yesterday's date as ``-MM-DD`` ``{{ yesterday_ds_nodash }}`` yesterday's date as ``MMDD`` ``{{ tomorrow_ds }}`` tomorrow's date as ``-MM-DD`` diff --git a/tests/core.py b/tests/core.py index 8df6312eeb..f8b8691912 100644 --- a/tests/core.py +++ b/tests/core.py @@ -626,6 +626,35 @@ def __bool__(self): dag=self.dag) t.resolve_template_files() +def test_task_get_template(self): +TI = models.TaskInstance +ti = TI( +task=self.runme_0, execution_date=DEFAULT_DATE) +ti.dag = self.dag_bash +ti.run(ignore_ti_state=True) +context = ti.get_template_context() + +# DEFAULT DATE is 2015-01-01 +self.assertEquals(context['ds'], '2015-01-01') +self.assertEquals(context['ds_nodash'], '20150101') + +# next_ds is 2015-01-02 as the dag interval is daily +self.assertEquals(context['next_ds'], '2015-01-02') +self.assertEquals(context['next_ds_nodash'], '20150102') + +# prev_ds is 2014-12-31 as the dag interval is daily +self.assertEquals(context['prev_ds'], '2014-12-31') +self.assertEquals(context['prev_ds_nodash'], '20141231') + +self.assertEquals(context['ts'], '2015-01-01T00:00:00+00:00') +self.assertEquals(context['ts_nodash'], '20150101T00+') + +self.assertEquals(context['yesterday_ds'], '2014-12-31') +self.assertEquals(context['yesterday_ds_nodash'], '20141231') + +self.assertEquals(context['tomorrow_ds'], '2015-01-02') +self.assertEquals(context['tomorrow_ds_nodash'], '20150102') + def
[GitHub] feng-tao closed pull request #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
feng-tao closed pull request #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 55badf4828..93368e1f18 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -1815,12 +1815,16 @@ def get_template_context(self, session=None): next_execution_date = task.dag.following_schedule(self.execution_date) next_ds = None +next_ds_nodash = None if next_execution_date: next_ds = next_execution_date.strftime('%Y-%m-%d') +next_ds_nodash = next_ds.replace('-', '') prev_ds = None +prev_ds_nodash = None if prev_execution_date: prev_ds = prev_execution_date.strftime('%Y-%m-%d') +prev_ds_nodash = prev_ds.replace('-', '') ds_nodash = ds.replace('-', '') ts_nodash = ts.replace('-', '').replace(':', '') @@ -1887,7 +1891,9 @@ def __repr__(self): 'dag': task.dag, 'ds': ds, 'next_ds': next_ds, +'next_ds_nodash': next_ds_nodash, 'prev_ds': prev_ds, +'prev_ds_nodash': prev_ds_nodash, 'ds_nodash': ds_nodash, 'ts': ts, 'ts_nodash': ts_nodash, diff --git a/docs/code.rst b/docs/code.rst index 80ec76193f..c9e9b3d431 100644 --- a/docs/code.rst +++ b/docs/code.rst @@ -242,12 +242,14 @@ VariableDescription = ``{{ ds }}``the execution date as ``-MM-DD`` ``{{ ds_nodash }}`` the execution date as ``MMDD`` -``{{ prev_ds }}`` the previous execution date as ``-MM-DD``. +``{{ prev_ds }}`` the previous execution date as ``-MM-DD`` if ``{{ ds }}`` is ``2016-01-08`` and ``schedule_interval`` is ``@weekly``, -``{{ prev_ds }}`` will be ``2016-01-01``. -``{{ next_ds }}`` the next execution date as ``-MM-DD``. +``{{ prev_ds }}`` will be ``2016-01-01`` +``{{ prev_ds_nodash }}``the previous execution date as ``MMDD`` if exists, else ``None` +``{{ next_ds }}`` the next execution date as ``-MM-DD`` if ``{{ ds }}`` is ``2016-01-01`` and ``schedule_interval`` is ``@weekly``, -``{{ prev_ds }}`` will be ``2016-01-08``. +``{{ prev_ds }}`` will be ``2016-01-08`` +``{{ next_ds_nodash }}``the next execution date as ``MMDD`` if exists, else ``None` ``{{ yesterday_ds }}`` yesterday's date as ``-MM-DD`` ``{{ yesterday_ds_nodash }}`` yesterday's date as ``MMDD`` ``{{ tomorrow_ds }}`` tomorrow's date as ``-MM-DD`` diff --git a/tests/core.py b/tests/core.py index 8df6312eeb..f8b8691912 100644 --- a/tests/core.py +++ b/tests/core.py @@ -626,6 +626,35 @@ def __bool__(self): dag=self.dag) t.resolve_template_files() +def test_task_get_template(self): +TI = models.TaskInstance +ti = TI( +task=self.runme_0, execution_date=DEFAULT_DATE) +ti.dag = self.dag_bash +ti.run(ignore_ti_state=True) +context = ti.get_template_context() + +# DEFAULT DATE is 2015-01-01 +self.assertEquals(context['ds'], '2015-01-01') +self.assertEquals(context['ds_nodash'], '20150101') + +# next_ds is 2015-01-02 as the dag interval is daily +self.assertEquals(context['next_ds'], '2015-01-02') +self.assertEquals(context['next_ds_nodash'], '20150102') + +# prev_ds is 2014-12-31 as the dag interval is daily +self.assertEquals(context['prev_ds'], '2014-12-31') +self.assertEquals(context['prev_ds_nodash'], '20141231') + +self.assertEquals(context['ts'], '2015-01-01T00:00:00+00:00') +self.assertEquals(context['ts_nodash'], '20150101T00+') + +self.assertEquals(context['yesterday_ds'], '2014-12-31') +self.assertEquals(context['yesterday_ds_nodash'], '20141231') + +self.assertEquals(context['tomorrow_ds'], '2015-01-02') +self.assertEquals(context['tomorrow_ds_nodash'], '20150102') + def test_import_examples(self): self.assertEqual(len(self.dagbag.dags), NUM_EXAMPLE_DAGS) This is an automated message from the Apache Git Service. To respond to the
[GitHub] feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417536518 Hey @kaxil, @r39132, a test to check template context is added. Merge it now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
codecov-io edited a comment on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417186769 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=h1) Report > Merging [#3821](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3821/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3821 +/- ## == + Coverage 77.43% 77.44% +<.01% == Files 203 203 Lines 1584015844 +4 == + Hits1226612270 +4 Misses 3574 3574 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <100%> (+0.01%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=footer). Last update [8245447...c78c818](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
codecov-io edited a comment on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417186769 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=h1) Report > Merging [#3821](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3821/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3821 +/- ## == + Coverage 77.43% 77.44% +<.01% == Files 203 203 Lines 1584015844 +4 == + Hits1226612270 +4 Misses 3574 3574 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3821/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.78% <100%> (+0.01%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=footer). Last update [8245447...c78c818](https://codecov.io/gh/apache/incubator-airflow/pull/3821?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417515885 @kaxil , thanks for the comment. I added a unit test to check. Will wait for CI to finish. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gerardo commented on a change in pull request #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
gerardo commented on a change in pull request #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#discussion_r214217582 ## File path: .travis.yml ## @@ -26,14 +26,14 @@ env: - TRAVIS_CACHE=$HOME/.travis_cache/ matrix: - TOX_ENV=flake8 -- TOX_ENV=py27-backend_mysql -- TOX_ENV=py27-backend_sqlite -- TOX_ENV=py27-backend_postgres -- TOX_ENV=py35-backend_mysql PYTHON_VERSION=3 -- TOX_ENV=py35-backend_sqlite PYTHON_VERSION=3 -- TOX_ENV=py35-backend_postgres PYTHON_VERSION=3 -- TOX_ENV=py27-backend_postgres KUBERNETES_VERSION=v1.9.0 -- TOX_ENV=py35-backend_postgres KUBERNETES_VERSION=v1.10.0 PYTHON_VERSION=3 +- TOX_ENV=py27-backend_mysql-env_docker +- TOX_ENV=py27-backend_sqlite-env_docker +- TOX_ENV=py27-backend_postgres-env_docker +- TOX_ENV=py35-backend_mysql-env_docker PYTHON_VERSION=3 +- TOX_ENV=py35-backend_sqlite-env_ddocker PYTHON_VERSION=3 Review comment: There's a typo here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ndmar commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow
ndmar commented on issue #2708: [AIRFLOW-1746] Add a Nomad operator to trigger job from Airflow URL: https://github.com/apache/incubator-airflow/pull/2708#issuecomment-417502998 @etrabelsi @Fokko Just discovered this PR and am pretty excited about it, as we just started using Airflow on Nomad. This'll greatly simplify our deployment setup. Seems like it's almost there; I'm more than happy/willing to help in any way push this across the finish line! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825#issuecomment-417496304 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=h1) Report > Merging [#3825](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3825/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3825 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=footer). Last update [8245447...486efa8](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825#issuecomment-417496304 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=h1) Report > Merging [#3825](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3825/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3825 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=footer). Last update [8245447...486efa8](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2548) Output Plugin Import Errors to WebUI
[ https://issues.apache.org/jira/browse/AIRFLOW-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598028#comment-16598028 ] Jimmy Cao commented on AIRFLOW-2548: This makes sense to me. Are you working on a PR? > Output Plugin Import Errors to WebUI > > > Key: AIRFLOW-2548 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2548 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andy Cooper >Priority: Major > Fix For: 2.0.0 > > > All, > > We currently output all DAG import errors to the webUI. I propose we do the > same with plugin errors as well. This will provide a better user experience > by bubbling up all errors to the webUI instead of hiding them in stdOut. > > Proposal... > * Extend models.ImportError to have a "type" field to distinguish from error > types. > * Prevent class SchedulerJob methods from clearing out and pulling from > models.ImportError if type = 'plugin' > * Create new ImportError records in plugins_manager.py for each plugin that > fails to import > * Prompt user in views.py with plugin ImportErrors - specifying that they > need to fix and restart webserver to resolve. > > Does this seem reasonable to everyone? I'd be interested in taking on this > work if needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
codecov-io edited a comment on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825#issuecomment-417496304 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=h1) Report > Merging [#3825](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **decrease** coverage by `0.28%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3825/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3825 +/- ## == - Coverage 77.43% 77.15% -0.29% == Files 203 203 Lines 1584015840 == - Hits1226612221 -45 - Misses 3574 3619 +45 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/hooks/hdfs\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oZGZzX2hvb2sucHk=) | `27.5% <0%> (-65%)` | :arrow_down: | | [airflow/utils/decorators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5) | `85.41% <0%> (-6.25%)` | :arrow_down: | | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `79.67% <0%> (-3.26%)` | :arrow_down: | | [airflow/task/task\_runner/base\_task\_runner.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL2Jhc2VfdGFza19ydW5uZXIucHk=) | `77.96% <0%> (-1.7%)` | :arrow_down: | | [airflow/operators/docker\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZG9ja2VyX29wZXJhdG9yLnB5) | `96.51% <0%> (-1.17%)` | :arrow_down: | | [airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=) | `96.66% <0%> (-1.12%)` | :arrow_down: | | [airflow/configuration.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9jb25maWd1cmF0aW9uLnB5) | `82.96% <0%> (-1.12%)` | :arrow_down: | | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `98.97% <0%> (-1.03%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.53% <0%> (-0.26%)` | :arrow_down: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.47% <0%> (-0.15%)` | :arrow_down: | | ... and [1 more](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=footer). Last update [8245447...486efa8](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
codecov-io commented on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825#issuecomment-417496304 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=h1) Report > Merging [#3825](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **decrease** coverage by `0.28%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3825/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3825 +/- ## == - Coverage 77.43% 77.15% -0.29% == Files 203 203 Lines 1584015840 == - Hits1226612221 -45 - Misses 3574 3619 +45 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/hooks/hdfs\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oZGZzX2hvb2sucHk=) | `27.5% <0%> (-65%)` | :arrow_down: | | [airflow/utils/decorators.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kZWNvcmF0b3JzLnB5) | `85.41% <0%> (-6.25%)` | :arrow_down: | | [airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5) | `79.67% <0%> (-3.26%)` | :arrow_down: | | [airflow/task/task\_runner/base\_task\_runner.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy90YXNrL3Rhc2tfcnVubmVyL2Jhc2VfdGFza19ydW5uZXIucHk=) | `77.96% <0%> (-1.7%)` | :arrow_down: | | [airflow/operators/docker\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZG9ja2VyX29wZXJhdG9yLnB5) | `96.51% <0%> (-1.17%)` | :arrow_down: | | [airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=) | `96.66% <0%> (-1.12%)` | :arrow_down: | | [airflow/configuration.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9jb25maWd1cmF0aW9uLnB5) | `82.96% <0%> (-1.12%)` | :arrow_down: | | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `98.97% <0%> (-1.03%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.53% <0%> (-0.26%)` | :arrow_down: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.47% <0%> (-0.15%)` | :arrow_down: | | ... and [1 more](https://codecov.io/gh/apache/incubator-airflow/pull/3825/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=footer). Last update [8245447...486efa8](https://codecov.io/gh/apache/incubator-airflow/pull/3825?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2988) GCP Dataflow hook should specifically run python2
[ https://issues.apache.org/jira/browse/AIRFLOW-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598001#comment-16598001 ] ASF GitHub Bot commented on AIRFLOW-2988: - jcao219 opened a new pull request #3826: [AIRFLOW-2988] Run specifically python2 for dataflow URL: https://github.com/apache/incubator-airflow/pull/3826 Apache beam does not yet support python3, so it's best to run dataflow jobs with python2 specifically until python3 support is complete (BEAM-1251), in case if the user's 'python' in PATH is python3. Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > GCP Dataflow hook should specifically run python2 > - > > Key: AIRFLOW-2988 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2988 > Project: Apache Airflow > Issue Type: Improvement > Components: Dataflow, gcp, hooks >Reporter: Jimmy Cao >Priority: Major > > Currently the GCP dataflow hook invokes 'python' > [here|https://github.com/apache/incubator-airflow/blob/c3939c8e721870d263997e7aeaebc28e678d544b/airflow/contrib/hooks/gcp_dataflow_hook.py#L239]. > This can fail if the user's 'python' in PATH starts python 3, which Apache > Beam does not yet support, (see BEAM-1251). > It should be changed to 'python2' to ensure that Apache Beam is run with the > correct version of Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jcao219 opened a new pull request #3826: [AIRFLOW-2988] Run specifically python2 for dataflow
jcao219 opened a new pull request #3826: [AIRFLOW-2988] Run specifically python2 for dataflow URL: https://github.com/apache/incubator-airflow/pull/3826 Apache beam does not yet support python3, so it's best to run dataflow jobs with python2 specifically until python3 support is complete (BEAM-1251), in case if the user's 'python' in PATH is python3. Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2989) No Parameter to change bootDiskType for DataprocClusterCreateOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597996#comment-16597996 ] ASF GitHub Bot commented on AIRFLOW-2989: - kaxil opened a new pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2989 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - Add param to set bootDiskType for master and worker nodes in `DataprocClusterCreateOperator` ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Modified `DataprocClusterCreateOperatorTest` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > No Parameter to change bootDiskType for DataprocClusterCreateOperator > -- > > Key: AIRFLOW-2989 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2989 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, gcp >Affects Versions: 1.9.0, 1.10.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 1.10.1 > > > Currently, we cannot set the Primary disk type for master and worker to > `pd-ssd` for DataprocClusterCreateOperator. > Google API: > https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#diskconfig > Related StackOverflow Issue: > https://stackoverflow.com/questions/52090315/airflow-dataprocclustercreateoperator/52092942#52092942 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil opened a new pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
kaxil opened a new pull request #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2989 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - Add param to set bootDiskType for master and worker nodes in `DataprocClusterCreateOperator` ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Modified `DataprocClusterCreateOperatorTest` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op
kaxil commented on issue #3825: [AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op URL: https://github.com/apache/incubator-airflow/pull/3825#issuecomment-417488271 cc @Fokko @fenglu-g This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2989) No Parameter to change bootDiskType for DataprocClusterCreateOperator
Kaxil Naik created AIRFLOW-2989: --- Summary: No Parameter to change bootDiskType for DataprocClusterCreateOperator Key: AIRFLOW-2989 URL: https://issues.apache.org/jira/browse/AIRFLOW-2989 Project: Apache Airflow Issue Type: New Feature Components: contrib, gcp Affects Versions: 1.9.0, 1.10.0 Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 1.10.1 Currently, we cannot set the Primary disk type for master and worker to `pd-ssd` for DataprocClusterCreateOperator. Google API: https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#diskconfig Related StackOverflow Issue: https://stackoverflow.com/questions/52090315/airflow-dataprocclustercreateoperator/52092942#52092942 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3817: [AIRFLOW-2974] Extended Databricks hook with cluster operation
codecov-io edited a comment on issue #3817: [AIRFLOW-2974] Extended Databricks hook with cluster operation URL: https://github.com/apache/incubator-airflow/pull/3817#issuecomment-416701478 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=h1) Report > Merging [#3817](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3817/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3817 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=footer). Last update [8245447...e70aa98](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2988) GCP Dataflow hook should specifically run python2
Jimmy Cao created AIRFLOW-2988: -- Summary: GCP Dataflow hook should specifically run python2 Key: AIRFLOW-2988 URL: https://issues.apache.org/jira/browse/AIRFLOW-2988 Project: Apache Airflow Issue Type: Improvement Components: Dataflow, gcp, hooks Reporter: Jimmy Cao Currently the GCP dataflow hook invokes 'python' [here|https://github.com/apache/incubator-airflow/blob/c3939c8e721870d263997e7aeaebc28e678d544b/airflow/contrib/hooks/gcp_dataflow_hook.py#L239]. This can fail if the user's 'python' in PATH starts python 3, which Apache Beam does not yet support, (see BEAM-1251). It should be changed to 'python2' to ensure that Apache Beam is run with the correct version of Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3817: [AIRFLOW-2974] Extended Databricks hook with cluster operation
codecov-io edited a comment on issue #3817: [AIRFLOW-2974] Extended Databricks hook with cluster operation URL: https://github.com/apache/incubator-airflow/pull/3817#issuecomment-416701478 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=h1) Report > Merging [#3817](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3817/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3817 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584015840 === Hits1226612266 Misses 3574 3574 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=footer). Last update [8245447...e70aa98](https://codecov.io/gh/apache/incubator-airflow/pull/3817?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-417475558 @kaxil, check pls. And thanks in advance! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
kaxil commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417475014 LGTM. @feng-tao Can we add a simple test for this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators
codecov-io commented on issue #3820: [AIRFLOW-XXX] Fix Docstrings for Hooks/Operators URL: https://github.com/apache/incubator-airflow/pull/3820#issuecomment-417466806 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=h1) Report > Merging [#3820](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/82454477c57699ece5c6515ce85b7df0c0583571?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3820/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3820 +/- ## == - Coverage 77.43% 77.43% -0.01% == Files 203 203 Lines 1584015840 == - Hits1226612265 -1 - Misses 3574 3575 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/sensors/s3\_key\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3820/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3MzX2tleV9zZW5zb3IucHk=) | `30.3% <ø> (ø)` | :arrow_up: | | [airflow/sensors/s3\_prefix\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3820/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL3MzX3ByZWZpeF9zZW5zb3IucHk=) | `0% <ø> (ø)` | :arrow_up: | | [airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3820/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=) | `93.87% <ø> (ø)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3820/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.72% <ø> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=footer). Last update [8245447...583f2b7](https://codecov.io/gh/apache/incubator-airflow/pull/3820?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417466718 @gerardo I agree that it would be a pain, but it's going to REALLY hurt if we merge PRs for a couple of weeks and then can't track down what broke the k8s executor when it restarts. Definitely please try on a different branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
gerardo commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417461794 @dimberman I can take a stab at making this work in a separate branch if you want. This is definitely a blocker, but reverting sounds like even more work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI
[ https://issues.apache.org/jira/browse/AIRFLOW-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Casandra julie mitchell updated AIRFLOW-2963: - Attachment: (was: 2811fb90a1302h.txt) > Error parsing AIRFLOW_CONN_ URI > --- > > Key: AIRFLOW-2963 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2963 > Project: Apache Airflow > Issue Type: Bug > Components: boto3, configuration >Affects Versions: 1.9.0, 1.10.0 >Reporter: Leonardo de Campos Almeida >Assignee: Casandra julie mitchell >Priority: Minor > Labels: easyfix > > I'm using the environment variable AIRFLOW_CONN_ to define my connection to > AWS, but my AWS secret access key has a slash on it. > e.g.: > {code:java} > s3://login:pass/word@bucket > {code} > The problem is that the method *BaseHook._get_connection_from_env* doesn't > accept this URI as a valid URI. When it finds the / it is assuming that the > path starts there, so it is returning: > * host: login > * port: pass > * path: word > And ignoring the rest, so I get an error, because pass is not a valid port > number. > So, I tried to pass the URI quoted > {code:java} > s3://login:pass%2Fword@bucker > {code} > But them, the values are not being unquoted correctly, and the AwsHook is > trying to use pass%2Fword as the secret access key. > I took a look at the method that parses the URI, and it is only unquoting > the host, manually. > {code:java} > def parse_from_uri(self, uri): > temp_uri = urlparse(uri) > hostname = temp_uri.hostname or '' > if '%2f' in hostname: > hostname = hostname.replace('%2f', '/').replace('%2F', '/') > conn_type = temp_uri.scheme > if conn_type == 'postgresql': > conn_type = 'postgres' > self.conn_type = conn_type > self.host = hostname > self.schema = temp_uri.path[1:] > self.login = temp_uri.username > self.password = temp_uri.password > self.port = temp_uri.port > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI
[ https://issues.apache.org/jira/browse/AIRFLOW-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Casandra julie mitchell updated AIRFLOW-2963: - Attachment: 2811fb90a1302h.txt > Error parsing AIRFLOW_CONN_ URI > --- > > Key: AIRFLOW-2963 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2963 > Project: Apache Airflow > Issue Type: Bug > Components: boto3, configuration >Affects Versions: 1.9.0, 1.10.0 >Reporter: Leonardo de Campos Almeida >Assignee: Casandra julie mitchell >Priority: Minor > Labels: easyfix > Attachments: 2811fb90a1302h.txt > > > I'm using the environment variable AIRFLOW_CONN_ to define my connection to > AWS, but my AWS secret access key has a slash on it. > e.g.: > {code:java} > s3://login:pass/word@bucket > {code} > The problem is that the method *BaseHook._get_connection_from_env* doesn't > accept this URI as a valid URI. When it finds the / it is assuming that the > path starts there, so it is returning: > * host: login > * port: pass > * path: word > And ignoring the rest, so I get an error, because pass is not a valid port > number. > So, I tried to pass the URI quoted > {code:java} > s3://login:pass%2Fword@bucker > {code} > But them, the values are not being unquoted correctly, and the AwsHook is > trying to use pass%2Fword as the secret access key. > I took a look at the method that parses the URI, and it is only unquoting > the host, manually. > {code:java} > def parse_from_uri(self, uri): > temp_uri = urlparse(uri) > hostname = temp_uri.hostname or '' > if '%2f' in hostname: > hostname = hostname.replace('%2f', '/').replace('%2F', '/') > conn_type = temp_uri.scheme > if conn_type == 'postgresql': > conn_type = 'postgres' > self.conn_type = conn_type > self.host = hostname > self.schema = temp_uri.path[1:] > self.login = temp_uri.username > self.password = temp_uri.password > self.port = temp_uri.port > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI
[ https://issues.apache.org/jira/browse/AIRFLOW-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Casandra julie mitchell reassigned AIRFLOW-2963: Assignee: Casandra julie mitchell > Error parsing AIRFLOW_CONN_ URI > --- > > Key: AIRFLOW-2963 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2963 > Project: Apache Airflow > Issue Type: Bug > Components: boto3, configuration >Affects Versions: 1.9.0, 1.10.0 >Reporter: Leonardo de Campos Almeida >Assignee: Casandra julie mitchell >Priority: Minor > Labels: easyfix > > I'm using the environment variable AIRFLOW_CONN_ to define my connection to > AWS, but my AWS secret access key has a slash on it. > e.g.: > {code:java} > s3://login:pass/word@bucket > {code} > The problem is that the method *BaseHook._get_connection_from_env* doesn't > accept this URI as a valid URI. When it finds the / it is assuming that the > path starts there, so it is returning: > * host: login > * port: pass > * path: word > And ignoring the rest, so I get an error, because pass is not a valid port > number. > So, I tried to pass the URI quoted > {code:java} > s3://login:pass%2Fword@bucker > {code} > But them, the values are not being unquoted correctly, and the AwsHook is > trying to use pass%2Fword as the secret access key. > I took a look at the method that parses the URI, and it is only unquoting > the host, manually. > {code:java} > def parse_from_uri(self, uri): > temp_uri = urlparse(uri) > hostname = temp_uri.hostname or '' > if '%2f' in hostname: > hostname = hostname.replace('%2f', '/').replace('%2F', '/') > conn_type = temp_uri.scheme > if conn_type == 'postgresql': > conn_type = 'postgres' > self.conn_type = conn_type > self.host = hostname > self.schema = temp_uri.path[1:] > self.login = temp_uri.username > self.password = temp_uri.password > self.port = temp_uri.port > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2900) Code not visible for Packaged DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2900: Affects Version/s: (was: Airflow 1.9.0) 1.10.0 1.9.0 > Code not visible for Packaged DAGs > -- > > Key: AIRFLOW-2900 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2900 > Project: Apache Airflow > Issue Type: Bug > Components: webapp, webserver >Affects Versions: 1.9.0, 1.10.0 >Reporter: Jacob Biesinger >Assignee: Jacob Biesinger >Priority: Minor > Fix For: 1.10.1 > > > Packaged DAGs are present on the server as ZIP files. The [rendering > code|https://github.com/apache/incubator-airflow/blob/a29fe350164937b28f525b46f7aecbc309665e5a/airflow/www/views.py#L668] > is not aware of zip files and fails to show code for packaged apps. > > Easy fix: If .zip appears as a suffix in the path components, attempt to open > the file using ZipFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2900) Code not visible for Packaged DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-2900: Fix Version/s: 1.10.1 > Code not visible for Packaged DAGs > -- > > Key: AIRFLOW-2900 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2900 > Project: Apache Airflow > Issue Type: Bug > Components: webapp, webserver >Affects Versions: Airflow 1.9.0 >Reporter: Jacob Biesinger >Assignee: Jacob Biesinger >Priority: Minor > Fix For: 1.10.1 > > > Packaged DAGs are present on the server as ZIP files. The [rendering > code|https://github.com/apache/incubator-airflow/blob/a29fe350164937b28f525b46f7aecbc309665e5a/airflow/www/views.py#L668] > is not aware of zip files and fails to show code for packaged apps. > > Easy fix: If .zip appears as a suffix in the path components, attempt to open > the file using ZipFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2900) Code not visible for Packaged DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-2900. - Resolution: Fixed Resolved by https://github.com/apache/incubator-airflow/pull/3749 > Code not visible for Packaged DAGs > -- > > Key: AIRFLOW-2900 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2900 > Project: Apache Airflow > Issue Type: Bug > Components: webapp, webserver >Affects Versions: Airflow 1.9.0 >Reporter: Jacob Biesinger >Assignee: Jacob Biesinger >Priority: Minor > > Packaged DAGs are present on the server as ZIP files. The [rendering > code|https://github.com/apache/incubator-airflow/blob/a29fe350164937b28f525b46f7aecbc309665e5a/airflow/www/views.py#L668] > is not aware of zip files and fails to show code for packaged apps. > > Easy fix: If .zip appears as a suffix in the path components, attempt to open > the file using ZipFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #3749: [AIRFLOW-2900] Show code for packaged DAGs
kaxil closed pull request #3749: [AIRFLOW-2900] Show code for packaged DAGs URL: https://github.com/apache/incubator-airflow/pull/3749 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 94e18794d6..ddf3094567 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -337,7 +337,8 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): return found_dags mods = [] -if not zipfile.is_zipfile(filepath): +is_zipfile = zipfile.is_zipfile(filepath) +if not is_zipfile: if safe_mode and os.path.isfile(filepath): with open(filepath, 'rb') as f: content = f.read() @@ -409,7 +410,7 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): if isinstance(dag, DAG): if not dag.full_filepath: dag.full_filepath = filepath -if dag.fileloc != filepath: +if dag.fileloc != filepath and not is_zipfile: dag.fileloc = filepath try: dag.is_subdag = False diff --git a/airflow/www/utils.py b/airflow/www/utils.py index 9ce114d5ed..e85bc5909a 100644 --- a/airflow/www/utils.py +++ b/airflow/www/utils.py @@ -20,17 +20,21 @@ # flake8: noqa: E402 import inspect from future import standard_library -standard_library.install_aliases() +standard_library.install_aliases() # noqa: E402 from builtins import str, object from cgi import escape from io import BytesIO as IO import functools import gzip +import io import json +import os +import re import time import wtforms from wtforms.compat import text_type +import zipfile from flask import after_this_request, request, Response from flask_admin.model import filters @@ -372,6 +376,22 @@ def zipper(response): return view_func +def open_maybe_zipped(f, mode='r'): +""" +Opens the given file. If the path contains a folder with a .zip suffix, then +the folder is treated as a zip archive, opening the file inside the archive. + +:return: a file object, as in `open`, or as in `ZipFile.open`. +""" + +_, archive, filename = re.search( +r'((.*\.zip){})?(.*)'.format(re.escape(os.sep)), f).groups() +if archive and zipfile.is_zipfile(archive): +return zipfile.ZipFile(archive, mode=mode).open(filename) +else: +return io.open(f, mode=mode) + + def make_cache_key(*args, **kwargs): """ Used by cache to get a unique key per URL diff --git a/airflow/www/views.py b/airflow/www/views.py index e1a7caa8bb..aa2530e458 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -661,7 +661,7 @@ def code(self): dag = dagbag.get_dag(dag_id) title = dag_id try: -with open(dag.fileloc, 'r') as f: +with wwwutils.open_maybe_zipped(dag.fileloc, 'r') as f: code = f.read() html_code = highlight( code, lexers.PythonLexer(), HtmlFormatter(linenos=True)) diff --git a/airflow/www_rbac/utils.py b/airflow/www_rbac/utils.py index a0e9258eae..0176a5312c 100644 --- a/airflow/www_rbac/utils.py +++ b/airflow/www_rbac/utils.py @@ -26,6 +26,10 @@ import wtforms import bleach import markdown +import re +import zipfile +import os +import io from builtins import str from past.builtins import basestring @@ -202,6 +206,22 @@ def json_response(obj): mimetype="application/json") +def open_maybe_zipped(f, mode='r'): +""" +Opens the given file. If the path contains a folder with a .zip suffix, then +the folder is treated as a zip archive, opening the file inside the archive. + +:return: a file object, as in `open`, or as in `ZipFile.open`. +""" + +_, archive, filename = re.search( +r'((.*\.zip){})?(.*)'.format(re.escape(os.sep)), f).groups() +if archive and zipfile.is_zipfile(archive): +return zipfile.ZipFile(archive, mode=mode).open(filename) +else: +return io.open(f, mode=mode) + + def make_cache_key(*args, **kwargs): """ Used by cache to get a unique key per URL diff --git a/airflow/www_rbac/views.py b/airflow/www_rbac/views.py index d011724cc6..3dc3400968 100644 --- a/airflow/www_rbac/views.py +++ b/airflow/www_rbac/views.py @@ -400,7 +400,7 @@ def code(self): dag = dagbag.get_dag(dag_id) title = dag_id try: -with open(dag.fileloc, 'r') as f: +with wwwutils.open_maybe_zipped(dag.fileloc, 'r') as f: code = f.read() html_code = highlight( code, lexers.PythonLexer(),
[jira] [Commented] (AIRFLOW-2900) Code not visible for Packaged DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597881#comment-16597881 ] ASF GitHub Bot commented on AIRFLOW-2900: - kaxil closed pull request #3749: [AIRFLOW-2900] Show code for packaged DAGs URL: https://github.com/apache/incubator-airflow/pull/3749 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 94e18794d6..ddf3094567 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -337,7 +337,8 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): return found_dags mods = [] -if not zipfile.is_zipfile(filepath): +is_zipfile = zipfile.is_zipfile(filepath) +if not is_zipfile: if safe_mode and os.path.isfile(filepath): with open(filepath, 'rb') as f: content = f.read() @@ -409,7 +410,7 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): if isinstance(dag, DAG): if not dag.full_filepath: dag.full_filepath = filepath -if dag.fileloc != filepath: +if dag.fileloc != filepath and not is_zipfile: dag.fileloc = filepath try: dag.is_subdag = False diff --git a/airflow/www/utils.py b/airflow/www/utils.py index 9ce114d5ed..e85bc5909a 100644 --- a/airflow/www/utils.py +++ b/airflow/www/utils.py @@ -20,17 +20,21 @@ # flake8: noqa: E402 import inspect from future import standard_library -standard_library.install_aliases() +standard_library.install_aliases() # noqa: E402 from builtins import str, object from cgi import escape from io import BytesIO as IO import functools import gzip +import io import json +import os +import re import time import wtforms from wtforms.compat import text_type +import zipfile from flask import after_this_request, request, Response from flask_admin.model import filters @@ -372,6 +376,22 @@ def zipper(response): return view_func +def open_maybe_zipped(f, mode='r'): +""" +Opens the given file. If the path contains a folder with a .zip suffix, then +the folder is treated as a zip archive, opening the file inside the archive. + +:return: a file object, as in `open`, or as in `ZipFile.open`. +""" + +_, archive, filename = re.search( +r'((.*\.zip){})?(.*)'.format(re.escape(os.sep)), f).groups() +if archive and zipfile.is_zipfile(archive): +return zipfile.ZipFile(archive, mode=mode).open(filename) +else: +return io.open(f, mode=mode) + + def make_cache_key(*args, **kwargs): """ Used by cache to get a unique key per URL diff --git a/airflow/www/views.py b/airflow/www/views.py index e1a7caa8bb..aa2530e458 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -661,7 +661,7 @@ def code(self): dag = dagbag.get_dag(dag_id) title = dag_id try: -with open(dag.fileloc, 'r') as f: +with wwwutils.open_maybe_zipped(dag.fileloc, 'r') as f: code = f.read() html_code = highlight( code, lexers.PythonLexer(), HtmlFormatter(linenos=True)) diff --git a/airflow/www_rbac/utils.py b/airflow/www_rbac/utils.py index a0e9258eae..0176a5312c 100644 --- a/airflow/www_rbac/utils.py +++ b/airflow/www_rbac/utils.py @@ -26,6 +26,10 @@ import wtforms import bleach import markdown +import re +import zipfile +import os +import io from builtins import str from past.builtins import basestring @@ -202,6 +206,22 @@ def json_response(obj): mimetype="application/json") +def open_maybe_zipped(f, mode='r'): +""" +Opens the given file. If the path contains a folder with a .zip suffix, then +the folder is treated as a zip archive, opening the file inside the archive. + +:return: a file object, as in `open`, or as in `ZipFile.open`. +""" + +_, archive, filename = re.search( +r'((.*\.zip){})?(.*)'.format(re.escape(os.sep)), f).groups() +if archive and zipfile.is_zipfile(archive): +return zipfile.ZipFile(archive, mode=mode).open(filename) +else: +return io.open(f, mode=mode) + + def make_cache_key(*args, **kwargs): """ Used by cache to get a unique key per URL diff --git a/airflow/www_rbac/views.py b/airflow/www_rbac/views.py index d011724cc6..3dc3400968 100644 --- a/airflow/www_rbac/views.py +++ b/airflow/www_rbac/views.py @@ -400,7 +400,7 @@ def code(self): dag = dagbag.get_dag(dag_id) title = dag_id
[GitHub] kaxil commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs
kaxil commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs URL: https://github.com/apache/incubator-airflow/pull/3749#issuecomment-417451770 Awesome, Great work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2981) TypeError in dataflow operators when using GCS jar or py_file
[ https://issues.apache.org/jira/browse/AIRFLOW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Payne updated AIRFLOW-2981: --- Description: The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a list to an int, resulting in the TypeError, with: {noformat} ... path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') if path_components < 2: ... {noformat} This should be {{if len(path_components) < 2:}}. Also, fix {{if file_size > 0:}} in same function... was: The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a list to an int, resulting in the TypeError, with: {noformat} ... path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') if path_components < 2: ... {noformat} This should be {{if len(path_components) < 2:}}. > TypeError in dataflow operators when using GCS jar or py_file > -- > > Key: AIRFLOW-2981 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2981 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Dataflow >Affects Versions: 1.9.0, 1.10 >Reporter: Jeffrey Payne >Assignee: Jeffrey Payne >Priority: Major > > The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to > compare a list to an int, resulting in the TypeError, with: > {noformat} > ... > path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') > if path_components < 2: > ... > {noformat} > This should be {{if len(path_components) < 2:}}. > Also, fix {{if file_size > 0:}} in same function... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (AIRFLOW-2981) TypeError in dataflow operators when using GCS jar or py_file
[ https://issues.apache.org/jira/browse/AIRFLOW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2981 started by Jeffrey Payne. -- > TypeError in dataflow operators when using GCS jar or py_file > -- > > Key: AIRFLOW-2981 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2981 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Dataflow >Affects Versions: 1.9.0, 1.10 >Reporter: Jeffrey Payne >Assignee: Jeffrey Payne >Priority: Major > > The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to > compare a list to an int, resulting in the TypeError, with: > {noformat} > ... > path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') > if path_components < 2: > ... > {noformat} > This should be {{if len(path_components) < 2:}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
feng-tao commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417446554 Thanks @r39132 for the feedback. I updated the description in the pr and jira and let me know if it looks ok to you. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2986) Airflow Worker does not reach sqs
[ https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597832#comment-16597832 ] Shivakumar Gopalakrishnan commented on AIRFLOW-2986: yes, I have In fact I have loaded the proxies in the airflow script to os.environment["http_proxy"], https_proxy and no_proxy also, the scheduler is able to write to the queue; only the worker is not able to read the queue; I have checked the proxy logs, and they do show a tunnel connection to [eu-west-1.queue.amazonaws.com|https://eu-west-1.queue.amazonaws.com/] port 443 the only thing that comes to my mind is - ** -- .> transport: sqs://localhost// is it looking for a queue by the name of localhost - I tried debugging, I was not able to figure out where this is being set > Airflow Worker does not reach sqs > - > > Key: AIRFLOW-2986 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2986 > Project: Apache Airflow > Issue Type: Bug > Environment: amazon linux >Reporter: Shivakumar Gopalakrishnan >Priority: Major > > I am running the airflow worker service. The service is not able to connect > to the sqs > The scheduler is able to reach and write to the queue > Proxies are fine; I have implemented this in both python 2.7 and 3.5 same > issue > Copy of the log is below > {code} > starting airflow-worker... > /data/share/airflow > /data/share/airflow/airflow.cfg > [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12304) > [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12308) > [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > Starting flask > [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on > http://0.0.0.0:8793/ (Press CTRL+C to quit) > [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: > Exception('Request Empty body HTTP 599 Failed to connect to > eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) > Traceback (most recent call last): > File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line > 207, in start > self.blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, > in start > return self.obj.start() > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 316, in start > blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 592, in start > c.loop(*c.loop_args()) > File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line > 91, in asynloop > next(loop) > File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", > line 354, in create_loop > cb(*cbargs) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 114, in on_writable > return self._on_event(fd, _pycurl.CSELECT_OUT) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 124, in _on_event > self._process_pending_requests() > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 132, in _process_pending_requests > self._process(curl, errno, reason) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 178, in _process > buffer=buffer, effective_url=effective_url, error=error, > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in > __call__ > svpending(*ca, **ck) > File
[jira] [Commented] (AIRFLOW-2986) Airflow Worker does not reach sqs
[ https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597828#comment-16597828 ] Ash Berlin-Taylor commented on AIRFLOW-2986: You mentioned you have a proxy? Does airflow worker run such that {{https_proxy}} environment variable is set? > Airflow Worker does not reach sqs > - > > Key: AIRFLOW-2986 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2986 > Project: Apache Airflow > Issue Type: Bug > Environment: amazon linux >Reporter: Shivakumar Gopalakrishnan >Priority: Major > > I am running the airflow worker service. The service is not able to connect > to the sqs > The scheduler is able to reach and write to the queue > Proxies are fine; I have implemented this in both python 2.7 and 3.5 same > issue > Copy of the log is below > {code} > starting airflow-worker... > /data/share/airflow > /data/share/airflow/airflow.cfg > [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12304) > [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12308) > [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > Starting flask > [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on > http://0.0.0.0:8793/ (Press CTRL+C to quit) > [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: > Exception('Request Empty body HTTP 599 Failed to connect to > eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) > Traceback (most recent call last): > File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line > 207, in start > self.blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, > in start > return self.obj.start() > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 316, in start > blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 592, in start > c.loop(*c.loop_args()) > File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line > 91, in asynloop > next(loop) > File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", > line 354, in create_loop > cb(*cbargs) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 114, in on_writable > return self._on_event(fd, _pycurl.CSELECT_OUT) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 124, in _on_event > self._process_pending_requests() > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 132, in _process_pending_requests > self._process(curl, errno, reason) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 178, in _process > buffer=buffer, effective_url=effective_url, error=error, > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in > __call__ > svpending(*ca, **ck) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in > __call__ > retval = fun(*final_args, **final_kwargs) > File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in > _transback > return callback(ret) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in
[GitHub] r39132 commented on issue #1875: [AIRFLOW-620] Add log refresh button to TI's log view page
r39132 commented on issue #1875: [AIRFLOW-620] Add log refresh button to TI's log view page URL: https://github.com/apache/incubator-airflow/pull/1875#issuecomment-417429399 @msumit What do you want to do with this PR? Close and re-open a fresh one or rebase and ask for a review. If I don't hear back in a few days, I'll close this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro
r39132 commented on issue #3821: [AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro URL: https://github.com/apache/incubator-airflow/pull/3821#issuecomment-417428702 @feng-tao can you provide a description for your change to the JIRA? The problem statement should be apparent to anyone in the community. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213896240 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[jira] [Commented] (AIRFLOW-491) Add cache parameter in BigQuery query method
[ https://issues.apache.org/jira/browse/AIRFLOW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597733#comment-16597733 ] ASF GitHub Bot commented on AIRFLOW-491: xnuinside opened a new pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-491 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Added "useQueryCache" from job BQ configuration https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add cache parameter in BigQuery query method > > > Key: AIRFLOW-491 > URL: https://issues.apache.org/jira/browse/AIRFLOW-491 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp >Affects Versions: Airflow 1.7.1 >Reporter: Chris Riccomini >Assignee: Iuliia Volkova >Priority: Major > Fix For: Airflow 1.8 > > > The current BigQuery query() method does not have a user_query_cache > parameter. This param always defaults to true (see > [here|https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query]). > I'd like to disable query caching for some data consistency checks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside opened a new pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside opened a new pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-491 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Added "useQueryCache" from job BQ configuration https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-491) Add cache parameter in BigQuery query method
[ https://issues.apache.org/jira/browse/AIRFLOW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597730#comment-16597730 ] ASF GitHub Bot commented on AIRFLOW-491: xnuinside closed pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index e4c0653bfe..e4957b3831 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -24,6 +24,7 @@ import time from builtins import range +from copy import deepcopy from past.builtins import basestring @@ -195,10 +196,19 @@ class BigQueryBaseCursor(LoggingMixin): PEP 249 cursor isn't needed. """ -def __init__(self, service, project_id, use_legacy_sql=True): +def __init__(self, + service, + project_id, + use_legacy_sql=True, + api_resource_configs=None): + self.service = service self.project_id = project_id self.use_legacy_sql = use_legacy_sql +if api_resource_configs: +_validate_value("api_resource_configs", api_resource_configs, dict) +self.api_resource_configs = api_resource_configs \ +if api_resource_configs else {} self.running_job_id = None def create_empty_table(self, @@ -238,8 +248,7 @@ def create_empty_table(self, :return: """ -if time_partitioning is None: -time_partitioning = dict() + project_id = project_id if project_id is not None else self.project_id table_resource = { @@ -473,11 +482,11 @@ def create_external_table(self, def run_query(self, bql=None, sql=None, - destination_dataset_table=False, + destination_dataset_table=None, write_disposition='WRITE_EMPTY', allow_large_results=False, flatten_results=False, - udf_config=False, + udf_config=None, use_legacy_sql=None, maximum_billing_tier=None, maximum_bytes_billed=None, @@ -486,7 +495,8 @@ def run_query(self, labels=None, schema_update_options=(), priority='INTERACTIVE', - time_partitioning=None): + time_partitioning=None, + api_resource_configs=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -550,12 +560,22 @@ def run_query(self, :type time_partitioning: dict """ +if not api_resource_configs: +api_resource_configs = self.api_resource_configs +else: +_validate_value('api_resource_configs', +api_resource_configs, dict) +configuration = deepcopy(api_resource_configs) +if 'query' not in configuration: +configuration['query'] = {} + +else: +_validate_value("api_resource_configs['query']", +configuration['query'], dict) -# TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] -if time_partitioning is None: -time_partitioning = {} sql = bql if sql is None else sql +# TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] if bql: import warnings warnings.warn('Deprecated parameter `bql` used in ' @@ -566,95 +586,109 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if sql is None and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') # BigQuery also allows you to define how you want a table's schema to change # as a side effect of a query job # for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions + allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not
[GitHub] xnuinside closed pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside closed pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index e4c0653bfe..e4957b3831 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -24,6 +24,7 @@ import time from builtins import range +from copy import deepcopy from past.builtins import basestring @@ -195,10 +196,19 @@ class BigQueryBaseCursor(LoggingMixin): PEP 249 cursor isn't needed. """ -def __init__(self, service, project_id, use_legacy_sql=True): +def __init__(self, + service, + project_id, + use_legacy_sql=True, + api_resource_configs=None): + self.service = service self.project_id = project_id self.use_legacy_sql = use_legacy_sql +if api_resource_configs: +_validate_value("api_resource_configs", api_resource_configs, dict) +self.api_resource_configs = api_resource_configs \ +if api_resource_configs else {} self.running_job_id = None def create_empty_table(self, @@ -238,8 +248,7 @@ def create_empty_table(self, :return: """ -if time_partitioning is None: -time_partitioning = dict() + project_id = project_id if project_id is not None else self.project_id table_resource = { @@ -473,11 +482,11 @@ def create_external_table(self, def run_query(self, bql=None, sql=None, - destination_dataset_table=False, + destination_dataset_table=None, write_disposition='WRITE_EMPTY', allow_large_results=False, flatten_results=False, - udf_config=False, + udf_config=None, use_legacy_sql=None, maximum_billing_tier=None, maximum_bytes_billed=None, @@ -486,7 +495,8 @@ def run_query(self, labels=None, schema_update_options=(), priority='INTERACTIVE', - time_partitioning=None): + time_partitioning=None, + api_resource_configs=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -550,12 +560,22 @@ def run_query(self, :type time_partitioning: dict """ +if not api_resource_configs: +api_resource_configs = self.api_resource_configs +else: +_validate_value('api_resource_configs', +api_resource_configs, dict) +configuration = deepcopy(api_resource_configs) +if 'query' not in configuration: +configuration['query'] = {} + +else: +_validate_value("api_resource_configs['query']", +configuration['query'], dict) -# TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] -if time_partitioning is None: -time_partitioning = {} sql = bql if sql is None else sql +# TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] if bql: import warnings warnings.warn('Deprecated parameter `bql` used in ' @@ -566,95 +586,109 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if sql is None and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') # BigQuery also allows you to define how you want a table's schema to change # as a side effect of a query job # for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions + allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options:
[GitHub] tedmiston edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
tedmiston edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-417341304 @r39132 Yes, I've been working on the revisions discussed above and anticipate pushing up the next pass later today or tomorrow. Edit: The `npm run lint` is still valid. I've updated CONTRIBUTING.md locally for a note with a workaround on how to use `npm run lint:fix` with the Jinja plugin. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2987) "About" page version info is not available
Frank Maritato created AIRFLOW-2987: --- Summary: "About" page version info is not available Key: AIRFLOW-2987 URL: https://issues.apache.org/jira/browse/AIRFLOW-2987 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.0 Reporter: Frank Maritato Attachments: Screen Shot 2018-08-30 at 10.17.52 AM.png >From the Airflow 1.10.0 ui, click about and the resulting page shows version >and git version as "Not available" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
feng-tao edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417384630 Thanks for the info. Then I think it should add a todo / note in the comment about the limitation for cross bucket copying. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
feng-tao commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417384630 Thanks for the info. Then I think it should mention a todo / note in the code about the limitation for cross bucket copying. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2986) Airflow Worker does not reach sqs
[ https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597631#comment-16597631 ] Shivakumar Gopalakrishnan commented on AIRFLOW-2986: curl https://eu-west-1.queue.amazonaws.com it reaches the environment in fact I have assigned IAM roles to the machine and I am able to read and write > Airflow Worker does not reach sqs > - > > Key: AIRFLOW-2986 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2986 > Project: Apache Airflow > Issue Type: Bug > Environment: amazon linux >Reporter: Shivakumar Gopalakrishnan >Priority: Major > > I am running the airflow worker service. The service is not able to connect > to the sqs > The scheduler is able to reach and write to the queue > Proxies are fine; I have implemented this in both python 2.7 and 3.5 same > issue > Copy of the log is below > {code} > starting airflow-worker... > /data/share/airflow > /data/share/airflow/airflow.cfg > [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12304) > [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12308) > [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > Starting flask > [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on > http://0.0.0.0:8793/ (Press CTRL+C to quit) > [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: > Exception('Request Empty body HTTP 599 Failed to connect to > eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) > Traceback (most recent call last): > File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line > 207, in start > self.blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, > in start > return self.obj.start() > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 316, in start > blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 592, in start > c.loop(*c.loop_args()) > File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line > 91, in asynloop > next(loop) > File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", > line 354, in create_loop > cb(*cbargs) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 114, in on_writable > return self._on_event(fd, _pycurl.CSELECT_OUT) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 124, in _on_event > self._process_pending_requests() > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 132, in _process_pending_requests > self._process(curl, errno, reason) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 178, in _process > buffer=buffer, effective_url=effective_url, error=error, > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in > __call__ > svpending(*ca, **ck) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in > __call__ > retval = fun(*final_args, **final_kwargs) > File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in > _transback > return callback(ret) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File
[jira] [Commented] (AIRFLOW-2986) Airflow Worker does not reach sqs
[ https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597626#comment-16597626 ] Ash Berlin-Taylor commented on AIRFLOW-2986: This sounds like a problem with the AWS networking for your worker instances. If you SSH to the instance can you run {{curl https://eu-west-1.queue.amazonaws.com}}? > Airflow Worker does not reach sqs > - > > Key: AIRFLOW-2986 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2986 > Project: Apache Airflow > Issue Type: Bug > Environment: amazon linux >Reporter: Shivakumar Gopalakrishnan >Priority: Major > > I am running the airflow worker service. The service is not able to connect > to the sqs > The scheduler is able to reach and write to the queue > Proxies are fine; I have implemented this in both python 2.7 and 3.5 same > issue > Copy of the log is below > {code} > starting airflow-worker... > /data/share/airflow > /data/share/airflow/airflow.cfg > [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12304) > [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection > pool (PID 12308) > [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to > doesn't look like a module path > [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor > CeleryExecutor > [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding > to pre execution callback > [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling > callbacks: [] > Starting flask > [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on > http://0.0.0.0:8793/ (Press CTRL+C to quit) > [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: > Exception('Request Empty body HTTP 599 Failed to connect to > eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) > Traceback (most recent call last): > File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line > 207, in start > self.blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, > in start > return self.obj.start() > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 316, in start > blueprint.start(self) > File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, > in start > step.start(parent) > File > "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", > line 592, in start > c.loop(*c.loop_args()) > File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line > 91, in asynloop > next(loop) > File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", > line 354, in create_loop > cb(*cbargs) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 114, in on_writable > return self._on_event(fd, _pycurl.CSELECT_OUT) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 124, in _on_event > self._process_pending_requests() > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 132, in _process_pending_requests > self._process(curl, errno, reason) > File > "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", > line 178, in _process > buffer=buffer, effective_url=effective_url, error=error, > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in > __call__ > svpending(*ca, **ck) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in > __call__ > retval = fun(*final_args, **final_kwargs) > File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in > _transback > return callback(ret) > File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in > __call__ > return self.throw() > File
[jira] [Updated] (AIRFLOW-2986) Airflow Worker does not reach sqs
[ https://issues.apache.org/jira/browse/AIRFLOW-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2986: --- Description: I am running the airflow worker service. The service is not able to connect to the sqs The scheduler is able to reach and write to the queue Proxies are fine; I have implemented this in both python 2.7 and 3.5 same issue Copy of the log is below {code} starting airflow-worker... /data/share/airflow /data/share/airflow/airflow.cfg [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection pool (PID 12304) [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800 [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor CeleryExecutor [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding to pre execution callback [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling callbacks: [] [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection pool (PID 12308) [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800 [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor CeleryExecutor [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding to pre execution callback [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling callbacks: [] Starting flask [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit) [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: Exception('Request Empty body HTTP 599 Failed to connect to eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line 207, in start self.blueprint.start(self) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start return self.obj.start() File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 316, in start blueprint.start(self) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 592, in start c.loop(*c.loop_args()) File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 91, in asynloop next(loop) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop cb(*cbargs) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 114, in on_writable return self._on_event(fd, _pycurl.CSELECT_OUT) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 124, in _on_event self._process_pending_requests() File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 132, in _process_pending_requests self._process(curl, errno, reason) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 178, in _process buffer=buffer, effective_url=effective_url, error=error, File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in __call__ svpending(*ca, **ck) File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__ return self.throw() File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__ retval = fun(*final_args, **final_kwargs) File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in _transback return callback(ret) File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__ return self.throw() File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__ retval = fun(*final_args, **final_kwargs) File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 98, in _transback callback.throw() File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 96, in _transback ret = filter_(*args + (ret,), **kwargs) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/aws/connection.py", line 233, in _on_list_ready raise self._for_status(response, response.read()) Exception: Request Empty body HTTP 599 Failed to connect to eu-west-1.queue.amazonaws.com port 443: Connection timed out (None) -- celery@ip-10-92-19-197 v4.1.1 (latentcall) - --- * *** * --
[jira] [Created] (AIRFLOW-2986) Airflow Worker does not reach sqs
Shivakumar Gopalakrishnan created AIRFLOW-2986: -- Summary: Airflow Worker does not reach sqs Key: AIRFLOW-2986 URL: https://issues.apache.org/jira/browse/AIRFLOW-2986 Project: Apache Airflow Issue Type: Bug Environment: amazon linux Reporter: Shivakumar Gopalakrishnan I am running the airflow worker service. The service is not able to connect to the sqs The scheduler is able to reach and write to the queue Proxies are fine; I have implemented this in both python 2.7 and 3.5 same issue Copy of the log is below starting airflow-worker... /data/share/airflow /data/share/airflow/airflow.cfg [2018-08-30 15:41:44,367] \{settings.py:146} DEBUG - Setting up DB connection pool (PID 12304) [2018-08-30 15:41:44,367] \{settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800 [2018-08-30 15:41:44,468] \{__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path [2018-08-30 15:41:44,875] \{__init__.py:51} INFO - Using executor CeleryExecutor [2018-08-30 15:41:44,886] \{cli_action_loggers.py:40} DEBUG - Adding to pre execution callback [2018-08-30 15:41:44,995] \{cli_action_loggers.py:64} DEBUG - Calling callbacks: [] [2018-08-30 15:41:45,768] \{settings.py:146} DEBUG - Setting up DB connection pool (PID 12308) [2018-08-30 15:41:45,768] \{settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800 [2018-08-30 15:41:45,883] \{__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path [2018-08-30 15:41:46,345] \{__init__.py:51} INFO - Using executor CeleryExecutor [2018-08-30 15:41:46,358] \{cli_action_loggers.py:40} DEBUG - Adding to pre execution callback [2018-08-30 15:41:46,476] \{cli_action_loggers.py:64} DEBUG - Calling callbacks: [] Starting flask [2018-08-30 15:41:46,519] \{_internal.py:88} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit) [2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: Exception('Request Empty body HTTP 599 Failed to connect to eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',) Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line 207, in start self.blueprint.start(self) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start return self.obj.start() File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 316, in start blueprint.start(self) File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 592, in start c.loop(*c.loop_args()) File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 91, in asynloop next(loop) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop cb(*cbargs) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 114, in on_writable return self._on_event(fd, _pycurl.CSELECT_OUT) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 124, in _on_event self._process_pending_requests() File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 132, in _process_pending_requests self._process(curl, errno, reason) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 178, in _process buffer=buffer, effective_url=effective_url, error=error, File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in __call__ svpending(*ca, **ck) File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__ return self.throw() File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__ retval = fun(*final_args, **final_kwargs) File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in _transback return callback(ret) File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__ return self.throw() File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__ retval = fun(*final_args, **final_kwargs) File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 98, in _transback callback.throw() File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 96, in _transback ret = filter_(*args + (ret,), **kwargs) File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/aws/connection.py", line 233, in _on_list_ready raise self._for_status(response, response.read()) Exception: Request Empty body HTTP 599 Failed to connect to
[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417363095 Thanks @ashb for adding up & clarifying. My intention for this PR is to provide the same `copy_object()` feature in `boto3`, for which the current implementation should suffice. BTW, I have made changes based on your earlier reviews. Will re-push to this PR after my isolated tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs
jakebiesinger commented on issue #3749: [AIRFLOW-2900] Show code for packaged DAGs URL: https://github.com/apache/incubator-airflow/pull/3749#issuecomment-417359964 Thanks, @kaxil. Looks like I had a bad merge when I squashed my commit. Should be working now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jakahn commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.
jakahn commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption. URL: https://github.com/apache/incubator-airflow/pull/3805#issuecomment-417359277 @bolkedebruin @gerardo, thanks for the feedback. The intent is that this implementation _should_ be able to support other KMSs in the future, what aspects were you concerned about regarding Amazon KMS integration? The intent is that, for example, an AWS KMS Hook could be added in the future (similar to `GcpKmsHook` now) following the `KmsApiHook` interface (in addition to supporting any AWS-specific features), and then add it to the list of supported KMSs in `get_kms_hook` (`models.py`, Line 883 in this PR). You should then be able to choose between AWS or GCP KMS on a per-connection basis. The reason the `kms_*` fields are not stored as part of the `extra` field is so that you can encrypt *any* connection via KMS managed credentials (not just Google connections). Since other connections may not use JSON extras, we didn't want to mess with their extra data. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
feng-tao commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417358938 Different s3 bucket is only acceessible by different IAM role. How does this operator work with different IAM roles? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
dimberman commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-417349648 @Fokko @gerardo Quick update. I've been still running into weird minikube issues and have been unable to get the CI to build properly. This has become blocking on me implementing/PRing fixes for the k8sExecutor and the bug reports are starting to pile up. Could we revert the dockerized CI and then re-merge it once we get it working with k8s? I'm working with the k8s-kubeadm-dind guys as I think the best way forward might be to switch to that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tedmiston commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
tedmiston commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-417341304 @r39132 Yes, I've been working on the revisions discussed above and anticipate pushing up the next pass later today or tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] msumit closed pull request #3824: [AIRFLOW-XXX] Add Format to company list
msumit closed pull request #3824: [AIRFLOW-XXX] Add Format to company list URL: https://github.com/apache/incubator-airflow/pull/3824 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/README.md b/README.md index 275f6cc600..e911225aee 100644 --- a/README.md +++ b/README.md @@ -153,6 +153,7 @@ Currently **officially** using Airflow: 1. [eRevalue](https://www.datamaran.com) [[@hamedhsn](https://github.com/hamedhsn)] 1. [evo.company](https://evo.company/) [[@orhideous](https://github.com/orhideous)] 1. [Flipp](https://www.flipp.com) [[@sethwilsonwishabi](https://github.com/sethwilsonwishabi)] +1. [Format](https://www.format.com) [[@format](https://github.com/4ormat) & [@jasonicarter](https://github.com/jasonicarter)] 1. [FreshBooks](https://github.com/freshbooks) [[@DinoCow](https://github.com/DinoCow)] 1. [Fundera](https://fundera.com) [[@andyxhadji](https://github.com/andyxhadji)] 1. [G Adventures](https://gadventures.com) [[@samuelmullin](https://github.com/samuelmullin)] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3824: [AIRFLOW-XXX] Add Format to company list
codecov-io edited a comment on issue #3824: [AIRFLOW-XXX] Add Format to company list URL: https://github.com/apache/incubator-airflow/pull/3824#issuecomment-417325998 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=h1) Report > Merging [#3824](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/274f093da42d300b4295b5489013a65439fc11e4?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3824/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3824 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1582115821 === Hits1224812248 Misses 3573 3573 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=footer). Last update [274f093...085113f](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3824: [AIRFLOW-XXX] Add Format to company list
codecov-io commented on issue #3824: [AIRFLOW-XXX] Add Format to company list URL: https://github.com/apache/incubator-airflow/pull/3824#issuecomment-417325998 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=h1) Report > Merging [#3824](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/274f093da42d300b4295b5489013a65439fc11e4?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3824/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3824 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1582115821 === Hits1224812248 Misses 3573 3573 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=footer). Last update [274f093...085113f](https://codecov.io/gh/apache/incubator-airflow/pull/3824?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mascah edited a comment on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances
mascah edited a comment on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances URL: https://github.com/apache/incubator-airflow/pull/2946#issuecomment-417314553 @Fokko @bolkedebruin Can you guys please confirm that this was intended to change the output of macros? In 1.9, {{ ts_nodash }} would output `20160317T00`, in 1.10 it outputs `20160317T00+`. I would consider this a breaking change since previously generated file names using the result of this macro are now incompatible. Am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mascah commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances
mascah commented on issue #2946: [AIRFLOW-1927] Convert naive datetimes for TaskInstances URL: https://github.com/apache/incubator-airflow/pull/2946#issuecomment-417314553 @Fokko @bolkedebruin Can you guys please confirm that this was intended to change the output of macros? In 1.9, {{ ts_nodash }} would output `20160317T00`, in 1.10 it outputs `20160317T00+`. I would consider this a breaking change since previously generated files using the result of this macro are now incompatible. Am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jasonicarter opened a new pull request #3824: [AIRFLOW-XXX] Add Format to company list
jasonicarter opened a new pull request #3824: [AIRFLOW-XXX] Add Format to company list URL: https://github.com/apache/incubator-airflow/pull/3824 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Add Format to the official list of companies in README.md ### Tests - [x] None. Update to README.md ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] None. Update to README.md ### Code Quality - [x] Not a code change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (AIRFLOW-2966) KubernetesExecutor + namespace quotas kills scheduler if the pod can't be launched
[ https://issues.apache.org/jira/browse/AIRFLOW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597385#comment-16597385 ] Roland de Boo edited comment on AIRFLOW-2966 at 8/30/18 12:40 PM: -- Colleague of John here. Some additional info: * Updated to 1.10.0 and retried, same issue remains * Last observation in the log (not mentioned above): {{[2018-08-30 12:19:46,967] \{jobs.py:1585} INFO - Exited execute loop}} In the Pod I can see 2 other threads remaining, but they don't seem to do anything. {{$ ps -ef}} {{airflow 16 1 0 12:19 ? 00:00:02 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} {{airflow 38 16 0 12:19 ? 00:00:00 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} The Pod is stuck but does not exit. So we need to kill it by hand. If we increase the quota on the namespace, nothing happens to the scheduler. Steps to reproduce: Set a Pod quotum on your namespace. First count the current number of pods and set it to that value. {code:java} apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources spec: hard: pods: "4"{code} Then try to schedule a task. was (Author: rdeboo): Colleague of John here. Some additional info: * Updated to 1.10.0 and retried, same issue remains * Last observation in the log (not mentioned above): {{[2018-08-30 12:19:46,967] \{jobs.py:1585} INFO - Exited execute loop}} In the Pod I can see 2 other threads remaining, but they don't seem to do anything. {{$ ps -ef}} {{airflow 16 1 0 12:19 ? 00:00:02 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} {{airflow 38 16 0 12:19 ? 00:00:00 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} The Pod is stuck but does not exit. So we need to kill it by hand. If we increase the quota on the namespace, nothing happens to the scheduler. > KubernetesExecutor + namespace quotas kills scheduler if the pod can't be > launched > -- > > Key: AIRFLOW-2966 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2966 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10 > Environment: Kubernetes 1.9.8 >Reporter: John Hofman >Priority: Major > > When running Airflow in Kubernetes with the KubernetesExecutor and resource > quota's set on the namespace Airflow is deployed in. If the scheduler tries > to launch a pod into the namespace that exceeds the namespace limits it gets > an ApiException, and crashes the scheduler. > This stack trace is an example of the ApiException from the kubernetes client: > {code:java} > [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when > attempting to create Namespaced Pod. > Traceback (most recent call last): > File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line > 55, in run_pod_async > resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6057, in create_namespaced_pod > (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6142, in create_namespaced_pod_with_http_info > collection_formats=collection_formats) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 321, in call_api > _return_http_data_only, collection_formats, _preload_content, > _request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 155, in __call_api > _request_timeout=_request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 364, in request > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 266, in POST > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 222, in request > raise ApiException(http_resp=r) > kubernetes.client.rest.ApiException: (403) > Reason: Forbidden > HTTP response headers: HTTPHeaderDict({'Audit-Id': > 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json', > 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'}) > HTTP response body: > {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods > \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota: > compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi, > limited: > limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code} > > I would expect the scheduler to
[GitHub] codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-413105867 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=h1) Report > Merging [#3733](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3733/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) ```diff @@Coverage Diff@@ ## master #3733 +/- ## = - Coverage 77.41% 77.4% -0.01% = Files 203 203 Lines 15817 15817 = - Hits12244 12243 -1 - Misses 35733574 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3733/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=footer). Last update [ac9033d...07ee01b](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-413105867 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=h1) Report > Merging [#3733](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3733/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) ```diff @@Coverage Diff@@ ## master #3733 +/- ## = - Coverage 77.41% 77.4% -0.01% = Files 203 203 Lines 15817 15817 = - Hits12244 12243 -1 - Misses 35733574 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3733/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=footer). Last update [ac9033d...07ee01b](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin commented on issue #3526: [AIRFLOW-2651] Add file system hooks with a common interface
bolkedebruin commented on issue #3526: [AIRFLOW-2651] Add file system hooks with a common interface URL: https://github.com/apache/incubator-airflow/pull/3526#issuecomment-417304220 @jrderuiter I like the possibilities that this will deliver, but I think some architectural updates are required. The `lineage` improvements basically allow for the same kind of functionality and these changes will need to tie in with it. Maybe have a discussion offline or on th emaling list (or an improvement proposal) can speed this up? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (AIRFLOW-2966) KubernetesExecutor + namespace quotas kills scheduler if the pod can't be launched
[ https://issues.apache.org/jira/browse/AIRFLOW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597385#comment-16597385 ] Roland de Boo edited comment on AIRFLOW-2966 at 8/30/18 12:35 PM: -- Colleague of John here. Some additional info: * Updated to 1.10.0 and retried, same issue remains * Last observation in the log (not mentioned above): {{[2018-08-30 12:19:46,967] \{jobs.py:1585} INFO - Exited execute loop}} In the Pod I can see 2 other threads remaining, but they don't seem to do anything. {{$ ps -ef}} {{airflow 16 1 0 12:19 ? 00:00:02 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} {{airflow 38 16 0 12:19 ? 00:00:00 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} The Pod is stuck but does not exit. So we need to kill it by hand. If we increase the quota on the namespace, nothing happens to the scheduler. was (Author: rdeboo): Colleague of John here. Some additional info: * Updated to 1.10.0 and retried, same issue remains * Last observation in the log (not mentioned above): {{[2018-08-30 12:19:46,967] \{jobs.py:1585} INFO - Exited execute loop}} In the Pod I can see 2 other threads remaining, but they don't seem to do anything. {{$ ps -ef}} {{airflow 16 1 0 12:19 ? 00:00:02 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} {{airflow 38 16 0 12:19 ? 00:00:00 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} The Pod is stuck but does not exit. So we need to kill it by hand. > KubernetesExecutor + namespace quotas kills scheduler if the pod can't be > launched > -- > > Key: AIRFLOW-2966 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2966 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10 > Environment: Kubernetes 1.9.8 >Reporter: John Hofman >Priority: Major > > When running Airflow in Kubernetes with the KubernetesExecutor and resource > quota's set on the namespace Airflow is deployed in. If the scheduler tries > to launch a pod into the namespace that exceeds the namespace limits it gets > an ApiException, and crashes the scheduler. > This stack trace is an example of the ApiException from the kubernetes client: > {code:java} > [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when > attempting to create Namespaced Pod. > Traceback (most recent call last): > File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line > 55, in run_pod_async > resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6057, in create_namespaced_pod > (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6142, in create_namespaced_pod_with_http_info > collection_formats=collection_formats) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 321, in call_api > _return_http_data_only, collection_formats, _preload_content, > _request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 155, in __call_api > _request_timeout=_request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 364, in request > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 266, in POST > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 222, in request > raise ApiException(http_resp=r) > kubernetes.client.rest.ApiException: (403) > Reason: Forbidden > HTTP response headers: HTTPHeaderDict({'Audit-Id': > 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json', > 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'}) > HTTP response body: > {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods > \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota: > compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi, > limited: > limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code} > > I would expect the scheduler to catch the Exception and at least mark the > task as failed, or better yet retry the task later. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-413105867 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=h1) Report > Merging [#3733](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **decrease** coverage by `0.04%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3733/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3733 +/- ## == - Coverage 77.41% 77.36% -0.05% == Files 203 203 Lines 1581715817 == - Hits1224412237 -7 - Misses 3573 3580 +7 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/incubator-airflow/pull/3733/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5) | `75.71% <0%> (-5.72%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3733/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.18% <0%> (-0.13%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3733/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=footer). Last update [ac9033d...07ee01b](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2966) KubernetesExecutor + namespace quotas kills scheduler if the pod can't be launched
[ https://issues.apache.org/jira/browse/AIRFLOW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597385#comment-16597385 ] Roland de Boo commented on AIRFLOW-2966: Colleague of John here. Some additional info: * Updated to 1.10.0 and retried, same issue remains * Last observation in the log (not mentioned above): {{[2018-08-30 12:19:46,967] \{jobs.py:1585} INFO - Exited execute loop}} In the Pod I can see 2 other threads remaining, but they don't seem to do anything. {{$ ps -ef}} {{airflow 16 1 0 12:19 ? 00:00:02 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} {{airflow 38 16 0 12:19 ? 00:00:00 /usr/local/bin/python /usr/local/bin/airflow scheduler -n -1}} The Pod is stuck but does not exit. So we need to kill it by hand. > KubernetesExecutor + namespace quotas kills scheduler if the pod can't be > launched > -- > > Key: AIRFLOW-2966 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2966 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10 > Environment: Kubernetes 1.9.8 >Reporter: John Hofman >Priority: Major > > When running Airflow in Kubernetes with the KubernetesExecutor and resource > quota's set on the namespace Airflow is deployed in. If the scheduler tries > to launch a pod into the namespace that exceeds the namespace limits it gets > an ApiException, and crashes the scheduler. > This stack trace is an example of the ApiException from the kubernetes client: > {code:java} > [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when > attempting to create Namespaced Pod. > Traceback (most recent call last): > File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line > 55, in run_pod_async > resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6057, in create_namespaced_pod > (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6142, in create_namespaced_pod_with_http_info > collection_formats=collection_formats) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 321, in call_api > _return_http_data_only, collection_formats, _preload_content, > _request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 155, in __call_api > _request_timeout=_request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 364, in request > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 266, in POST > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 222, in request > raise ApiException(http_resp=r) > kubernetes.client.rest.ApiException: (403) > Reason: Forbidden > HTTP response headers: HTTPHeaderDict({'Audit-Id': > 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json', > 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'}) > HTTP response body: > {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods > \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota: > compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi, > limited: > limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code} > > I would expect the scheduler to catch the Exception and at least mark the > task as failed, or better yet retry the task later. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin closed pull request #3761: Subdag inherit runid *do not merge*
bolkedebruin closed pull request #3761: Subdag inherit runid *do not merge* URL: https://github.com/apache/incubator-airflow/pull/3761 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index e7fff3114f..4c82e722cb 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -1922,6 +1922,7 @@ def __init__( ignore_task_deps=False, pool=None, delay_on_limit_secs=1.0, +run_id_template=None, *args, **kwargs): self.dag = dag self.dag_id = dag.dag_id @@ -1934,6 +1935,9 @@ def __init__( self.ignore_task_deps = ignore_task_deps self.pool = pool self.delay_on_limit_secs = delay_on_limit_secs +self.run_id_template = BackfillJob.ID_FORMAT_PREFIX +if run_id_template: +self.run_id_template = run_id_template super(BackfillJob, self).__init__(*args, **kwargs) def _update_counters(self, ti_status): @@ -2023,7 +2027,7 @@ def _get_dag_run(self, run_date, session=None): :type session: Session :return: a DagRun in state RUNNING or None """ -run_id = BackfillJob.ID_FORMAT_PREFIX.format(run_date.isoformat()) +run_id = self.run_id_template.format(run_date.isoformat()) # consider max_active_runs but ignore when running subdags respect_dag_max_active_limit = (True diff --git a/airflow/models.py b/airflow/models.py old mode 100755 new mode 100644 index 3e296eb58b..90546f5940 --- a/airflow/models.py +++ b/airflow/models.py @@ -3675,7 +3675,9 @@ def run( ignore_task_deps=False, ignore_first_depends_on_past=False, pool=None, -delay_on_limit_secs=1.0): +delay_on_limit_secs=1.0, +run_id_template=None +): """ Runs the DAG. @@ -3703,6 +3705,8 @@ def run( :param delay_on_limit_secs: Time in seconds to wait before next attempt to run dag run when max_active_runs limit has been reached :type delay_on_limit_secs: float +:param run_id_template: Template for the run_id to be with the execution date +:type run_id_template: string """ from airflow.jobs import BackfillJob if not executor and local: @@ -3720,7 +3724,9 @@ def run( ignore_task_deps=ignore_task_deps, ignore_first_depends_on_past=ignore_first_depends_on_past, pool=pool, -delay_on_limit_secs=delay_on_limit_secs) +delay_on_limit_secs=delay_on_limit_secs, +run_id_template=run_id_template +) job.run() def cli(self): diff --git a/airflow/operators/subdag_operator.py b/airflow/operators/subdag_operator.py index 9445c4c96d..369c645ed7 100644 --- a/airflow/operators/subdag_operator.py +++ b/airflow/operators/subdag_operator.py @@ -87,6 +87,11 @@ def __init__( def execute(self, context): ed = context['execution_date'] +# Use the parent's run id as a template for the subdag dag run's run_id +run_id = context['run_id'] +run_id_template = run_id + '.{0}' self.subdag.run( start_date=ed, end_date=ed, donot_pickle=True, -executor=self.executor) +executor=self.executor, +run_id_template=run_id_template +) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin commented on issue #3761: Subdag inherit runid *do not merge*
bolkedebruin commented on issue #3761: Subdag inherit runid *do not merge* URL: https://github.com/apache/incubator-airflow/pull/3761#issuecomment-417302471 please run this on your own CI and discuss it on the mailing list. CI costs Apache Infra money. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
xnuinside commented on a change in pull request #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#discussion_r213894129 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -566,95 +612,108 @@ def run_query(self, 'Airflow.', category=DeprecationWarning) -if sql is None: -raise TypeError('`BigQueryBaseCursor.run_query` missing 1 required ' -'positional argument: `sql`') +if not sql and not configuration['query'].get('query', None): +raise TypeError('`BigQueryBaseCursor.run_query` ' +'missing 1 required positional argument: `sql`') + +# BigQuery also allows you to define how you want a table's schema +# to change as a side effect of a query job for more details: +# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions -# BigQuery also allows you to define how you want a table's schema to change -# as a side effect of a query job -# for more details: -# https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.schemaUpdateOptions allowed_schema_update_options = [ 'ALLOW_FIELD_ADDITION', "ALLOW_FIELD_RELAXATION" ] -if not set(allowed_schema_update_options).issuperset( -set(schema_update_options)): -raise ValueError( -"{0} contains invalid schema update options. " -"Please only use one or more of the following options: {1}" -.format(schema_update_options, allowed_schema_update_options)) -if use_legacy_sql is None: -use_legacy_sql = self.use_legacy_sql +if not set(allowed_schema_update_options + ).issuperset(set(schema_update_options)): +raise ValueError("{0} contains invalid schema update options. " + "Please only use one or more of the following " + "options: {1}" + .format(schema_update_options, + allowed_schema_update_options)) -configuration = { -'query': { -'query': sql, -'useLegacySql': use_legacy_sql, -'maximumBillingTier': maximum_billing_tier, -'maximumBytesBilled': maximum_bytes_billed, -'priority': priority -} -} +if schema_update_options: +if write_disposition not in ["WRITE_APPEND", "WRITE_TRUNCATE"]: +raise ValueError("schema_update_options is only " + "allowed if write_disposition is " + "'WRITE_APPEND' or 'WRITE_TRUNCATE'.") if destination_dataset_table: -if '.' not in destination_dataset_table: -raise ValueError( -'Expected destination_dataset_table name in the format of ' -'.. Got: {}'.format( -destination_dataset_table)) destination_project, destination_dataset, destination_table = \ _split_tablename(table_input=destination_dataset_table, default_project_id=self.project_id) -configuration['query'].update({ -'allowLargeResults': allow_large_results, -'flattenResults': flatten_results, -'writeDisposition': write_disposition, -'createDisposition': create_disposition, -'destinationTable': { -'projectId': destination_project, -'datasetId': destination_dataset, -'tableId': destination_table, -} -}) -if udf_config: -if not isinstance(udf_config, list): -raise TypeError("udf_config argument must have a type 'list'" -" not {}".format(type(udf_config))) -configuration['query'].update({ -'userDefinedFunctionResources': udf_config -}) -if query_params: -if self.use_legacy_sql: -raise ValueError("Query parameters are not allowed when using " - "legacy SQL") -else: -configuration['query']['queryParameters'] = query_params +destination_dataset_table = { +'projectId': destination_project, +'datasetId': destination_dataset, +'tableId': destination_table, +} -if labels: -configuration['labels'] = labels +query_param_list = [ +(sql,
[jira] [Commented] (AIRFLOW-2779) Verify and correct licenses
[ https://issues.apache.org/jira/browse/AIRFLOW-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597382#comment-16597382 ] ASF GitHub Bot commented on AIRFLOW-2779: - bolkedebruin closed pull request #3803: [AIRFLOW-2779] Restore Copyright notice of GHE auth backend URL: https://github.com/apache/incubator-airflow/pull/3803 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Verify and correct licenses > --- > > Key: AIRFLOW-2779 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2779 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Bolke de Bruin >Priority: Major > Labels: licenses > Fix For: 1.10.0 > > > # {color:#00}/airflow/security/utils.py{color} > {color:#00}2. ./airflow/security/kerberos.py{color} > {color:#00}3. ./airflow/www_rbac/static/jqClock.min.js{color} > {color:#00}4. ./airflow/www/static/bootstrap3-typeahead.min.js{color} > {color:#00}5. > ./apache-airflow-1.10.0rc2+incubating/scripts/ci/flake8_diff.sh{color} > {color:#00}6. {color}[https://www.apache.org/legal/resolved.html#optional] > {color:#00}7. ./docs/license.rst{color} > {color:#00}8. airflow/contrib/auth/backends/google_auth.py{color} > {color:#00}9. > /airflow/contrib/auth/backends/github_enterprise_auth.py{color} > {color:#00}10. /airflow/contrib/hooks/ssh_hook.py{color} > {color:#00}11. /airflow/minihivecluster.py{color} > {color:#00}This files [1][2] seem to be 3rd party ALv2 licensed files > that refers to a NOTICE file, that information in that NOTICE file (at the > very least the copyright into) should be in your NOTICE file. This should > also be noted in LICENSE.{color} > > {color:#00}LICENSE is: > - missing jQuery clock [3] and typeahead [4], as they are ALv2 it’s not > required to list them but it’s a good idea to do so. > - missing the license for this [5] > - this file [7] oddly has © 2016 GitHub, [Inc.at|http://inc.at/] the bottom > of it{color} > > * {color:#00}Year in NOTICE is not correct "2016 and onwards” isn’t > valid as copyright has an expiry date{color} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin closed pull request #3803: [AIRFLOW-2779] Restore Copyright notice of GHE auth backend
bolkedebruin closed pull request #3803: [AIRFLOW-2779] Restore Copyright notice of GHE auth backend URL: https://github.com/apache/incubator-airflow/pull/3803 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2984) Cannot convert naive_datetime when task has a naive start_date/end_date
[ https://issues.apache.org/jira/browse/AIRFLOW-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597377#comment-16597377 ] ASF GitHub Bot commented on AIRFLOW-2984: - bolkedebruin closed pull request #3822: [AIRFLOW-2984] Convert operator dates to UTC URL: https://github.com/apache/incubator-airflow/pull/3822 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 55badf4828..94e18794d6 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -2413,10 +2413,17 @@ def __init__( self.email = email self.email_on_retry = email_on_retry self.email_on_failure = email_on_failure + self.start_date = start_date if start_date and not isinstance(start_date, datetime): self.log.warning("start_date for %s isn't datetime.datetime", self) +elif start_date: +self.start_date = timezone.convert_to_utc(start_date) + self.end_date = end_date +if end_date: +self.end_date = timezone.convert_to_utc(end_date) + if not TriggerRule.is_valid(trigger_rule): raise AirflowException( "The trigger_rule must be one of {all_triggers}," diff --git a/docs/timezone.rst b/docs/timezone.rst index 9e8598e2ed..fe44ecfbb9 100644 --- a/docs/timezone.rst +++ b/docs/timezone.rst @@ -2,23 +2,23 @@ Time zones == Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. -It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the -end user’s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators +It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the +end user’s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators are not converted. Time zone information is exposed and it is up to the writer of DAG what do with it. -This is handy if your users live in more than one time zone and you want to display datetime information according to +This is handy if your users live in more than one time zone and you want to display datetime information according to each user’s wall clock. -Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database -(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is -Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward -in autumn. If you’re working in local time, you’re likely to encounter errors twice a year, when the transitions -happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesn’t matter -for a simple DAG, but it’s a problem if you are in, for example, financial services where you have end of day -deadlines to meet. +Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database +(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is +Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward +in autumn. If you’re working in local time, you’re likely to encounter errors twice a year, when the transitions +happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesn’t matter +for a simple DAG, but it’s a problem if you are in, for example, financial services where you have end of day +deadlines to meet. -The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the system’s settings or -an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. +The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the system’s settings or +an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. Pendulum is installed when you install Airflow. Please note that the Web UI currently only runs in UTC. @@ -28,8 +28,8 @@ Concepts Naïve and aware datetime objects -Python’s datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, -represented as an instance of a subclass of
[GitHub] bolkedebruin closed pull request #3822: [AIRFLOW-2984] Convert operator dates to UTC
bolkedebruin closed pull request #3822: [AIRFLOW-2984] Convert operator dates to UTC URL: https://github.com/apache/incubator-airflow/pull/3822 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 55badf4828..94e18794d6 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -2413,10 +2413,17 @@ def __init__( self.email = email self.email_on_retry = email_on_retry self.email_on_failure = email_on_failure + self.start_date = start_date if start_date and not isinstance(start_date, datetime): self.log.warning("start_date for %s isn't datetime.datetime", self) +elif start_date: +self.start_date = timezone.convert_to_utc(start_date) + self.end_date = end_date +if end_date: +self.end_date = timezone.convert_to_utc(end_date) + if not TriggerRule.is_valid(trigger_rule): raise AirflowException( "The trigger_rule must be one of {all_triggers}," diff --git a/docs/timezone.rst b/docs/timezone.rst index 9e8598e2ed..fe44ecfbb9 100644 --- a/docs/timezone.rst +++ b/docs/timezone.rst @@ -2,23 +2,23 @@ Time zones == Support for time zones is enabled by default. Airflow stores datetime information in UTC internally and in the database. -It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the -end user’s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators +It allows you to run your DAGs with time zone dependent schedules. At the moment Airflow does not convert them to the +end user’s time zone in the user interface. There it will always be displayed in UTC. Also templates used in Operators are not converted. Time zone information is exposed and it is up to the writer of DAG what do with it. -This is handy if your users live in more than one time zone and you want to display datetime information according to +This is handy if your users live in more than one time zone and you want to display datetime information according to each user’s wall clock. -Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database -(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is -Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward -in autumn. If you’re working in local time, you’re likely to encounter errors twice a year, when the transitions -happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesn’t matter -for a simple DAG, but it’s a problem if you are in, for example, financial services where you have end of day -deadlines to meet. +Even if you are running Airflow in only one time zone it is still good practice to store data in UTC in your database +(also before Airflow became time zone aware this was also to recommended or even required setup). The main reason is +Daylight Saving Time (DST). Many countries have a system of DST, where clocks are moved forward in spring and backward +in autumn. If you’re working in local time, you’re likely to encounter errors twice a year, when the transitions +happen. (The pendulum and pytz documentation discusses these issues in greater detail.) This probably doesn’t matter +for a simple DAG, but it’s a problem if you are in, for example, financial services where you have end of day +deadlines to meet. -The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the system’s settings or -an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. +The time zone is set in `airflow.cfg`. By default it is set to utc, but you change it to use the system’s settings or +an arbitrary IANA time zone, e.g. `Europe/Amsterdam`. It is dependent on `pendulum`, which is more accurate than `pytz`. Pendulum is installed when you install Airflow. Please note that the Web UI currently only runs in UTC. @@ -28,8 +28,8 @@ Concepts Naïve and aware datetime objects -Python’s datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, -represented as an instance of a subclass of datetime.tzinfo. When this attribute is set and describes an offset, +Python’s datetime.datetime objects have a tzinfo attribute that can be used to store time zone information, +represented as an instance of a subclass of datetime.tzinfo. When
[GitHub] ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying
ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214007887 ## File path: airflow/contrib/operators/s3_copy_object_operator.py ## @@ -0,0 +1,84 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from pprint import pformat + +from airflow.hooks.S3_hook import S3Hook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +class S3CopyObjectOperator(BaseOperator): Review comment: Please add a link to this class in docs/code.rst This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying
ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214007593 ## File path: airflow/contrib/operators/s3_copy_object_operator.py ## @@ -0,0 +1,84 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from pprint import pformat + +from airflow.hooks.S3_hook import S3Hook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +class S3CopyObjectOperator(BaseOperator): +""" +Creates a copy of an object that is already stored in S3. + +:param dest_s3_bucket: The name of the bucket to copy to +:type dest_s3_bucket: str +:param dest_s3_key: The name of the key to copy to +:type dest_s3_key: str +:param source_s3_bucket: The name of the source bucket +:type source_s3_bucket: str +:param source_s3_key: Key name of the source object +:type source_s3_key: str +:param source_s3_version_id: Version ID of the source object (OPTIONAL) +:type source_s3_version_id: str +:param s3_conn_id: Connection id of the S3 connection to use +:type s3_conn_id: str +:parame verify: Whether or not to verify SSL certificates for S3 connetion. +By default SSL certificates are verified. +You can provide the following values: +- False: do not validate SSL certificates. SSL will still be used, + but SSL certificates will not be + verified. +- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses. + You can specify this argument if you want to use a different + CA cert bundle than the one used by botocore. +This is also applicable to ``dest_verify``. +:type verify: bool or str +""" + +@apply_defaults +def __init__( +self, +dest_s3_bucket, +dest_s3_key, +source_s3_bucket, Review comment: It would be nice to support a single `s3://bucket/key` style parameter here like we do in the other S3 ops/sensors https://github.com/apache/incubator-airflow/blob/ac9033db0981ae1f770a8bdb5597055751ab15bd/airflow/sensors/s3_key_sensor.py#L34-L39 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1) Report > Merging [#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3823 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581715817 === Hits1224412244 Misses 3573 3573 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer). Last update [ac9033d...4fc1df4](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying
codecov-io commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=h1) Report > Merging [#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3823 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581715817 === Hits1224412244 Misses 3573 3573 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=footer). Last update [ac9033d...4fc1df4](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2985) An operator for S3 object copying [boto3.client.copy_object()]
[ https://issues.apache.org/jira/browse/AIRFLOW-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597338#comment-16597338 ] ASF GitHub Bot commented on AIRFLOW-2985: - XD-DENG opened a new pull request #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2985 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Currently we don't have an operator in Airflow to help copy objects within S3, while this is a quite common use case when we deal with the data in S3. Under the hood, this operator is using `boto3.client.copy_object()`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Test case has been added ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > An operator for S3 object copying [boto3.client.copy_object()] > -- > > Key: AIRFLOW-2985 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2985 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > Currently we don't have an operator in Airflow to help copy objects within > S3, while this is a quite common use case when we deal with the data in S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG opened a new pull request #3823: [AIRFLOW-2985] An operator for S3 object copying
XD-DENG opened a new pull request #3823: [AIRFLOW-2985] An operator for S3 object copying URL: https://github.com/apache/incubator-airflow/pull/3823 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2985 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Currently we don't have an operator in Airflow to help copy objects within S3, while this is a quite common use case when we deal with the data in S3. Under the hood, this operator is using `boto3.client.copy_object()`. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Test case has been added ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2985) An operator for S3 object copying [boto3.client.copy_object()]
Xiaodong DENG created AIRFLOW-2985: -- Summary: An operator for S3 object copying [boto3.client.copy_object()] Key: AIRFLOW-2985 URL: https://issues.apache.org/jira/browse/AIRFLOW-2985 Project: Apache Airflow Issue Type: Improvement Components: operators Reporter: Xiaodong DENG Assignee: Xiaodong DENG Currently we don't have an operator in Airflow to help copy objects within S3, while this is a quite common use case when we deal with the data in S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs'
codecov-io edited a comment on issue #3733: [AIRFLOW-491] Add cache parameter in BigQuery query method - with 'api_resource_configs' URL: https://github.com/apache/incubator-airflow/pull/3733#issuecomment-413105867 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=h1) Report > Merging [#3733](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/ac9033db0981ae1f770a8bdb5597055751ab15bd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3733/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3733 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581715817 === Hits1224412244 Misses 3573 3573 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=footer). Last update [ac9033d...3e7742d](https://codecov.io/gh/apache/incubator-airflow/pull/3733?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services