[GitHub] tedmiston commented on issue #3703: [AIRFLOW-2857] Fix broken RTD env
tedmiston commented on issue #3703: [AIRFLOW-2857] Fix broken RTD env URL: https://github.com/apache/incubator-airflow/pull/3703#issuecomment-411272356 Hey all - It looks like our RTD build post-merge was successful. I dug into the mock dependency etc in the RTD build environment logs which led to creating a follow up Jira issue. If you have time to take a look and provide any comments there, it would be appreciated. https://issues.apache.org/jira/browse/AIRFLOW-2871 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2871) Harden and improve Read the Docs build environment
Taylor Edmiston created AIRFLOW-2871: Summary: Harden and improve Read the Docs build environment Key: AIRFLOW-2871 URL: https://issues.apache.org/jira/browse/AIRFLOW-2871 Project: Apache Airflow Issue Type: Bug Components: docs, Documentation Reporter: Taylor Edmiston Assignee: Taylor Edmiston h2. Context In the process of resolving AIRFLOW-2857 (via [PR 3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some oddities in our Read the Docs (RTD) build environment especially around cached dependencies. This motivates hardening and showing some love to our RTD setup. h2. Problem I dug into the RTD build logs for a moment to find some closure on the mock dependency discussed in PR #3703 above. I think that our RTD environment possibly has been working by coincidence off of cached dependencies. {code:java} python /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip install --ignore-installed --cache-dir /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip .[doc,docker,gcp_api,emr]{code} The directory referenced by that --cache-dir arg earlier in the log happens to have mock installed already. {code:java} python /home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip install --upgrade --cache-dir /home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code} Here are some logs where you can see that (view raw): # Latest successful (Aug. 7, 2018. 9:21 a.m.) - [7602630|https://readthedocs.org/projects/airflow/builds/7602630/] # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - [7593052|https://readthedocs.org/projects/airflow/builds/7593052/] # Last successful before (2) (July 18, 2018. 3:23 a.m.) - [7503718|https://readthedocs.org/projects/airflow/builds/7503718/] # First build (2016) - [4150778|https://readthedocs.org/projects/airflow/builds/4150778/] It appears that mock and others have potentially been cached since the first RTD build in 2016 (4). These versions like mock==1.0.1 do not appear to be coming from anywhere in our current config in incubator-airflow; I believe they are installed as [core dependencies of RTD itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235]. Some but not all of these dependencies get upgraded to newer versions further down in the build. In the case of mock, we were getting lucky that mock==1.0.1 was a dependency of RTD and our setup inherited that old version which allowed the docs build to succeed. (There might be other cases of dependencies like this too.) h2. Solution My proposed enhancements to harden and improve our RTD setup are: * Hardening ** Set our RTD build to use a virtual environment if it's not already ** Set our RTD build to ignore packages outside of its virtualenv like dependencies of RTD itself ** Specify any dependencies broken by ^ ** Test wiping a version in the build environment (not sure if this clears cache dir) *** [https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment] *** [https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment] ** Make build.image, python.version, etc explicit in yaml config *** [https://docs.readthedocs.io/en/latest/yaml-config.html] ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x * Misc ** Improve RTD project page to have tags and description ** Lint YAML file Note: I don't yet have maintainer access for airflow on RTD which I believe this would require. I am happy to take this issue if I can get that. I have experience as an admin of another project on RTD (simple-salesforce). /cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1410) BigQuery Operator flattens the repeated column and it's not configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572579#comment-16572579 ] Iuliia Volkova commented on AIRFLOW-1410: - '[AIRFLOW-1404] Add 'flatten_results' & 'maximum_bytes_billed' to BQ Operator' those task seems already done > BigQuery Operator flattens the repeated column and it's not configurable > > > Key: AIRFLOW-1410 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1410 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, hooks >Reporter: Pratap Koritala >Priority: Minor > > For Legacy SQL, BigQuery Operator/Hook flattens the repeated column. The > underlying Google Query JSON API's parameter "flattenResult" has default > value of false, by not setting it every Legacy SQL result is flattened. > Without this config option, Legacy SQL can't be used for queries with > repeated columns. Even though it looks like feature request, I consider this > as a bug as this config option should have been in the first place. > I am working on submitting PR to github project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411263370 apt-get install failed .net/chris-lea/redis-server/ubuntu/dists/trusty/main/binary-amd64/Packages Unable to connect to ppa.launchpad.net:http: W: Failed to fetch http://ppa.launchpad.net/chris-lea/redis-server/ubuntu/dists/trusty/main/binary-i386/Packages Unable to connect to ppa.launchpad.net:http: W: Some index files failed to download. They have been ignored, or old ones used instead. The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $TRAVIS_APT_OPTS install slapd ldap-utils openssh-server mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc krb5-admin-server oracle-java8-installer" failed and exited with 100 during . travis died this such error, what to do? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411263370 apt-get install failed $ cat ~/apt-get-update.log Ign:1 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease Get:2 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease [61.4 kB] Get:3 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release [2,495 B] Get:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release.gpg [801 B] Get:4 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] Get:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4/multiverse amd64 Packages [11.8 kB] Ign:6 http://toolbelt.heroku.com/ubuntu ./ InRelease Get:8 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease [15.4 kB] Get:10 http://apt.postgresql.org/pub/repos/apt trusty-pgdg/main amd64 Packages [190 kB] Hit:9 http://toolbelt.heroku.com/ubuntu ./ Release Get:11 http://apt.postgresql.org/pub/repos/apt trusty-pgdg/main i386 Packages [189 kB] Hit:13 https://packagecloud.io/computology/apt-backport/ubuntu trusty InRelease Get:14 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease [23.2 kB] Get:15 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty InRelease [23.7 kB] Get:16 https://packagecloud.io/github/git-lfs/ubuntu trusty/main amd64 Packages [6,863 B] Get:17 https://packagecloud.io/github/git-lfs/ubuntu trusty/main i386 Packages [6,629 B] Get:18 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty/main amd64 Packages [5,515 B] Get:19 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty/main i386 Packages [5,515 B] Get:20 https://download.docker.com/linux/ubuntu trusty InRelease [26.5 kB] Get:21 https://download.docker.com/linux/ubuntu trusty/stable amd64 Packages [3,545 B] Get:22 https://download.docker.com/linux/ubuntu trusty/edge amd64 Packages [4,756 B] Get:23 https://dl.hhvm.com/ubuntu trusty InRelease [2,411 B] Get:24 https://dl.hhvm.com/ubuntu trusty/main amd64 Packages [1,816 B] Err:25 http://mirror.jmu.edu/pub/ubuntu trusty InRelease Could not connect to mirror.jmu.edu:80 (134.126.36.183), connection timed out Err:26 http://mirror.jmu.edu/pub/ubuntu trusty-updates InRelease Unable to connect to mirror.jmu.edu:http: Err:27 http://mirror.jmu.edu/pub/ubuntu trusty-backports InRelease Unable to connect to mirror.jmu.edu:http: Ign:28 http://dl.google.com/linux/chrome/deb stable InRelease Get:29 http://dl.google.com/linux/chrome/deb stable Release [1,189 B] Get:30 http://dl.google.com/linux/chrome/deb stable Release.gpg [819 B] Get:31 http://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,111 B] Get:32 http://security.ubuntu.com/ubuntu trusty-security InRelease [65.9 kB] Get:33 http://security.ubuntu.com/ubuntu trusty-security/main Sources [205 kB] Get:34 http://security.ubuntu.com/ubuntu trusty-security/restricted Sources [5,050 B] Get:35 http://security.ubuntu.com/ubuntu trusty-security/universe Sources [93.0 kB] Get:36 http://security.ubuntu.com/ubuntu trusty-security/multiverse Sources [3,072 B] Get:37 http://security.ubuntu.com/ubuntu trusty-security/main amd64 Packages [941 kB] Get:38 http://security.ubuntu.com/ubuntu trusty-security/main i386 Packages [869 kB] Err:39 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease Could not connect to ppa.launchpad.net:80 (91.189.95.83), connection timed out Err:40 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease Unable to connect to ppa.launchpad.net:http: Err:41 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease Unable to connect to ppa.launchpad.net:http: Err:42 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease Unable to connect to ppa.launchpad.net:http: Err:43 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease Unable to connect to ppa.launchpad.net:http: Ign:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main amd64 Packages Ign:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main i386 Packages Ign:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main amd64 Packages Ign:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main i386 Packages Err:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main amd64 Packages Unable to connect to ppa.launchpad.net:http: Err:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main i386 Packages Unable to connect to ppa.launchpad.net:http: Get:46 http://security.ubuntu.com/ubuntu trusty-security/main Translation-en [496 kB] Get:47 http://security.ubuntu.com/ubuntu trusty-security/restricted amd64 Packages [18.1 kB] Get:48
[jira] [Commented] (AIRFLOW-1503) AssertionError: INTERNAL: No default project is specified
[ https://issues.apache.org/jira/browse/AIRFLOW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572578#comment-16572578 ] Iuliia Volkova commented on AIRFLOW-1503: - @ :param destination_dataset_table: A dotted (.|:). that, if set, will store the results 'temp.percentage' it could not be, it must have a project at the start, at this moment is this actual issue for anybody? > AssertionError: INTERNAL: No default project is specified > - > > Key: AIRFLOW-1503 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1503 > Project: Apache Airflow > Issue Type: Bug > Components: gcp >Affects Versions: Airflow 1.8 > Environment: Unix platform >Reporter: chaitanya >Priority: Minor > Labels: beginner > > Hi , > New to airflow. Tried to run BigQuery query and store the result in another > table. Getting the following error. > Please let me know where to default project. > Code: > sql_bigquery = BigQueryOperator( > task_id='sql_bigquery', > use_legacy_sql=False, > write_disposition='WRITE_TRUNCATE', > allow_large_results=True, > bql=''' > #standardSQL > SELECT ID, Name, Group, Mark, RATIO_TO_REPORT(Mark) > OVER(PARTITION BY Group) AS percent FROM `tensile-site-168620.temp.marks` > ''', > destination_dataset_table='temp.percentage', > dag=dag > ) > Error Message: > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 28, in > args.func(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 585, > in test > ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True) > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, > in wrapper > result = func(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1374, > in run > result = task_copy.execute(context=context) > File > "/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/bigquery_operator.py", > line 82, in execute > self.allow_large_results, self.udf_config, self.use_legacy_sql) > File > "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/bigquery_hook.py", > line 228, in run_query > default_project_id=self.project_id) > File > "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/bigquery_hook.py", > line 917, in _split_tablename > assert default_project_id is not None, "INTERNAL: No default project is > specified" > AssertionError: INTERNAL: No default project is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208438535 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() Review comment: @kaxil seems it is missed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG resolved AIRFLOW-2836. Resolution: Fixed > Minor improvement of contrib.sensors.FileSensor > --- > > Key: AIRFLOW-2836 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2836 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > h4. *Background* > The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. > However, when we initiate the database > (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88), > there isn't such an entry. It doesn't exist anywhere else. > h4. *Issue* > The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file > system (of course can also be used for NAS). Then the path ("/") from default > connection 'fs_default' would suffice. > However, given the default value for *fs_conn_id* in > contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will > make the situation much more complex. > When users intend to check local file system only, they should be able to > leave *fs_conn_id* default directly, instead of going setting up another > connection separately. > h4. Proposal > Change default value for *fs_conn_id* in contrib.sensors.FileSensor from > "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* > are all specified to be "fs_default"). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File
[ https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG resolved AIRFLOW-2855. Resolution: Fixed > Need to Check Validity of Cron Expression When Process DAG File/Zip File > > > Key: AIRFLOW-2855 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2855 > Project: Apache Airflow > Issue Type: Improvement > Components: DAG >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > *schedule_interval* of DAGs can either be *timedelta* or a *Cron expression*. > When it's a Cron expression, there is no mechanism to check its validity at > this moment. If there is anything wrong with the Cron expression itself, it > will cause issues when methods _*following_schedule(**)*_ and > _*previous_schedule()*_ are invoked (will affect scheduling). However, > exceptions will only be written into logs. From Web UI, it’s hard for users > to identify this issue & the source while no new task can be initiated > (especially for users who’re not very familiar with Cron). > It may be good to show error messages in web UI when a DAG's Cron expression > (as schedule_interval) can not be parsed by *croniter* properly. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob
[ https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572561#comment-16572561 ] Xiaodong DENG commented on AIRFLOW-2848: Thanks [~xnuinside] for reminding. Have closed this ticket. > dag_id is missing in metadata table "job" for LocalTaskJob > -- > > Key: AIRFLOW-2848 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2848 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Attachments: After this fix.png, Before this fix.png > > > dag_id is missing for all entries in metadata table "job" with job_type > "LocalTaskJob". > This is due to that dag_id was not specified within class LocalTaskJob. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob
[ https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaodong DENG closed AIRFLOW-2848. -- Resolution: Fixed > dag_id is missing in metadata table "job" for LocalTaskJob > -- > > Key: AIRFLOW-2848 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2848 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Attachments: After this fix.png, Before this fix.png > > > dag_id is missing for all entries in metadata table "job" with job_type > "LocalTaskJob". > This is due to that dag_id was not specified within class LocalTaskJob. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-491) Add cache parameter in BigQuery query method
[ https://issues.apache.org/jira/browse/AIRFLOW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova reassigned AIRFLOW-491: -- Assignee: Iuliia Volkova > Add cache parameter in BigQuery query method > > > Key: AIRFLOW-491 > URL: https://issues.apache.org/jira/browse/AIRFLOW-491 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp >Affects Versions: Airflow 1.7.1 >Reporter: Chris Riccomini >Assignee: Iuliia Volkova >Priority: Major > Fix For: Airflow 1.8 > > > The current BigQuery query() method does not have a user_query_cache > parameter. This param always defaults to true (see > [here|https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query]). > I'd like to disable query caching for some data consistency checks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
codecov-io commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411257030 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=h1) Report > Merging [#3717](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d47580feaf80eeebb416d0179dfa8db3f4e1d2c9?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3717/graphs/tree.svg?src=pr=650=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3717 +/- ## === Coverage 77.57% 77.57% === Files 204 204 Lines 1577615776 === Hits1223912239 Misses 3537 3537 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=footer). Last update [d47580f...ab05faf](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
yrqls21 commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#issuecomment-411255565 TY @feng-tao! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
[ https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572530#comment-16572530 ] Iuliia Volkova edited comment on AIRFLOW-1874 at 8/8/18 1:13 AM: - [~kaxilnaik], I opened PR ^^ was (Author: xnuinside): [~kaxilnaik], [https://github.com/apache/incubator-airflow/pull/3717] I opened PR > Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators > -- > > Key: AIRFLOW-1874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1874 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp, operators >Reporter: Guillermo Rodríguez Cano >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > BigQueryCheckOperator, BigQueryValueCheckOperator and > BigQueryIntervalCheckOperator do not support disabling use of default legacy > SQL in BigQuery. > This is a major blocker to support correct migration to standard SQL when > queries are complicated. For example, a query that can be queried in legacy > SQL may be blocked from any subsequent view done in standard SQL that this > view uses as the queries are bound to either standard or legacy SQL but not a > mix. > These operators inherit from base ones of the same name (without the BigQuery > prefix) from Airflow which may make the process more complicated as the flag > to use standard SQL should be enabled because the underlying BigQueryHook has > the corresponding parameter, use_legacy_sql, set to True, when running a > query. But it is not possible to pass parameters all the way to it via the > aforementioned operators. > The workaround of including #standardSQL and a new line before the query > doesn't work either as there is mismatch. BigQuery reports the following in > fact: "Query text specifies use_legacy_sql:false, while API options > specify:true" > A workaround for queries on views using standard SQL is to persist the result > of the query in a temporary table, then run the check operation and > thereafter delete the temporary table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
[ https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572530#comment-16572530 ] Iuliia Volkova commented on AIRFLOW-1874: - [~kaxilnaik], [https://github.com/apache/incubator-airflow/pull/3717] I opened PR > Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators > -- > > Key: AIRFLOW-1874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1874 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp, operators >Reporter: Guillermo Rodríguez Cano >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > BigQueryCheckOperator, BigQueryValueCheckOperator and > BigQueryIntervalCheckOperator do not support disabling use of default legacy > SQL in BigQuery. > This is a major blocker to support correct migration to standard SQL when > queries are complicated. For example, a query that can be queried in legacy > SQL may be blocked from any subsequent view done in standard SQL that this > view uses as the queries are bound to either standard or legacy SQL but not a > mix. > These operators inherit from base ones of the same name (without the BigQuery > prefix) from Airflow which may make the process more complicated as the flag > to use standard SQL should be enabled because the underlying BigQueryHook has > the corresponding parameter, use_legacy_sql, set to True, when running a > query. But it is not possible to pass parameters all the way to it via the > aforementioned operators. > The workaround of including #standardSQL and a new line before the query > doesn't work either as there is mismatch. BigQuery reports the following in > fact: "Query text specifies use_legacy_sql:false, while API options > specify:true" > A workaround for queries on views using standard SQL is to persist the result > of the query in a temporary table, then run the check operation and > thereafter delete the temporary table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
[ https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572529#comment-16572529 ] ASF GitHub Bot commented on AIRFLOW-1874: - xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717 Hi everyone! I saw the old task relative to my current work scope and decide to do this little PR ) Just adding use_legacy_sql to BigQueryCheck operator. [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators > -- > > Key: AIRFLOW-1874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1874 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp, operators >Reporter: Guillermo Rodríguez Cano >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > BigQueryCheckOperator, BigQueryValueCheckOperator and > BigQueryIntervalCheckOperator do not support disabling use of default legacy > SQL in BigQuery. > This is a major blocker to support correct migration to standard SQL when > queries are complicated. For example, a query that can be queried in legacy > SQL may be blocked from any subsequent view done in standard SQL that this > view uses as the queries are bound to either standard or legacy SQL but not a > mix. > These operators inherit from base ones of the same name (without the BigQuery > prefix) from Airflow which may make the process more complicated as the flag > to use standard SQL should be enabled because the underlying BigQueryHook has > the corresponding parameter, use_legacy_sql, set to True, when running a > query. But it is not possible to pass parameters all the way to it via the > aforementioned operators. > The workaround of including #standardSQL and a new line before the query > doesn't work either as there is mismatch. BigQuery reports the following in > fact: "Query text specifies use_legacy_sql:false, while API options > specify:true" > A workaround for queries on views using standard SQL is to persist the result > of the query in a temporary table, then run the check operation and > thereafter delete the temporary table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators URL: https://github.com/apache/incubator-airflow/pull/3717 Hi everyone! I saw the old task relative to my current work scope and decide to do this little PR ) Just adding use_legacy_sql to BigQueryCheck operator. [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
feng-tao commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#issuecomment-411251048 thanks @XD-DENG @yrqls21 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File
[ https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572528#comment-16572528 ] ASF subversion and git services commented on AIRFLOW-2855: -- Commit d47580feaf80eeebb416d0179dfa8db3f4e1d2c9 in incubator-airflow's branch refs/heads/master from Xiaodong [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=d47580f ] [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() (#3698) A DAG can be imported as a .py script properly, but the Cron expression inside as "schedule_interval" may be invalid, like "0 100 * * *". This commit helps check the validity of Cron expression in DAG files (.py) and packaged DAG files (.zip), and help show exception messages in web UI by add these exceptions into metadata "import_error". > Need to Check Validity of Cron Expression When Process DAG File/Zip File > > > Key: AIRFLOW-2855 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2855 > Project: Apache Airflow > Issue Type: Improvement > Components: DAG >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > *schedule_interval* of DAGs can either be *timedelta* or a *Cron expression*. > When it's a Cron expression, there is no mechanism to check its validity at > this moment. If there is anything wrong with the Cron expression itself, it > will cause issues when methods _*following_schedule(**)*_ and > _*previous_schedule()*_ are invoked (will affect scheduling). However, > exceptions will only be written into logs. From Web UI, it’s hard for users > to identify this issue & the source while no new task can be initiated > (especially for users who’re not very familiar with Cron). > It may be good to show error messages in web UI when a DAG's Cron expression > (as schedule_interval) can not be parsed by *croniter* properly. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index cf7eb0a64f..206106a4e9 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -62,7 +62,9 @@ from sqlalchemy.orm import reconstructor, relationship, synonym from sqlalchemy_utc import UtcDateTime -from croniter import croniter +from croniter import ( +croniter, CroniterBadCronError, CroniterBadDateError, CroniterNotAlphaError +) import six from airflow import settings, utils @@ -412,8 +414,18 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): try: dag.is_subdag = False self.bag_dag(dag, parent_dag=dag, root_dag=dag) +if isinstance(dag._schedule_interval, six.string_types): +croniter(dag._schedule_interval) found_dags.append(dag) found_dags += dag.subdags +except (CroniterBadCronError, +CroniterBadDateError, +CroniterNotAlphaError) as cron_e: +self.log.exception("Failed to bag_dag: %s", dag.full_filepath) +self.import_errors[dag.full_filepath] = \ +"Invalid Cron expression: " + str(cron_e) +self.file_last_changed[dag.full_filepath] = \ +file_last_changed_on_disk except AirflowDagCycleException as cycle_exception: self.log.exception("Failed to bag_dag: %s", dag.full_filepath) self.import_errors[dag.full_filepath] = str(cycle_exception) diff --git a/tests/dags/test_invalid_cron.py b/tests/dags/test_invalid_cron.py new file mode 100755 index 00..51a0e43cb5 --- /dev/null +++ b/tests/dags/test_invalid_cron.py @@ -0,0 +1,35 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.models import DAG +from airflow.operators.dummy_operator import DummyOperator +from airflow.utils.timezone import datetime + +# The schedule_interval specified here is an INVALID +# Cron expression. This invalid DAG will be used to +# test whether dagbag.process_file() can identify +# invalid Cron expression. +dag1 = DAG( +dag_id='test_invalid_cron', +start_date=datetime(2015, 1, 1), +schedule_interval="0 100 * * *") +dag1_task1 = DummyOperator( +task_id='task1', +dag=dag1, +owner='airflow') diff --git a/tests/dags/test_zip_invalid_cron.zip b/tests/dags/test_zip_invalid_cron.zip new file mode 100644 index 00..fe45153abe Binary files /dev/null and b/tests/dags/test_zip_invalid_cron.zip differ diff --git a/tests/models.py b/tests/models.py index 1c88ea47f7..5a0397dc08 100644 --- a/tests/models.py +++ b/tests/models.py @@ -56,7 +56,7 @@ from airflow.utils.trigger_rule import TriggerRule from mock import patch, ANY from parameterized import parameterized -from tempfile import NamedTemporaryFile +from tempfile import mkdtemp, NamedTemporaryFile DEFAULT_DATE = timezone.datetime(2016, 1, 1) TEST_DAGS_FOLDER = os.path.join( @@ -1038,6 +1038,19 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag(dag_folder=mkdtemp()) + +self.assertEqual(len(dagbag.import_errors), 0) +for d in invalid_dag_files: +
[jira] [Commented] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File
[ https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572527#comment-16572527 ] ASF GitHub Bot commented on AIRFLOW-2855: - feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index cf7eb0a64f..206106a4e9 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -62,7 +62,9 @@ from sqlalchemy.orm import reconstructor, relationship, synonym from sqlalchemy_utc import UtcDateTime -from croniter import croniter +from croniter import ( +croniter, CroniterBadCronError, CroniterBadDateError, CroniterNotAlphaError +) import six from airflow import settings, utils @@ -412,8 +414,18 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): try: dag.is_subdag = False self.bag_dag(dag, parent_dag=dag, root_dag=dag) +if isinstance(dag._schedule_interval, six.string_types): +croniter(dag._schedule_interval) found_dags.append(dag) found_dags += dag.subdags +except (CroniterBadCronError, +CroniterBadDateError, +CroniterNotAlphaError) as cron_e: +self.log.exception("Failed to bag_dag: %s", dag.full_filepath) +self.import_errors[dag.full_filepath] = \ +"Invalid Cron expression: " + str(cron_e) +self.file_last_changed[dag.full_filepath] = \ +file_last_changed_on_disk except AirflowDagCycleException as cycle_exception: self.log.exception("Failed to bag_dag: %s", dag.full_filepath) self.import_errors[dag.full_filepath] = str(cycle_exception) diff --git a/tests/dags/test_invalid_cron.py b/tests/dags/test_invalid_cron.py new file mode 100755 index 00..51a0e43cb5 --- /dev/null +++ b/tests/dags/test_invalid_cron.py @@ -0,0 +1,35 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.models import DAG +from airflow.operators.dummy_operator import DummyOperator +from airflow.utils.timezone import datetime + +# The schedule_interval specified here is an INVALID +# Cron expression. This invalid DAG will be used to +# test whether dagbag.process_file() can identify +# invalid Cron expression. +dag1 = DAG( +dag_id='test_invalid_cron', +start_date=datetime(2015, 1, 1), +schedule_interval="0 100 * * *") +dag1_task1 = DummyOperator( +task_id='task1', +dag=dag1, +owner='airflow') diff --git a/tests/dags/test_zip_invalid_cron.zip b/tests/dags/test_zip_invalid_cron.zip new file mode 100644 index 00..fe45153abe Binary files /dev/null and b/tests/dags/test_zip_invalid_cron.zip differ diff --git a/tests/models.py b/tests/models.py index 1c88ea47f7..5a0397dc08 100644 --- a/tests/models.py +++ b/tests/models.py @@ -56,7 +56,7 @@ from airflow.utils.trigger_rule import TriggerRule from mock import patch, ANY from parameterized import parameterized -from tempfile import NamedTemporaryFile +from tempfile import mkdtemp, NamedTemporaryFile DEFAULT_DATE = timezone.datetime(2016, 1, 1) TEST_DAGS_FOLDER = os.path.join( @@ -1038,6 +1038,19 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files =
[jira] [Assigned] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
[ https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iuliia Volkova reassigned AIRFLOW-1874: --- Assignee: Iuliia Volkova (was: Guillermo Rodríguez Cano) > Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators > -- > > Key: AIRFLOW-1874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1874 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp, operators >Reporter: Guillermo Rodríguez Cano >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > BigQueryCheckOperator, BigQueryValueCheckOperator and > BigQueryIntervalCheckOperator do not support disabling use of default legacy > SQL in BigQuery. > This is a major blocker to support correct migration to standard SQL when > queries are complicated. For example, a query that can be queried in legacy > SQL may be blocked from any subsequent view done in standard SQL that this > view uses as the queries are bound to either standard or legacy SQL but not a > mix. > These operators inherit from base ones of the same name (without the BigQuery > prefix) from Airflow which may make the process more complicated as the flag > to use standard SQL should be enabled because the underlying BigQueryHook has > the corresponding parameter, use_legacy_sql, set to True, when running a > query. But it is not possible to pass parameters all the way to it via the > aforementioned operators. > The workaround of including #standardSQL and a new line before the query > doesn't work either as there is mismatch. BigQuery reports the following in > fact: "Query text specifies use_legacy_sql:false, while API options > specify:true" > A workaround for queries on views using standard SQL is to persist the result > of the query in a temporary table, then run the check operation and > thereafter delete the temporary table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob
[ https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572512#comment-16572512 ] Iuliia Volkova commented on AIRFLOW-2848: - [~XD-DENG] seems like it's already done) just friendly reminder about closing the task > dag_id is missing in metadata table "job" for LocalTaskJob > -- > > Key: AIRFLOW-2848 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2848 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > Attachments: After this fix.png, Before this fix.png > > > dag_id is missing for all entries in metadata table "job" with job_type > "LocalTaskJob". > This is due to that dag_id was not specified within class LocalTaskJob. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#issuecomment-410575101 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=h1) Report > Merging [#3698](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/120f4856cdea5134971c4c4a239ddbfdc80db77e?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3698/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3698 +/- ## == + Coverage 77.57% 77.57% +<.01% == Files 204 204 Lines 1577015776 +6 == + Hits1223312239 +6 Misses 3537 3537 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3698/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.61% <100%> (+0.02%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=footer). Last update [120f485...1a5936f](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#issuecomment-410575101 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=h1) Report > Merging [#3698](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/120f4856cdea5134971c4c4a239ddbfdc80db77e?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3698/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3698 +/- ## == + Coverage 77.57% 77.57% +<.01% == Files 204 204 Lines 1577015776 +6 == + Hits1223312239 +6 Misses 3537 3537 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3698/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.61% <100%> (+0.02%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=footer). Last update [120f485...1a5936f](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training
troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208422276 ## File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py ## @@ -0,0 +1,98 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.models import BaseOperator +from airflow.utils import apply_defaults +from airflow.exceptions import AirflowException + + +class SageMakerCreateTrainingJobOperator(BaseOperator): + +""" + Initiate a SageMaker training + + This operator returns The ARN of the model created in Amazon SageMaker + + :param training_job_config: + The configuration necessary to start a training job (templated) + :type training_job_config: dict + :param region_name: The AWS region_name + :type region_name: string + :param sagemaker_conn_id: The SageMaker connection ID to use. + :type aws_conn_id: string Review comment: @Fokko @gglanzani I agree with you that giving user an easier way is always helpful, so I updated the code, and added an wait option in operator. The default is True, and if it is not set to False by user, the operator will only signal success after the training job has finished. The control logic should be very similar to that of Druid hook you mentioned before. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208416023 ## File path: tests/models.py ## @@ -56,7 +56,7 @@ from airflow.utils.trigger_rule import TriggerRule from mock import patch, ANY from parameterized import parameterized -from tempfile import NamedTemporaryFile +from tempfile import NamedTemporaryFile, mkdtemp Review comment: Updated as suggested. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards
[ https://issues.apache.org/jira/browse/AIRFLOW-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572471#comment-16572471 ] ASF GitHub Bot commented on AIRFLOW-2867: - feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index aa8fc382a6..2a94580f50 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -206,7 +206,7 @@ def create_empty_table(self, dataset_id, table_id, schema_fields=None, - time_partitioning={}, + time_partitioning=None, labels=None ): """ @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() project_id = project_id if project_id is not None else self.project_id table_resource = { @@ -286,7 +288,7 @@ def create_external_table(self, quote_character=None, allow_quoted_newlines=False, allow_jagged_rows=False, - src_fmt_configs={}, + src_fmt_configs=None, labels=None ): """ @@ -352,6 +354,8 @@ def create_external_table(self, :type labels: dict """ +if src_fmt_configs is None: +src_fmt_configs = {} project_id, dataset_id, external_table_id = \ _split_tablename(table_input=external_project_dataset_table, default_project_id=self.project_id, @@ -482,7 +486,7 @@ def run_query(self, labels=None, schema_update_options=(), priority='INTERACTIVE', - time_partitioning={}): + time_partitioning=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -548,6 +552,8 @@ def run_query(self, """ # TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] +if time_partitioning is None: +time_partitioning = {} sql = bql if sql is None else sql if bql: @@ -808,8 +814,8 @@ def run_load(self, allow_quoted_newlines=False, allow_jagged_rows=False, schema_update_options=(), - src_fmt_configs={}, - time_partitioning={}): + src_fmt_configs=None, + time_partitioning=None): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -880,6 +886,10 @@ def run_load(self, # if it's not, we raise a ValueError # Refer to this link for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat +if src_fmt_configs is None: +src_fmt_configs = {} +if time_partitioning is None: +time_partitioning = {} source_format = source_format.upper() allowed_formats = [ "CSV", "NEWLINE_DELIMITED_JSON", "AVRO", "GOOGLE_SHEETS", @@ -1011,12 +1021,12 @@ def run_with_configuration(self, configuration): # Wait for query to finish. keep_polling_job = True -while (keep_polling_job): +while keep_polling_job: try: job = jobs.get( projectId=self.project_id, jobId=self.running_job_id).execute() -if (job['status']['state'] == 'DONE'): +if job['status']['state'] == 'DONE': keep_polling_job = False # Check if job had errors. if 'errorResult' in job['status']: @@ -1045,7 +1055,7 @@ def poll_job_complete(self, job_id): jobs = self.service.jobs() try: job = jobs.get(projectId=self.project_id, jobId=job_id).execute() -if (job['status']['state'] == 'DONE'): +if job['status']['state'] == 'DONE': return True except HttpError as err: if err.resp.status in [500, 503]: @@
[GitHub] feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index aa8fc382a6..2a94580f50 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -206,7 +206,7 @@ def create_empty_table(self, dataset_id, table_id, schema_fields=None, - time_partitioning={}, + time_partitioning=None, labels=None ): """ @@ -238,6 +238,8 @@ def create_empty_table(self, :return: """ +if time_partitioning is None: +time_partitioning = dict() project_id = project_id if project_id is not None else self.project_id table_resource = { @@ -286,7 +288,7 @@ def create_external_table(self, quote_character=None, allow_quoted_newlines=False, allow_jagged_rows=False, - src_fmt_configs={}, + src_fmt_configs=None, labels=None ): """ @@ -352,6 +354,8 @@ def create_external_table(self, :type labels: dict """ +if src_fmt_configs is None: +src_fmt_configs = {} project_id, dataset_id, external_table_id = \ _split_tablename(table_input=external_project_dataset_table, default_project_id=self.project_id, @@ -482,7 +486,7 @@ def run_query(self, labels=None, schema_update_options=(), priority='INTERACTIVE', - time_partitioning={}): + time_partitioning=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -548,6 +552,8 @@ def run_query(self, """ # TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513] +if time_partitioning is None: +time_partitioning = {} sql = bql if sql is None else sql if bql: @@ -808,8 +814,8 @@ def run_load(self, allow_quoted_newlines=False, allow_jagged_rows=False, schema_update_options=(), - src_fmt_configs={}, - time_partitioning={}): + src_fmt_configs=None, + time_partitioning=None): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -880,6 +886,10 @@ def run_load(self, # if it's not, we raise a ValueError # Refer to this link for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat +if src_fmt_configs is None: +src_fmt_configs = {} +if time_partitioning is None: +time_partitioning = {} source_format = source_format.upper() allowed_formats = [ "CSV", "NEWLINE_DELIMITED_JSON", "AVRO", "GOOGLE_SHEETS", @@ -1011,12 +1021,12 @@ def run_with_configuration(self, configuration): # Wait for query to finish. keep_polling_job = True -while (keep_polling_job): +while keep_polling_job: try: job = jobs.get( projectId=self.project_id, jobId=self.running_job_id).execute() -if (job['status']['state'] == 'DONE'): +if job['status']['state'] == 'DONE': keep_polling_job = False # Check if job had errors. if 'errorResult' in job['status']: @@ -1045,7 +1055,7 @@ def poll_job_complete(self, job_id): jobs = self.service.jobs() try: job = jobs.get(projectId=self.project_id, jobId=job_id).execute() -if (job['status']['state'] == 'DONE'): +if job['status']['state'] == 'DONE': return True except HttpError as err: if err.resp.status in [500, 503]: @@ -1079,13 +1089,13 @@ def cancel_query(self): polling_attempts = 0 job_complete = False -while (polling_attempts < max_polling_attempts and not job_complete): +while polling_attempts < max_polling_attempts and
[GitHub] codecov-io commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
codecov-io commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411232814 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=h1) Report > Merging [#3714](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `76.19%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3714/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3714 +/- ## == + Coverage 77.56% 77.56% +<.01% == Files 204 204 Lines 1576815770 +2 == + Hits1223012232 +2 Misses 3538 3538 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/forms.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9mb3Jtcy5weQ==) | `100% <ø> (ø)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.76% <ø> (ø)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `68.88% <0%> (ø)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.85% <0%> (ø)` | :arrow_up: | | [airflow/utils/log/gcs\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZ2NzX3Rhc2tfaGFuZGxlci5weQ==) | `0% <0%> (ø)` | :arrow_up: | | [airflow/operators/hive\_stats\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV9zdGF0c19vcGVyYXRvci5weQ==) | `0% <0%> (ø)` | :arrow_up: | | [airflow/executors/dask\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvZGFza19leGVjdXRvci5weQ==) | `2% <0%> (ø)` | :arrow_up: | | [airflow/operators/check\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvY2hlY2tfb3BlcmF0b3IucHk=) | `58.26% <100%> (ø)` | :arrow_up: | | [airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5) | `95% <100%> (+0.17%)` | :arrow_up: | | [airflow/utils/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9jbGkucHk=) | `100% <100%> (ø)` | :arrow_up: | | ... and [8 more](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=footer). Last update [d9fecba...c3b5ea1](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3716: [AIRFLOW-2869] Remove smart quote from default config
codecov-io commented on issue #3716: [AIRFLOW-2869] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411232064 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=h1) Report > Merging [#3716](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3716/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3716 +/- ## === Coverage 77.56% 77.56% === Files 204 204 Lines 1576815768 === Hits1223012230 Misses 3538 3538 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=footer). Last update [d9fecba...6c1b97c](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Leslie-Waksman updated AIRFLOW-2870: --- Description: Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: {noformat} INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: column task_instance.executor_config does not exist LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... {noformat} The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which has changes to the task_instance table that are not expected by the migration. Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an executor_config column that does not exist as of when cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. It is worth noting that this will not be observed for new installs because the migration branches on table existence/non-existence at a point that will hide the issue from new installs. was: Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: {noformat} INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: column task_instance.executor_config does not exist LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... {noformat} The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which has changes to the task_instance table that are not expected by the migration. Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an executor_config column that does not exist as of when cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. > It is worth noting that this will not be observed for new installs because > the migration branches on table existence/non-existence at a point that will > hide the issue from new installs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Leslie-Waksman updated AIRFLOW-2870: --- Description: Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: {noformat} INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: column task_instance.executor_config does not exist LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... {noformat} The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which has changes to the task_instance table that are not expected by the migration. Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an executor_config column that does not exist as of when cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. was: Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: {noformat} INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: column task_instance.executor_config does not exist LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... {noformat} The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which exists in a post-migration state. > Migrations fail when upgrading from below > cc1e65623dc7_add_max_tries_column_to_task_instance > > > Key: AIRFLOW-2870 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Blocker > > Running migrations from below > cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: > {noformat} > INFO [alembic.runtime.migration] Context impl PostgresqlImpl. > INFO [alembic.runtime.migration] Will assume transactional DDL. > INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> > cc1e65623dc7, add max tries column to task instance > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", > line 1182, in _execute_context > context) > File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", > line 470, in do_execute > cursor.execute(statement, parameters) > psycopg2.ProgrammingError: column task_instance.executor_config does not exist > LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... > {noformat} > The failure is occurring because > cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance > from the current code version, which has changes to the task_instance table > that are not expected by the migration. > Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an > executor_config column that does not exist as of when > cc1e65623dc7_add_max_tries_column_to_task_instance.py is run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
feng-tao commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411230444 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli
codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli URL: https://github.com/apache/incubator-airflow/pull/3586#issuecomment-403631506 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=h1) Report > Merging [#3586](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `50%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3586/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3586 +/- ## == - Coverage 77.56% 77.56% -0.01% == Files 204 204 Lines 1576815767 -1 == - Hits1223012229 -1 Misses 3538 3538 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.43% <50%> (+0.08%)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.54% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=footer). Last update [d9fecba...de54ae0](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli
codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli URL: https://github.com/apache/incubator-airflow/pull/3586#issuecomment-403631506 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=h1) Report > Merging [#3586](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `50%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3586/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) ```diff @@Coverage Diff @@ ## master#3586 +/- ## == - Coverage 77.56% 77.56% -0.01% == Files 204 204 Lines 1576815767 -1 == - Hits1223012229 -1 Misses 3538 3538 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.43% <50%> (+0.08%)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.54% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=footer). Last update [d9fecba...de54ae0](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208411271 ## File path: airflow/contrib/operators/oracle_to_azure_data_lake_transfer.py ## @@ -1,113 +1,115 @@ -# -*- coding: utf-8 -*- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -from airflow.hooks.oracle_hook import OracleHook -from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook -from airflow.models import BaseOperator -from airflow.utils.decorators import apply_defaults -from airflow.utils.file import TemporaryDirectory - -import unicodecsv as csv -import os - - -class OracleToAzureDataLakeTransfer(BaseOperator): -""" -Moves data from Oracle to Azure Data Lake. The operator runs the query against -Oracle and stores the file locally before loading it into Azure Data Lake. - - -:param filename: file name to be used by the csv file. -:type filename: str -:param azure_data_lake_conn_id: destination azure data lake connection. -:type azure_data_lake_conn_id: str -:param azure_data_lake_path: destination path in azure data lake to put the file. -:type azure_data_lake_path: str -:param oracle_conn_id: source Oracle connection. -:type oracle_conn_id: str -:param sql: SQL query to execute against the Oracle database. (templated) -:type sql: str -:param sql_params: Parameters to use in sql query. (templated) -:type sql_params: str -:param delimiter: field delimiter in the file. -:type delimiter: str -:param encoding: enconding type for the file. -:type encoding: str -:param quotechar: Character to use in quoting. -:type quotechar: str -:param quoting: Quoting strategy. See unicodecsv quoting for more information. -:type quoting: str -""" - -template_fields = ('filename', 'sql', 'sql_params') -ui_color = '#e08c8c' - -@apply_defaults -def __init__( -self, -filename, -azure_data_lake_conn_id, -azure_data_lake_path, -oracle_conn_id, -sql, -sql_params={}, -delimiter=",", -encoding="utf-8", -quotechar='"', -quoting=csv.QUOTE_MINIMAL, -*args, **kwargs): -super(OracleToAzureDataLakeTransfer, self).__init__(*args, **kwargs) -self.filename = filename -self.oracle_conn_id = oracle_conn_id -self.sql = sql -self.sql_params = sql_params -self.azure_data_lake_conn_id = azure_data_lake_conn_id -self.azure_data_lake_path = azure_data_lake_path -self.delimiter = delimiter -self.encoding = encoding -self.quotechar = quotechar -self.quoting = quoting - -def _write_temp_file(self, cursor, path_to_save): -with open(path_to_save, 'wb') as csvfile: -csv_writer = csv.writer(csvfile, delimiter=self.delimiter, -encoding=self.encoding, quotechar=self.quotechar, -quoting=self.quoting) -csv_writer.writerow(map(lambda field: field[0], cursor.description)) -csv_writer.writerows(cursor) -csvfile.flush() - -def execute(self, context): -oracle_hook = OracleHook(oracle_conn_id=self.oracle_conn_id) -azure_data_lake_hook = AzureDataLakeHook( -azure_data_lake_conn_id=self.azure_data_lake_conn_id) - -self.log.info("Dumping Oracle query results to local file") -conn = oracle_hook.get_conn() -cursor = conn.cursor() -cursor.execute(self.sql, self.sql_params) - -with TemporaryDirectory(prefix='airflow_oracle_to_azure_op_') as temp: -self._write_temp_file(cursor, os.path.join(temp, self.filename)) -self.log.info("Uploading local file to Azure Data Lake") -azure_data_lake_hook.upload_file(os.path.join(temp, self.filename), - os.path.join(self.azure_data_lake_path, - self.filename)) -
[jira] [Commented] (AIRFLOW-2869) Remove smart quote from default config
[ https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572455#comment-16572455 ] ASF GitHub Bot commented on AIRFLOW-2869: - r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index d4a7242118..4f1f0df383 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -612,7 +612,7 @@ image_pull_secrets = gcp_service_account_keys = # Use the service account kubernetes gives to pods to connect to kubernetes cluster. -# It’s intended for clients that expect to be running inside a pod running on kubernetes. +# It's intended for clients that expect to be running inside a pod running on kubernetes. # It will raise an exception if called from a process not running in a kubernetes environment. in_cluster = True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove smart quote from default config > -- > > Key: AIRFLOW-2869 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Trivial > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2869) Remove smart quote from default config
[ https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand resolved AIRFLOW-2869. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3716 [https://github.com/apache/incubator-airflow/pull/3716] > Remove smart quote from default config > -- > > Key: AIRFLOW-2869 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Trivial > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from default config
r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index d4a7242118..4f1f0df383 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -612,7 +612,7 @@ image_pull_secrets = gcp_service_account_keys = # Use the service account kubernetes gives to pods to connect to kubernetes cluster. -# It’s intended for clients that expect to be running inside a pod running on kubernetes. +# It's intended for clients that expect to be running inside a pod running on kubernetes. # It will raise an exception if called from a process not running in a kubernetes environment. in_cluster = True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2869) Remove smart quote from default config
[ https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572453#comment-16572453 ] ASF subversion and git services commented on AIRFLOW-2869: -- Commit 67e2bb96cdc5ea37226d11332362d3bd3778cea0 in incubator-airflow's branch refs/heads/master from William Horton [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=67e2bb9 ] [AIRFLOW-2869] Remove smart quote from default config Closes #3716 from wdhorton/remove-smart-quote- from-cfg > Remove smart quote from default config > -- > > Key: AIRFLOW-2869 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Trivial > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408841 ## File path: airflow/contrib/operators/oracle_to_azure_data_lake_transfer.py ## @@ -1,113 +1,115 @@ -# -*- coding: utf-8 -*- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -from airflow.hooks.oracle_hook import OracleHook -from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook -from airflow.models import BaseOperator -from airflow.utils.decorators import apply_defaults -from airflow.utils.file import TemporaryDirectory - -import unicodecsv as csv -import os - - -class OracleToAzureDataLakeTransfer(BaseOperator): -""" -Moves data from Oracle to Azure Data Lake. The operator runs the query against -Oracle and stores the file locally before loading it into Azure Data Lake. - - -:param filename: file name to be used by the csv file. -:type filename: str -:param azure_data_lake_conn_id: destination azure data lake connection. -:type azure_data_lake_conn_id: str -:param azure_data_lake_path: destination path in azure data lake to put the file. -:type azure_data_lake_path: str -:param oracle_conn_id: source Oracle connection. -:type oracle_conn_id: str -:param sql: SQL query to execute against the Oracle database. (templated) -:type sql: str -:param sql_params: Parameters to use in sql query. (templated) -:type sql_params: str -:param delimiter: field delimiter in the file. -:type delimiter: str -:param encoding: enconding type for the file. -:type encoding: str -:param quotechar: Character to use in quoting. -:type quotechar: str -:param quoting: Quoting strategy. See unicodecsv quoting for more information. -:type quoting: str -""" - -template_fields = ('filename', 'sql', 'sql_params') -ui_color = '#e08c8c' - -@apply_defaults -def __init__( -self, -filename, -azure_data_lake_conn_id, -azure_data_lake_path, -oracle_conn_id, -sql, -sql_params={}, -delimiter=",", -encoding="utf-8", -quotechar='"', -quoting=csv.QUOTE_MINIMAL, -*args, **kwargs): -super(OracleToAzureDataLakeTransfer, self).__init__(*args, **kwargs) -self.filename = filename -self.oracle_conn_id = oracle_conn_id -self.sql = sql -self.sql_params = sql_params -self.azure_data_lake_conn_id = azure_data_lake_conn_id -self.azure_data_lake_path = azure_data_lake_path -self.delimiter = delimiter -self.encoding = encoding -self.quotechar = quotechar -self.quoting = quoting - -def _write_temp_file(self, cursor, path_to_save): -with open(path_to_save, 'wb') as csvfile: -csv_writer = csv.writer(csvfile, delimiter=self.delimiter, -encoding=self.encoding, quotechar=self.quotechar, -quoting=self.quoting) -csv_writer.writerow(map(lambda field: field[0], cursor.description)) -csv_writer.writerows(cursor) -csvfile.flush() - -def execute(self, context): -oracle_hook = OracleHook(oracle_conn_id=self.oracle_conn_id) -azure_data_lake_hook = AzureDataLakeHook( -azure_data_lake_conn_id=self.azure_data_lake_conn_id) - -self.log.info("Dumping Oracle query results to local file") -conn = oracle_hook.get_conn() -cursor = conn.cursor() -cursor.execute(self.sql, self.sql_params) - -with TemporaryDirectory(prefix='airflow_oracle_to_azure_op_') as temp: -self._write_temp_file(cursor, os.path.join(temp, self.filename)) -self.log.info("Uploading local file to Azure Data Lake") -azure_data_lake_hook.upload_file(os.path.join(temp, self.filename), - os.path.join(self.azure_data_lake_path, - self.filename)) -
[jira] [Updated] (AIRFLOW-2869) Remove smart quote from default config
[ https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand updated AIRFLOW-2869: - External issue URL: https://github.com/apache/incubator-airflow/pull/3716 > Remove smart quote from default config > -- > > Key: AIRFLOW-2869 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408365 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -65,7 +65,8 @@ def __init__( raise ValueError('Retry limit must be greater than equal to 1') self.retry_limit = retry_limit -def _parse_host(self, host): +@staticmethod Review comment: do you change the place where it uses this method? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 edited a comment on issue #3716: [AIRFLOW-2869] Remove smart quote from default config
r39132 edited a comment on issue #3716: [AIRFLOW-2869] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958 +1 -- please follow contributor guidelines in the future. I'll create a tracking JIRA and update the JIRA & commit message for this one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408365 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -65,7 +65,8 @@ def __init__( raise ValueError('Retry limit must be greater than equal to 1') self.retry_limit = retry_limit -def _parse_host(self, host): +@staticmethod Review comment: do you change the place where it uses this method? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
George Leslie-Waksman created AIRFLOW-2870: -- Summary: Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance Key: AIRFLOW-2870 URL: https://issues.apache.org/jira/browse/AIRFLOW-2870 Project: Apache Airflow Issue Type: Bug Reporter: George Leslie-Waksman Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with: {noformat} INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.ProgrammingError: column task_instance.executor_config does not exist LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta... {noformat} The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which exists in a post-migration state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2869) Remove smart quote from default config
Siddharth Anand created AIRFLOW-2869: Summary: Remove smart quote from default config Key: AIRFLOW-2869 URL: https://issues.apache.org/jira/browse/AIRFLOW-2869 Project: Apache Airflow Issue Type: Improvement Reporter: Siddharth Anand Assignee: Siddharth Anand -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 edited a comment on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config
r39132 edited a comment on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958 +1 -- please follow contributor guidelines in the future. I'll create a tracking JIRA and update the commit message for this one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config
r39132 commented on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wdhorton opened a new pull request #3716: [AIRFLOW-XXX] Remove smart quote from default config
wdhorton opened a new pull request #3716: [AIRFLOW-XXX] Remove smart quote from default config URL: https://github.com/apache/incubator-airflow/pull/3716 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: I removed a [smart quotation mark](https://www.fileformat.info/info/unicode/char/2019/index.htm) from the default config, as it could cause problems with tooling because of unicode issues, and there's no real need for it. In my case, I was trying to use Fabric and cuisine to copy a config to a remote machine and it crashed with this error: ``` File "/Users/william.horton/development/urbancompass/build-support/python/venvs/shorts/lib/python2.7/site-packages/cuisine.py", line 545, in file_write sig= hashlib.md5(content).hexdigest() UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 20037: ordinal not in range(128) ``` ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Just changing a quotation mark in a comment in the default config ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
feng-tao commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411222194 flake8 fails. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1460) Tasks get stuck in the "removed" state
[ https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572430#comment-16572430 ] ASF GitHub Bot commented on AIRFLOW-1460: - gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update URL: https://github.com/apache/incubator-airflow/pull/2482 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index d1f8e59fe3..ff059b4fcb 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -4255,6 +4255,7 @@ def update_state(self, session=None): # skip in db? if ti.state == State.REMOVED: tis.remove(ti) +session.delete(ti) else: ti.task = dag.get_task(ti.task_id) diff --git a/tests/models.py b/tests/models.py index cf2734b746..b8157a664b 100644 --- a/tests/models.py +++ b/tests/models.py @@ -487,6 +487,34 @@ def test_dagrun_success_conditions(self): state = dr.update_state() self.assertEqual(State.FAILED, state) +def test_dagrun_clear_removed(self): +session = settings.Session() + +dag = DAG( +'test_dagrun_clear_removed', +start_date=DEFAULT_DATE, +default_args={'owner': 'owner1'}) + +with dag: +op = DummyOperator(task_id='A') + +dag.clear() + +now = datetime.datetime.now() +dr = dag.create_dagrun(run_id='test_dagrun_clear_removed', + state=State.RUNNING, + execution_date=now, + start_date=now) + +ti_op = dr.get_task_instance(task_id=op.task_id) +ti_op.set_state(state=State.REMOVED, session=session) + +self.assertEqual( +dr.get_task_instance(task_id=op.task_id).state, +State.REMOVED) +dr.update_state() +self.assertIsNone(dr.get_task_instance(task_id=op.task_id)) + def test_get_task_instance_on_empty_dagrun(self): """ Make sure that a proper value is returned when a dagrun has no task instances This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Tasks get stuck in the "removed" state > -- > > Key: AIRFLOW-1460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1460 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: Maxime Beauchemin >Priority: Major > Fix For: 1.9.1 > > > The current handling of task instances that get assigned the state "removed" > is that they get ignored. > If the underlying task later gets re-added, the existing task instances will > prevent the task from running in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] gwax commented on issue #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update
gwax commented on issue #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update URL: https://github.com/apache/incubator-airflow/pull/2482#issuecomment-411221356 This Issue has been resolved by https://github.com/apache/incubator-airflow/pull/3137 I will close this PR as a duplicate. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update
gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update URL: https://github.com/apache/incubator-airflow/pull/2482 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index d1f8e59fe3..ff059b4fcb 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -4255,6 +4255,7 @@ def update_state(self, session=None): # skip in db? if ti.state == State.REMOVED: tis.remove(ti) +session.delete(ti) else: ti.task = dag.get_task(ti.task_id) diff --git a/tests/models.py b/tests/models.py index cf2734b746..b8157a664b 100644 --- a/tests/models.py +++ b/tests/models.py @@ -487,6 +487,34 @@ def test_dagrun_success_conditions(self): state = dr.update_state() self.assertEqual(State.FAILED, state) +def test_dagrun_clear_removed(self): +session = settings.Session() + +dag = DAG( +'test_dagrun_clear_removed', +start_date=DEFAULT_DATE, +default_args={'owner': 'owner1'}) + +with dag: +op = DummyOperator(task_id='A') + +dag.clear() + +now = datetime.datetime.now() +dr = dag.create_dagrun(run_id='test_dagrun_clear_removed', + state=State.RUNNING, + execution_date=now, + start_date=now) + +ti_op = dr.get_task_instance(task_id=op.task_id) +ti_op.set_state(state=State.REMOVED, session=session) + +self.assertEqual( +dr.get_task_instance(task_id=op.task_id).state, +State.REMOVED) +dr.update_state() +self.assertIsNone(dr.get_task_instance(task_id=op.task_id)) + def test_get_task_instance_on_empty_dagrun(self): """ Make sure that a proper value is returned when a dagrun has no task instances This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-2787) Airflow scheduler dies on DAGs with NULL DagRun run_id
[ https://issues.apache.org/jira/browse/AIRFLOW-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Leslie-Waksman resolved AIRFLOW-2787. Resolution: Fixed > Airflow scheduler dies on DAGs with NULL DagRun run_id > -- > > Key: AIRFLOW-2787 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2787 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Critical > > When a DagRun is created with NULL run_id, the scheduler subprocess will > crash when checking `is_backfill`: > {noformat} > Got an exception! Propagating... > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 347, in > helper > pickle_dags) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 53, > in wrapper > result = func(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1583, > in process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1175, > in _process_dags > self._process_task_instances(dag, tis_out) > File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 873, in > _process_task_instances > if run.is_backfill: > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 4257, > in is_backfill > if "backfill" in self.run_id: > TypeError: argument of type 'NoneType' is not iterable > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2868) Mesos Executor use executor_config to specify CPU, Memory and Docker image on the task level
Amir Shahatit created AIRFLOW-2868: -- Summary: Mesos Executor use executor_config to specify CPU, Memory and Docker image on the task level Key: AIRFLOW-2868 URL: https://issues.apache.org/jira/browse/AIRFLOW-2868 Project: Apache Airflow Issue Type: Improvement Components: contrib Affects Versions: 1.10, 1.10.1 Reporter: Amir Shahatit Assignee: Amir Shahatit Executor_config was added as a part of [AIRFLOW-1314|https://github.com/apache/incubator-airflow/commit/c0920efc012468681cff3d3c9cfe25c7381dc976]. This task extends the mesosExecutor to make use of specified executor configs to pass on resource requirements (CPU/Memory) as well as docker images on the task level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2868) Mesos Executor should use executor_config to specify CPU, Memory and Docker image on the task level
[ https://issues.apache.org/jira/browse/AIRFLOW-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Shahatit updated AIRFLOW-2868: --- Summary: Mesos Executor should use executor_config to specify CPU, Memory and Docker image on the task level (was: Mesos Executor use executor_config to specify CPU, Memory and Docker image on the task level) > Mesos Executor should use executor_config to specify CPU, Memory and Docker > image on the task level > --- > > Key: AIRFLOW-2868 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2868 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: 1.10, 1.10.1 >Reporter: Amir Shahatit >Assignee: Amir Shahatit >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > Executor_config was added as a part of > [AIRFLOW-1314|https://github.com/apache/incubator-airflow/commit/c0920efc012468681cff3d3c9cfe25c7381dc976]. > This task extends the mesosExecutor to make use of specified executor > configs to pass on resource requirements (CPU/Memory) as well as docker > images on the task level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin edited a comment on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
bolkedebruin edited a comment on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195033 @ashb I think I finally nailed it. The issue was in my understanding how sqlalchemy was dealing with timezone information inside fields for mysql. If you insert in mysql `2018-08-08 20:40:21.443732+02:00` mysql will ignore the `+02:00` _and_ apply the connection's timezone setting (e.g. `set time_zone='+01:00'`). I thought sqlalchemy would handle this somehow (ie bij separating the timezone and doing something like this [1]. So, this creates chaos obviously, although not too much as long as you do not change the connection's timezone. In my tests I was actually doing so hence they didn't pass. Strangely enough if executing them isolated / locally it worked for some reason. probably due to connection re-use. Anyways I borrowed you event watcher and now make sure we always connect in UTC with mysql. For postgres it doesn't matter and we can test it doesn't. [1] https://stackoverflow.com/questions/7651409/mysql-datetime-insert-a-date-with-timezone-offset This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195448 AIP-3: Drop support for Mysql :D This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
bolkedebruin commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195033 @ashb I think I finally nailed it. The issue was in my understanding how sqlalchemy was dealing with timezone information inside fields for mysql. If you insert in mysql `2018-08-08 20:40:21.443732+02:00` mysql will ignore the `+02:00` _and_ apply the connection's timezone setting (e.g. `set time_zone='+01:00'`). This creates chaos obviously, although not too much as long as you do not change the connection's timezone. In my tests I was actually doing so hence they didn't pass. Strangely enough if executing them isolated / locally it worked for some reason. probably due to connection re-use. Anyways I borrowed you event watcher and now make sure we always connect in UTC with mysql. For postgres it doesn't matter and we can test it doesn't. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208377732 ## File path: tests/core.py ## @@ -2423,7 +2423,7 @@ def test_init_proxy_user(self): class HDFSHookTest(unittest.TestCase): def setUp(self): configuration.load_test_config() -os.environ['AIRFLOW_CONN_HDFS_DEFAULT'] = ('hdfs://localhost:8020') +os.environ['AIRFLOW_CONN_HDFS_DEFAULT'] = 'hdfs://localhost:8020' Review comment: (Not related to this Pr, but we have HDFS tests in core.py? o_O) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411194032 @ashb This is now ready for review. :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bolkedebruin closed pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10
bolkedebruin closed pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10 URL: https://github.com/apache/incubator-airflow/pull/3715 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index f829bed3f9..4fda57663f 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -124,6 +124,17 @@ elasticsearch_log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_ elasticsearch_end_of_log_mark = end_of_log ``` +The previous setting of `log_task_reader` is not needed in many cases now when using the default logging config with remote storages. (Previously it needed to be set to `s3.task` or similar. This is not needed with the default config anymore) + + Change of per-task log path + +With the change to Airflow core to be timezone aware the default log path for task instances will now include timezone information. This will by default mean all previous task logs won't be found. You can get the old behaviour back by setting the following config options: + +``` +[core] +log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ execution_date.strftime("%%Y-%%m-%%dT%%H:%%M:%%S") }}/{{ try_number }}.log +``` + ## Airflow 1.9 ### SSH Hook updates, along with new SSH Operator & SFTP Operator This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb opened a new pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10
ashb opened a new pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10 URL: https://github.com/apache/incubator-airflow/pull/3715 We had a few other logging changes that weren't mentioned in here that meant previous logs were not viewable anymore. Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Mention changes about remote logging config in UPDATING.md ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Docs only ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3714: [WIP][AIRFLOW-2867] Refactor code to conform Python standards & guidelines
kaxil commented on issue #3714: [WIP][AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411190182 Hi @ashb, Sorry, I need to sort couple of things here. I have added a WIP flag. Will ping you as soon as it is ready for review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208372051 ## File path: airflow/operators/s3_to_hive_operator.py ## @@ -261,9 +261,9 @@ def _match_headers(self, header_list): else: return True +@staticmethod def _delete_top_row_and_compress( -self, -input_file_name, +input_file_name, Review comment: indent This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371591 ## File path: airflow/contrib/operators/dataflow_operator.py ## @@ -331,7 +331,7 @@ def execute(self, context): self.py_file, self.py_options) -class GoogleCloudBucketHelper(): +class GoogleCloudBucketHelper: Review comment: For py2 compat this should probablt be `class GoogleCloudBucketHelper(object)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime URL: https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411189623 Can I help? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371167 ## File path: airflow/configuration.py ## @@ -186,7 +186,8 @@ def _validate(self): self.is_validated = True -def _get_env_var_option(self, section, key): +@staticmethod +def _get_env_var_option(key): Review comment: What happened to `section`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371759 ## File path: airflow/contrib/operators/oracle_to_oracle_transfer.py ## @@ -52,10 +52,12 @@ def __init__( destination_table, oracle_source_conn_id, source_sql, -source_sql_params={}, +source_sql_params=None, Review comment: indent? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards
[ https://issues.apache.org/jira/browse/AIRFLOW-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572266#comment-16572266 ] ASF GitHub Bot commented on AIRFLOW-2867: - kaxil opened a new pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines URL: https://github.com/apache/incubator-airflow/pull/3714 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2867 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: - Dictionary creation should be written by dictionary literal - Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well. - Functions calling sets which can be replaced by set literal are now replaced by set literal - Replace list literals - Some of the static methods haven't been set static - Remove redundant parentheses ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: N/a, Nothing new added ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Airflow Python Code not compatible to coding guidelines and standards > -- > > Key: AIRFLOW-2867 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2867 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > Some of the Airflow code doesn't conform to python coding guidelines and > standards. > The improvement I have analyzed are below: > - Dictionary creation should be written by dictionary literal > - Mutable default argument. Python’s default arguments are evaluated once > when the function is defined, not each time the function is called (like it > is in say, Ruby). This means that if you use a mutable default argument and > mutate it, you will and have mutated that object for all future calls to the > function as well. > - Functions calling sets can be replaced by set literal > - Replace list literals > - Some of the static methods haven't been set static > - Redundant parentheses -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards
Kaxil Naik created AIRFLOW-2867: --- Summary: Airflow Python Code not compatible to coding guidelines and standards Key: AIRFLOW-2867 URL: https://issues.apache.org/jira/browse/AIRFLOW-2867 Project: Apache Airflow Issue Type: Improvement Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 2.0.0 Some of the Airflow code doesn't conform to python coding guidelines and standards. The improvement I have analyzed are below: - Dictionary creation should be written by dictionary literal - Mutable default argument. Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well. - Functions calling sets can be replaced by set literal - Replace list literals - Some of the static methods haven't been set static - Redundant parentheses -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil edited a comment on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports
kaxil edited a comment on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411169973 @mistercrunch 樂 I might have definitely done it on `gcs_hook`. Sorry about that. Not sure if I ever used `_os`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports
kaxil commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411169973 @mistercrunch 樂 I might have definitely done it on `gcs_hook`. Sorry about that. Not sure if I ever use `_os`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports
[ https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572155#comment-16572155 ] ASF GitHub Bot commented on AIRFLOW-2851: - mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index 2e05fde0cd..ed8943ac77 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -36,7 +36,7 @@ import warnings from backports.configparser import ConfigParser -from zope.deprecation import deprecated as _deprecated +from zope.deprecation import deprecated from airflow.exceptions import AirflowConfigException from airflow.utils.log.logging_mixin import LoggingMixin @@ -534,7 +534,7 @@ def parameterized_config(template): for func in [load_test_config, get, getboolean, getfloat, getint, has_option, remove_option, as_dict, set]: -_deprecated( +deprecated( func, "Accessing configuration method '{f.__name__}' directly from " "the configuration module is deprecated. Please access the " diff --git a/airflow/contrib/auth/backends/kerberos_auth.py b/airflow/contrib/auth/backends/kerberos_auth.py index 08be299a19..fdb6204967 100644 --- a/airflow/contrib/auth/backends/kerberos_auth.py +++ b/airflow/contrib/auth/backends/kerberos_auth.py @@ -28,7 +28,7 @@ # pykerberos should be used as it verifies the KDC, the "kerberos" module does not do so # and make it possible to spoof the KDC import kerberos -import airflow.security.utils as utils +from airflow.security import utils from flask import url_for, redirect diff --git a/airflow/contrib/hooks/__init__.py b/airflow/contrib/hooks/__init__.py index 95f9892d89..6c31842578 100644 --- a/airflow/contrib/hooks/__init__.py +++ b/airflow/contrib/hooks/__init__.py @@ -24,7 +24,7 @@ import sys -import os as _os +import os # # @@ -64,7 +64,7 @@ } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) diff --git a/airflow/contrib/operators/__init__.py b/airflow/contrib/operators/__init__.py index 3485aaa7fd..247ec5941f 100644 --- a/airflow/contrib/operators/__init__.py +++ b/airflow/contrib/operators/__init__.py @@ -24,7 +24,7 @@ import sys -import os as _os +import os # # @@ -48,6 +48,6 @@ 'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator'] } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _operators) diff --git a/airflow/hooks/__init__.py b/airflow/hooks/__init__.py index 1f1c40f16a..38a7dcfebe 100644 --- a/airflow/hooks/__init__.py +++ b/airflow/hooks/__init__.py @@ -18,7 +18,7 @@ # under the License. -import os as _os +import os import sys @@ -67,7 +67,7 @@ } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) @@ -82,12 +82,12 @@ def _integrate_plugins(): ## # TODO FIXME Remove in Airflow 2.0 -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from zope.deprecation import deprecated as _deprecated +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +from zope.deprecation import deprecated for _hook in hooks_module._objects: hook_name = _hook.__name__ globals()[hook_name] = _hook -_deprecated( +deprecated( hook_name, "Importing plugin hook '{i}' directly from " "'airflow.hooks' has been deprecated. Please " diff --git a/airflow/hooks/hive_hooks.py b/airflow/hooks/hive_hooks.py index 5ac99eec6c..8df96c464f 100644 --- a/airflow/hooks/hive_hooks.py +++ b/airflow/hooks/hive_hooks.py @@ -34,10 +34,10 @@ from past.builtins import unicode from six.moves import zip -import airflow.security.utils as utils from airflow import configuration from
[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports
[ https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572156#comment-16572156 ] ASF subversion and git services commented on AIRFLOW-2851: -- Commit 9131d6cc8fce515222da6e4ab9d86cce69f20d1e in incubator-airflow's branch refs/heads/master from Taylor D. Edmiston [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=9131d6c ] [AIRFLOW-2851] Canonicalize "as _..." etc imports (#3696) > Canonicalize "as _..." etc imports > -- > > Key: AIRFLOW-2851 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2851 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > This PR: > 1. Replaces `import foo as _foo` style imports with the more common `import > foo` used everywhere else across the codebase. I dug through history and > couldn't find special reasons to maintain the as style imports here (I think > it's just old code). Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), > there are just a handful of import lines using `as _...` vs thousands not > using it, so the goal here is to improve consistency. > 2. It also simplifies `import foo.bar as bar` style imports to equivalent > `from foo import bar`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports
mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index 2e05fde0cd..ed8943ac77 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -36,7 +36,7 @@ import warnings from backports.configparser import ConfigParser -from zope.deprecation import deprecated as _deprecated +from zope.deprecation import deprecated from airflow.exceptions import AirflowConfigException from airflow.utils.log.logging_mixin import LoggingMixin @@ -534,7 +534,7 @@ def parameterized_config(template): for func in [load_test_config, get, getboolean, getfloat, getint, has_option, remove_option, as_dict, set]: -_deprecated( +deprecated( func, "Accessing configuration method '{f.__name__}' directly from " "the configuration module is deprecated. Please access the " diff --git a/airflow/contrib/auth/backends/kerberos_auth.py b/airflow/contrib/auth/backends/kerberos_auth.py index 08be299a19..fdb6204967 100644 --- a/airflow/contrib/auth/backends/kerberos_auth.py +++ b/airflow/contrib/auth/backends/kerberos_auth.py @@ -28,7 +28,7 @@ # pykerberos should be used as it verifies the KDC, the "kerberos" module does not do so # and make it possible to spoof the KDC import kerberos -import airflow.security.utils as utils +from airflow.security import utils from flask import url_for, redirect diff --git a/airflow/contrib/hooks/__init__.py b/airflow/contrib/hooks/__init__.py index 95f9892d89..6c31842578 100644 --- a/airflow/contrib/hooks/__init__.py +++ b/airflow/contrib/hooks/__init__.py @@ -24,7 +24,7 @@ import sys -import os as _os +import os # # @@ -64,7 +64,7 @@ } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) diff --git a/airflow/contrib/operators/__init__.py b/airflow/contrib/operators/__init__.py index 3485aaa7fd..247ec5941f 100644 --- a/airflow/contrib/operators/__init__.py +++ b/airflow/contrib/operators/__init__.py @@ -24,7 +24,7 @@ import sys -import os as _os +import os # # @@ -48,6 +48,6 @@ 'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator'] } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _operators) diff --git a/airflow/hooks/__init__.py b/airflow/hooks/__init__.py index 1f1c40f16a..38a7dcfebe 100644 --- a/airflow/hooks/__init__.py +++ b/airflow/hooks/__init__.py @@ -18,7 +18,7 @@ # under the License. -import os as _os +import os import sys @@ -67,7 +67,7 @@ } -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): from airflow.utils.helpers import AirflowImporter airflow_importer = AirflowImporter(sys.modules[__name__], _hooks) @@ -82,12 +82,12 @@ def _integrate_plugins(): ## # TODO FIXME Remove in Airflow 2.0 -if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): -from zope.deprecation import deprecated as _deprecated +if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False): +from zope.deprecation import deprecated for _hook in hooks_module._objects: hook_name = _hook.__name__ globals()[hook_name] = _hook -_deprecated( +deprecated( hook_name, "Importing plugin hook '{i}' directly from " "'airflow.hooks' has been deprecated. Please " diff --git a/airflow/hooks/hive_hooks.py b/airflow/hooks/hive_hooks.py index 5ac99eec6c..8df96c464f 100644 --- a/airflow/hooks/hive_hooks.py +++ b/airflow/hooks/hive_hooks.py @@ -34,10 +34,10 @@ from past.builtins import unicode from six.moves import zip -import airflow.security.utils as utils from airflow import configuration from airflow.exceptions import AirflowException from airflow.hooks.base_hook import BaseHook +from airflow.security import utils from airflow.utils.file import TemporaryDirectory from airflow.utils.helpers import as_flattened_list from
[GitHub] mistercrunch commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports
mistercrunch commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411163359 I'm unclear as to why contributors went around their way to alias imports, but it appears to be 100% unnecessary. `git blame` show @kaxil as one of the people doing this on `_os`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2866) Missing CSRF Token Error on Web UI Create/Update Operations
Jasper Kahn created AIRFLOW-2866: Summary: Missing CSRF Token Error on Web UI Create/Update Operations Key: AIRFLOW-2866 URL: https://issues.apache.org/jira/browse/AIRFLOW-2866 Project: Apache Airflow Issue Type: Bug Components: webapp Reporter: Jasper Kahn Attempting to modify or delete many resources (such as Connections or Users) results in a 400 from the webserver: {quote}{{Bad Request}} {{The CSRF session token is missing.}}{quote} Logs report: {quote}{{[2018-08-07 18:45:15,771] \{csrf.py:251} INFO - The CSRF session token is missing.}} {{192.168.9.1 - - [07/Aug/2018:18:45:15 +] "POST /admin/connection/delete/ HTTP/1.1" 400 150 "http://localhost:8081/admin/connection/; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36"}}{quote} Chrome dev tools show the CSRF token is present in the request payload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2865) Race condition between on_success_callback and LocalTaskJob's cleanup
Marcin Mejran created AIRFLOW-2865: -- Summary: Race condition between on_success_callback and LocalTaskJob's cleanup Key: AIRFLOW-2865 URL: https://issues.apache.org/jira/browse/AIRFLOW-2865 Project: Apache Airflow Issue Type: Bug Reporter: Marcin Mejran The TaskInstance's run_raw_task method first records SUCCESS for the task instance and then runs the on_success_callback function. The LocalTaskJob's heartbeat_callback checks for any TI's with a SUCCESS state and terminates their processes. As such it's possible for the TI process to be terminated before the on_success_callback function finishes running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572143#comment-16572143 ] ASF GitHub Bot commented on AIRFLOW-2140: - bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook URL: https://github.com/apache/incubator-airflow/pull/3700 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/spark_submit_hook.py b/airflow/contrib/hooks/spark_submit_hook.py index 0185cab283..65bb6134e6 100644 --- a/airflow/contrib/hooks/spark_submit_hook.py +++ b/airflow/contrib/hooks/spark_submit_hook.py @@ -26,7 +26,6 @@ from airflow.exceptions import AirflowException from airflow.utils.log.logging_mixin import LoggingMixin from airflow.contrib.kubernetes import kube_client -from kubernetes.client.rest import ApiException class SparkSubmitHook(BaseHook, LoggingMixin): @@ -136,6 +135,10 @@ def __init__(self, self._connection = self._resolve_connection() self._is_yarn = 'yarn' in self._connection['master'] self._is_kubernetes = 'k8s' in self._connection['master'] +if self._is_kubernetes and kube_client is None: +raise RuntimeError( +"{master} specified by kubernetes dependencies are not installed!".format( +self._connection['master'])) self._should_track_driver_status = self._resolve_should_track_driver_status() self._driver_id = None @@ -559,6 +562,6 @@ def on_kill(self): self.log.info("Spark on K8s killed with response: %s", api_response) -except ApiException as e: +except kube_client.ApiException as e: self.log.info("Exception when attempting to kill Spark on K8s:") self.log.exception(e) diff --git a/airflow/contrib/kubernetes/kube_client.py b/airflow/contrib/kubernetes/kube_client.py index 8b71f41242..4b8fa17155 100644 --- a/airflow/contrib/kubernetes/kube_client.py +++ b/airflow/contrib/kubernetes/kube_client.py @@ -17,9 +17,21 @@ from airflow.configuration import conf from six import PY2 +try: +from kubernetes import config, client +from kubernetes.client.rest import ApiException +has_kubernetes = True +except ImportError as e: +# We need an exception class to be able to use it in ``except`` elsewhere +# in the code base +ApiException = BaseException +has_kubernetes = False +_import_err = e + def _load_kube_config(in_cluster, cluster_context, config_file): -from kubernetes import config, client +if not has_kubernetes: +raise _import_err if in_cluster: config.load_incluster_config() else: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Kubernetes Scheduler to Spark Submit Operator > - > > Key: AIRFLOW-2140 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2140 > Project: Apache Airflow > Issue Type: New Feature >Affects Versions: 1.9.0 >Reporter: Rob Keevil >Assignee: Rob Keevil >Priority: Major > Fix For: 2.0.0 > > > Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the > existing Standalone, Yarn and Mesos resource managers. > https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md > We should extend the spark submit operator to enable the new K8s spark submit > options, and to be able to monitor Spark jobs running within Kubernetes. > I already have working code for this, I need to test the monitoring/log > parsing code and make sure that Airflow is able to terminate Kubernetes pods > when jobs are cancelled etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572144#comment-16572144 ] ASF subversion and git services commented on AIRFLOW-2140: -- Commit 0be002eebb182b607109a0390d7f6fb8795c668b in incubator-airflow's branch refs/heads/master from [~ashb] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=0be002e ] [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (#3700) This extra dep is a quasi-breaking change when upgrading - previously there were no deps outside of Airflow itself for this hook. Importing the k8s libs breaks installs that aren't also using Kubernetes. This makes the dep optional for anyone who doesn't explicitly use the functionality > Add Kubernetes Scheduler to Spark Submit Operator > - > > Key: AIRFLOW-2140 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2140 > Project: Apache Airflow > Issue Type: New Feature >Affects Versions: 1.9.0 >Reporter: Rob Keevil >Assignee: Rob Keevil >Priority: Major > Fix For: 2.0.0 > > > Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the > existing Standalone, Yarn and Mesos resource managers. > https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md > We should extend the spark submit operator to enable the new K8s spark submit > options, and to be able to monitor Spark jobs running within Kubernetes. > I already have working code for this, I need to test the monitoring/log > parsing code and make sure that Airflow is able to terminate Kubernetes pods > when jobs are cancelled etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook
bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook URL: https://github.com/apache/incubator-airflow/pull/3700 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/spark_submit_hook.py b/airflow/contrib/hooks/spark_submit_hook.py index 0185cab283..65bb6134e6 100644 --- a/airflow/contrib/hooks/spark_submit_hook.py +++ b/airflow/contrib/hooks/spark_submit_hook.py @@ -26,7 +26,6 @@ from airflow.exceptions import AirflowException from airflow.utils.log.logging_mixin import LoggingMixin from airflow.contrib.kubernetes import kube_client -from kubernetes.client.rest import ApiException class SparkSubmitHook(BaseHook, LoggingMixin): @@ -136,6 +135,10 @@ def __init__(self, self._connection = self._resolve_connection() self._is_yarn = 'yarn' in self._connection['master'] self._is_kubernetes = 'k8s' in self._connection['master'] +if self._is_kubernetes and kube_client is None: +raise RuntimeError( +"{master} specified by kubernetes dependencies are not installed!".format( +self._connection['master'])) self._should_track_driver_status = self._resolve_should_track_driver_status() self._driver_id = None @@ -559,6 +562,6 @@ def on_kill(self): self.log.info("Spark on K8s killed with response: %s", api_response) -except ApiException as e: +except kube_client.ApiException as e: self.log.info("Exception when attempting to kill Spark on K8s:") self.log.exception(e) diff --git a/airflow/contrib/kubernetes/kube_client.py b/airflow/contrib/kubernetes/kube_client.py index 8b71f41242..4b8fa17155 100644 --- a/airflow/contrib/kubernetes/kube_client.py +++ b/airflow/contrib/kubernetes/kube_client.py @@ -17,9 +17,21 @@ from airflow.configuration import conf from six import PY2 +try: +from kubernetes import config, client +from kubernetes.client.rest import ApiException +has_kubernetes = True +except ImportError as e: +# We need an exception class to be able to use it in ``except`` elsewhere +# in the code base +ApiException = BaseException +has_kubernetes = False +_import_err = e + def _load_kube_config(in_cluster, cluster_context, config_file): -from kubernetes import config, client +if not has_kubernetes: +raise _import_err if in_cluster: config.load_incluster_config() else: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jakahn commented on issue #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook
jakahn commented on issue #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook URL: https://github.com/apache/incubator-airflow/pull/3677#issuecomment-411156427 @Fokko PTAL when you can This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-852) Docs: Cancel a triggered dag
[ https://issues.apache.org/jira/browse/AIRFLOW-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572113#comment-16572113 ] Quijano Flores commented on AIRFLOW-852: I get an error when I do this [~salad_king] `AttributeError: 'NoneType' object has no attribute 'is_paused'` ``` Traceback (most recent call last): File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1988, in wsgi_app response = self.full_dispatch_request() File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1641, in full_dispatch_request rv = self.handle_user_exception(e) File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1544, in handle_user_exception reraise(exc_type, exc_value, tb) File "/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise raise value File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1625, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/anaconda3/lib/python3.6/site-packages/flask_admin/base.py", line 69, in inner return self._run_view(f, *args, **kwargs) File "/anaconda3/lib/python3.6/site-packages/flask_admin/base.py", line 368, in _run_view return fn(self, *args, **kwargs) File "/anaconda3/lib/python3.6/site-packages/flask_login.py", line 755, in decorated_view return func(*args, **kwargs) File "/anaconda3/lib/python3.6/site-packages/airflow/www/utils.py", line 125, in wrapper return f(*args, **kwargs) File "/anaconda3/lib/python3.6/site-packages/airflow/www/views.py", line 1606, in paused orm_dag.is_paused = True ``` > Docs: Cancel a triggered dag > > > Key: AIRFLOW-852 > URL: https://issues.apache.org/jira/browse/AIRFLOW-852 > Project: Apache Airflow > Issue Type: Improvement > Components: docs >Reporter: Andi Pl >Priority: Major > > Issue: Cancel an externally triggered dag. > The dag is running and contains 200 parallel tasks. Only 3 are run at once, > as the celery executor is limited to 3 in parallel. > 23 tasks are finished till now. > We want to cancel the execution of the missing tasks. But they are not queued > yet. How can we cancel them? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tedmiston commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports
tedmiston commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports URL: https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411139302 Thank you both for the review and initial comments! It sounds like the next step here is probably to get a quick thought from Max when he has a chance. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-411138672 @tedmiston sounds good to me!! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tedmiston commented on issue #3587: [WIP][AIRFLOW-2732] Remove hooks and operators from core
tedmiston commented on issue #3587: [WIP][AIRFLOW-2732] Remove hooks and operators from core URL: https://github.com/apache/incubator-airflow/pull/3587#issuecomment-411135195 Closing this PR for now as the next step for this issue is drafting an AIP. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-411134901 Hey all - thanks for bumping. Sorry for any delay here — my work sprint has been quite busy this past week. This PR is actually very nearly done. I have a local WIP commit to be pushed up with a small bit of cleanup left IIRC. Given that, my preference would be to wrap up and merge this PR since I've already done ~95% of the work, then rebase/update @verdan's extraction PR above (hopefully this doesn't cause much extra work there). This is something I can get done very soon. I'm also happy to help review/test/etc on the extraction PR as well. Does this work for you @verdan ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2863) GKEClusterHook catches wrong exception
[ https://issues.apache.org/jira/browse/AIRFLOW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571966#comment-16571966 ] ASF GitHub Bot commented on AIRFLOW-2863: - feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception URL: https://github.com/apache/incubator-airflow/pull/3711 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/gcp_container_hook.py b/airflow/contrib/hooks/gcp_container_hook.py index d36d796d76..227cd3d124 100644 --- a/airflow/contrib/hooks/gcp_container_hook.py +++ b/airflow/contrib/hooks/gcp_container_hook.py @@ -23,7 +23,7 @@ from airflow import AirflowException, version from airflow.hooks.base_hook import BaseHook -from google.api_core.exceptions import AlreadyExists +from google.api_core.exceptions import AlreadyExists, NotFound from google.api_core.gapic_v1.method import DEFAULT from google.cloud import container_v1, exceptions from google.cloud.container_v1.gapic.enums import Operation @@ -141,7 +141,7 @@ def delete_cluster(self, name, retry=DEFAULT, timeout=DEFAULT): op = self.wait_for_operation(op) # Returns server-defined url for the resource return op.self_link -except exceptions.NotFound as error: +except NotFound as error: self.log.info('Assuming Success: ' + error.message) def create_cluster(self, cluster, retry=DEFAULT, timeout=DEFAULT): diff --git a/tests/contrib/hooks/test_gcp_container_hook.py b/tests/contrib/hooks/test_gcp_container_hook.py index f3705ea4ce..6e13461395 100644 --- a/tests/contrib/hooks/test_gcp_container_hook.py +++ b/tests/contrib/hooks/test_gcp_container_hook.py @@ -61,6 +61,22 @@ def test_delete_cluster(self, wait_mock, convert_mock): wait_mock.assert_called_with(client_delete.return_value) convert_mock.assert_not_called() +@mock.patch( +"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log") + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") +@mock.patch( + "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") +def test_delete_cluster_not_found(self, wait_mock, convert_mock, log_mock): +from google.api_core.exceptions import NotFound +# To force an error +message = 'Not Found' +self.gke_hook.client.delete_cluster.side_effect = NotFound(message=message) + +self.gke_hook.delete_cluster(None) +wait_mock.assert_not_called() +convert_mock.assert_not_called() +log_mock.info.assert_any_call("Assuming Success: " + message) + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") @mock.patch( "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") @@ -107,7 +123,7 @@ def test_create_cluster_proto(self, wait_mock, convert_mock): @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") @mock.patch( "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") -def test_delete_cluster_dict(self, wait_mock, convert_mock): +def test_create_cluster_dict(self, wait_mock, convert_mock): mock_cluster_dict = {'name': CLUSTER_NAME} retry_mock, timeout_mock = mock.Mock(), mock.Mock() @@ -135,6 +151,22 @@ def test_create_cluster_error(self, wait_mock, convert_mock): wait_mock.assert_not_called() convert_mock.assert_not_called() +@mock.patch( +"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log") + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") +@mock.patch( + "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") +def test_create_cluster_already_exists(self, wait_mock, convert_mock, log_mock): +from google.api_core.exceptions import AlreadyExists +# To force an error +message = 'Already Exists' +self.gke_hook.client.create_cluster.side_effect = AlreadyExists(message=message) + +self.gke_hook.create_cluster({}) +wait_mock.assert_not_called() +self.assertEquals(convert_mock.call_count, 1) +log_mock.info.assert_any_call("Assuming Success: " + message) + class GKEClusterHookGetTest(unittest.TestCase): def setUp(self): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this
[jira] [Commented] (AIRFLOW-2863) GKEClusterHook catches wrong exception
[ https://issues.apache.org/jira/browse/AIRFLOW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571967#comment-16571967 ] ASF subversion and git services commented on AIRFLOW-2863: -- Commit 142a9425db3675fc433b0dbbb637f53638e70a8a in incubator-airflow's branch refs/heads/master from [~noremac201] [ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=142a942 ] [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception (#3711) > GKEClusterHook catches wrong exception > -- > > Key: AIRFLOW-2863 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2863 > Project: Apache Airflow > Issue Type: Bug >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > > Instead of successfully catching the error and reporting success, it reports > a failure, since it catches the wrong error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception
feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception URL: https://github.com/apache/incubator-airflow/pull/3711 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/gcp_container_hook.py b/airflow/contrib/hooks/gcp_container_hook.py index d36d796d76..227cd3d124 100644 --- a/airflow/contrib/hooks/gcp_container_hook.py +++ b/airflow/contrib/hooks/gcp_container_hook.py @@ -23,7 +23,7 @@ from airflow import AirflowException, version from airflow.hooks.base_hook import BaseHook -from google.api_core.exceptions import AlreadyExists +from google.api_core.exceptions import AlreadyExists, NotFound from google.api_core.gapic_v1.method import DEFAULT from google.cloud import container_v1, exceptions from google.cloud.container_v1.gapic.enums import Operation @@ -141,7 +141,7 @@ def delete_cluster(self, name, retry=DEFAULT, timeout=DEFAULT): op = self.wait_for_operation(op) # Returns server-defined url for the resource return op.self_link -except exceptions.NotFound as error: +except NotFound as error: self.log.info('Assuming Success: ' + error.message) def create_cluster(self, cluster, retry=DEFAULT, timeout=DEFAULT): diff --git a/tests/contrib/hooks/test_gcp_container_hook.py b/tests/contrib/hooks/test_gcp_container_hook.py index f3705ea4ce..6e13461395 100644 --- a/tests/contrib/hooks/test_gcp_container_hook.py +++ b/tests/contrib/hooks/test_gcp_container_hook.py @@ -61,6 +61,22 @@ def test_delete_cluster(self, wait_mock, convert_mock): wait_mock.assert_called_with(client_delete.return_value) convert_mock.assert_not_called() +@mock.patch( +"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log") + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") +@mock.patch( + "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") +def test_delete_cluster_not_found(self, wait_mock, convert_mock, log_mock): +from google.api_core.exceptions import NotFound +# To force an error +message = 'Not Found' +self.gke_hook.client.delete_cluster.side_effect = NotFound(message=message) + +self.gke_hook.delete_cluster(None) +wait_mock.assert_not_called() +convert_mock.assert_not_called() +log_mock.info.assert_any_call("Assuming Success: " + message) + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") @mock.patch( "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") @@ -107,7 +123,7 @@ def test_create_cluster_proto(self, wait_mock, convert_mock): @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") @mock.patch( "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") -def test_delete_cluster_dict(self, wait_mock, convert_mock): +def test_create_cluster_dict(self, wait_mock, convert_mock): mock_cluster_dict = {'name': CLUSTER_NAME} retry_mock, timeout_mock = mock.Mock(), mock.Mock() @@ -135,6 +151,22 @@ def test_create_cluster_error(self, wait_mock, convert_mock): wait_mock.assert_not_called() convert_mock.assert_not_called() +@mock.patch( +"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log") + @mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto") +@mock.patch( + "airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation") +def test_create_cluster_already_exists(self, wait_mock, convert_mock, log_mock): +from google.api_core.exceptions import AlreadyExists +# To force an error +message = 'Already Exists' +self.gke_hook.client.create_cluster.side_effect = AlreadyExists(message=message) + +self.gke_hook.create_cluster({}) +wait_mock.assert_not_called() +self.assertEquals(convert_mock.call_count, 1) +log_mock.info.assert_any_call("Assuming Success: " + message) + class GKEClusterHookGetTest(unittest.TestCase): def setUp(self): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception
feng-tao commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception URL: https://github.com/apache/incubator-airflow/pull/3711#issuecomment-411127557 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Noremac201 commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception
Noremac201 commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception URL: https://github.com/apache/incubator-airflow/pull/3711#issuecomment-49591 Done! Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] awelsh93 commented on issue #3710: [AIRFLOW-2860] Update tests for druid hook
awelsh93 commented on issue #3710: [AIRFLOW-2860] Update tests for druid hook URL: https://github.com/apache/incubator-airflow/pull/3710#issuecomment-47114 Thanks for merging this - can the change in #3707 get another review please? It was merged but then reverted due to tests failing which have now been fixed in this pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
feng-tao commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208296250 ## File path: tests/models.py ## @@ -56,7 +56,7 @@ from airflow.utils.trigger_rule import TriggerRule from mock import patch, ANY from parameterized import parameterized -from tempfile import NamedTemporaryFile +from tempfile import NamedTemporaryFile, mkdtemp Review comment: small nit: could you move mkdtemp before NamedTemporaryFile given we put lower case import before upper case(L57)? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook
ashb commented on a change in pull request #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook URL: https://github.com/apache/incubator-airflow/pull/3713#discussion_r208267811 ## File path: airflow/contrib/hooks/emr_hook.py ## @@ -63,6 +63,8 @@ def create_job_flow(self, job_flow_overrides): VisibleToAllUsers=config.get('VisibleToAllUsers'), JobFlowRole=config.get('JobFlowRole'), ServiceRole=config.get('ServiceRole'), +AutoScalingRole=config.get('AutoScalingRole'), +ScaleDownBehavior=config.get('ScaleDownBehavior'), Tags=config.get('Tags') Review comment: Hmmm, I wonder if it might be an idea to do `self.get_conn().run_job_flow(**config)` - that we don't have to stay up-to-date with the AWS API. For that to work we'd need three `config.setdefault()` calls for the three list-types. I also don't know if all the kwargs are optional or not (but I suspect they are). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571799#comment-16571799 ] ASF GitHub Bot commented on AIRFLOW-2428: - dmnpignaud opened a new pull request #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook URL: https://github.com/apache/incubator-airflow/pull/3713 ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2428) ### Description - [x] Enable the use of the autoscaling functionality when creating an EMR cluster ### Tests - [ ] None ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add AutoScalingRole key to emr_hook > --- > > Key: AIRFLOW-2428 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2428 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kyle Hamlin >Priority: Minor > Fix For: 1.10.0 > > > Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` > method for EMR autoscaling to work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)