[GitHub] tedmiston commented on issue #3703: [AIRFLOW-2857] Fix broken RTD env

2018-08-07 Thread GitBox
tedmiston commented on issue #3703: [AIRFLOW-2857] Fix broken RTD env 
URL: 
https://github.com/apache/incubator-airflow/pull/3703#issuecomment-411272356
 
 
   Hey all - It looks like our RTD build post-merge was successful. 
   
   I dug into the mock dependency etc in the RTD build environment logs which 
led to creating a follow up Jira issue.  If you have time to take a look and 
provide any comments there, it would be appreciated.
   
   https://issues.apache.org/jira/browse/AIRFLOW-2871
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2871) Harden and improve Read the Docs build environment

2018-08-07 Thread Taylor Edmiston (JIRA)
Taylor Edmiston created AIRFLOW-2871:


 Summary: Harden and improve Read the Docs build environment
 Key: AIRFLOW-2871
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2871
 Project: Apache Airflow
  Issue Type: Bug
  Components: docs, Documentation
Reporter: Taylor Edmiston
Assignee: Taylor Edmiston


h2. Context

In the process of resolving AIRFLOW-2857 (via [PR 
3703|https://github.com/apache/incubator-airflow/pull/3703]), I noticed some 
oddities in our Read the Docs (RTD) build environment especially around cached 
dependencies.  This motivates hardening and showing some love to our RTD setup.
h2. Problem

I dug into the RTD build logs for a moment to find some closure on the mock 
dependency discussed in PR #3703 above. I think that our RTD environment 
possibly has been working by coincidence off of cached dependencies.
{code:java}
python 
/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip 
install --ignore-installed --cache-dir 
/home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip 
.[doc,docker,gcp_api,emr]{code}
The directory referenced by that --cache-dir arg earlier in the log happens to 
have mock installed already.
{code:java}
python 
/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/bin/pip 
install --upgrade --cache-dir 
/home/docs/checkouts/readthedocs.org/user_builds/airflow/.cache/pip 
Pygments==2.2.0 setuptools<40 docutils==0.13.1 mock==1.0.1 pillow==2.6.1 
alabaster>=0.7,<0.8,!=0.7.5 commonmark==0.5.4 recommonmark==0.4.0 sphinx<1.8 
sphinx-rtd-theme<0.5 readthedocs-sphinx-ext<0.6{code}
 Here are some logs where you can see that (view raw):
 # Latest successful (Aug. 7, 2018. 9:21 a.m.) - 
[7602630|https://readthedocs.org/projects/airflow/builds/7602630/]
 # Last unsuccessful before (1) (Aug. 5, 2018. 1:24 p.m.) - 
[7593052|https://readthedocs.org/projects/airflow/builds/7593052/]
 # Last successful before (2) (July 18, 2018. 3:23 a.m.) - 
[7503718|https://readthedocs.org/projects/airflow/builds/7503718/]
 # First build (2016) - 
[4150778|https://readthedocs.org/projects/airflow/builds/4150778/]

It appears that mock and others have potentially been cached since the first 
RTD build in 2016 (4).

These versions like mock==1.0.1 do not appear to be coming from anywhere in our 
current config in incubator-airflow; I believe they are installed as [core 
dependencies of RTD 
itself|https://github.com/rtfd/readthedocs.org/blob/ca7afe6577672e129ccfe63abe33561dc32a6651/readthedocs/doc_builder/python_environments.py#L220-L235].

Some but not all of these dependencies get upgraded to newer versions further 
down in the build.  In the case of mock, we were getting lucky that mock==1.0.1 
was a dependency of RTD and our setup inherited that old version which allowed 
the docs build to succeed.  (There might be other cases of dependencies like 
this too.)
h2. Solution

My proposed enhancements to harden and improve our RTD setup are:
 * Hardening
 ** Set our RTD build to use a virtual environment if it's not already
 ** Set our RTD build to ignore packages outside of its virtualenv like 
dependencies of RTD itself
 ** Specify any dependencies broken by ^
 ** Test wiping a version in the build environment (not sure if this clears 
cache dir)
 *** 
[https://docs.readthedocs.io/en/latest/guides/wipe-environment.html#wiping-a-build-environment]
 *** 
[https://docs.readthedocs.io/en/latest/builds.html#deleting-a-stale-or-broken-build-environment]
 ** Make build.image, python.version, etc explicit in yaml config
 *** [https://docs.readthedocs.io/en/latest/yaml-config.html]
 ** Test upgrading our RTD environment from CPython 2.x to using CPython 3.x
 * Misc
 ** Improve RTD project page to have tags and description
 ** Lint YAML file

Note: I don't yet have maintainer access for airflow on RTD which I believe 
this would require.  I am happy to take this issue if I can get that.  I have 
experience as an admin of another project on RTD (simple-salesforce).

/cc Everyone who commented in PR #3703 - [~kaxilnaik], [~ashb], [~TaoFeng] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1410) BigQuery Operator flattens the repeated column and it's not configurable

2018-08-07 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572579#comment-16572579
 ] 

Iuliia Volkova commented on AIRFLOW-1410:
-

'[AIRFLOW-1404] Add 'flatten_results' & 'maximum_bytes_billed' to BQ Operator' 
those task seems already done

> BigQuery Operator flattens the repeated column and it's not configurable
> 
>
> Key: AIRFLOW-1410
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1410
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks
>Reporter: Pratap Koritala
>Priority: Minor
>
> For Legacy SQL, BigQuery Operator/Hook flattens the repeated column. The 
> underlying Google Query JSON API's parameter "flattenResult" has default 
> value of false, by not setting it every Legacy SQL result is flattened. 
> Without this config option, Legacy SQL can't be used for queries with 
> repeated columns.  Even though it looks like feature request, I consider this 
> as a bug as this config option should have been in the first place.
> I am working on submitting PR to github project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators

2018-08-07 Thread GitBox
xnuinside edited a comment on issue #3717: [AIRFLOW-1874] use_legacy_sql added 
to BigQueryCheck operators
URL: 
https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411263370
 
 
   apt-get install failed
   
   .net/chris-lea/redis-server/ubuntu/dists/trusty/main/binary-amd64/Packages  
Unable to connect to ppa.launchpad.net:http:
   W: Failed to fetch 
http://ppa.launchpad.net/chris-lea/redis-server/ubuntu/dists/trusty/main/binary-i386/Packages
  Unable to connect to ppa.launchpad.net:http:
   W: Some index files failed to download. They have been ignored, or old ones 
used instead.
   The command "sudo -E apt-get -yq --no-install-suggests 
--no-install-recommends $TRAVIS_APT_OPTS install slapd ldap-utils 
openssh-server mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 
krb5-user krb5-kdc krb5-admin-server oracle-java8-installer" failed and exited 
with 100 during .
   
   travis died this such error, what to do? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators

2018-08-07 Thread GitBox
xnuinside commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to 
BigQueryCheck operators
URL: 
https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411263370
 
 
   apt-get install failed
   $ cat ~/apt-get-update.log
   Ign:1 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
   Get:2 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease [61.4 kB]
   Get:3 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release 
[2,495 B]
   Get:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release.gpg 
[801 B]
   Get:4 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
   Get:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4/multiverse 
amd64 Packages [11.8 kB]
   Ign:6 http://toolbelt.heroku.com/ubuntu ./ InRelease
   Get:8 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty 
InRelease [15.4 kB]
   Get:10 http://apt.postgresql.org/pub/repos/apt trusty-pgdg/main amd64 
Packages [190 kB]
   Hit:9 http://toolbelt.heroku.com/ubuntu ./ Release
   Get:11 http://apt.postgresql.org/pub/repos/apt trusty-pgdg/main i386 
Packages [189 kB]
   Hit:13 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
InRelease
   Get:14 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease [23.2 
kB]
   Get:15 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
InRelease [23.7 kB]
   Get:16 https://packagecloud.io/github/git-lfs/ubuntu trusty/main amd64 
Packages [6,863 B]
   Get:17 https://packagecloud.io/github/git-lfs/ubuntu trusty/main i386 
Packages [6,629 B]
   Get:18 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty/main 
amd64 Packages [5,515 B]
   Get:19 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty/main 
i386 Packages [5,515 B]
   Get:20 https://download.docker.com/linux/ubuntu trusty InRelease [26.5 kB]
   Get:21 https://download.docker.com/linux/ubuntu trusty/stable amd64 Packages 
[3,545 B]
   Get:22 https://download.docker.com/linux/ubuntu trusty/edge amd64 Packages 
[4,756 B]
   Get:23 https://dl.hhvm.com/ubuntu trusty InRelease [2,411 B]
   Get:24 https://dl.hhvm.com/ubuntu trusty/main amd64 Packages [1,816 B]
   Err:25 http://mirror.jmu.edu/pub/ubuntu trusty InRelease
 Could not connect to mirror.jmu.edu:80 (134.126.36.183), connection timed 
out
   Err:26 http://mirror.jmu.edu/pub/ubuntu trusty-updates InRelease
 Unable to connect to mirror.jmu.edu:http:
   Err:27 http://mirror.jmu.edu/pub/ubuntu trusty-backports InRelease
 Unable to connect to mirror.jmu.edu:http:
   Ign:28 http://dl.google.com/linux/chrome/deb stable InRelease
   Get:29 http://dl.google.com/linux/chrome/deb stable Release [1,189 B]
   Get:30 http://dl.google.com/linux/chrome/deb stable Release.gpg [819 B]
   Get:31 http://dl.google.com/linux/chrome/deb stable/main amd64 Packages 
[1,111 B]
   Get:32 http://security.ubuntu.com/ubuntu trusty-security InRelease [65.9 kB]
   Get:33 http://security.ubuntu.com/ubuntu trusty-security/main Sources [205 
kB]
   Get:34 http://security.ubuntu.com/ubuntu trusty-security/restricted Sources 
[5,050 B]
   Get:35 http://security.ubuntu.com/ubuntu trusty-security/universe Sources 
[93.0 kB]
   Get:36 http://security.ubuntu.com/ubuntu trusty-security/multiverse Sources 
[3,072 B]
   Get:37 http://security.ubuntu.com/ubuntu trusty-security/main amd64 Packages 
[941 kB]
   Get:38 http://security.ubuntu.com/ubuntu trusty-security/main i386 Packages 
[869 kB]
   Err:39 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
 Could not connect to ppa.launchpad.net:80 (91.189.95.83), connection timed 
out
   Err:40 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
 Unable to connect to ppa.launchpad.net:http:
   Err:41 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
 Unable to connect to ppa.launchpad.net:http:
   Err:42 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
 Unable to connect to ppa.launchpad.net:http:
   Err:43 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
 Unable to connect to ppa.launchpad.net:http:
   Ign:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
amd64 Packages
   Ign:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
i386 Packages
   Ign:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
amd64 Packages
   Ign:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
i386 Packages
   Err:44 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
amd64 Packages
 Unable to connect to ppa.launchpad.net:http:
   Err:45 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty/main 
i386 Packages
 Unable to connect to ppa.launchpad.net:http:
   Get:46 http://security.ubuntu.com/ubuntu trusty-security/main Translation-en 
[496 kB]
   Get:47 http://security.ubuntu.com/ubuntu trusty-security/restricted amd64 
Packages [18.1 kB]
   Get:48 

[jira] [Commented] (AIRFLOW-1503) AssertionError: INTERNAL: No default project is specified

2018-08-07 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572578#comment-16572578
 ] 

Iuliia Volkova commented on AIRFLOW-1503:
-

@

:param destination_dataset_table: A dotted
(.|:). that, if set, will store the results

'temp.percentage' it could not be, it must have a project at the start, at this 
moment

 

is this actual issue for anybody? 

 

> AssertionError: INTERNAL: No default project is specified
> -
>
> Key: AIRFLOW-1503
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1503
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: Airflow 1.8
> Environment: Unix platform
>Reporter: chaitanya
>Priority: Minor
>  Labels: beginner
>
> Hi ,
> New to airflow. Tried to run BigQuery query and store the result in another 
> table. Getting the following error. 
> Please let me know where to default project. 
> Code: 
> sql_bigquery = BigQueryOperator(
> task_id='sql_bigquery',
> use_legacy_sql=False,
> write_disposition='WRITE_TRUNCATE',
> allow_large_results=True,
> bql='''
> #standardSQL
> SELECT ID, Name, Group, Mark, RATIO_TO_REPORT(Mark) 
> OVER(PARTITION BY Group) AS percent FROM `tensile-site-168620.temp.marks`
> ''',
> destination_dataset_table='temp.percentage',
> dag=dag
> )
> Error Message: 
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 28, in 
> args.func(args)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 585, 
> in test
> ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
> in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1374, 
> in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/operators/bigquery_operator.py",
>  line 82, in execute
> self.allow_large_results, self.udf_config, self.use_legacy_sql)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/bigquery_hook.py",
>  line 228, in run_query
> default_project_id=self.project_id)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/hooks/bigquery_hook.py",
>  line 917, in _split_tablename
> assert default_project_id is not None, "INTERNAL: No default project is 
> specified"
> AssertionError: INTERNAL: No default project is specified



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
xnuinside commented on a change in pull request #3714: [AIRFLOW-2867] Refactor 
code to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208438535
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -238,6 +238,8 @@ def create_empty_table(self,
 
 :return:
 """
+if time_partitioning is None:
+time_partitioning = dict()
 
 Review comment:
   @kaxil  seems it is missed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2836) Minor improvement of contrib.sensors.FileSensor

2018-08-07 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG resolved AIRFLOW-2836.

Resolution: Fixed

> Minor improvement of contrib.sensors.FileSensor
> ---
>
> Key: AIRFLOW-2836
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2836
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> h4. *Background*
> The default *fs_conn_id* in contrib.sensors.FileSensor is '_*fs_default2*_'. 
> However, when we initiate the database 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/utils/db.py#L88),
>  there isn't such an entry. It doesn't exist anywhere else.
> h4. *Issue*
> The purpose of _contrib.sensors.FileSensor_ is mainly for checking local file 
> system (of course can also be used for NAS). Then the path ("/") from default 
> connection 'fs_default' would suffice.
> However, given the default value for *fs_conn_id* in 
> contrib.sensors.FileSensor is "fs_default2" (a value doesn't exist), it will 
> make the situation much more complex. 
> When users intend to check local file system only, they should be able to 
> leave *fs_conn_id* default directly, instead of going setting up another 
> connection separately.
> h4. Proposal
> Change default value for *fs_conn_id* in contrib.sensors.FileSensor from 
> "fs_default2" to "fs_default" (actually in the related test, the *fs_conn_id* 
> are all specified to be "fs_default").



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File

2018-08-07 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG resolved AIRFLOW-2855.

Resolution: Fixed

> Need to Check Validity of Cron Expression When Process DAG File/Zip File
> 
>
> Key: AIRFLOW-2855
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2855
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> *schedule_interval* of DAGs can either be *timedelta* or a *Cron expression*.
> When it's a Cron expression, there is no mechanism to check its validity at 
> this moment. If there is anything wrong with the Cron expression itself, it 
> will cause issues when methods _*following_schedule(**)*_  and 
> _*previous_schedule()*_ are invoked (will affect scheduling). However, 
> exceptions will only be written into logs. From Web UI, it’s hard for users 
> to identify this issue & the source while no new task can be initiated 
> (especially for users who’re not very familiar with Cron).
>  It may be good to show error messages in web UI when a DAG's Cron expression 
> (as schedule_interval) can not be parsed by *croniter* properly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob

2018-08-07 Thread Xiaodong DENG (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572561#comment-16572561
 ] 

Xiaodong DENG commented on AIRFLOW-2848:


Thanks [~xnuinside] for reminding. Have closed this ticket.

> dag_id is missing in metadata table "job" for LocalTaskJob
> --
>
> Key: AIRFLOW-2848
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2848
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Attachments: After this fix.png, Before this fix.png
>
>
> dag_id is missing for all entries in metadata table "job" with job_type 
> "LocalTaskJob".
> This is due to that dag_id was not specified within class LocalTaskJob.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob

2018-08-07 Thread Xiaodong DENG (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaodong DENG closed AIRFLOW-2848.
--
Resolution: Fixed

> dag_id is missing in metadata table "job" for LocalTaskJob
> --
>
> Key: AIRFLOW-2848
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2848
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Attachments: After this fix.png, Before this fix.png
>
>
> dag_id is missing for all entries in metadata table "job" with job_type 
> "LocalTaskJob".
> This is due to that dag_id was not specified within class LocalTaskJob.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-491) Add cache parameter in BigQuery query method

2018-08-07 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-491:
--

Assignee: Iuliia Volkova

> Add cache parameter in BigQuery query method
> 
>
> Key: AIRFLOW-491
> URL: https://issues.apache.org/jira/browse/AIRFLOW-491
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp
>Affects Versions: Airflow 1.7.1
>Reporter: Chris Riccomini
>Assignee: Iuliia Volkova
>Priority: Major
> Fix For: Airflow 1.8
>
>
> The current BigQuery query() method does not have a user_query_cache 
> parameter. This param always defaults to true (see 
> [here|https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query]).
>  I'd like to disable query caching for some data consistency checks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators

2018-08-07 Thread GitBox
codecov-io commented on issue #3717: [AIRFLOW-1874] use_legacy_sql added to 
BigQueryCheck operators
URL: 
https://github.com/apache/incubator-airflow/pull/3717#issuecomment-411257030
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=h1)
 Report
   > Merging 
[#3717](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/d47580feaf80eeebb416d0179dfa8db3f4e1d2c9?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3717/graphs/tree.svg?src=pr=650=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3717   +/-   ##
   ===
 Coverage   77.57%   77.57%   
   ===
 Files 204  204   
 Lines   1577615776   
   ===
 Hits1223912239   
 Misses   3537 3537
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=footer).
 Last update 
[d47580f...ab05faf](https://codecov.io/gh/apache/incubator-airflow/pull/3717?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
yrqls21 commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity 
in DagBag.process_file()
URL: 
https://github.com/apache/incubator-airflow/pull/3698#issuecomment-411255565
 
 
   TY @feng-tao!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators

2018-08-07 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572530#comment-16572530
 ] 

Iuliia Volkova edited comment on AIRFLOW-1874 at 8/8/18 1:13 AM:
-

[~kaxilnaik], I opened PR ^^


was (Author: xnuinside):
[~kaxilnaik], [https://github.com/apache/incubator-airflow/pull/3717] I opened 
PR

> Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
> --
>
> Key: AIRFLOW-1874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1874
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp, operators
>Reporter: Guillermo Rodríguez Cano
>Assignee: Iuliia Volkova
>Priority: Major
> Fix For: 2.0.0
>
>
> BigQueryCheckOperator, BigQueryValueCheckOperator and 
> BigQueryIntervalCheckOperator do not support disabling use of default legacy 
> SQL in BigQuery.
> This is a major blocker to support correct migration to standard SQL when 
> queries are complicated. For example, a query that can be queried in legacy 
> SQL may be blocked from any subsequent view done in standard SQL that this 
> view uses as the queries are bound to either standard or legacy SQL but not a 
> mix.
> These operators inherit from base ones of the same name (without the BigQuery 
> prefix) from Airflow which may make the process more complicated as the flag 
> to use standard SQL should be enabled because the underlying BigQueryHook has 
> the corresponding parameter, use_legacy_sql, set to True, when running a 
> query. But it is not possible to pass parameters all the way to it via the 
> aforementioned operators.
> The workaround of including #standardSQL and a new line before the query 
> doesn't work either as there is mismatch. BigQuery reports the following in 
> fact: "Query text specifies use_legacy_sql:false, while API options 
> specify:true"
> A workaround for queries on views using standard SQL is to persist the result 
> of the query in a temporary table, then run the check operation and 
> thereafter delete the temporary table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators

2018-08-07 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572530#comment-16572530
 ] 

Iuliia Volkova commented on AIRFLOW-1874:
-

[~kaxilnaik], [https://github.com/apache/incubator-airflow/pull/3717] I opened 
PR

> Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
> --
>
> Key: AIRFLOW-1874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1874
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp, operators
>Reporter: Guillermo Rodríguez Cano
>Assignee: Iuliia Volkova
>Priority: Major
> Fix For: 2.0.0
>
>
> BigQueryCheckOperator, BigQueryValueCheckOperator and 
> BigQueryIntervalCheckOperator do not support disabling use of default legacy 
> SQL in BigQuery.
> This is a major blocker to support correct migration to standard SQL when 
> queries are complicated. For example, a query that can be queried in legacy 
> SQL may be blocked from any subsequent view done in standard SQL that this 
> view uses as the queries are bound to either standard or legacy SQL but not a 
> mix.
> These operators inherit from base ones of the same name (without the BigQuery 
> prefix) from Airflow which may make the process more complicated as the flag 
> to use standard SQL should be enabled because the underlying BigQueryHook has 
> the corresponding parameter, use_legacy_sql, set to True, when running a 
> query. But it is not possible to pass parameters all the way to it via the 
> aforementioned operators.
> The workaround of including #standardSQL and a new line before the query 
> doesn't work either as there is mismatch. BigQuery reports the following in 
> fact: "Query text specifies use_legacy_sql:false, while API options 
> specify:true"
> A workaround for queries on views using standard SQL is to persist the result 
> of the query in a temporary table, then run the check operation and 
> thereafter delete the temporary table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572529#comment-16572529
 ] 

ASF GitHub Bot commented on AIRFLOW-1874:
-

xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added 
to BigQueryCheck operators
URL: https://github.com/apache/incubator-airflow/pull/3717
 
 
   Hi everyone! I saw the old task relative to my current work scope and decide 
to do this little PR ) 
   Just adding use_legacy_sql to BigQueryCheck operator.
   
   [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
> --
>
> Key: AIRFLOW-1874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1874
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp, operators
>Reporter: Guillermo Rodríguez Cano
>Assignee: Iuliia Volkova
>Priority: Major
> Fix For: 2.0.0
>
>
> BigQueryCheckOperator, BigQueryValueCheckOperator and 
> BigQueryIntervalCheckOperator do not support disabling use of default legacy 
> SQL in BigQuery.
> This is a major blocker to support correct migration to standard SQL when 
> queries are complicated. For example, a query that can be queried in legacy 
> SQL may be blocked from any subsequent view done in standard SQL that this 
> view uses as the queries are bound to either standard or legacy SQL but not a 
> mix.
> These operators inherit from base ones of the same name (without the BigQuery 
> prefix) from Airflow which may make the process more complicated as the flag 
> to use standard SQL should be enabled because the underlying BigQueryHook has 
> the corresponding parameter, use_legacy_sql, set to True, when running a 
> query. But it is not possible to pass parameters all the way to it via the 
> aforementioned operators.
> The workaround of including #standardSQL and a new line before the query 
> doesn't work either as there is mismatch. BigQuery reports the following in 
> fact: "Query text specifies use_legacy_sql:false, while API options 
> specify:true"
> A workaround for queries on views using standard SQL is to persist the result 
> of the query in a temporary table, then run the check operation and 
> thereafter delete the temporary table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators

2018-08-07 Thread GitBox
xnuinside opened a new pull request #3717: [AIRFLOW-1874] use_legacy_sql added 
to BigQueryCheck operators
URL: https://github.com/apache/incubator-airflow/pull/3717
 
 
   Hi everyone! I saw the old task relative to my current work scope and decide 
to do this little PR ) 
   Just adding use_legacy_sql to BigQueryCheck operator.
   
   [AIRFLOW-1874] use_legacy_sql added to BigQueryCheck operators
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
feng-tao commented on issue #3698: [AIRFLOW-2855] Check Cron Expression 
Validity in DagBag.process_file()
URL: 
https://github.com/apache/incubator-airflow/pull/3698#issuecomment-411251048
 
 
   thanks @XD-DENG @yrqls21 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572528#comment-16572528
 ] 

ASF subversion and git services commented on AIRFLOW-2855:
--

Commit d47580feaf80eeebb416d0179dfa8db3f4e1d2c9 in incubator-airflow's branch 
refs/heads/master from Xiaodong
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=d47580f ]

[AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() (#3698)

A DAG can be imported as a .py script properly,
but the Cron expression inside as "schedule_interval" may be
invalid, like "0 100 * * *".

This commit helps check the validity of Cron expression in DAG
files (.py) and packaged DAG files (.zip), and help show
exception messages in web UI by add these exceptions into
metadata "import_error".

> Need to Check Validity of Cron Expression When Process DAG File/Zip File
> 
>
> Key: AIRFLOW-2855
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2855
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> *schedule_interval* of DAGs can either be *timedelta* or a *Cron expression*.
> When it's a Cron expression, there is no mechanism to check its validity at 
> this moment. If there is anything wrong with the Cron expression itself, it 
> will cause issues when methods _*following_schedule(**)*_  and 
> _*previous_schedule()*_ are invoked (will affect scheduling). However, 
> exceptions will only be written into logs. From Web UI, it’s hard for users 
> to identify this issue & the source while no new task can be initiated 
> (especially for users who’re not very familiar with Cron).
>  It may be good to show error messages in web UI when a DAG's Cron expression 
> (as schedule_interval) can not be parsed by *croniter* properly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression 
Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index cf7eb0a64f..206106a4e9 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -62,7 +62,9 @@
 from sqlalchemy.orm import reconstructor, relationship, synonym
 from sqlalchemy_utc import UtcDateTime
 
-from croniter import croniter
+from croniter import (
+croniter, CroniterBadCronError, CroniterBadDateError, CroniterNotAlphaError
+)
 import six
 
 from airflow import settings, utils
@@ -412,8 +414,18 @@ def process_file(self, filepath, only_if_updated=True, 
safe_mode=True):
 try:
 dag.is_subdag = False
 self.bag_dag(dag, parent_dag=dag, root_dag=dag)
+if isinstance(dag._schedule_interval, 
six.string_types):
+croniter(dag._schedule_interval)
 found_dags.append(dag)
 found_dags += dag.subdags
+except (CroniterBadCronError,
+CroniterBadDateError,
+CroniterNotAlphaError) as cron_e:
+self.log.exception("Failed to bag_dag: %s", 
dag.full_filepath)
+self.import_errors[dag.full_filepath] = \
+"Invalid Cron expression: " + str(cron_e)
+self.file_last_changed[dag.full_filepath] = \
+file_last_changed_on_disk
 except AirflowDagCycleException as cycle_exception:
 self.log.exception("Failed to bag_dag: %s", 
dag.full_filepath)
 self.import_errors[dag.full_filepath] = 
str(cycle_exception)
diff --git a/tests/dags/test_invalid_cron.py b/tests/dags/test_invalid_cron.py
new file mode 100755
index 00..51a0e43cb5
--- /dev/null
+++ b/tests/dags/test_invalid_cron.py
@@ -0,0 +1,35 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.models import DAG
+from airflow.operators.dummy_operator import DummyOperator
+from airflow.utils.timezone import datetime
+
+# The schedule_interval specified here is an INVALID
+# Cron expression. This invalid DAG will be used to
+# test whether dagbag.process_file() can identify
+# invalid Cron expression.
+dag1 = DAG(
+dag_id='test_invalid_cron',
+start_date=datetime(2015, 1, 1),
+schedule_interval="0 100 * * *")
+dag1_task1 = DummyOperator(
+task_id='task1',
+dag=dag1,
+owner='airflow')
diff --git a/tests/dags/test_zip_invalid_cron.zip 
b/tests/dags/test_zip_invalid_cron.zip
new file mode 100644
index 00..fe45153abe
Binary files /dev/null and b/tests/dags/test_zip_invalid_cron.zip differ
diff --git a/tests/models.py b/tests/models.py
index 1c88ea47f7..5a0397dc08 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -56,7 +56,7 @@
 from airflow.utils.trigger_rule import TriggerRule
 from mock import patch, ANY
 from parameterized import parameterized
-from tempfile import NamedTemporaryFile
+from tempfile import mkdtemp, NamedTemporaryFile
 
 DEFAULT_DATE = timezone.datetime(2016, 1, 1)
 TEST_DAGS_FOLDER = os.path.join(
@@ -1038,6 +1038,19 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag(dag_folder=mkdtemp())
+
+self.assertEqual(len(dagbag.import_errors), 0)
+for d in invalid_dag_files:
+

[jira] [Commented] (AIRFLOW-2855) Need to Check Validity of Cron Expression When Process DAG File/Zip File

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572527#comment-16572527
 ] 

ASF GitHub Bot commented on AIRFLOW-2855:
-

feng-tao closed pull request #3698: [AIRFLOW-2855] Check Cron Expression 
Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index cf7eb0a64f..206106a4e9 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -62,7 +62,9 @@
 from sqlalchemy.orm import reconstructor, relationship, synonym
 from sqlalchemy_utc import UtcDateTime
 
-from croniter import croniter
+from croniter import (
+croniter, CroniterBadCronError, CroniterBadDateError, CroniterNotAlphaError
+)
 import six
 
 from airflow import settings, utils
@@ -412,8 +414,18 @@ def process_file(self, filepath, only_if_updated=True, 
safe_mode=True):
 try:
 dag.is_subdag = False
 self.bag_dag(dag, parent_dag=dag, root_dag=dag)
+if isinstance(dag._schedule_interval, 
six.string_types):
+croniter(dag._schedule_interval)
 found_dags.append(dag)
 found_dags += dag.subdags
+except (CroniterBadCronError,
+CroniterBadDateError,
+CroniterNotAlphaError) as cron_e:
+self.log.exception("Failed to bag_dag: %s", 
dag.full_filepath)
+self.import_errors[dag.full_filepath] = \
+"Invalid Cron expression: " + str(cron_e)
+self.file_last_changed[dag.full_filepath] = \
+file_last_changed_on_disk
 except AirflowDagCycleException as cycle_exception:
 self.log.exception("Failed to bag_dag: %s", 
dag.full_filepath)
 self.import_errors[dag.full_filepath] = 
str(cycle_exception)
diff --git a/tests/dags/test_invalid_cron.py b/tests/dags/test_invalid_cron.py
new file mode 100755
index 00..51a0e43cb5
--- /dev/null
+++ b/tests/dags/test_invalid_cron.py
@@ -0,0 +1,35 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.models import DAG
+from airflow.operators.dummy_operator import DummyOperator
+from airflow.utils.timezone import datetime
+
+# The schedule_interval specified here is an INVALID
+# Cron expression. This invalid DAG will be used to
+# test whether dagbag.process_file() can identify
+# invalid Cron expression.
+dag1 = DAG(
+dag_id='test_invalid_cron',
+start_date=datetime(2015, 1, 1),
+schedule_interval="0 100 * * *")
+dag1_task1 = DummyOperator(
+task_id='task1',
+dag=dag1,
+owner='airflow')
diff --git a/tests/dags/test_zip_invalid_cron.zip 
b/tests/dags/test_zip_invalid_cron.zip
new file mode 100644
index 00..fe45153abe
Binary files /dev/null and b/tests/dags/test_zip_invalid_cron.zip differ
diff --git a/tests/models.py b/tests/models.py
index 1c88ea47f7..5a0397dc08 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -56,7 +56,7 @@
 from airflow.utils.trigger_rule import TriggerRule
 from mock import patch, ANY
 from parameterized import parameterized
-from tempfile import NamedTemporaryFile
+from tempfile import mkdtemp, NamedTemporaryFile
 
 DEFAULT_DATE = timezone.datetime(2016, 1, 1)
 TEST_DAGS_FOLDER = os.path.join(
@@ -1038,6 +1038,19 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = 

[jira] [Assigned] (AIRFLOW-1874) Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators

2018-08-07 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1874:
---

Assignee: Iuliia Volkova  (was: Guillermo Rodríguez Cano)

> Support standard SQL in Check, ValueCheck and IntervalCheck BigQuery operators
> --
>
> Key: AIRFLOW-1874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1874
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp, operators
>Reporter: Guillermo Rodríguez Cano
>Assignee: Iuliia Volkova
>Priority: Major
> Fix For: 2.0.0
>
>
> BigQueryCheckOperator, BigQueryValueCheckOperator and 
> BigQueryIntervalCheckOperator do not support disabling use of default legacy 
> SQL in BigQuery.
> This is a major blocker to support correct migration to standard SQL when 
> queries are complicated. For example, a query that can be queried in legacy 
> SQL may be blocked from any subsequent view done in standard SQL that this 
> view uses as the queries are bound to either standard or legacy SQL but not a 
> mix.
> These operators inherit from base ones of the same name (without the BigQuery 
> prefix) from Airflow which may make the process more complicated as the flag 
> to use standard SQL should be enabled because the underlying BigQueryHook has 
> the corresponding parameter, use_legacy_sql, set to True, when running a 
> query. But it is not possible to pass parameters all the way to it via the 
> aforementioned operators.
> The workaround of including #standardSQL and a new line before the query 
> doesn't work either as there is mismatch. BigQuery reports the following in 
> fact: "Query text specifies use_legacy_sql:false, while API options 
> specify:true"
> A workaround for queries on views using standard SQL is to persist the result 
> of the query in a temporary table, then run the check operation and 
> thereafter delete the temporary table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2848) dag_id is missing in metadata table "job" for LocalTaskJob

2018-08-07 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572512#comment-16572512
 ] 

Iuliia Volkova commented on AIRFLOW-2848:
-

[~XD-DENG]  seems like it's already done) just friendly reminder about closing 
the task

> dag_id is missing in metadata table "job" for LocalTaskJob
> --
>
> Key: AIRFLOW-2848
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2848
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
> Attachments: After this fix.png, Before this fix.png
>
>
> dag_id is missing for all entries in metadata table "job" with job_type 
> "LocalTaskJob".
> This is due to that dag_id was not specified within class LocalTaskJob.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: 
https://github.com/apache/incubator-airflow/pull/3698#issuecomment-410575101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=h1)
 Report
   > Merging 
[#3698](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/120f4856cdea5134971c4c4a239ddbfdc80db77e?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3698/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3698  +/-   ##
   ==
   + Coverage   77.57%   77.57%   +<.01% 
   ==
 Files 204  204  
 Lines   1577015776   +6 
   ==
   + Hits1223312239   +6 
 Misses   3537 3537
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3698/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.61% <100%> (+0.02%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=footer).
 Last update 
[120f485...1a5936f](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
codecov-io edited a comment on issue #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: 
https://github.com/apache/incubator-airflow/pull/3698#issuecomment-410575101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=h1)
 Report
   > Merging 
[#3698](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/120f4856cdea5134971c4c4a239ddbfdc80db77e?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3698/graphs/tree.svg?width=650=pr=WdLKlKHOAU=150)](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3698  +/-   ##
   ==
   + Coverage   77.57%   77.57%   +<.01% 
   ==
 Files 204  204  
 Lines   1577015776   +6 
   ==
   + Hits1223312239   +6 
 Misses   3537 3537
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3698/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.61% <100%> (+0.02%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=footer).
 Last update 
[120f485...1a5936f](https://codecov.io/gh/apache/incubator-airflow/pull/3698?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add Amazon SageMaker Training

2018-08-07 Thread GitBox
troychen728 commented on a change in pull request #3658: [AIRFLOW-2524] Add 
Amazon SageMaker Training
URL: https://github.com/apache/incubator-airflow/pull/3658#discussion_r208422276
 
 

 ##
 File path: airflow/contrib/operators/sagemaker_create_training_job_operator.py
 ##
 @@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.models import BaseOperator
+from airflow.utils import apply_defaults
+from airflow.exceptions import AirflowException
+
+
+class SageMakerCreateTrainingJobOperator(BaseOperator):
+
+"""
+   Initiate a SageMaker training
+
+   This operator returns The ARN of the model created in Amazon SageMaker
+
+   :param training_job_config:
+   The configuration necessary to start a training job (templated)
+   :type training_job_config: dict
+   :param region_name: The AWS region_name
+   :type region_name: string
+   :param sagemaker_conn_id: The SageMaker connection ID to use.
+   :type aws_conn_id: string
 
 Review comment:
   @Fokko @gglanzani 
   I agree with you that giving user an easier way is always helpful, so I 
updated the code, and added an wait option in operator. The default is True, 
and if it is not set to False by user, the operator will only signal success 
after the training job has finished. The control logic should be very similar 
to that of Druid hook you mentioned before.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208416023
 
 

 ##
 File path: tests/models.py
 ##
 @@ -56,7 +56,7 @@
 from airflow.utils.trigger_rule import TriggerRule
 from mock import patch, ANY
 from parameterized import parameterized
-from tempfile import NamedTemporaryFile
+from tempfile import NamedTemporaryFile, mkdtemp
 
 Review comment:
   Updated as suggested. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572471#comment-16572471
 ] 

ASF GitHub Bot commented on AIRFLOW-2867:
-

feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index aa8fc382a6..2a94580f50 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -206,7 +206,7 @@ def create_empty_table(self,
dataset_id,
table_id,
schema_fields=None,
-   time_partitioning={},
+   time_partitioning=None,
labels=None
):
 """
@@ -238,6 +238,8 @@ def create_empty_table(self,
 
 :return:
 """
+if time_partitioning is None:
+time_partitioning = dict()
 project_id = project_id if project_id is not None else self.project_id
 
 table_resource = {
@@ -286,7 +288,7 @@ def create_external_table(self,
   quote_character=None,
   allow_quoted_newlines=False,
   allow_jagged_rows=False,
-  src_fmt_configs={},
+  src_fmt_configs=None,
   labels=None
   ):
 """
@@ -352,6 +354,8 @@ def create_external_table(self,
 :type labels: dict
 """
 
+if src_fmt_configs is None:
+src_fmt_configs = {}
 project_id, dataset_id, external_table_id = \
 _split_tablename(table_input=external_project_dataset_table,
  default_project_id=self.project_id,
@@ -482,7 +486,7 @@ def run_query(self,
   labels=None,
   schema_update_options=(),
   priority='INTERACTIVE',
-  time_partitioning={}):
+  time_partitioning=None):
 """
 Executes a BigQuery SQL query. Optionally persists results in a 
BigQuery
 table. See here:
@@ -548,6 +552,8 @@ def run_query(self,
 """
 
 # TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513]
+if time_partitioning is None:
+time_partitioning = {}
 sql = bql if sql is None else sql
 
 if bql:
@@ -808,8 +814,8 @@ def run_load(self,
  allow_quoted_newlines=False,
  allow_jagged_rows=False,
  schema_update_options=(),
- src_fmt_configs={},
- time_partitioning={}):
+ src_fmt_configs=None,
+ time_partitioning=None):
 """
 Executes a BigQuery load command to load data from Google Cloud Storage
 to BigQuery. See here:
@@ -880,6 +886,10 @@ def run_load(self,
 # if it's not, we raise a ValueError
 # Refer to this link for more details:
 #   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat
+if src_fmt_configs is None:
+src_fmt_configs = {}
+if time_partitioning is None:
+time_partitioning = {}
 source_format = source_format.upper()
 allowed_formats = [
 "CSV", "NEWLINE_DELIMITED_JSON", "AVRO", "GOOGLE_SHEETS",
@@ -1011,12 +1021,12 @@ def run_with_configuration(self, configuration):
 
 # Wait for query to finish.
 keep_polling_job = True
-while (keep_polling_job):
+while keep_polling_job:
 try:
 job = jobs.get(
 projectId=self.project_id,
 jobId=self.running_job_id).execute()
-if (job['status']['state'] == 'DONE'):
+if job['status']['state'] == 'DONE':
 keep_polling_job = False
 # Check if job had errors.
 if 'errorResult' in job['status']:
@@ -1045,7 +1055,7 @@ def poll_job_complete(self, job_id):
 jobs = self.service.jobs()
 try:
 job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
-if (job['status']['state'] == 'DONE'):
+if job['status']['state'] == 'DONE':
 return True
 except HttpError as err:
 if err.resp.status in [500, 503]:
@@ 

[GitHub] feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
feng-tao closed pull request #3714: [AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index aa8fc382a6..2a94580f50 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -206,7 +206,7 @@ def create_empty_table(self,
dataset_id,
table_id,
schema_fields=None,
-   time_partitioning={},
+   time_partitioning=None,
labels=None
):
 """
@@ -238,6 +238,8 @@ def create_empty_table(self,
 
 :return:
 """
+if time_partitioning is None:
+time_partitioning = dict()
 project_id = project_id if project_id is not None else self.project_id
 
 table_resource = {
@@ -286,7 +288,7 @@ def create_external_table(self,
   quote_character=None,
   allow_quoted_newlines=False,
   allow_jagged_rows=False,
-  src_fmt_configs={},
+  src_fmt_configs=None,
   labels=None
   ):
 """
@@ -352,6 +354,8 @@ def create_external_table(self,
 :type labels: dict
 """
 
+if src_fmt_configs is None:
+src_fmt_configs = {}
 project_id, dataset_id, external_table_id = \
 _split_tablename(table_input=external_project_dataset_table,
  default_project_id=self.project_id,
@@ -482,7 +486,7 @@ def run_query(self,
   labels=None,
   schema_update_options=(),
   priority='INTERACTIVE',
-  time_partitioning={}):
+  time_partitioning=None):
 """
 Executes a BigQuery SQL query. Optionally persists results in a 
BigQuery
 table. See here:
@@ -548,6 +552,8 @@ def run_query(self,
 """
 
 # TODO remove `bql` in Airflow 2.0 - Jira: [AIRFLOW-2513]
+if time_partitioning is None:
+time_partitioning = {}
 sql = bql if sql is None else sql
 
 if bql:
@@ -808,8 +814,8 @@ def run_load(self,
  allow_quoted_newlines=False,
  allow_jagged_rows=False,
  schema_update_options=(),
- src_fmt_configs={},
- time_partitioning={}):
+ src_fmt_configs=None,
+ time_partitioning=None):
 """
 Executes a BigQuery load command to load data from Google Cloud Storage
 to BigQuery. See here:
@@ -880,6 +886,10 @@ def run_load(self,
 # if it's not, we raise a ValueError
 # Refer to this link for more details:
 #   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat
+if src_fmt_configs is None:
+src_fmt_configs = {}
+if time_partitioning is None:
+time_partitioning = {}
 source_format = source_format.upper()
 allowed_formats = [
 "CSV", "NEWLINE_DELIMITED_JSON", "AVRO", "GOOGLE_SHEETS",
@@ -1011,12 +1021,12 @@ def run_with_configuration(self, configuration):
 
 # Wait for query to finish.
 keep_polling_job = True
-while (keep_polling_job):
+while keep_polling_job:
 try:
 job = jobs.get(
 projectId=self.project_id,
 jobId=self.running_job_id).execute()
-if (job['status']['state'] == 'DONE'):
+if job['status']['state'] == 'DONE':
 keep_polling_job = False
 # Check if job had errors.
 if 'errorResult' in job['status']:
@@ -1045,7 +1055,7 @@ def poll_job_complete(self, job_id):
 jobs = self.service.jobs()
 try:
 job = jobs.get(projectId=self.project_id, jobId=job_id).execute()
-if (job['status']['state'] == 'DONE'):
+if job['status']['state'] == 'DONE':
 return True
 except HttpError as err:
 if err.resp.status in [500, 503]:
@@ -1079,13 +1089,13 @@ def cancel_query(self):
 polling_attempts = 0
 
 job_complete = False
-while (polling_attempts < max_polling_attempts and not job_complete):
+while polling_attempts < max_polling_attempts and 

[GitHub] codecov-io commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
codecov-io commented on issue #3714: [AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: 
https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411232814
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=h1)
 Report
   > Merging 
[#3714](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `76.19%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3714/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3714  +/-   ##
   ==
   + Coverage   77.56%   77.56%   +<.01% 
   ==
 Files 204  204  
 Lines   1576815770   +2 
   ==
   + Hits1223012232   +2 
 Misses   3538 3538
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www\_rbac/forms.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9mb3Jtcy5weQ==)
 | `100% <ø> (ø)` | :arrow_up: |
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `82.76% <ø> (ø)` | :arrow_up: |
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `68.88% <0%> (ø)` | :arrow_up: |
   | 
[airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==)
 | `72.85% <0%> (ø)` | :arrow_up: |
   | 
[airflow/utils/log/gcs\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZ2NzX3Rhc2tfaGFuZGxlci5weQ==)
 | `0% <0%> (ø)` | :arrow_up: |
   | 
[airflow/operators/hive\_stats\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV9zdGF0c19vcGVyYXRvci5weQ==)
 | `0% <0%> (ø)` | :arrow_up: |
   | 
[airflow/executors/dask\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvZGFza19leGVjdXRvci5weQ==)
 | `2% <0%> (ø)` | :arrow_up: |
   | 
[airflow/operators/check\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvY2hlY2tfb3BlcmF0b3IucHk=)
 | `58.26% <100%> (ø)` | :arrow_up: |
   | 
[airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5)
 | `95% <100%> (+0.17%)` | :arrow_up: |
   | 
[airflow/utils/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9jbGkucHk=)
 | `100% <100%> (ø)` | :arrow_up: |
   | ... and [8 
more](https://codecov.io/gh/apache/incubator-airflow/pull/3714/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=footer).
 Last update 
[d9fecba...c3b5ea1](https://codecov.io/gh/apache/incubator-airflow/pull/3714?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3716: [AIRFLOW-2869] Remove smart quote from default config

2018-08-07 Thread GitBox
codecov-io commented on issue #3716: [AIRFLOW-2869] Remove smart quote from 
default config
URL: 
https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411232064
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=h1)
 Report
   > Merging 
[#3716](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3716/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3716   +/-   ##
   ===
 Coverage   77.56%   77.56%   
   ===
 Files 204  204   
 Lines   1576815768   
   ===
 Hits1223012230   
 Misses   3538 3538
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=footer).
 Last update 
[d9fecba...6c1b97c](https://codecov.io/gh/apache/incubator-airflow/pull/3716?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-07 Thread George Leslie-Waksman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Leslie-Waksman updated AIRFLOW-2870:
---
Description: 
Running migrations from below 
cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:

{noformat}
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 
1182, in _execute_context
context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.ProgrammingError: column task_instance.executor_config does not exist
LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
{noformat}

The failure is occurring because 
cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from 
the current code version, which has changes to the task_instance table that are 
not expected by the migration.

Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
executor_config column that does not exist as of when 
cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.

It is worth noting that this will not be observed for new installs because the 
migration branches on table existence/non-existence at a point that will hide 
the issue from new installs.

  was:
Running migrations from below 
cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:

{noformat}
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 
1182, in _execute_context
context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.ProgrammingError: column task_instance.executor_config does not exist
LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
{noformat}

The failure is occurring because 
cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from 
the current code version, which has changes to the task_instance table that are 
not expected by the migration.

Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
executor_config column that does not exist as of when 
cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.


> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-07 Thread George Leslie-Waksman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Leslie-Waksman updated AIRFLOW-2870:
---
Description: 
Running migrations from below 
cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:

{noformat}
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 
1182, in _execute_context
context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.ProgrammingError: column task_instance.executor_config does not exist
LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
{noformat}

The failure is occurring because 
cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from 
the current code version, which has changes to the task_instance table that are 
not expected by the migration.

Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
executor_config column that does not exist as of when 
cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.

  was:
Running migrations from below 
cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:

{noformat}
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 
1182, in _execute_context
context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.ProgrammingError: column task_instance.executor_config does not exist
LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
{noformat}

The failure is occurring because 
cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from 
the current code version, which exists in a post-migration state.


> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
feng-tao commented on issue #3714: [AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: 
https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411230444
 
 
   lgtm


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli

2018-08-07 Thread GitBox
codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and 
subprocess in webserver cli
URL: 
https://github.com/apache/incubator-airflow/pull/3586#issuecomment-403631506
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=h1)
 Report
   > Merging 
[#3586](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `50%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3586/graphs/tree.svg?token=WdLKlKHOAU=pr=150=650)](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3586  +/-   ##
   ==
   - Coverage   77.56%   77.56%   -0.01% 
   ==
 Files 204  204  
 Lines   1576815767   -1 
   ==
   - Hits1223012229   -1 
 Misses   3538 3538
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `64.43% <50%> (+0.08%)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=footer).
 Last update 
[d9fecba...de54ae0](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and subprocess in webserver cli

2018-08-07 Thread GitBox
codecov-io edited a comment on issue #3586: [AIRFLOW-2733] Reconcile psutil and 
subprocess in webserver cli
URL: 
https://github.com/apache/incubator-airflow/pull/3586#issuecomment-403631506
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=h1)
 Report
   > Merging 
[#3586](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/d9fecba14c5eb56990508573a91b13ab27ca5153?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `50%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3586/graphs/tree.svg?width=650=150=pr=WdLKlKHOAU)](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3586  +/-   ##
   ==
   - Coverage   77.56%   77.56%   -0.01% 
   ==
 Files 204  204  
 Lines   1576815767   -1 
   ==
   - Hits1223012229   -1 
 Misses   3538 3538
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `64.43% <50%> (+0.08%)` | :arrow_up: |
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3586/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.54% <0%> (-0.05%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=footer).
 Last update 
[d9fecba...de54ae0](https://codecov.io/gh/apache/incubator-airflow/pull/3586?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
kaxil commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208411271
 
 

 ##
 File path: airflow/contrib/operators/oracle_to_azure_data_lake_transfer.py
 ##
 @@ -1,113 +1,115 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from airflow.hooks.oracle_hook import OracleHook
-from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
-from airflow.models import BaseOperator
-from airflow.utils.decorators import apply_defaults
-from airflow.utils.file import TemporaryDirectory
-
-import unicodecsv as csv
-import os
-
-
-class OracleToAzureDataLakeTransfer(BaseOperator):
-"""
-Moves data from Oracle to Azure Data Lake. The operator runs the query 
against
-Oracle and stores the file locally before loading it into Azure Data Lake.
-
-
-:param filename: file name to be used by the csv file.
-:type filename: str
-:param azure_data_lake_conn_id: destination azure data lake connection.
-:type azure_data_lake_conn_id: str
-:param azure_data_lake_path: destination path in azure data lake to put 
the file.
-:type azure_data_lake_path: str
-:param oracle_conn_id: source Oracle connection.
-:type oracle_conn_id: str
-:param sql: SQL query to execute against the Oracle database. (templated)
-:type sql: str
-:param sql_params: Parameters to use in sql query. (templated)
-:type sql_params: str
-:param delimiter: field delimiter in the file.
-:type delimiter: str
-:param encoding: enconding type for the file.
-:type encoding: str
-:param quotechar: Character to use in quoting.
-:type quotechar: str
-:param quoting: Quoting strategy. See unicodecsv quoting for more 
information.
-:type quoting: str
-"""
-
-template_fields = ('filename', 'sql', 'sql_params')
-ui_color = '#e08c8c'
-
-@apply_defaults
-def __init__(
-self,
-filename,
-azure_data_lake_conn_id,
-azure_data_lake_path,
-oracle_conn_id,
-sql,
-sql_params={},
-delimiter=",",
-encoding="utf-8",
-quotechar='"',
-quoting=csv.QUOTE_MINIMAL,
-*args, **kwargs):
-super(OracleToAzureDataLakeTransfer, self).__init__(*args, **kwargs)
-self.filename = filename
-self.oracle_conn_id = oracle_conn_id
-self.sql = sql
-self.sql_params = sql_params
-self.azure_data_lake_conn_id = azure_data_lake_conn_id
-self.azure_data_lake_path = azure_data_lake_path
-self.delimiter = delimiter
-self.encoding = encoding
-self.quotechar = quotechar
-self.quoting = quoting
-
-def _write_temp_file(self, cursor, path_to_save):
-with open(path_to_save, 'wb') as csvfile:
-csv_writer = csv.writer(csvfile, delimiter=self.delimiter,
-encoding=self.encoding, 
quotechar=self.quotechar,
-quoting=self.quoting)
-csv_writer.writerow(map(lambda field: field[0], 
cursor.description))
-csv_writer.writerows(cursor)
-csvfile.flush()
-
-def execute(self, context):
-oracle_hook = OracleHook(oracle_conn_id=self.oracle_conn_id)
-azure_data_lake_hook = AzureDataLakeHook(
-azure_data_lake_conn_id=self.azure_data_lake_conn_id)
-
-self.log.info("Dumping Oracle query results to local file")
-conn = oracle_hook.get_conn()
-cursor = conn.cursor()
-cursor.execute(self.sql, self.sql_params)
-
-with TemporaryDirectory(prefix='airflow_oracle_to_azure_op_') as temp:
-self._write_temp_file(cursor, os.path.join(temp, self.filename))
-self.log.info("Uploading local file to Azure Data Lake")
-azure_data_lake_hook.upload_file(os.path.join(temp, self.filename),
- 
os.path.join(self.azure_data_lake_path,
-  self.filename))
-

[jira] [Commented] (AIRFLOW-2869) Remove smart quote from default config

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572455#comment-16572455
 ] 

ASF GitHub Bot commented on AIRFLOW-2869:
-

r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from 
default config
URL: https://github.com/apache/incubator-airflow/pull/3716
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index d4a7242118..4f1f0df383 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -612,7 +612,7 @@ image_pull_secrets =
 gcp_service_account_keys =
 
 # Use the service account kubernetes gives to pods to connect to kubernetes 
cluster.
-# It’s intended for clients that expect to be running inside a pod running on 
kubernetes.
+# It's intended for clients that expect to be running inside a pod running on 
kubernetes.
 # It will raise an exception if called from a process not running in a 
kubernetes environment.
 in_cluster = True
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove smart quote from default config
> --
>
> Key: AIRFLOW-2869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2869
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Trivial
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2869) Remove smart quote from default config

2018-08-07 Thread Siddharth Anand (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand resolved AIRFLOW-2869.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3716
[https://github.com/apache/incubator-airflow/pull/3716]

> Remove smart quote from default config
> --
>
> Key: AIRFLOW-2869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2869
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Trivial
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from default config

2018-08-07 Thread GitBox
r39132 closed pull request #3716: [AIRFLOW-2869] Remove smart quote from 
default config
URL: https://github.com/apache/incubator-airflow/pull/3716
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index d4a7242118..4f1f0df383 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -612,7 +612,7 @@ image_pull_secrets =
 gcp_service_account_keys =
 
 # Use the service account kubernetes gives to pods to connect to kubernetes 
cluster.
-# It’s intended for clients that expect to be running inside a pod running on 
kubernetes.
+# It's intended for clients that expect to be running inside a pod running on 
kubernetes.
 # It will raise an exception if called from a process not running in a 
kubernetes environment.
 in_cluster = True
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2869) Remove smart quote from default config

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572453#comment-16572453
 ] 

ASF subversion and git services commented on AIRFLOW-2869:
--

Commit 67e2bb96cdc5ea37226d11332362d3bd3778cea0 in incubator-airflow's branch 
refs/heads/master from William Horton
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=67e2bb9 ]

[AIRFLOW-2869] Remove smart quote from default config

Closes #3716 from wdhorton/remove-smart-quote-
from-cfg


> Remove smart quote from default config
> --
>
> Key: AIRFLOW-2869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2869
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Trivial
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor 
code to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408841
 
 

 ##
 File path: airflow/contrib/operators/oracle_to_azure_data_lake_transfer.py
 ##
 @@ -1,113 +1,115 @@
-# -*- coding: utf-8 -*-
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from airflow.hooks.oracle_hook import OracleHook
-from airflow.contrib.hooks.azure_data_lake_hook import AzureDataLakeHook
-from airflow.models import BaseOperator
-from airflow.utils.decorators import apply_defaults
-from airflow.utils.file import TemporaryDirectory
-
-import unicodecsv as csv
-import os
-
-
-class OracleToAzureDataLakeTransfer(BaseOperator):
-"""
-Moves data from Oracle to Azure Data Lake. The operator runs the query 
against
-Oracle and stores the file locally before loading it into Azure Data Lake.
-
-
-:param filename: file name to be used by the csv file.
-:type filename: str
-:param azure_data_lake_conn_id: destination azure data lake connection.
-:type azure_data_lake_conn_id: str
-:param azure_data_lake_path: destination path in azure data lake to put 
the file.
-:type azure_data_lake_path: str
-:param oracle_conn_id: source Oracle connection.
-:type oracle_conn_id: str
-:param sql: SQL query to execute against the Oracle database. (templated)
-:type sql: str
-:param sql_params: Parameters to use in sql query. (templated)
-:type sql_params: str
-:param delimiter: field delimiter in the file.
-:type delimiter: str
-:param encoding: enconding type for the file.
-:type encoding: str
-:param quotechar: Character to use in quoting.
-:type quotechar: str
-:param quoting: Quoting strategy. See unicodecsv quoting for more 
information.
-:type quoting: str
-"""
-
-template_fields = ('filename', 'sql', 'sql_params')
-ui_color = '#e08c8c'
-
-@apply_defaults
-def __init__(
-self,
-filename,
-azure_data_lake_conn_id,
-azure_data_lake_path,
-oracle_conn_id,
-sql,
-sql_params={},
-delimiter=",",
-encoding="utf-8",
-quotechar='"',
-quoting=csv.QUOTE_MINIMAL,
-*args, **kwargs):
-super(OracleToAzureDataLakeTransfer, self).__init__(*args, **kwargs)
-self.filename = filename
-self.oracle_conn_id = oracle_conn_id
-self.sql = sql
-self.sql_params = sql_params
-self.azure_data_lake_conn_id = azure_data_lake_conn_id
-self.azure_data_lake_path = azure_data_lake_path
-self.delimiter = delimiter
-self.encoding = encoding
-self.quotechar = quotechar
-self.quoting = quoting
-
-def _write_temp_file(self, cursor, path_to_save):
-with open(path_to_save, 'wb') as csvfile:
-csv_writer = csv.writer(csvfile, delimiter=self.delimiter,
-encoding=self.encoding, 
quotechar=self.quotechar,
-quoting=self.quoting)
-csv_writer.writerow(map(lambda field: field[0], 
cursor.description))
-csv_writer.writerows(cursor)
-csvfile.flush()
-
-def execute(self, context):
-oracle_hook = OracleHook(oracle_conn_id=self.oracle_conn_id)
-azure_data_lake_hook = AzureDataLakeHook(
-azure_data_lake_conn_id=self.azure_data_lake_conn_id)
-
-self.log.info("Dumping Oracle query results to local file")
-conn = oracle_hook.get_conn()
-cursor = conn.cursor()
-cursor.execute(self.sql, self.sql_params)
-
-with TemporaryDirectory(prefix='airflow_oracle_to_azure_op_') as temp:
-self._write_temp_file(cursor, os.path.join(temp, self.filename))
-self.log.info("Uploading local file to Azure Data Lake")
-azure_data_lake_hook.upload_file(os.path.join(temp, self.filename),
- 
os.path.join(self.azure_data_lake_path,
-  self.filename))
-

[jira] [Updated] (AIRFLOW-2869) Remove smart quote from default config

2018-08-07 Thread Siddharth Anand (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Anand updated AIRFLOW-2869:
-
External issue URL: https://github.com/apache/incubator-airflow/pull/3716

> Remove smart quote from default config
> --
>
> Key: AIRFLOW-2869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2869
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor 
code to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408365
 
 

 ##
 File path: airflow/contrib/hooks/databricks_hook.py
 ##
 @@ -65,7 +65,8 @@ def __init__(
 raise ValueError('Retry limit must be greater than equal to 1')
 self.retry_limit = retry_limit
 
-def _parse_host(self, host):
+@staticmethod
 
 Review comment:
   do you change the place where it uses this method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 edited a comment on issue #3716: [AIRFLOW-2869] Remove smart quote from default config

2018-08-07 Thread GitBox
r39132 edited a comment on issue #3716: [AIRFLOW-2869] Remove smart quote from 
default config
URL: 
https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958
 
 
   +1 -- please follow contributor guidelines in the future. I'll create a 
tracking JIRA and update the JIRA & commit message for this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
feng-tao commented on a change in pull request #3714: [AIRFLOW-2867] Refactor 
code to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208408365
 
 

 ##
 File path: airflow/contrib/hooks/databricks_hook.py
 ##
 @@ -65,7 +65,8 @@ def __init__(
 raise ValueError('Retry limit must be greater than equal to 1')
 self.retry_limit = retry_limit
 
-def _parse_host(self, host):
+@staticmethod
 
 Review comment:
   do you change the place where it uses this method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-07 Thread George Leslie-Waksman (JIRA)
George Leslie-Waksman created AIRFLOW-2870:
--

 Summary: Migrations fail when upgrading from below 
cc1e65623dc7_add_max_tries_column_to_task_instance
 Key: AIRFLOW-2870
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
 Project: Apache Airflow
  Issue Type: Bug
Reporter: George Leslie-Waksman


Running migrations from below 
cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:

{noformat}
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, 
add max tries column to task instance
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 
1182, in _execute_context
context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
line 470, in do_execute
cursor.execute(statement, parameters)
psycopg2.ProgrammingError: column task_instance.executor_config does not exist
LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
{noformat}

The failure is occurring because 
cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from 
the current code version, which exists in a post-migration state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2869) Remove smart quote from default config

2018-08-07 Thread Siddharth Anand (JIRA)
Siddharth Anand created AIRFLOW-2869:


 Summary: Remove smart quote from default config
 Key: AIRFLOW-2869
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2869
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Siddharth Anand
Assignee: Siddharth Anand






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 edited a comment on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config

2018-08-07 Thread GitBox
r39132 edited a comment on issue #3716: [AIRFLOW-XXX] Remove smart quote from 
default config
URL: 
https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958
 
 
   +1 -- please follow contributor guidelines in the future. I'll create a 
tracking JIRA and update the commit message for this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3716: [AIRFLOW-XXX] Remove smart quote from default config

2018-08-07 Thread GitBox
r39132 commented on issue #3716: [AIRFLOW-XXX] Remove smart quote from default 
config
URL: 
https://github.com/apache/incubator-airflow/pull/3716#issuecomment-411222958
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wdhorton opened a new pull request #3716: [AIRFLOW-XXX] Remove smart quote from default config

2018-08-07 Thread GitBox
wdhorton opened a new pull request #3716: [AIRFLOW-XXX] Remove smart quote from 
default config
URL: https://github.com/apache/incubator-airflow/pull/3716
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   I removed a [smart quotation 
mark](https://www.fileformat.info/info/unicode/char/2019/index.htm) from the 
default config, as it could cause problems with tooling because of unicode 
issues, and there's no real need for it. In my case, I was trying to use Fabric 
and cuisine to copy a config to a remote machine and it crashed with this error:
   ```
 File 
"/Users/william.horton/development/urbancompass/build-support/python/venvs/shorts/lib/python2.7/site-packages/cuisine.py",
 line 545, in file_write
   sig= hashlib.md5(content).hexdigest()
   UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in 
position 20037: ordinal not in range(128)
   ```
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Just changing a quotation mark in a comment in the default config
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime

2018-08-07 Thread GitBox
feng-tao commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
URL: 
https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411222194
 
 
   flake8 fails.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-1460) Tasks get stuck in the "removed" state

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572430#comment-16572430
 ] 

ASF GitHub Bot commented on AIRFLOW-1460:
-

gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun 
update
URL: https://github.com/apache/incubator-airflow/pull/2482
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index d1f8e59fe3..ff059b4fcb 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4255,6 +4255,7 @@ def update_state(self, session=None):
 # skip in db?
 if ti.state == State.REMOVED:
 tis.remove(ti)
+session.delete(ti)
 else:
 ti.task = dag.get_task(ti.task_id)
 
diff --git a/tests/models.py b/tests/models.py
index cf2734b746..b8157a664b 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -487,6 +487,34 @@ def test_dagrun_success_conditions(self):
 state = dr.update_state()
 self.assertEqual(State.FAILED, state)
 
+def test_dagrun_clear_removed(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_clear_removed',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+with dag:
+op = DummyOperator(task_id='A')
+
+dag.clear()
+
+now = datetime.datetime.now()
+dr = dag.create_dagrun(run_id='test_dagrun_clear_removed',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+ti_op = dr.get_task_instance(task_id=op.task_id)
+ti_op.set_state(state=State.REMOVED, session=session)
+
+self.assertEqual(
+dr.get_task_instance(task_id=op.task_id).state,
+State.REMOVED)
+dr.update_state()
+self.assertIsNone(dr.get_task_instance(task_id=op.task_id))
+
 def test_get_task_instance_on_empty_dagrun(self):
 """
 Make sure that a proper value is returned when a dagrun has no task 
instances


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tasks get stuck in the "removed" state
> --
>
> Key: AIRFLOW-1460
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1460
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Assignee: Maxime Beauchemin
>Priority: Major
> Fix For: 1.9.1
>
>
> The current handling of task instances that get assigned the state "removed" 
> is that they get ignored.
> If the underlying task later gets re-added, the existing task instances will 
> prevent the task from running in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] gwax commented on issue #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update

2018-08-07 Thread GitBox
gwax commented on issue #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update
URL: 
https://github.com/apache/incubator-airflow/pull/2482#issuecomment-411221356
 
 
   This Issue has been resolved by 
https://github.com/apache/incubator-airflow/pull/3137
   
   I will close this PR as a duplicate.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun update

2018-08-07 Thread GitBox
gwax closed pull request #2482: AIRFLOW-1460 clear "REMOVED" tis on DagRun 
update
URL: https://github.com/apache/incubator-airflow/pull/2482
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/models.py b/airflow/models.py
index d1f8e59fe3..ff059b4fcb 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -4255,6 +4255,7 @@ def update_state(self, session=None):
 # skip in db?
 if ti.state == State.REMOVED:
 tis.remove(ti)
+session.delete(ti)
 else:
 ti.task = dag.get_task(ti.task_id)
 
diff --git a/tests/models.py b/tests/models.py
index cf2734b746..b8157a664b 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -487,6 +487,34 @@ def test_dagrun_success_conditions(self):
 state = dr.update_state()
 self.assertEqual(State.FAILED, state)
 
+def test_dagrun_clear_removed(self):
+session = settings.Session()
+
+dag = DAG(
+'test_dagrun_clear_removed',
+start_date=DEFAULT_DATE,
+default_args={'owner': 'owner1'})
+
+with dag:
+op = DummyOperator(task_id='A')
+
+dag.clear()
+
+now = datetime.datetime.now()
+dr = dag.create_dagrun(run_id='test_dagrun_clear_removed',
+   state=State.RUNNING,
+   execution_date=now,
+   start_date=now)
+
+ti_op = dr.get_task_instance(task_id=op.task_id)
+ti_op.set_state(state=State.REMOVED, session=session)
+
+self.assertEqual(
+dr.get_task_instance(task_id=op.task_id).state,
+State.REMOVED)
+dr.update_state()
+self.assertIsNone(dr.get_task_instance(task_id=op.task_id))
+
 def test_get_task_instance_on_empty_dagrun(self):
 """
 Make sure that a proper value is returned when a dagrun has no task 
instances


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-2787) Airflow scheduler dies on DAGs with NULL DagRun run_id

2018-08-07 Thread George Leslie-Waksman (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Leslie-Waksman resolved AIRFLOW-2787.

Resolution: Fixed

> Airflow scheduler dies on DAGs with NULL DagRun run_id
> --
>
> Key: AIRFLOW-2787
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2787
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Assignee: George Leslie-Waksman
>Priority: Critical
>
> When a DagRun is created with NULL run_id, the scheduler subprocess will 
> crash when checking `is_backfill`:
> {noformat}
> Got an exception! Propagating...
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 347, in 
> helper
> pickle_dags)
>   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 53, 
> in wrapper
> result = func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1583, 
> in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 1175, 
> in _process_dags
> self._process_task_instances(dag, tis_out)
>   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 873, in 
> _process_task_instances
> if run.is_backfill:
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 4257, 
> in is_backfill
> if "backfill" in self.run_id:
> TypeError: argument of type 'NoneType' is not iterable
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2868) Mesos Executor use executor_config to specify CPU, Memory and Docker image on the task level

2018-08-07 Thread Amir Shahatit (JIRA)
Amir Shahatit created AIRFLOW-2868:
--

 Summary: Mesos Executor use executor_config to specify CPU, Memory 
and Docker image on the task level
 Key: AIRFLOW-2868
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2868
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Affects Versions: 1.10, 1.10.1
Reporter: Amir Shahatit
Assignee: Amir Shahatit


Executor_config was added as a part of 
[AIRFLOW-1314|https://github.com/apache/incubator-airflow/commit/c0920efc012468681cff3d3c9cfe25c7381dc976].
 This task extends the mesosExecutor to make use of specified executor configs 
to pass on resource requirements (CPU/Memory) as well as docker images on the 
task level. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2868) Mesos Executor should use executor_config to specify CPU, Memory and Docker image on the task level

2018-08-07 Thread Amir Shahatit (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Shahatit updated AIRFLOW-2868:
---
Summary: Mesos Executor should use executor_config to specify CPU, Memory 
and Docker image on the task level  (was: Mesos Executor use executor_config to 
specify CPU, Memory and Docker image on the task level)

> Mesos Executor should use executor_config to specify CPU, Memory and Docker 
> image on the task level
> ---
>
> Key: AIRFLOW-2868
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2868
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10, 1.10.1
>Reporter: Amir Shahatit
>Assignee: Amir Shahatit
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Executor_config was added as a part of 
> [AIRFLOW-1314|https://github.com/apache/incubator-airflow/commit/c0920efc012468681cff3d3c9cfe25c7381dc976].
>  This task extends the mesosExecutor to make use of specified executor 
> configs to pass on resource requirements (CPU/Memory) as well as docker 
> images on the task level. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] bolkedebruin edited a comment on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime

2018-08-07 Thread GitBox
bolkedebruin edited a comment on issue #3708: [AIRFLOW-2859] Implement own 
UtcDateTime
URL: 
https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195033
 
 
   @ashb I think I finally nailed it. The issue was in my understanding how 
sqlalchemy was dealing with timezone information inside fields for mysql. If 
you insert in mysql `2018-08-08 20:40:21.443732+02:00` mysql will ignore the 
`+02:00` _and_ apply the connection's timezone setting (e.g. `set 
time_zone='+01:00'`). I thought sqlalchemy would handle this somehow (ie bij 
separating the timezone and doing something like this [1]. 
   
   So, this creates chaos obviously, although not too much as long as you do 
not change the connection's timezone. In my tests I was actually doing so hence 
they didn't pass. Strangely enough if executing them isolated / locally it 
worked for some reason. probably due to connection re-use. 
   
   Anyways I borrowed you event watcher and now make sure we always connect in 
UTC with mysql. For postgres it doesn't matter and we can test it doesn't. 
   
   [1] 
https://stackoverflow.com/questions/7651409/mysql-datetime-insert-a-date-with-timezone-offset


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime

2018-08-07 Thread GitBox
ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
URL: 
https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195448
 
 
   AIP-3: Drop support for Mysql :D
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime

2018-08-07 Thread GitBox
bolkedebruin commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
URL: 
https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411195033
 
 
   @ashb I think I finally nailed it. The issue was in my understanding how 
sqlalchemy was dealing with timezone information inside fields for mysql. If 
you insert in mysql `2018-08-08 20:40:21.443732+02:00` mysql will ignore the 
`+02:00` _and_ apply the connection's timezone setting (e.g. `set 
time_zone='+01:00'`). This creates chaos obviously, although not too much as 
long as you do not change the connection's timezone. In my tests I was actually 
doing so hence they didn't pass. Strangely enough if executing them isolated / 
locally it worked for some reason. probably due to connection re-use. 
   
   Anyways I borrowed you event watcher and now make sure we always connect in 
UTC with mysql. For postgres it doesn't matter and we can test it doesn't. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208377732
 
 

 ##
 File path: tests/core.py
 ##
 @@ -2423,7 +2423,7 @@ def test_init_proxy_user(self):
 class HDFSHookTest(unittest.TestCase):
 def setUp(self):
 configuration.load_test_config()
-os.environ['AIRFLOW_CONN_HDFS_DEFAULT'] = ('hdfs://localhost:8020')
+os.environ['AIRFLOW_CONN_HDFS_DEFAULT'] = 'hdfs://localhost:8020'
 
 Review comment:
   (Not related to this Pr, but we have HDFS tests in core.py? o_O)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
kaxil commented on issue #3714: [AIRFLOW-2867] Refactor code to conform Python 
standards & guidelines
URL: 
https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411194032
 
 
   @ashb This is now ready for review. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bolkedebruin closed pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10

2018-08-07 Thread GitBox
bolkedebruin closed pull request #3715: [AIRFLOW-XXX] Updating instructions 
about logging changes in 1.10
URL: https://github.com/apache/incubator-airflow/pull/3715
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index f829bed3f9..4fda57663f 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -124,6 +124,17 @@ elasticsearch_log_id_template = 
{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_
 elasticsearch_end_of_log_mark = end_of_log
 ```
 
+The previous setting of `log_task_reader` is not needed in many cases now when 
using the default logging config with remote storages. (Previously it needed to 
be set to `s3.task` or similar. This is not needed with the default config 
anymore)
+
+ Change of per-task log path
+
+With the change to Airflow core to be timezone aware the default log path for 
task instances will now include timezone information. This will by default mean 
all previous task logs won't be found. You can get the old behaviour back by 
setting the following config options:
+
+```
+[core]
+log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ 
execution_date.strftime("%%Y-%%m-%%dT%%H:%%M:%%S") }}/{{ try_number }}.log
+```
+
 ## Airflow 1.9
 
 ### SSH Hook updates, along with new SSH Operator & SFTP Operator


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb opened a new pull request #3715: [AIRFLOW-XXX] Updating instructions about logging changes in 1.10

2018-08-07 Thread GitBox
ashb opened a new pull request #3715: [AIRFLOW-XXX] Updating instructions about 
logging changes in 1.10
URL: https://github.com/apache/incubator-airflow/pull/3715
 
 
   We had a few other logging changes that weren't mentioned in here that
   meant previous logs were not viewable anymore.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes: Mention changes about remote logging config in UPDATING.md
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: Docs only
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3714: [WIP][AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
kaxil commented on issue #3714: [WIP][AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: 
https://github.com/apache/incubator-airflow/pull/3714#issuecomment-411190182
 
 
   Hi @ashb, Sorry, I need to sort couple of things here. I have added a WIP 
flag. Will ping you as soon as it is ready for review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208372051
 
 

 ##
 File path: airflow/operators/s3_to_hive_operator.py
 ##
 @@ -261,9 +261,9 @@ def _match_headers(self, header_list):
 else:
 return True
 
+@staticmethod
 def _delete_top_row_and_compress(
-self,
-input_file_name,
+input_file_name,
 
 Review comment:
   indent


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371591
 
 

 ##
 File path: airflow/contrib/operators/dataflow_operator.py
 ##
 @@ -331,7 +331,7 @@ def execute(self, context):
 self.py_file, self.py_options)
 
 
-class GoogleCloudBucketHelper():
+class GoogleCloudBucketHelper:
 
 Review comment:
   For py2 compat this should probablt be `class 
GoogleCloudBucketHelper(object)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime

2018-08-07 Thread GitBox
ashb commented on issue #3708: [AIRFLOW-2859] Implement own UtcDateTime
URL: 
https://github.com/apache/incubator-airflow/pull/3708#issuecomment-411189623
 
 
   Can I help?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371167
 
 

 ##
 File path: airflow/configuration.py
 ##
 @@ -186,7 +186,8 @@ def _validate(self):
 
 self.is_validated = True
 
-def _get_env_var_option(self, section, key):
+@staticmethod
+def _get_env_var_option(key):
 
 Review comment:
   What happened to `section`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code to conform Python standards & guidelines

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3714: [AIRFLOW-2867] Refactor code 
to conform Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714#discussion_r208371759
 
 

 ##
 File path: airflow/contrib/operators/oracle_to_oracle_transfer.py
 ##
 @@ -52,10 +52,12 @@ def __init__(
 destination_table,
 oracle_source_conn_id,
 source_sql,
-source_sql_params={},
+source_sql_params=None,
 
 Review comment:
   indent?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572266#comment-16572266
 ] 

ASF GitHub Bot commented on AIRFLOW-2867:
-

kaxil opened a new pull request #3714: [AIRFLOW-2867] Refactor code to conform 
Python standards & guidelines
URL: https://github.com/apache/incubator-airflow/pull/3714
 
 
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-2867
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - Dictionary creation should be written by dictionary literal
   - Python’s default arguments are evaluated once when the function is 
defined, not each time the function is called (like it is in say, Ruby). This 
means that if you use a mutable default argument and mutate it, you will and 
have mutated that object for all future calls to the function as well.
   - Functions calling sets which can be replaced by set literal are now 
replaced by set literal
   - Replace list literals
   - Some of the static methods haven't been set static
   - Remove redundant parentheses
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   N/a, Nothing new added
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow Python Code not compatible to coding guidelines and standards 
> --
>
> Key: AIRFLOW-2867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2867
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> Some of the Airflow code doesn't conform to python coding guidelines and 
> standards.
> The improvement I have analyzed are below:
> - Dictionary creation should be written by dictionary literal
> - Mutable default argument. Python’s default arguments are evaluated once 
> when the function is defined, not each time the function is called (like it 
> is in say, Ruby). This means that if you use a mutable default argument and 
> mutate it, you will and have mutated that object for all future calls to the 
> function as well.
> - Functions calling sets can be replaced by set literal 
> - Replace list literals
> - Some of the static methods haven't been set static
> - Redundant parentheses



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2867) Airflow Python Code not compatible to coding guidelines and standards

2018-08-07 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-2867:
---

 Summary: Airflow Python Code not compatible to coding guidelines 
and standards 
 Key: AIRFLOW-2867
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2867
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0


Some of the Airflow code doesn't conform to python coding guidelines and 
standards.

The improvement I have analyzed are below:
- Dictionary creation should be written by dictionary literal
- Mutable default argument. Python’s default arguments are evaluated once when 
the function is defined, not each time the function is called (like it is in 
say, Ruby). This means that if you use a mutable default argument and mutate 
it, you will and have mutated that object for all future calls to the function 
as well.
- Functions calling sets can be replaced by set literal 
- Replace list literals
- Some of the static methods haven't been set static
- Redundant parentheses



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil edited a comment on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports

2018-08-07 Thread GitBox
kaxil edited a comment on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." 
etc imports
URL: 
https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411169973
 
 
   @mistercrunch 樂 I might have definitely done it on `gcs_hook`. Sorry about 
that. Not sure if I ever used `_os`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports

2018-08-07 Thread GitBox
kaxil commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc 
imports
URL: 
https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411169973
 
 
   @mistercrunch 樂 I might have definitely done it on `gcs_hook`. Sorry about 
that. Not sure if I ever use `_os`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572155#comment-16572155
 ] 

ASF GitHub Bot commented on AIRFLOW-2851:
-

mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." 
etc imports
URL: https://github.com/apache/incubator-airflow/pull/3696
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index 2e05fde0cd..ed8943ac77 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -36,7 +36,7 @@
 import warnings
 
 from backports.configparser import ConfigParser
-from zope.deprecation import deprecated as _deprecated
+from zope.deprecation import deprecated
 
 from airflow.exceptions import AirflowConfigException
 from airflow.utils.log.logging_mixin import LoggingMixin
@@ -534,7 +534,7 @@ def parameterized_config(template):
 
 for func in [load_test_config, get, getboolean, getfloat, getint, has_option,
  remove_option, as_dict, set]:
-_deprecated(
+deprecated(
 func,
 "Accessing configuration method '{f.__name__}' directly from "
 "the configuration module is deprecated. Please access the "
diff --git a/airflow/contrib/auth/backends/kerberos_auth.py 
b/airflow/contrib/auth/backends/kerberos_auth.py
index 08be299a19..fdb6204967 100644
--- a/airflow/contrib/auth/backends/kerberos_auth.py
+++ b/airflow/contrib/auth/backends/kerberos_auth.py
@@ -28,7 +28,7 @@
 # pykerberos should be used as it verifies the KDC, the "kerberos" module does 
not do so
 # and make it possible to spoof the KDC
 import kerberos
-import airflow.security.utils as utils
+from airflow.security import utils
 
 from flask import url_for, redirect
 
diff --git a/airflow/contrib/hooks/__init__.py 
b/airflow/contrib/hooks/__init__.py
index 95f9892d89..6c31842578 100644
--- a/airflow/contrib/hooks/__init__.py
+++ b/airflow/contrib/hooks/__init__.py
@@ -24,7 +24,7 @@
 
 
 import sys
-import os as _os
+import os
 
 # 
 #
@@ -64,7 +64,7 @@
 }
 
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 
 airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
diff --git a/airflow/contrib/operators/__init__.py 
b/airflow/contrib/operators/__init__.py
index 3485aaa7fd..247ec5941f 100644
--- a/airflow/contrib/operators/__init__.py
+++ b/airflow/contrib/operators/__init__.py
@@ -24,7 +24,7 @@
 
 
 import sys
-import os as _os
+import os
 
 # 
 #
@@ -48,6 +48,6 @@
 'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator']
 }
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 airflow_importer = AirflowImporter(sys.modules[__name__], _operators)
diff --git a/airflow/hooks/__init__.py b/airflow/hooks/__init__.py
index 1f1c40f16a..38a7dcfebe 100644
--- a/airflow/hooks/__init__.py
+++ b/airflow/hooks/__init__.py
@@ -18,7 +18,7 @@
 # under the License.
 
 
-import os as _os
+import os
 import sys
 
 
@@ -67,7 +67,7 @@
 }
 
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
 
@@ -82,12 +82,12 @@ def _integrate_plugins():
 ##
 # TODO FIXME Remove in Airflow 2.0
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from zope.deprecation import deprecated as _deprecated
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+from zope.deprecation import deprecated
 for _hook in hooks_module._objects:
 hook_name = _hook.__name__
 globals()[hook_name] = _hook
-_deprecated(
+deprecated(
 hook_name,
 "Importing plugin hook '{i}' directly from "
 "'airflow.hooks' has been deprecated. Please "
diff --git a/airflow/hooks/hive_hooks.py b/airflow/hooks/hive_hooks.py
index 5ac99eec6c..8df96c464f 100644
--- a/airflow/hooks/hive_hooks.py
+++ b/airflow/hooks/hive_hooks.py
@@ -34,10 +34,10 @@
 from past.builtins import unicode
 from six.moves import zip
 
-import airflow.security.utils as utils
 from airflow import configuration
 from 

[jira] [Commented] (AIRFLOW-2851) Canonicalize "as _..." etc imports

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572156#comment-16572156
 ] 

ASF subversion and git services commented on AIRFLOW-2851:
--

Commit 9131d6cc8fce515222da6e4ab9d86cce69f20d1e in incubator-airflow's branch 
refs/heads/master from Taylor D. Edmiston
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=9131d6c ]

[AIRFLOW-2851] Canonicalize "as _..." etc imports (#3696)



> Canonicalize "as _..." etc imports
> --
>
> Key: AIRFLOW-2851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2851
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> This PR:
> 1. Replaces `import foo as _foo` style imports with the more common `import 
> foo` used everywhere else across the codebase. I dug through history and 
> couldn't find special reasons to maintain the as style imports here (I think 
> it's just old code). Currently (33dd33c89d4b6454d224ca34bab5ae37fb9812a6), 
> there are just a handful of import lines using `as _...` vs thousands not 
> using it, so the goal here is to improve consistency.
> 2. It also simplifies `import foo.bar as bar` style imports to equivalent 
> `from foo import bar`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports

2018-08-07 Thread GitBox
mistercrunch closed pull request #3696: [AIRFLOW-2851] Canonicalize "as _..." 
etc imports
URL: https://github.com/apache/incubator-airflow/pull/3696
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index 2e05fde0cd..ed8943ac77 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -36,7 +36,7 @@
 import warnings
 
 from backports.configparser import ConfigParser
-from zope.deprecation import deprecated as _deprecated
+from zope.deprecation import deprecated
 
 from airflow.exceptions import AirflowConfigException
 from airflow.utils.log.logging_mixin import LoggingMixin
@@ -534,7 +534,7 @@ def parameterized_config(template):
 
 for func in [load_test_config, get, getboolean, getfloat, getint, has_option,
  remove_option, as_dict, set]:
-_deprecated(
+deprecated(
 func,
 "Accessing configuration method '{f.__name__}' directly from "
 "the configuration module is deprecated. Please access the "
diff --git a/airflow/contrib/auth/backends/kerberos_auth.py 
b/airflow/contrib/auth/backends/kerberos_auth.py
index 08be299a19..fdb6204967 100644
--- a/airflow/contrib/auth/backends/kerberos_auth.py
+++ b/airflow/contrib/auth/backends/kerberos_auth.py
@@ -28,7 +28,7 @@
 # pykerberos should be used as it verifies the KDC, the "kerberos" module does 
not do so
 # and make it possible to spoof the KDC
 import kerberos
-import airflow.security.utils as utils
+from airflow.security import utils
 
 from flask import url_for, redirect
 
diff --git a/airflow/contrib/hooks/__init__.py 
b/airflow/contrib/hooks/__init__.py
index 95f9892d89..6c31842578 100644
--- a/airflow/contrib/hooks/__init__.py
+++ b/airflow/contrib/hooks/__init__.py
@@ -24,7 +24,7 @@
 
 
 import sys
-import os as _os
+import os
 
 # 
 #
@@ -64,7 +64,7 @@
 }
 
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 
 airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
diff --git a/airflow/contrib/operators/__init__.py 
b/airflow/contrib/operators/__init__.py
index 3485aaa7fd..247ec5941f 100644
--- a/airflow/contrib/operators/__init__.py
+++ b/airflow/contrib/operators/__init__.py
@@ -24,7 +24,7 @@
 
 
 import sys
-import os as _os
+import os
 
 # 
 #
@@ -48,6 +48,6 @@
 'hive_to_dynamodb': ['HiveToDynamoDBTransferOperator']
 }
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 airflow_importer = AirflowImporter(sys.modules[__name__], _operators)
diff --git a/airflow/hooks/__init__.py b/airflow/hooks/__init__.py
index 1f1c40f16a..38a7dcfebe 100644
--- a/airflow/hooks/__init__.py
+++ b/airflow/hooks/__init__.py
@@ -18,7 +18,7 @@
 # under the License.
 
 
-import os as _os
+import os
 import sys
 
 
@@ -67,7 +67,7 @@
 }
 
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
 from airflow.utils.helpers import AirflowImporter
 airflow_importer = AirflowImporter(sys.modules[__name__], _hooks)
 
@@ -82,12 +82,12 @@ def _integrate_plugins():
 ##
 # TODO FIXME Remove in Airflow 2.0
 
-if not _os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
-from zope.deprecation import deprecated as _deprecated
+if not os.environ.get('AIRFLOW_USE_NEW_IMPORTS', False):
+from zope.deprecation import deprecated
 for _hook in hooks_module._objects:
 hook_name = _hook.__name__
 globals()[hook_name] = _hook
-_deprecated(
+deprecated(
 hook_name,
 "Importing plugin hook '{i}' directly from "
 "'airflow.hooks' has been deprecated. Please "
diff --git a/airflow/hooks/hive_hooks.py b/airflow/hooks/hive_hooks.py
index 5ac99eec6c..8df96c464f 100644
--- a/airflow/hooks/hive_hooks.py
+++ b/airflow/hooks/hive_hooks.py
@@ -34,10 +34,10 @@
 from past.builtins import unicode
 from six.moves import zip
 
-import airflow.security.utils as utils
 from airflow import configuration
 from airflow.exceptions import AirflowException
 from airflow.hooks.base_hook import BaseHook
+from airflow.security import utils
 from airflow.utils.file import TemporaryDirectory
 from airflow.utils.helpers import as_flattened_list
 from 

[GitHub] mistercrunch commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports

2018-08-07 Thread GitBox
mistercrunch commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." 
etc imports
URL: 
https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411163359
 
 
   I'm unclear as to why contributors went around their way to alias imports, 
but it appears to be 100% unnecessary. `git blame` show @kaxil as one of the 
people doing this on `_os`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-2866) Missing CSRF Token Error on Web UI Create/Update Operations

2018-08-07 Thread Jasper Kahn (JIRA)
Jasper Kahn created AIRFLOW-2866:


 Summary: Missing CSRF Token Error on Web UI Create/Update 
Operations
 Key: AIRFLOW-2866
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2866
 Project: Apache Airflow
  Issue Type: Bug
  Components: webapp
Reporter: Jasper Kahn


Attempting to modify or delete many resources (such as Connections or Users) 
results in a 400 from the webserver:
{quote}{{Bad Request}}
{{The CSRF session token is missing.}}{quote}
Logs report:
{quote}{{[2018-08-07 18:45:15,771] \{csrf.py:251} INFO - The CSRF session token 
is missing.}}
{{192.168.9.1 - - [07/Aug/2018:18:45:15 +] "POST /admin/connection/delete/ 
HTTP/1.1" 400 150 "http://localhost:8081/admin/connection/; "Mozilla/5.0 (X11; 
Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 
Safari/537.36"}}{quote}
Chrome dev tools show the CSRF token is present in the request payload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2865) Race condition between on_success_callback and LocalTaskJob's cleanup

2018-08-07 Thread Marcin Mejran (JIRA)
Marcin Mejran created AIRFLOW-2865:
--

 Summary: Race condition between on_success_callback and 
LocalTaskJob's cleanup
 Key: AIRFLOW-2865
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2865
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Marcin Mejran


The TaskInstance's run_raw_task method first records SUCCESS for the task 
instance and then runs the on_success_callback function.

The LocalTaskJob's heartbeat_callback checks for any TI's with a SUCCESS state 
and terminates their processes.

As such it's possible for the TI process to be terminated before the 
on_success_callback function finishes running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572143#comment-16572143
 ] 

ASF GitHub Bot commented on AIRFLOW-2140:
-

bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes 
for the SparkSubmit hook
URL: https://github.com/apache/incubator-airflow/pull/3700
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/spark_submit_hook.py 
b/airflow/contrib/hooks/spark_submit_hook.py
index 0185cab283..65bb6134e6 100644
--- a/airflow/contrib/hooks/spark_submit_hook.py
+++ b/airflow/contrib/hooks/spark_submit_hook.py
@@ -26,7 +26,6 @@
 from airflow.exceptions import AirflowException
 from airflow.utils.log.logging_mixin import LoggingMixin
 from airflow.contrib.kubernetes import kube_client
-from kubernetes.client.rest import ApiException
 
 
 class SparkSubmitHook(BaseHook, LoggingMixin):
@@ -136,6 +135,10 @@ def __init__(self,
 self._connection = self._resolve_connection()
 self._is_yarn = 'yarn' in self._connection['master']
 self._is_kubernetes = 'k8s' in self._connection['master']
+if self._is_kubernetes and kube_client is None:
+raise RuntimeError(
+"{master} specified by kubernetes dependencies are not 
installed!".format(
+self._connection['master']))
 
 self._should_track_driver_status = 
self._resolve_should_track_driver_status()
 self._driver_id = None
@@ -559,6 +562,6 @@ def on_kill(self):
 
 self.log.info("Spark on K8s killed with response: %s", 
api_response)
 
-except ApiException as e:
+except kube_client.ApiException as e:
 self.log.info("Exception when attempting to kill Spark on 
K8s:")
 self.log.exception(e)
diff --git a/airflow/contrib/kubernetes/kube_client.py 
b/airflow/contrib/kubernetes/kube_client.py
index 8b71f41242..4b8fa17155 100644
--- a/airflow/contrib/kubernetes/kube_client.py
+++ b/airflow/contrib/kubernetes/kube_client.py
@@ -17,9 +17,21 @@
 from airflow.configuration import conf
 from six import PY2
 
+try:
+from kubernetes import config, client
+from kubernetes.client.rest import ApiException
+has_kubernetes = True
+except ImportError as e:
+# We need an exception class to be able to use it in ``except`` elsewhere
+# in the code base
+ApiException = BaseException
+has_kubernetes = False
+_import_err = e
+
 
 def _load_kube_config(in_cluster, cluster_context, config_file):
-from kubernetes import config, client
+if not has_kubernetes:
+raise _import_err
 if in_cluster:
 config.load_incluster_config()
 else:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2140) Add Kubernetes Scheduler to Spark Submit Operator

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572144#comment-16572144
 ] 

ASF subversion and git services commented on AIRFLOW-2140:
--

Commit 0be002eebb182b607109a0390d7f6fb8795c668b in incubator-airflow's branch 
refs/heads/master from [~ashb]
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=0be002e ]

[AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (#3700)

This extra dep is a quasi-breaking change when upgrading - previously
there were no deps outside of Airflow itself for this hook. Importing
the k8s libs breaks installs that aren't also using Kubernetes.

This makes the dep optional for anyone who doesn't explicitly use the
functionality

> Add Kubernetes Scheduler to Spark Submit Operator
> -
>
> Key: AIRFLOW-2140
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2140
> Project: Apache Airflow
>  Issue Type: New Feature
>Affects Versions: 1.9.0
>Reporter: Rob Keevil
>Assignee: Rob Keevil
>Priority: Major
> Fix For: 2.0.0
>
>
> Spark 2.3 adds the Kubernetes resource manager to Spark, alongside the 
> existing Standalone, Yarn and Mesos resource managers. 
> https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md
> We should extend the spark submit operator to enable the new K8s spark submit 
> options, and to be able to monitor Spark jobs running within Kubernetes.
> I already have working code for this, I need to test the monitoring/log 
> parsing code and make sure that Airflow is able to terminate Kubernetes pods 
> when jobs are cancelled etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook

2018-08-07 Thread GitBox
bolkedebruin closed pull request #3700: [AIRFLOW-2140] Don't require kubernetes 
for the SparkSubmit hook
URL: https://github.com/apache/incubator-airflow/pull/3700
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/spark_submit_hook.py 
b/airflow/contrib/hooks/spark_submit_hook.py
index 0185cab283..65bb6134e6 100644
--- a/airflow/contrib/hooks/spark_submit_hook.py
+++ b/airflow/contrib/hooks/spark_submit_hook.py
@@ -26,7 +26,6 @@
 from airflow.exceptions import AirflowException
 from airflow.utils.log.logging_mixin import LoggingMixin
 from airflow.contrib.kubernetes import kube_client
-from kubernetes.client.rest import ApiException
 
 
 class SparkSubmitHook(BaseHook, LoggingMixin):
@@ -136,6 +135,10 @@ def __init__(self,
 self._connection = self._resolve_connection()
 self._is_yarn = 'yarn' in self._connection['master']
 self._is_kubernetes = 'k8s' in self._connection['master']
+if self._is_kubernetes and kube_client is None:
+raise RuntimeError(
+"{master} specified by kubernetes dependencies are not 
installed!".format(
+self._connection['master']))
 
 self._should_track_driver_status = 
self._resolve_should_track_driver_status()
 self._driver_id = None
@@ -559,6 +562,6 @@ def on_kill(self):
 
 self.log.info("Spark on K8s killed with response: %s", 
api_response)
 
-except ApiException as e:
+except kube_client.ApiException as e:
 self.log.info("Exception when attempting to kill Spark on 
K8s:")
 self.log.exception(e)
diff --git a/airflow/contrib/kubernetes/kube_client.py 
b/airflow/contrib/kubernetes/kube_client.py
index 8b71f41242..4b8fa17155 100644
--- a/airflow/contrib/kubernetes/kube_client.py
+++ b/airflow/contrib/kubernetes/kube_client.py
@@ -17,9 +17,21 @@
 from airflow.configuration import conf
 from six import PY2
 
+try:
+from kubernetes import config, client
+from kubernetes.client.rest import ApiException
+has_kubernetes = True
+except ImportError as e:
+# We need an exception class to be able to use it in ``except`` elsewhere
+# in the code base
+ApiException = BaseException
+has_kubernetes = False
+_import_err = e
+
 
 def _load_kube_config(in_cluster, cluster_context, config_file):
-from kubernetes import config, client
+if not has_kubernetes:
+raise _import_err
 if in_cluster:
 config.load_incluster_config()
 else:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jakahn commented on issue #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook

2018-08-07 Thread GitBox
jakahn commented on issue #3677: [AIRFLOW-2826] Add GoogleCloudKMSHook
URL: 
https://github.com/apache/incubator-airflow/pull/3677#issuecomment-411156427
 
 
   @Fokko PTAL when you can


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-852) Docs: Cancel a triggered dag

2018-08-07 Thread Quijano Flores (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572113#comment-16572113
 ] 

Quijano Flores commented on AIRFLOW-852:


I get an error when I do this [~salad_king]

`AttributeError: 'NoneType' object has no attribute 'is_paused'`

```
Traceback (most recent call last):
 File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1988, in 
wsgi_app
 response = self.full_dispatch_request()
 File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1641, in 
full_dispatch_request
 rv = self.handle_user_exception(e)
 File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1544, in 
handle_user_exception
 reraise(exc_type, exc_value, tb)
 File "/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 33, in 
reraise
 raise value
 File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1639, in 
full_dispatch_request
 rv = self.dispatch_request()
 File "/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1625, in 
dispatch_request
 return self.view_functions[rule.endpoint](**req.view_args)
 File "/anaconda3/lib/python3.6/site-packages/flask_admin/base.py", line 69, in 
inner
 return self._run_view(f, *args, **kwargs)
 File "/anaconda3/lib/python3.6/site-packages/flask_admin/base.py", line 368, 
in _run_view
 return fn(self, *args, **kwargs)
 File "/anaconda3/lib/python3.6/site-packages/flask_login.py", line 755, in 
decorated_view
 return func(*args, **kwargs)
 File "/anaconda3/lib/python3.6/site-packages/airflow/www/utils.py", line 125, 
in wrapper
 return f(*args, **kwargs)
 File "/anaconda3/lib/python3.6/site-packages/airflow/www/views.py", line 1606, 
in paused
 orm_dag.is_paused = True
```

> Docs: Cancel a triggered dag
> 
>
> Key: AIRFLOW-852
> URL: https://issues.apache.org/jira/browse/AIRFLOW-852
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Andi Pl
>Priority: Major
>
> Issue: Cancel an externally triggered dag.
> The dag is running and contains 200 parallel tasks. Only 3 are run at once, 
> as the celery executor is limited to 3 in parallel.
> 23 tasks are finished till now.
> We want to cancel the execution of the missing tasks. But they are not queued 
> yet. How can we  cancel them?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tedmiston commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc imports

2018-08-07 Thread GitBox
tedmiston commented on issue #3696: [AIRFLOW-2851] Canonicalize "as _..." etc 
imports
URL: 
https://github.com/apache/incubator-airflow/pull/3696#issuecomment-411139302
 
 
   Thank you both for the review and initial comments!  It sounds like the next 
step here is probably to get a quick thought from Max when he has a chance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues

2018-08-07 Thread GitBox
verdan commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-411138672
 
 
   @tedmiston sounds good to me!!  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tedmiston commented on issue #3587: [WIP][AIRFLOW-2732] Remove hooks and operators from core

2018-08-07 Thread GitBox
tedmiston commented on issue #3587: [WIP][AIRFLOW-2732] Remove hooks and 
operators from core
URL: 
https://github.com/apache/incubator-airflow/pull/3587#issuecomment-411135195
 
 
   Closing this PR for now as the next step for this issue is drafting an AIP.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues

2018-08-07 Thread GitBox
tedmiston commented on issue #3656: [WIP][AIRFLOW-2803] Fix all ESLint issues
URL: 
https://github.com/apache/incubator-airflow/pull/3656#issuecomment-411134901
 
 
   Hey all - thanks for bumping.  Sorry for any delay here — my work sprint has 
been quite busy this past week.
   
   This PR is actually very nearly done.  I have a local WIP commit to be 
pushed up with a small bit of cleanup left IIRC.  Given that, my preference 
would be to wrap up and merge this PR since I've already done ~95% of the work, 
then rebase/update @verdan's extraction PR above (hopefully this doesn't cause 
much extra work there).  This is something I can get done very soon.
   
   I'm also happy to help review/test/etc on the extraction PR as well.  Does 
this work for you @verdan ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2863) GKEClusterHook catches wrong exception

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571966#comment-16571966
 ] 

ASF GitHub Bot commented on AIRFLOW-2863:
-

feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching 
wrong exception
URL: https://github.com/apache/incubator-airflow/pull/3711
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/gcp_container_hook.py 
b/airflow/contrib/hooks/gcp_container_hook.py
index d36d796d76..227cd3d124 100644
--- a/airflow/contrib/hooks/gcp_container_hook.py
+++ b/airflow/contrib/hooks/gcp_container_hook.py
@@ -23,7 +23,7 @@
 from airflow import AirflowException, version
 from airflow.hooks.base_hook import BaseHook
 
-from google.api_core.exceptions import AlreadyExists
+from google.api_core.exceptions import AlreadyExists, NotFound
 from google.api_core.gapic_v1.method import DEFAULT
 from google.cloud import container_v1, exceptions
 from google.cloud.container_v1.gapic.enums import Operation
@@ -141,7 +141,7 @@ def delete_cluster(self, name, retry=DEFAULT, 
timeout=DEFAULT):
 op = self.wait_for_operation(op)
 # Returns server-defined url for the resource
 return op.self_link
-except exceptions.NotFound as error:
+except NotFound as error:
 self.log.info('Assuming Success: ' + error.message)
 
 def create_cluster(self, cluster, retry=DEFAULT, timeout=DEFAULT):
diff --git a/tests/contrib/hooks/test_gcp_container_hook.py 
b/tests/contrib/hooks/test_gcp_container_hook.py
index f3705ea4ce..6e13461395 100644
--- a/tests/contrib/hooks/test_gcp_container_hook.py
+++ b/tests/contrib/hooks/test_gcp_container_hook.py
@@ -61,6 +61,22 @@ def test_delete_cluster(self, wait_mock, convert_mock):
 wait_mock.assert_called_with(client_delete.return_value)
 convert_mock.assert_not_called()
 
+@mock.patch(
+"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log")
+
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
+@mock.patch(
+
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
+def test_delete_cluster_not_found(self, wait_mock, convert_mock, log_mock):
+from google.api_core.exceptions import NotFound
+# To force an error
+message = 'Not Found'
+self.gke_hook.client.delete_cluster.side_effect = 
NotFound(message=message)
+
+self.gke_hook.delete_cluster(None)
+wait_mock.assert_not_called()
+convert_mock.assert_not_called()
+log_mock.info.assert_any_call("Assuming Success: " + message)
+
 
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
 @mock.patch(
 
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
@@ -107,7 +123,7 @@ def test_create_cluster_proto(self, wait_mock, 
convert_mock):
 
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
 @mock.patch(
 
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
-def test_delete_cluster_dict(self, wait_mock, convert_mock):
+def test_create_cluster_dict(self, wait_mock, convert_mock):
 mock_cluster_dict = {'name': CLUSTER_NAME}
 retry_mock, timeout_mock = mock.Mock(), mock.Mock()
 
@@ -135,6 +151,22 @@ def test_create_cluster_error(self, wait_mock, 
convert_mock):
 wait_mock.assert_not_called()
 convert_mock.assert_not_called()
 
+@mock.patch(
+"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log")
+
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
+@mock.patch(
+
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
+def test_create_cluster_already_exists(self, wait_mock, convert_mock, 
log_mock):
+from google.api_core.exceptions import AlreadyExists
+# To force an error
+message = 'Already Exists'
+self.gke_hook.client.create_cluster.side_effect = 
AlreadyExists(message=message)
+
+self.gke_hook.create_cluster({})
+wait_mock.assert_not_called()
+self.assertEquals(convert_mock.call_count, 1)
+log_mock.info.assert_any_call("Assuming Success: " + message)
+
 
 class GKEClusterHookGetTest(unittest.TestCase):
 def setUp(self):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this 

[jira] [Commented] (AIRFLOW-2863) GKEClusterHook catches wrong exception

2018-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571967#comment-16571967
 ] 

ASF subversion and git services commented on AIRFLOW-2863:
--

Commit 142a9425db3675fc433b0dbbb637f53638e70a8a in incubator-airflow's branch 
refs/heads/master from [~noremac201]
[ https://gitbox.apache.org/repos/asf?p=incubator-airflow.git;h=142a942 ]

[AIRFLOW-2863] Fix GKEClusterHook catching wrong exception (#3711)



> GKEClusterHook catches wrong exception
> --
>
> Key: AIRFLOW-2863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2863
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>
> Instead of successfully catching the error and reporting success, it reports 
> a failure, since it catches the wrong error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception

2018-08-07 Thread GitBox
feng-tao closed pull request #3711: [AIRFLOW-2863] Fix GKEClusterHook catching 
wrong exception
URL: https://github.com/apache/incubator-airflow/pull/3711
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/gcp_container_hook.py 
b/airflow/contrib/hooks/gcp_container_hook.py
index d36d796d76..227cd3d124 100644
--- a/airflow/contrib/hooks/gcp_container_hook.py
+++ b/airflow/contrib/hooks/gcp_container_hook.py
@@ -23,7 +23,7 @@
 from airflow import AirflowException, version
 from airflow.hooks.base_hook import BaseHook
 
-from google.api_core.exceptions import AlreadyExists
+from google.api_core.exceptions import AlreadyExists, NotFound
 from google.api_core.gapic_v1.method import DEFAULT
 from google.cloud import container_v1, exceptions
 from google.cloud.container_v1.gapic.enums import Operation
@@ -141,7 +141,7 @@ def delete_cluster(self, name, retry=DEFAULT, 
timeout=DEFAULT):
 op = self.wait_for_operation(op)
 # Returns server-defined url for the resource
 return op.self_link
-except exceptions.NotFound as error:
+except NotFound as error:
 self.log.info('Assuming Success: ' + error.message)
 
 def create_cluster(self, cluster, retry=DEFAULT, timeout=DEFAULT):
diff --git a/tests/contrib/hooks/test_gcp_container_hook.py 
b/tests/contrib/hooks/test_gcp_container_hook.py
index f3705ea4ce..6e13461395 100644
--- a/tests/contrib/hooks/test_gcp_container_hook.py
+++ b/tests/contrib/hooks/test_gcp_container_hook.py
@@ -61,6 +61,22 @@ def test_delete_cluster(self, wait_mock, convert_mock):
 wait_mock.assert_called_with(client_delete.return_value)
 convert_mock.assert_not_called()
 
+@mock.patch(
+"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log")
+
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
+@mock.patch(
+
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
+def test_delete_cluster_not_found(self, wait_mock, convert_mock, log_mock):
+from google.api_core.exceptions import NotFound
+# To force an error
+message = 'Not Found'
+self.gke_hook.client.delete_cluster.side_effect = 
NotFound(message=message)
+
+self.gke_hook.delete_cluster(None)
+wait_mock.assert_not_called()
+convert_mock.assert_not_called()
+log_mock.info.assert_any_call("Assuming Success: " + message)
+
 
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
 @mock.patch(
 
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
@@ -107,7 +123,7 @@ def test_create_cluster_proto(self, wait_mock, 
convert_mock):
 
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
 @mock.patch(
 
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
-def test_delete_cluster_dict(self, wait_mock, convert_mock):
+def test_create_cluster_dict(self, wait_mock, convert_mock):
 mock_cluster_dict = {'name': CLUSTER_NAME}
 retry_mock, timeout_mock = mock.Mock(), mock.Mock()
 
@@ -135,6 +151,22 @@ def test_create_cluster_error(self, wait_mock, 
convert_mock):
 wait_mock.assert_not_called()
 convert_mock.assert_not_called()
 
+@mock.patch(
+"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.log")
+
@mock.patch("airflow.contrib.hooks.gcp_container_hook.GKEClusterHook._dict_to_proto")
+@mock.patch(
+
"airflow.contrib.hooks.gcp_container_hook.GKEClusterHook.wait_for_operation")
+def test_create_cluster_already_exists(self, wait_mock, convert_mock, 
log_mock):
+from google.api_core.exceptions import AlreadyExists
+# To force an error
+message = 'Already Exists'
+self.gke_hook.client.create_cluster.side_effect = 
AlreadyExists(message=message)
+
+self.gke_hook.create_cluster({})
+wait_mock.assert_not_called()
+self.assertEquals(convert_mock.call_count, 1)
+log_mock.info.assert_any_call("Assuming Success: " + message)
+
 
 class GKEClusterHookGetTest(unittest.TestCase):
 def setUp(self):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception

2018-08-07 Thread GitBox
feng-tao commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching 
wrong exception
URL: 
https://github.com/apache/incubator-airflow/pull/3711#issuecomment-411127557
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Noremac201 commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching wrong exception

2018-08-07 Thread GitBox
Noremac201 commented on issue #3711: [AIRFLOW-2863] Fix GKEClusterHook catching 
wrong exception
URL: 
https://github.com/apache/incubator-airflow/pull/3711#issuecomment-49591
 
 
   Done!
   Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] awelsh93 commented on issue #3710: [AIRFLOW-2860] Update tests for druid hook

2018-08-07 Thread GitBox
awelsh93 commented on issue #3710: [AIRFLOW-2860] Update tests for druid hook
URL: 
https://github.com/apache/incubator-airflow/pull/3710#issuecomment-47114
 
 
   Thanks for merging this - can the change in #3707 get another review please? 
It was merged but then reverted due to tests failing which have now been fixed 
in this pull request.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
feng-tao commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208296250
 
 

 ##
 File path: tests/models.py
 ##
 @@ -56,7 +56,7 @@
 from airflow.utils.trigger_rule import TriggerRule
 from mock import patch, ANY
 from parameterized import parameterized
-from tempfile import NamedTemporaryFile
+from tempfile import NamedTemporaryFile, mkdtemp
 
 Review comment:
   small nit: could you move mkdtemp before NamedTemporaryFile given we put 
lower case import before upper case(L57)? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook

2018-08-07 Thread GitBox
ashb commented on a change in pull request #3713: [AIRFLOW-2428] Add 
AutoScalingRole key to emr_hook
URL: https://github.com/apache/incubator-airflow/pull/3713#discussion_r208267811
 
 

 ##
 File path: airflow/contrib/hooks/emr_hook.py
 ##
 @@ -63,6 +63,8 @@ def create_job_flow(self, job_flow_overrides):
 VisibleToAllUsers=config.get('VisibleToAllUsers'),
 JobFlowRole=config.get('JobFlowRole'),
 ServiceRole=config.get('ServiceRole'),
+AutoScalingRole=config.get('AutoScalingRole'),
+ScaleDownBehavior=config.get('ScaleDownBehavior'),
 Tags=config.get('Tags')
 
 Review comment:
   Hmmm, I wonder if it might be an idea to do 
`self.get_conn().run_job_flow(**config)` - that we don't have to stay 
up-to-date with the AWS API.
   
   For that to work we'd need three `config.setdefault()` calls for the three 
list-types. I also don't know if all the kwargs are optional or not (but I 
suspect they are).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook

2018-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571799#comment-16571799
 ] 

ASF GitHub Bot commented on AIRFLOW-2428:
-

dmnpignaud opened a new pull request #3713: [AIRFLOW-2428] Add AutoScalingRole 
key to emr_hook
URL: https://github.com/apache/incubator-airflow/pull/3713
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-2428)
   
   ### Description
   
   - [x] Enable the use of the autoscaling functionality when creating an EMR 
cluster
   
   ### Tests
   
   - [ ] None
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add AutoScalingRole key to emr_hook
> ---
>
> Key: AIRFLOW-2428
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2428
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kyle Hamlin
>Priority: Minor
> Fix For: 1.10.0
>
>
> Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` 
> method for EMR autoscaling to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >