[GitHub] Rotemlofer commented on issue #2740: [AIRFLOW-1768] add if to show trigger only if not paused

2019-01-02 Thread GitBox
Rotemlofer commented on issue #2740: [AIRFLOW-1768] add if to show trigger only 
if not paused
URL: 
https://github.com/apache/incubator-airflow/pull/2740#issuecomment-450804541
 
 
   This is a nice twik! I was confused by it several times. Would be nice to 
have this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] milton0825 closed pull request #3478: [PTAL][AIRFLOW-2571] airflow backfill progress bar

2019-01-02 Thread GitBox
milton0825 closed pull request #3478: [PTAL][AIRFLOW-2571] airflow backfill 
progress bar
URL: https://github.com/apache/incubator-airflow/pull/3478
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index b56e325327..949a18c419 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -215,6 +215,7 @@ def backfill(args, dag=None):
 verbose=args.verbose,
 conf=run_conf,
 rerun_failed_tasks=args.rerun_failed_tasks,
+show_progress_bar=args.progress_bar,
 )
 
 
@@ -1388,7 +1389,12 @@ class CLIFactory(object):
 "all the failed tasks for the backfill date range "
 "instead of throwing exceptions"),
 "store_true"),
-
+'progress_bar': Arg(
+("--progress_bar", "-pbar"),
+(
+"Show backfill progress"
+),
+"store_true"),
 # list_tasks
 'tree': Arg(("-t", "--tree"), "Tree view", "store_true"),
 # list_dags
@@ -1716,7 +1722,7 @@ class CLIFactory(object):
 'mark_success', 'local', 'donot_pickle',
 'bf_ignore_dependencies', 'bf_ignore_first_depends_on_past',
 'subdir', 'pool', 'delay_on_limit', 'dry_run', 'verbose', 
'conf',
-'reset_dag_run', 'rerun_failed_tasks',
+'reset_dag_run', 'rerun_failed_tasks', 'progress_bar',
 )
 }, {
 'func': list_tasks,
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 6cfd2d3769..b3d2b18be7 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -56,8 +56,14 @@ def execute_command(command):
 log.info("Executing command in Celery: %s", command)
 env = os.environ.copy()
 try:
-subprocess.check_call(command, shell=True, stderr=subprocess.STDOUT,
-  close_fds=True, env=env)
+output = subprocess.check_output(
+command,
+shell=True,
+close_fds=True,
+stderr=subprocess.STDOUT,
+env=env,
+)
+log.info(output)
 except subprocess.CalledProcessError as e:
 log.exception('execute_command encountered a CalledProcessError')
 log.error(e.output)
diff --git a/airflow/executors/dask_executor.py 
b/airflow/executors/dask_executor.py
index a6ba677f8b..26b90f81d7 100644
--- a/airflow/executors/dask_executor.py
+++ b/airflow/executors/dask_executor.py
@@ -64,7 +64,13 @@ def execute_async(self, key, command, queue=None, 
executor_config=None):
 )
 
 def airflow_run():
-return subprocess.check_call(command, shell=True, close_fds=True)
+output = subprocess.check_output(
+command,
+shell=True,
+close_fds=True,
+)
+self.log.info(output)
+return 0
 
 future = self.client.submit(airflow_run, pure=False)
 self.futures[future] = key
diff --git a/airflow/executors/local_executor.py 
b/airflow/executors/local_executor.py
index 0c85262324..0cdaaff026 100644
--- a/airflow/executors/local_executor.py
+++ b/airflow/executors/local_executor.py
@@ -84,7 +84,12 @@ def execute_work(self, key, command):
 self.log.info("%s running %s", self.__class__.__name__, command)
 command = "exec bash -c '{0}'".format(command)
 try:
-subprocess.check_call(command, shell=True, close_fds=True)
+output = subprocess.check_output(
+command,
+shell=True,
+close_fds=True,
+)
+self.log.info(output)
 state = State.SUCCESS
 except subprocess.CalledProcessError as e:
 state = State.FAILED
diff --git a/airflow/executors/sequential_executor.py 
b/airflow/executors/sequential_executor.py
index 9c0d8ecf0c..f76217014a 100644
--- a/airflow/executors/sequential_executor.py
+++ b/airflow/executors/sequential_executor.py
@@ -45,7 +45,12 @@ def sync(self):
 self.log.info("Executing command: %s", command)
 
 try:
-subprocess.check_call(command, shell=True, close_fds=True)
+output = subprocess.check_output(
+command,
+shell=True,
+close_fds=True,
+)
+self.log.info(output)
 self.change_state(key, State.SUCCESS)
 except subprocess.CalledProcessError as e:
 self.change_state(key, State.FAILED)
diff --git a/airflow/jobs.py b/airflow/jobs.py
index ad114abda3..a4691d67fa 100644
--- 

[jira] [Commented] (AIRFLOW-2571) airflow backfill progress bar

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731837#comment-16731837
 ] 

ASF GitHub Bot commented on AIRFLOW-2571:
-

milton0825 commented on pull request #3478: [PTAL][AIRFLOW-2571] airflow 
backfill progress bar
URL: https://github.com/apache/incubator-airflow/pull/3478
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> airflow backfill progress bar
> -
>
> Key: AIRFLOW-2571
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2571
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> Improve airflow backfill to show percentage completed like:
>  100% ||



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3614) Operators loading to GCS - to support CSV files

2019-01-02 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731869#comment-16731869
 ] 

jack commented on AIRFLOW-3614:
---

Some work was started with https://issues.apache.org/jira/browse/AIRFLOW-2224  
but this was never merged.

> Operators loading to GCS - to support CSV files
> ---
>
> Key: AIRFLOW-3614
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3614
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.1, 2.0.0
>Reporter: jack
>Priority: Major
>
> Currently, only {color:#6f42c1}BigQueryToCloudStorageOperator{color} supports 
> both json and csv.
> The rest of the operators:
> {color:#6f42c1}CassandraToGoogleCloudStorageOperator{color}
> {color:#6f42c1}MySqlToGoogleCloudStorageOperator{color}
> etc..
> Support only json format.
>  
> I would be great if a {color:#032f62}export_format{color} param will be 
> introduced (default json) and the user will be able to set it to CSV if 
> wishes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed

2019-01-02 Thread GitBox
Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is 
closed
URL: 
https://github.com/apache/incubator-airflow/pull/4298#issuecomment-450805895
 
 
   @kaxil Done, thanks! :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244692162
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,83 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+from airflow.contrib.utils.weekday import WeekDay
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass 'FRIDAY', the task 
will wait
+until next Friday.
+
+:param week_day: Day of the week (full name). Example: "MONDAY"
+:type week_day_number: str
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day = week_day
+self.use_task_execution_day = use_task_execution_day
+self._week_day_num = 
WeekDay.get_weekday_number(week_day_str=self.week_day)
+
+def poke(self, context):
+self.log.info('Poking until weekday is %s, Today is %s',
+  WeekDay(self._week_day_num).name,
+  WeekDay(timezone.utcnow().isoweekday()).name)
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == self._week_day_num
+else:
+return timezone.utcnow().isoweekday() == self._week_day_num
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   If `DayOfWeekSensor` were to take a set (or iterable) of days rather than 
just a single day, the `WeekendSensor` could be replaced by 
`DayOfWeekSensor(weekdays={Weekday.SATURDAY, Weekday.SUNDAY})`, avoiding the 
requirement of having two separate classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244691751
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,83 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+from airflow.contrib.utils.weekday import WeekDay
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass 'FRIDAY', the task 
will wait
+until next Friday.
+
+:param week_day: Day of the week (full name). Example: "MONDAY"
+:type week_day_number: str
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day = week_day
 
 Review comment:
   Could be changed to a set of days instead of a single day, so that the 
sensor can filter for any given combinations of days, rather than only on one 
given day.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244691551
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,83 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+from airflow.contrib.utils.weekday import WeekDay
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass 'FRIDAY', the task 
will wait
+until next Friday.
+
+:param week_day: Day of the week (full name). Example: "MONDAY"
+:type week_day_number: str
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day = week_day
+self.use_task_execution_day = use_task_execution_day
 
 Review comment:
   Shouldn't the sensor use the execution date by default? (As Airflow's model 
is built around execution dates, I think that they should be preferred over 
`datetime.utcnow`. In that case, I would change the argument to `now`, with a 
value of `True` indicating that the now time should be used (and the default 
being False).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file

2019-01-02 Thread GitBox
BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
URL: 
https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450821127
 
 
   The contributing guide states to squash all commits into a single commit 
yourself with message like `[AIRFLOW-XXX] ...`, so you're saying that's not 
needed?
   
   If so, I could update the contributing guide.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244692934
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,83 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+from airflow.contrib.utils.weekday import WeekDay
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass 'FRIDAY', the task 
will wait
+until next Friday.
+
+:param week_day: Day of the week (full name). Example: "MONDAY"
+:type week_day_number: str
 
 Review comment:
   Should be updated to reflect the new ENUM (same goes for docstring above).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3615) Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 -> 3.5 (but not in 3.6)

2019-01-02 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk updated AIRFLOW-3615:
--
Description: 
There is a problem with case sensitivity of parsing URI for database 
connections which are using local UNIX sockets rather than TCP connection.

In case of local UNIX sockets the hostname part of the URI contains url-encoded 
local socket path rather than actual hostname and in case this path contains 
uppercase characters, urlparse will deliberately lowercase them when parsing. 
This is perfectly fine for hostnames (according to 
[https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation should 
be done for hostnames.

However urlparse still uses hostname if the URI does not contain host but only 
local path (i.e. when the location starts with %2F ("/")). What's more - the 
host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this is 
somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the hostname 
is not normalized to lowercase any more ! - see below snippets showing the 
behaviours for different python versions) .

In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get 
the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it is 
done via hostname. There is no other, reliable way when using urlparse because 
the path can also contain 'authority' (user/password) and this is urlparse's 
job to separate them out. The Airflow's Connection similarly does not make a 
distinction of TCP vs. local socket connection and it uses host field to store 
the  socket path (it's case sensitive however). So you can use UPPERCASE when 
you define connection in the database, but this is a problem for parsing 
connections from environment variables, because we currently cannot pass a URI 
where socket path contains UPPERCASE characters.

Since urlparse is really there to parse URLs and it is not good for parsing 
non-URL URIs - we should likely use different parser which handles more generic 
URIs - including non-lowercasing path for all versions of python.

I think we could also consider adding local path to Connection model and use it 
instead of hostname to store the socket path. This approach would be the 
"correct" one, but it might introduce some compatibility issues, so maybe it's 
not worth, considering that host is case sensitive in Airflow.

Snippet showing urlparse behaviour in different python versions:
{quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58)
 [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> from urlparse import urlparse,unquote
 >>> conn = urlparse("http://AAA;)
 >>> conn.hostname
 'aaa'
 >>> conn = urlparse("http://%2FAAA;)
 >>> conn.hostname
 '%2faaa'
{quote}
 
{quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
 [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> from urlparse import urlparse,unquote
 Traceback (most recent call last):
 File "", line 1, in 
 ImportError: No module named 'urlparse'
 >>> from urllib.parse import urlparse,unquote
 >>> conn = urlparse("http://AAA;)
 >>> conn.hostname
 'aaa'
 >>> conn = urlparse("http://%2FAAA;)
 >>> conn.hostname
 '%2faaa'
{quote}
 
{quote}Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> from urllib.parse import urlparse,unquote
 >>> conn = urlparse("http://AAA;)
 >>> conn.hostname
 'aaa'
 >>> conn = urlparse("http://%2FAAA;)
 >>> conn.hostname
 {color:#ff}'%2FAAA'{color}
{quote}
 

  was:
There is a problem with case sensitivity of parsing URI for database 
connections which are using local UNIX sockets rather than TCP connection.

In case of local UNIX sockets the hostname part of the URI contains url-encoded 
local socket path rather than actual hostname and incase this path contains 
uppercase characters, urlparse will deliberately lowercase them when parsing. 
This is perfectly fine for hostnames (according to 
[https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation should 
be done for hostnames.

However urlparse still uses hostname if the URI does not contain host but only 
local path (i.e. when the location starts with %2F ("/")). What's more - the 
host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this is 
somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the hostname 
is not normalized to lowercase any more ! - see below snippets showing the 
behaviours for different python versions) .

In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get 
the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it is 
done via 

[GitHub] XD-DENG opened a new pull request #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
XD-DENG opened a new pull request #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: https://github.com/apache/incubator-airflow/pull/4422
 
 
   This error passed PR CI (in 
https://github.com/apache/incubator-airflow/pull/4403) when the Flake8 test was 
not working from 
https://github.com/apache/incubator-airflow/commit/7a6acbf5b343e4a6895d1cc8af75ecc02b4fd0e8
 (10 days ago) to 
https://github.com/apache/incubator-airflow/commit/0d5c127d720e3b602fb4322065511c5c1c046adb
 (today).
   
   The Flake8 test is already fixed in 
https://github.com/apache/incubator-airflow/commit/0d5c127d720e3b602fb4322065511c5c1c046adb,
 so this error arise after it's merged into master.
   
   This can be considered as a follow-up of 
https://github.com/apache/incubator-airflow/pull/4415


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4247: [AIRFLOW-3402] Support global k8s affinity and toleration configs

2019-01-02 Thread GitBox
codecov-io edited a comment on issue #4247: [AIRFLOW-3402] Support global k8s 
affinity and toleration configs
URL: 
https://github.com/apache/incubator-airflow/pull/4247#issuecomment-443592089
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=h1)
 Report
   > Merging 
[#4247](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/8938d2c727b16762529ef9f1e257ba5fa39f4bea?src=pr=desc)
 will **decrease** coverage by `7.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4247/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4247  +/-   ##
   ==
   - Coverage   78.38%   71.35%   -7.03% 
   ==
 Files 204  204  
 Lines   1644518631+2186 
   ==
   + Hits1289013294 +404 
   - Misses   3555 5337+1782
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4247/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==)
 | `56.03% <0%> (-17.69%)` | :arrow_down: |
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4247/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `75.67% <0%> (-16.93%)` | :arrow_down: |
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4247/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `53.58% <0%> (-16.11%)` | :arrow_down: |
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4247/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `52.58% <0%> (-12.49%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=footer).
 Last update 
[8938d2c...e55f8c7](https://codecov.io/gh/apache/incubator-airflow/pull/4247?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Rotemlofer commented on issue #2879: [AIRFLOW-1921] Added support for presto https and user auth

2019-01-02 Thread GitBox
Rotemlofer commented on issue #2879: [AIRFLOW-1921] Added support for presto 
https and user auth
URL: 
https://github.com/apache/incubator-airflow/pull/2879#issuecomment-450804770
 
 
   Thanks @Fokko  will this be in the next release?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4359: [AIRFLOW-3150] Make execution_date templated in TriggerDagRunOperator

2019-01-02 Thread GitBox
Fokko commented on issue #4359: [AIRFLOW-3150] Make execution_date templated in 
TriggerDagRunOperator
URL: 
https://github.com/apache/incubator-airflow/pull/4359#issuecomment-450806005
 
 
   @kaxil Can you resolve the conflicts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ramandumcs commented on a change in pull request #4420: [AIRFLOW-3613] Updated ReadMe to add Adobe as an airflow user

2019-01-02 Thread GitBox
ramandumcs commented on a change in pull request #4420: [AIRFLOW-3613] Updated 
ReadMe to add Adobe as an airflow user
URL: https://github.com/apache/incubator-airflow/pull/4420#discussion_r244678876
 
 

 ##
 File path: README.md
 ##
 @@ -109,6 +109,7 @@ Currently **officially** using Airflow:
 1. [90 Seconds](https://90seconds.tv/) 
[[@aaronmak](https://github.com/aaronmak)]
 1. [99](https://99taxis.com) [[@fbenevides](https://github.com/fbenevides), 
[@gustavoamigo](https://github.com/gustavoamigo) & 
[@mmmaia](https://github.com/mmmaia)]
 1. [AdBOOST](https://www.adboost.sk) [[AdBOOST](https://github.com/AdBOOST)]
+1. [Adobe](https://www.adobe.com/)
 
 Review comment:
   Thanks @feng-tao for reviewing this. I have tried to follow the same format 
like  [AirDNA](https://www.airdna.co).
   Am I missing something here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related (coherent!) functionality in a single module. For example, scikit-learn 
also groups multiple estimators in a single module, as long as they belong to 
the same 'family' of estimators.
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #2879: [AIRFLOW-1921] Added support for presto https and user auth

2019-01-02 Thread GitBox
Fokko commented on issue #2879: [AIRFLOW-1921] Added support for presto https 
and user auth
URL: 
https://github.com/apache/incubator-airflow/pull/2879#issuecomment-450809542
 
 
   It is now scheduled for Airflow 2.0. We can include this for Airflow 1.11 if 
@kaxil agrees :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek 
Sensors
URL: 
https://github.com/apache/incubator-airflow/pull/4363#issuecomment-450809641
 
 
   Small comment: weekend is a single word, so the name `WeekendSensor` would 
be more appropriate than `WeekEndSensor`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3614) Operators loading to GCS - to support CSV files

2019-01-02 Thread jack (JIRA)
jack created AIRFLOW-3614:
-

 Summary: Operators loading to GCS - to support CSV files
 Key: AIRFLOW-3614
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3614
 Project: Apache Airflow
  Issue Type: Task
Affects Versions: 1.10.1, 2.0.0
Reporter: jack


Currently, only {color:#6f42c1}BigQueryToCloudStorageOperator{color} supports 
both json and csv.

The rest of the operators:

{color:#6f42c1}CassandraToGoogleCloudStorageOperator{color}

{color:#6f42c1}MySqlToGoogleCloudStorageOperator{color}

etc..

Support only json format.

 

I would be great if a {color:#032f62}export_format{color} param will be 
introduced (default json) and the user will be able to set it to CSV if wishes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file

2019-01-02 Thread GitBox
Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
URL: 
https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450804103
 
 
   Squashing is the default merge strategy, 99% of the merged PR's are merged 
using Github's `Squash and merge`. Only when the PR is too big to stuff 
everything into one commit, we rebase and merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #4309: [AIRFLOW-3504] Extend/refine the functionality of "/health" endpoint

2019-01-02 Thread GitBox
XD-DENG commented on a change in pull request #4309: [AIRFLOW-3504] 
Extend/refine the functionality of "/health" endpoint
URL: https://github.com/apache/incubator-airflow/pull/4309#discussion_r244679305
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -377,7 +377,6 @@ def index(self):
 @expose('/chart_data')
 @data_profiling_required
 @wwwutils.gzipped
-# @cache.cached(timeout=3600, key_prefix=wwwutils.make_cache_key)
 
 Review comment:
   I deleted it here "by the way", since it may not be worthwhile to have a 
separate PR for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related (coherent!) functionality in a single module. For example, scikit-learn 
also groups multiple estimators in a single module, as long as they belong to 
the same 'family' of estimators.
   
   This idea agrees with the recommendations here:
   - https://docs.python-guide.org/writing/structure/
   - https://docs.python-guide.org/writing/structure/
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related (coherent!) functionality in a single module. For example, scikit-learn 
also groups multiple estimators in a single module, as long as they belong to 
the same 'family' of estimators.
   
   This idea agrees with the recommendations here:
   - https://docs.python-guide.org/writing/structure/
   - 
https://stackoverflow.com/questions/106896/how-many-classes-should-i-put-in-one-file
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related (coherent!) functionality in a single module. For example, scikit-learn 
also groups multiple estimators in a single module, as long as they belong to 
the same 'family' of estimators.
   
   This idea agrees with the recommendations here:
   - https://docs.python-guide.org/writing/structure
   - https://docs.python.org/3/tutorial/modules.html#packages
   - 
https://stackoverflow.com/questions/106896/how-many-classes-should-i-put-in-one-file
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related  functionality in a single module. For example, scikit-learn also 
groups multiple estimators in a single module, as long as they belong to the 
same 'family' of estimators.
   
   This idea agrees with the recommendations here:
   - https://docs.python-guide.org/writing/structure
   - https://docs.python.org/3/tutorial/modules.html#packages
   - 
https://stackoverflow.com/questions/106896/how-many-classes-should-i-put-in-one-file
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
jrderuiter commented on a change in pull request #4363: [AIRFLOW-3560] Add 
WeekEnd & DayOfWeek Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244680805
 
 

 ##
 File path: airflow/contrib/sensors/weekday_sensor.py
 ##
 @@ -0,0 +1,87 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import calendar
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils import timezone
+from airflow.utils.decorators import apply_defaults
+
+
+class DayOfWeekSensor(BaseSensorOperator):
+"""
+Waits until the first specified day of the week. For example, if the 
execution
+day of the task is '2018-12-22' (Saturday) and you pass '5', the task will 
wait
+until next Friday (Week Day: 5)
+
+:param week_day_number: Day of the week as an integer, where Monday is 1 
and
+Sunday is 7 (ISO Week Numbering)
+:type week_day_number: int
+:param use_task_execution_day: If ``True``, uses task's execution day to 
compare
+with week_day_number. Execution Date is Useful for backfilling.
+If ``False``, uses system's day of the week. Useful when you
+don't want to run anything on weekdays on the system.
+:type use_task_execution_day: bool
+"""
+
+@apply_defaults
+def __init__(self, week_day_number,
+ use_task_execution_day=False,
+ *args, **kwargs):
+super(DayOfWeekSensor, self).__init__(*args, **kwargs)
+self.week_day_number = week_day_number
+self.use_task_execution_day = use_task_execution_day
+
+def poke(self, context):
+if self.week_day_number > 7:
+raise ValueError(
+'Invalid value ({}) for week_day_number! '
+'Valid value: 1 <= week_day_number <= 
7'.format(self.week_day_number))
+self.log.info('Poking until weekday is %s, Today is %s',
+  calendar.day_name[self.week_day_number - 1],
+  calendar.day_name[timezone.utcnow().isoweekday() - 1])
+if self.use_task_execution_day:
+return context['execution_date'].isoweekday() == 
self.week_day_number
+else:
+return timezone.utcnow().isoweekday() == self.week_day_number
+
+
+class WeekEndSensor(BaseSensorOperator):
 
 Review comment:
   IMO, in Python the general preference is to group classes/functions with 
related  functionality in a single module. For example, scikit-learn also 
groups multiple estimators in a single module, as long as they belong to the 
same 'family' of estimators.
   
   This idea agrees with the recommendations here:
   - https://docs.python-guide.org/writing/structure
   - https://docs.python.org/3/tutorial/modules.html#packages
   - 
https://stackoverflow.com/questions/106896/how-many-classes-should-i-put-in-one-file
   
   In this case, I would argue that these Sensor classes should be included in 
a single module because they are strongly related to one another. From a user 
perspective it's also easier to use if you don't have to have a separate import 
for every single (related) sensor/operator/hook. 
   
   The one-class-per-file preference is more Java-esque and seems to be mainly 
preferred by (ex) Java programmers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-3605) Load Plugins via entry_points

2019-01-02 Thread Drew Sonne (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731836#comment-16731836
 ] 

Drew Sonne edited comment on AIRFLOW-3605 at 1/2/19 8:33 AM:
-

Hi [~dzakus13], I have had this [pointed out to 
me|https://github.com/apache/incubator-airflow/pull/4412#issuecomment-450723914],
 and I will definitely be using this as reference, but as that PR has been 
closed, unless otherwise directed by the maintainers, I'll do this in two 
parts: one for the plugins; and one for DAGs.


was (Author: drewsonne):
Hi [~dzakus13], I have had this pointed out to me, and I will definitely be 
using this as reference, but as that PR has been closed, unless otherwise 
directed by the maintainers, I'll do this in two parts: one for the plugins; 
and one for DAGs.

> Load Plugins via entry_points
> -
>
> Key: AIRFLOW-3605
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3605
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: plugins
>Affects Versions: 1.10.1
>Reporter: Drew Sonne
>Assignee: Drew Sonne
>Priority: Minor
>  Labels: features, newbie, pull-request-available
> Fix For: 1.10.2
>
>
> Added logic to load AirflowPlugins from the 
> [entry_points|https://setuptools.readthedocs.io/en/latest/pkg_resources.html#id16].
>  Rather than moving files on the OS, this allows plugins to be installed and 
> distributed via {{pip}}.
> Also added a callback to execute business logic from the plugin when the 
> plugin is loaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko opened a new pull request #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-02 Thread GitBox
Fokko opened a new pull request #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/incubator-airflow/pull/4421
 
 
   The KnownEvent and KnownEventType aren't used by 99% of the companies and 
therefore we would like to deprecate this for Airflow 2.0. Also, this isn't 
available in the new RBAC view.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3468\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3468\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3468) Refactor: Move KnownEventType out of models.py

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731835#comment-16731835
 ] 

ASF GitHub Bot commented on AIRFLOW-3468:
-

Fokko commented on pull request #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/incubator-airflow/pull/4421
 
 
   The KnownEvent and KnownEventType aren't used by 99% of the companies and 
therefore we would like to deprecate this for Airflow 2.0. Also, this isn't 
available in the new RBAC view.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3468\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3468\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor: Move KnownEventType out of models.py
> --
>
> Key: AIRFLOW-3468
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3468
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #2569: [AIRFLOW-1569][WIP] Requeue tasks that stay in reserved state too long

2019-01-02 Thread GitBox
codecov-io edited a comment on issue #2569: [AIRFLOW-1569][WIP] Requeue tasks 
that stay in reserved state too long
URL: 
https://github.com/apache/incubator-airflow/pull/2569#issuecomment-327657977
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=h1)
 Report
   > Merging 
[#2569](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/e703d6beeb379ee88ef5e7df495e8a785666f8af?src=pr=desc)
 will **decrease** coverage by `5.01%`.
   > The diff coverage is `58.53%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/2569/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#2569  +/-   ##
   ==
   - Coverage   76.67%   71.66%   -5.02% 
   ==
 Files 199  154  -45 
 Lines   1618611842-4344 
   ==
   - Hits12410 8486-3924 
   + Misses   3776 3356 -420
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/executors/celery\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvY2VsZXJ5X2V4ZWN1dG9yLnB5)
 | `74.46% <58.53%> (-6.15%)` | :arrow_down: |
   | 
[airflow/operators/email\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZW1haWxfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/utils/log/s3\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvczNfdGFza19oYW5kbGVyLnB5)
 | `0% <0%> (-98.58%)` | :arrow_down: |
   | 
[airflow/operators/slack\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc2xhY2tfb3BlcmF0b3IucHk=)
 | `0% <0%> (-97.37%)` | :arrow_down: |
   | 
[airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=)
 | `0% <0%> (-96.23%)` | :arrow_down: |
   | 
[airflow/operators/redshift\_to\_s3\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcmVkc2hpZnRfdG9fczNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-95.46%)` | :arrow_down: |
   | 
[airflow/hooks/jdbc\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9qZGJjX2hvb2sucHk=)
 | `0% <0%> (-94.45%)` | :arrow_down: |
   | 
[airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5)
 | `21.76% <0%> (-72.57%)` | :arrow_down: |
   | 
[airflow/hooks/mssql\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9tc3NxbF9ob29rLnB5)
 | `6.66% <0%> (-66.67%)` | :arrow_down: |
   | ... and [197 
more](https://codecov.io/gh/apache/incubator-airflow/pull/2569/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=footer).
 Last update 
[e703d6b...ba7883d](https://codecov.io/gh/apache/incubator-airflow/pull/2569?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #4420: [AIRFLOW-3613] Updated ReadMe to add Adobe as an airflow user

2019-01-02 Thread GitBox
feng-tao commented on a change in pull request #4420: [AIRFLOW-3613] Updated 
ReadMe to add Adobe as an airflow user
URL: https://github.com/apache/incubator-airflow/pull/4420#discussion_r244830366
 
 

 ##
 File path: README.md
 ##
 @@ -109,6 +109,7 @@ Currently **officially** using Airflow:
 1. [90 Seconds](https://90seconds.tv/) 
[[@aaronmak](https://github.com/aaronmak)]
 1. [99](https://99taxis.com) [[@fbenevides](https://github.com/fbenevides), 
[@gustavoamigo](https://github.com/gustavoamigo) & 
[@mmmaia](https://github.com/mmmaia)]
 1. [AdBOOST](https://www.adboost.sk) [[AdBOOST](https://github.com/AdBOOST)]
+1. [Adobe](https://www.adobe.com/)
 
 Review comment:
   the correct format should be 
[Adobe](https://www.adobe.com/)[[@ramandumcs](https://github.com/ramandumcs)]


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-02 Thread GitBox
Fokko commented on a change in pull request #4421: [AIRFLOW-3468] Remove 
KnownEvent(Event)?
URL: https://github.com/apache/incubator-airflow/pull/4421#discussion_r244846778
 
 

 ##
 File path: airflow/models/__init__.py
 ##
 @@ -4376,37 +4376,6 @@ def __repr__(self):
 return self.label
 
 
-class KnownEventType(Base):
 
 Review comment:
   Leaving out an alembic script was a conscious decision. The table will be 
kept for existing airflow setups, but won't be created for the new ones. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mistercrunch commented on issue #4419: [AIRFLOW-3612] Remove incubation/incubator mention

2019-01-02 Thread GitBox
mistercrunch commented on issue #4419: [AIRFLOW-3612] Remove 
incubation/incubator mention
URL: 
https://github.com/apache/incubator-airflow/pull/4419#issuecomment-450986769
 
 
   Side note, we'll have to open a Jira with Apache infra for them to edit this:
   https://user-images.githubusercontent.com/487433/50612781-1f04d280-0e90-11e9-8b3a-d06afeb7bbb6.png;>
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3620) Inlets and outlets are always empty in bash operator templates

2019-01-02 Thread Adam C Baker (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam C Baker updated AIRFLOW-3620:
--
Description: 
When creating a data pipeline where one task's input is its upstream task's 
output, it seems the only way to automatically coordinate these is to use the 
lineage feature and set {{inlets = \{"auto": True}}}. Doing this with a 
PythonOperator allows one to get the input and output data sources for the task 
by passing in the context and getting {{task.inlets}} and {{task.outlets}} 
values.

This fails with the BashOperator. With a template including something like 
task.inlets[0], it throws an exception, and templating with 
task.inlets or task.outlets always reveals these values to be 
an empty list.

  was:
When creating a data pipeline where one task's input is its upstream task's 
output, it seems the only way to automatically coordinate these is to use the 
lineage feature and set {{inlets = \{"auto": True}}}. Doing this with a 
PythonOperator allows one to get the input and output data sources for the task 
by passing in the context and getting {{task.inlets}} and {{task.outlets}} 
values.

This fails with the BashOperator. With a template including something like 
task.inlets[0], it throws an exception, and templating with 
{{{\{task.inlets or {{{\{task.outlets always reveals these values to be 
an empty list.


> Inlets and outlets are always empty in bash operator templates
> --
>
> Key: AIRFLOW-3620
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3620
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Adam C Baker
>Priority: Major
>
> When creating a data pipeline where one task's input is its upstream task's 
> output, it seems the only way to automatically coordinate these is to use the 
> lineage feature and set {{inlets = \{"auto": True}}}. Doing this with a 
> PythonOperator allows one to get the input and output data sources for the 
> task by passing in the context and getting {{task.inlets}} and 
> {{task.outlets}} values.
> This fails with the BashOperator. With a template including something like 
> task.inlets[0], it throws an exception, and templating with 
> task.inlets or task.outlets always reveals these values to be 
> an empty list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] aliceabe opened a new pull request #4424: [AIRFLOW-3622] Add ability to pass hive_conf to HiveToMysqlTransfer

2019-01-02 Thread GitBox
aliceabe opened a new pull request #4424: [AIRFLOW-3622] Add ability to pass 
hive_conf to HiveToMysqlTransfer
URL: https://github.com/apache/incubator-airflow/pull/4424
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3622) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Right now we cannot overwrite hive queue because hive_conf is not passed to 
`HiveToMySqlTransfer`
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3622) Add ability to pass hive_conf to HiveToMySqlTransfer

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732445#comment-16732445
 ] 

ASF GitHub Bot commented on AIRFLOW-3622:
-

aliceabe commented on pull request #4424: [AIRFLOW-3622] Add ability to pass 
hive_conf to HiveToMysqlTransfer
URL: https://github.com/apache/incubator-airflow/pull/4424
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3622) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Right now we cannot overwrite hive queue because hive_conf is not passed to 
`HiveToMySqlTransfer`
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ability to pass hive_conf to HiveToMySqlTransfer
> 
>
> Key: AIRFLOW-3622
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3622
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Alice Berard
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] stale[bot] commented on issue #2799: [AIRFLOW-1836] airflow uses OAuth Provider keycloak

2019-01-02 Thread GitBox
stale[bot] commented on issue #2799: [AIRFLOW-1836] airflow uses OAuth Provider 
keycloak
URL: 
https://github.com/apache/incubator-airflow/pull/2799#issuecomment-450983991
 
 
   This issue has been automatically marked as stale because it has not had 
recent activity. It will be closed if no further activity occurs. Thank you for 
your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
feng-tao commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: 
https://github.com/apache/incubator-airflow/pull/4422#issuecomment-450959523
 
 
   thanks @XD-DENG @kaxil 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] verdan commented on a change in pull request #4339: [AIRFLOW-3303] Deprecate old UI in favor of FAB

2019-01-02 Thread GitBox
verdan commented on a change in pull request #4339: [AIRFLOW-3303] Deprecate 
old UI in favor of FAB
URL: https://github.com/apache/incubator-airflow/pull/4339#discussion_r244868664
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -888,7 +882,7 @@ def webserver(args):
 '-b', args.hostname + ':' + str(args.port),
 '-n', 'airflow-webserver',
 '-p', str(pid),
-'-c', 'python:airflow.www.gunicorn_config',
+# '-c', 'python:airflow.www.gunicorn_config',
 
 Review comment:
   Ah, I need to remove this line. As these settings were specific to the older 
version, I was testing things by commenting out code. Will remove this once the 
review is complete. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3618) airflow.models.connection.Connection class function get_hook doesn't work

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732224#comment-16732224
 ] 

ASF GitHub Bot commented on AIRFLOW-3618:
-

griff122 commented on pull request #4423: [AIRFLOW-3618] - Fix 
models.connection.Connection get_hook function
URL: https://github.com/apache/incubator-airflow/pull/4423
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] https://issues.apache.org/jira/browse/AIRFLOW-3618
   
   ### Description
   
   - [x] The airflow.models.connection.Connection class function get_hook did 
not work as expected. The `conn_type` was not guaranteed to be lower case, and 
the get_hook function expected all lowercase type names. By updating the code 
so that the conditionals used `self.conn_type.lower()` instead of just 
`self.conn_type`, we can now make sure that the conditionals match the 
expected. 
   
   For example:
   ```
   >>> import airflow
   >>> from airflow.hooks.base_hook import BaseHook
   >>> airflow.__version__
   '1.10.1'
   >>> conn = BaseHook.get_connection('test_local_mysql')
   [2019-01-02 11:36:15,530] {base_hook.py:83} INFO - Using connection to: 
localhost
   >>> conn.conn_type
   'MySQL'
   >>> type(conn.get_hook())
   
   >>> conn.conn_type = conn.conn_type.lower()
   >>> conn.conn_type
   'mysql'
   >>> type(conn.get_hook())
   
   ```
   
   ### Tests
   
   - [x] Does not need tests as it is updating existing functionality which 
should already have test associated.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] No new documentation is needed as the change is a bug fix which does 
not change the expected functionality of the function `Connection.get_hook`.
   
   ### Code Quality
   
   - [x] Passes `flake8`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> airflow.models.connection.Connection class function get_hook doesn't work
> -
>
> Key: AIRFLOW-3618
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3618
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.1
> Environment: MacOS, iTerm, python 3.5.2
>Reporter: Ryan Griffin
>Priority: Major
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The airflow.models.connection.Connection class function get_hook does not 
> work as expected. 
> It is currently using self.conn_type == 'type', where the 'type' is always 
> lowercase. In practice this is not true, and causes the get_hook function to 
> always return a NoneType. 
> Here is an example:
> {code:java}
> >>> import airflow
> >>> from airflow.hooks.base_hook import BaseHook
> >>> airflow.__version__
> '1.10.1'
> >>> conn = BaseHook.get_connection('test_local_mysql')
> [2019-01-02 11:36:15,530] {base_hook.py:83} INFO - Using connection to: 
> localhost
> >>> conn.conn_type
> 'MySQL'
> >>> type(conn.get_hook())
> 
> >>> conn.conn_type = conn.conn_type.lower()
> >>> conn.conn_type
> 'mysql'
> >>> type(conn.get_hook())
> {code}
>  
> As you can see, when using the BaseHook class to get the connection, the 
> conn_type is *MySQL* instead of *mysql*, so it is unable to find the correct 
> type to get the hook (reference: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/models/connection.py#L178).]
> By forcing the conn_type to be lowercase, the *get_hook* function now works. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io commented on issue #4360: [AIRFLOW-1191] Simplify override of spark submit command

2019-01-02 Thread GitBox
codecov-io commented on issue #4360: [AIRFLOW-1191] Simplify override of spark 
submit command
URL: 
https://github.com/apache/incubator-airflow/pull/4360#issuecomment-450936401
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=h1)
 Report
   > Merging 
[#4360](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/51d0aca83bc9b5c4cd073466fa7df2a43bd871bd?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4360/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#4360   +/-   ##
   ===
 Coverage   78.59%   78.59%   
   ===
 Files 204  204   
 Lines   1645316453   
   ===
 Hits1293212932   
 Misses   3521 3521
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=footer).
 Last update 
[51d0aca...3217c8c](https://codecov.io/gh/apache/incubator-airflow/pull/4360?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-3402) Set default kubernetes affinity and toleration settings in airflow.cfg

2019-01-02 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong resolved AIRFLOW-3402.
---
   Resolution: Fixed
 Assignee: Kevin Pullin
Fix Version/s: (was: 1.10.2)
   2.0.0

> Set default kubernetes affinity and toleration settings in airflow.cfg
> --
>
> Key: AIRFLOW-3402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3402
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Reporter: Kevin Pullin
>Assignee: Kevin Pullin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently airflow supports setting kubernetes `affinity` and `toleration` 
> configuration inside dags using either a `KubernetesExecutorConfig` 
> definition or using the `KubernetesPodOperator`.
> In order to reduce having to set and maintain this configuration in every 
> dag, it'd be useful to have the ability to set these globally in the 
> airflow.cfg file.  One use case is to force all kubernetes pods to run on a 
> particular set of dedicated airflow nodes, which requires both affinity rules 
> and tolerations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3622) Add ability to pass hive_conf to HiveToMySqlTransfer

2019-01-02 Thread Alice Berard (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Berard updated AIRFLOW-3622:
--
Issue Type: New Feature  (was: Bug)

> Add ability to pass hive_conf to HiveToMySqlTransfer
> 
>
> Key: AIRFLOW-3622
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3622
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Alice Berard
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on a change in pull request #4339: [AIRFLOW-3303] Deprecate old UI in favor of FAB

2019-01-02 Thread GitBox
Fokko commented on a change in pull request #4339: [AIRFLOW-3303] Deprecate old 
UI in favor of FAB
URL: https://github.com/apache/incubator-airflow/pull/4339#discussion_r244848371
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -888,7 +882,7 @@ def webserver(args):
 '-b', args.hostname + ':' + str(args.port),
 '-n', 'airflow-webserver',
 '-p', str(pid),
-'-c', 'python:airflow.www.gunicorn_config',
+# '-c', 'python:airflow.www.gunicorn_config',
 
 Review comment:
   Why is this commented out?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] griff122 opened a new pull request #4423: [AIRFLOW-3618] - Fix models.connection.Connection get_hook function

2019-01-02 Thread GitBox
griff122 opened a new pull request #4423: [AIRFLOW-3618] - Fix 
models.connection.Connection get_hook function
URL: https://github.com/apache/incubator-airflow/pull/4423
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] https://issues.apache.org/jira/browse/AIRFLOW-3618
   
   ### Description
   
   - [x] The airflow.models.connection.Connection class function get_hook did 
not work as expected. The `conn_type` was not guaranteed to be lower case, and 
the get_hook function expected all lowercase type names. By updating the code 
so that the conditionals used `self.conn_type.lower()` instead of just 
`self.conn_type`, we can now make sure that the conditionals match the 
expected. 
   
   For example:
   ```
   >>> import airflow
   >>> from airflow.hooks.base_hook import BaseHook
   >>> airflow.__version__
   '1.10.1'
   >>> conn = BaseHook.get_connection('test_local_mysql')
   [2019-01-02 11:36:15,530] {base_hook.py:83} INFO - Using connection to: 
localhost
   >>> conn.conn_type
   'MySQL'
   >>> type(conn.get_hook())
   
   >>> conn.conn_type = conn.conn_type.lower()
   >>> conn.conn_type
   'mysql'
   >>> type(conn.get_hook())
   
   ```
   
   ### Tests
   
   - [x] Does not need tests as it is updating existing functionality which 
should already have test associated.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] No new documentation is needed as the change is a bug fix which does 
not change the expected functionality of the function `Connection.get_hook`.
   
   ### Code Quality
   
   - [x] Passes `flake8`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3621) Document why packaged dags don't work with pickling enabled.

2019-01-02 Thread Thayne McCombs (JIRA)
Thayne McCombs created AIRFLOW-3621:
---

 Summary: Document why packaged dags don't work with pickling 
enabled.
 Key: AIRFLOW-3621
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3621
 Project: Apache Airflow
  Issue Type: Task
  Components: Documentation
Reporter: Thayne McCombs


The documentation for  packaged DAGs states:

 

> packaged dags cannot be used with pickling turned on.

However, there is no explanation of why this is the case, or links to such an 
explanation. Please add an explanation of why these two features don't work 
together. I am assuming of course that there is a good reason. If it just 
hasn't been implemented yet, there should be an issue to implement it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3402) Set default kubernetes affinity and toleration settings in airflow.cfg

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732404#comment-16732404
 ] 

ASF GitHub Bot commented on AIRFLOW-3402:
-

Fokko commented on pull request #4247: [AIRFLOW-3402] Support global k8s 
affinity and toleration configs
URL: https://github.com/apache/incubator-airflow/pull/4247
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Set default kubernetes affinity and toleration settings in airflow.cfg
> --
>
> Key: AIRFLOW-3402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3402
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Reporter: Kevin Pullin
>Assignee: Kevin Pullin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently airflow supports setting kubernetes `affinity` and `toleration` 
> configuration inside dags using either a `KubernetesExecutorConfig` 
> definition or using the `KubernetesPodOperator`.
> In order to reduce having to set and maintain this configuration in every 
> dag, it'd be useful to have the ability to set these globally in the 
> airflow.cfg file.  One use case is to force all kubernetes pods to run on a 
> particular set of dedicated airflow nodes, which requires both affinity rules 
> and tolerations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil commented on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors

2019-01-02 Thread GitBox
kaxil commented on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
URL: 
https://github.com/apache/incubator-airflow/pull/4363#issuecomment-451012148
 
 
   @Fokko  Updated to just have a single Sensor with an option to provide 
multiple days.
   
   Docs:
   
![image](https://user-images.githubusercontent.com/8811558/50616863-353b7e80-0ee2-11e9-85db-ebf7aa4b3e0c.png)
   
![image](https://user-images.githubusercontent.com/8811558/50616871-3a98c900-0ee2-11e9-85dc-f30726395bcb.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4350: [AIRFLOW-3527] Cloud SQL Proxy has shorter path for UNIX socket

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4350: [AIRFLOW-3527] Cloud SQL 
Proxy has shorter path for UNIX socket
URL: https://github.com/apache/incubator-airflow/pull/4350#discussion_r244884738
 
 

 ##
 File path: airflow/contrib/hooks/gcp_sql_hook.py
 ##
 @@ -749,18 +754,39 @@ def _validate_inputs(self):
 self._check_ssl_file(self.sslcert, "sslcert")
 self._check_ssl_file(self.sslkey, "sslkey")
 self._check_ssl_file(self.sslrootcert, "sslrootcert")
+if self.use_proxy and not self.sql_proxy_use_tcp:
+if self.database_type == 'postgres':
+suffix = "/.s.PGSQL.5432"
+else:
+suffix = ""
+expected_path = "{}/{}:{}:{}{}".format(
+self._generate_unique_path(),
+self.project_id, self.instance,
+self.database, suffix)
+if len(expected_path) > UNIX_PATH_MAX:
+self.log.info("Too long ({}) path: 
{}".format(len(expected_path),
 
 Review comment:
   please change it to `%s` format


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886346
 
 

 ##
 File path: airflow/contrib/operators/gcp_bigtable_operator.py
 ##
 @@ -0,0 +1,424 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import google.api_core.exceptions
+
+from airflow import AirflowException
+from airflow.models import BaseOperator
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.contrib.hooks.gcp_bigtable_hook import BigtableHook
+from airflow.utils.decorators import apply_defaults
+from google.cloud.bigtable_admin_v2 import enums
+from google.cloud.bigtable.table import ClusterState
+
+
+class BigtableValidationMixin(object):
+"""
+Common class for Cloud Bigtable operators for validating required fields.
+"""
+
+REQUIRED_ATTRIBUTES = []
+
+def _validate_inputs(self):
+for attr_name in self.REQUIRED_ATTRIBUTES:
+if not getattr(self, attr_name):
+raise AirflowException('Empty parameter: {}'.format(attr_name))
+
+
+class BigtableInstanceCreateOperator(BaseOperator, BigtableValidationMixin):
+"""
+Creates a new Cloud Bigtable instance.
+If the Cloud Bigtable instance with the given ID exists, the operator does 
not compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+For more details about instance creation have a look at the reference:
+
https://googleapis.github.io/google-cloud-python/latest/bigtable/instance.html#google.cloud.bigtable.instance.Instance.create
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID of the Cloud Bigtable instance to create.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster
+See https://cloud.google.com/bigtable/docs/locations for more details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for the 
new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: IntEnum
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance. Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate with 
the instance.
+:type cluster_nodes: int
+:param cluster_nodes: (optional) Number of nodes for cluster.
+:type cluster_storage_type: IntEnum
+:param cluster_storage_type: (optional) The type of storage.
+:type timeout: int
+:param timeout: (optional) timeout (in seconds) for instance creation.
+If None is not specified, Operator will wait indefinitely.
+"""
+
+REQUIRED_ATTRIBUTES = ('project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone')
+template_fields = ['project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone']
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ main_cluster_id,
+ main_cluster_zone,
+ replica_cluster_id=None,
+ replica_cluster_zone=None,
+ instance_display_name=None,
+ instance_type=None,
+ instance_labels=None,
+ cluster_nodes=None,
+ cluster_storage_type=None,
+ timeout=None,
+ *args, **kwargs):
+self.project_id = project_id
+self.instance_id = instance_id
+self.main_cluster_id = main_cluster_id
+self.main_cluster_zone = main_cluster_zone
+self.replica_cluster_id = replica_cluster_id
+self.replica_cluster_zone = replica_cluster_zone
+self.instance_display_name = 

[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886061
 
 

 ##
 File path: airflow/contrib/hooks/gcp_bigtable_hook.py
 ##
 @@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from google.cloud.bigtable import Client
+from google.cloud.bigtable.cluster import Cluster
+from google.cloud.bigtable.instance import Instance
+from google.cloud.bigtable.table import Table
+from google.cloud.bigtable_admin_v2 import enums
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+
+# noinspection PyAbstractClass
+class BigtableHook(GoogleCloudBaseHook):
+"""
+Hook for Google Cloud Bigtable APIs.
+"""
+
+_client = None
+
+def __init__(self,
+ gcp_conn_id='google_cloud_default',
+ delegate_to=None):
+super(BigtableHook, self).__init__(gcp_conn_id, delegate_to)
+
+def get_client(self, project_id):
+if not self._client:
+self._client = Client(project=project_id, 
credentials=self._get_credentials(), admin=True)
+return self._client
+
+def get_instance(self, project_id, instance_id):
+"""
+Retrieves and returns the specified Cloud Bigtable instance if it 
exists.
+Otherwise, returns None.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+
+client = self.get_client(project_id)
+
+instance = Instance(instance_id, client)
+if not instance.exists():
+return None
+return instance
+
+def delete_instance(self, project_id, instance_id):
+"""
+Deletes the specified Cloud Bigtable instance.
+Raises google.api_core.exceptions.NotFound if the Cloud Bigtable 
instance does not exist.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+instance = Instance(instance_id, self.get_client(project_id))
+instance.delete()
+
+def create_instance(self,
+project_id,
+instance_id,
+main_cluster_id,
+main_cluster_zone,
+replica_cluster_id=None,
+replica_cluster_zone=None,
+instance_display_name=None,
+instance_type=enums.Instance.Type.TYPE_UNSPECIFIED,
+instance_labels=None,
+cluster_nodes=None,
+
cluster_storage_type=enums.StorageType.STORAGE_TYPE_UNSPECIFIED,
+timeout=None):
+"""
+Creates new instance.
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID for the new instance.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster.
+See https://cloud.google.com/bigtable/docs/locations for more 
details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for 
the new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: enums.Instance.Type
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance.
+Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate 
with the instance.
+:type cluster_nodes: int
+:param 

[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244885772
 
 

 ##
 File path: airflow/contrib/hooks/gcp_bigtable_hook.py
 ##
 @@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from google.cloud.bigtable import Client
+from google.cloud.bigtable.cluster import Cluster
+from google.cloud.bigtable.instance import Instance
+from google.cloud.bigtable.table import Table
+from google.cloud.bigtable_admin_v2 import enums
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+
+# noinspection PyAbstractClass
+class BigtableHook(GoogleCloudBaseHook):
+"""
+Hook for Google Cloud Bigtable APIs.
+"""
+
+_client = None
+
+def __init__(self,
+ gcp_conn_id='google_cloud_default',
+ delegate_to=None):
+super(BigtableHook, self).__init__(gcp_conn_id, delegate_to)
+
+def get_client(self, project_id):
+if not self._client:
+self._client = Client(project=project_id, 
credentials=self._get_credentials(), admin=True)
+return self._client
+
+def get_instance(self, project_id, instance_id):
 
 Review comment:
   +1 for optional project_id


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886618
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
+
+Arguments
+"
+
+All examples below rely on the following variables, which can be passed via 
environment variables.
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:start-after: [START howto_operator_gcp_bigtable_args]
+:end-before: [END howto_operator_gcp_bigtable_args]
+
+
+BigtableInstanceCreateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator`
+to create a Google Cloud Bigtable instance.
+
+If the Cloud Bigtable instance with the given ID exists, the operator does not 
compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_create]
+:end-before: [END howto_operator_gcp_bigtable_instance_create]
+
+
+BigtableInstanceDeleteOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator`
+to delete a Google Cloud Bigtable instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_delete]
+:end-before: [END howto_operator_gcp_bigtable_instance_delete]
+
+BigtableClusterUpdateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator`
+to modify number of nodes in a Cloud Bigtable cluster.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_cluster_update]
+:end-before: [END howto_operator_gcp_bigtable_cluster_update]
+
+
+BigtableTableCreateOperator
+^
+
+Creates a table in a Cloud Bigtable instance.
+
+If the table with given ID exists in the Cloud Bigtable instance, the operator 
compares the Column Families.
+If the Column Families are identical operator succeeds. Otherwise, the 
operator fails with the appropriate
+error message.
+
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_table_create]
+:end-before: [END howto_operator_gcp_bigtable_table_create]
+
+Advanced
+
+
+When creating a table, you can specify the optional ``initial_split_keys`` and 
``column_familes`.
+Please refer to the Python Client for Google Cloud Bigtable documentation
+`for Table 
`_ 
and `for Column
+Families 
`_.
+
+
+BigtableTableDeleteOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableTableDeleteOperator`
+to delete a table in Google Cloud Bigtable.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_table_delete]
+:end-before: [END howto_operator_gcp_bigtable_table_delete]
+
+BigtableTableWaitForReplicationSensor
+^
 
 Review comment:
   Add extra needed `^`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886264
 
 

 ##
 File path: airflow/contrib/operators/gcp_bigtable_operator.py
 ##
 @@ -0,0 +1,424 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import google.api_core.exceptions
+
+from airflow import AirflowException
+from airflow.models import BaseOperator
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.contrib.hooks.gcp_bigtable_hook import BigtableHook
+from airflow.utils.decorators import apply_defaults
+from google.cloud.bigtable_admin_v2 import enums
+from google.cloud.bigtable.table import ClusterState
+
+
+class BigtableValidationMixin(object):
+"""
+Common class for Cloud Bigtable operators for validating required fields.
+"""
+
+REQUIRED_ATTRIBUTES = []
+
+def _validate_inputs(self):
+for attr_name in self.REQUIRED_ATTRIBUTES:
+if not getattr(self, attr_name):
+raise AirflowException('Empty parameter: {}'.format(attr_name))
+
+
+class BigtableInstanceCreateOperator(BaseOperator, BigtableValidationMixin):
+"""
+Creates a new Cloud Bigtable instance.
+If the Cloud Bigtable instance with the given ID exists, the operator does 
not compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+For more details about instance creation have a look at the reference:
+
https://googleapis.github.io/google-cloud-python/latest/bigtable/instance.html#google.cloud.bigtable.instance.Instance.create
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID of the Cloud Bigtable instance to create.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster
+See https://cloud.google.com/bigtable/docs/locations for more details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for the 
new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: IntEnum
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance. Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate with 
the instance.
+:type cluster_nodes: int
+:param cluster_nodes: (optional) Number of nodes for cluster.
+:type cluster_storage_type: IntEnum
+:param cluster_storage_type: (optional) The type of storage.
+:type timeout: int
+:param timeout: (optional) timeout (in seconds) for instance creation.
+If None is not specified, Operator will wait indefinitely.
+"""
+
+REQUIRED_ATTRIBUTES = ('project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone')
+template_fields = ['project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone']
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ main_cluster_id,
+ main_cluster_zone,
+ replica_cluster_id=None,
+ replica_cluster_zone=None,
+ instance_display_name=None,
+ instance_type=None,
+ instance_labels=None,
+ cluster_nodes=None,
+ cluster_storage_type=None,
+ timeout=None,
+ *args, **kwargs):
+self.project_id = project_id
+self.instance_id = instance_id
+self.main_cluster_id = main_cluster_id
+self.main_cluster_zone = main_cluster_zone
+self.replica_cluster_id = replica_cluster_id
+self.replica_cluster_zone = replica_cluster_zone
+self.instance_display_name = 

[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886402
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
 
 Review comment:
   Remove extra dash


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886559
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
+
+Arguments
+"
+
+All examples below rely on the following variables, which can be passed via 
environment variables.
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:start-after: [START howto_operator_gcp_bigtable_args]
+:end-before: [END howto_operator_gcp_bigtable_args]
+
+
+BigtableInstanceCreateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator`
+to create a Google Cloud Bigtable instance.
+
+If the Cloud Bigtable instance with the given ID exists, the operator does not 
compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_create]
+:end-before: [END howto_operator_gcp_bigtable_instance_create]
+
+
+BigtableInstanceDeleteOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator`
+to delete a Google Cloud Bigtable instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_delete]
+:end-before: [END howto_operator_gcp_bigtable_instance_delete]
+
+BigtableClusterUpdateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator`
+to modify number of nodes in a Cloud Bigtable cluster.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_cluster_update]
+:end-before: [END howto_operator_gcp_bigtable_cluster_update]
+
+
+BigtableTableCreateOperator
+^
+
+Creates a table in a Cloud Bigtable instance.
+
+If the table with given ID exists in the Cloud Bigtable instance, the operator 
compares the Column Families.
+If the Column Families are identical operator succeeds. Otherwise, the 
operator fails with the appropriate
+error message.
+
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_table_create]
+:end-before: [END howto_operator_gcp_bigtable_table_create]
+
+Advanced
+
+
+When creating a table, you can specify the optional ``initial_split_keys`` and 
``column_familes`.
 
 Review comment:
   Missing double back-tick
   
   ```suggestion
   When creating a table, you can specify the optional ``initial_split_keys`` 
and ``column_familes``.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886645
 
 

 ##
 File path: docs/integration.rst
 ##
 @@ -814,6 +815,69 @@ Cloud SQL Hooks
 :members:
 
 
+Cloud Bigtable
+''
+
+Cloud Bigtable Operators
+""
 
 Review comment:
   Add needed `"`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244885613
 
 

 ##
 File path: airflow/contrib/hooks/gcp_bigtable_hook.py
 ##
 @@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from google.cloud.bigtable import Client
+from google.cloud.bigtable.cluster import Cluster
+from google.cloud.bigtable.instance import Instance
+from google.cloud.bigtable.table import Table
+from google.cloud.bigtable_admin_v2 import enums
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+
+# noinspection PyAbstractClass
 
 Review comment:
   Can we remove this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886512
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
+
+Arguments
+"
+
+All examples below rely on the following variables, which can be passed via 
environment variables.
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:start-after: [START howto_operator_gcp_bigtable_args]
+:end-before: [END howto_operator_gcp_bigtable_args]
+
+
+BigtableInstanceCreateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator`
+to create a Google Cloud Bigtable instance.
+
+If the Cloud Bigtable instance with the given ID exists, the operator does not 
compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_create]
+:end-before: [END howto_operator_gcp_bigtable_instance_create]
+
+
+BigtableInstanceDeleteOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator`
+to delete a Google Cloud Bigtable instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_delete]
+:end-before: [END howto_operator_gcp_bigtable_instance_delete]
+
+BigtableClusterUpdateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator`
+to modify number of nodes in a Cloud Bigtable cluster.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_cluster_update]
+:end-before: [END howto_operator_gcp_bigtable_cluster_update]
+
+
+BigtableTableCreateOperator
+^
 
 Review comment:
   remove extra `^`
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886449
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
+
+Arguments
+"
+
+All examples below rely on the following variables, which can be passed via 
environment variables.
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:start-after: [START howto_operator_gcp_bigtable_args]
+:end-before: [END howto_operator_gcp_bigtable_args]
+
+
+BigtableInstanceCreateOperator
+^
 
 Review comment:
   Remove extra `^`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886010
 
 

 ##
 File path: airflow/contrib/hooks/gcp_bigtable_hook.py
 ##
 @@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from google.cloud.bigtable import Client
+from google.cloud.bigtable.cluster import Cluster
+from google.cloud.bigtable.instance import Instance
+from google.cloud.bigtable.table import Table
+from google.cloud.bigtable_admin_v2 import enums
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+
+# noinspection PyAbstractClass
+class BigtableHook(GoogleCloudBaseHook):
+"""
+Hook for Google Cloud Bigtable APIs.
+"""
+
+_client = None
+
+def __init__(self,
+ gcp_conn_id='google_cloud_default',
+ delegate_to=None):
+super(BigtableHook, self).__init__(gcp_conn_id, delegate_to)
+
+def get_client(self, project_id):
+if not self._client:
+self._client = Client(project=project_id, 
credentials=self._get_credentials(), admin=True)
+return self._client
+
+def get_instance(self, project_id, instance_id):
+"""
+Retrieves and returns the specified Cloud Bigtable instance if it 
exists.
+Otherwise, returns None.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+
+client = self.get_client(project_id)
+
+instance = Instance(instance_id, client)
+if not instance.exists():
+return None
+return instance
+
+def delete_instance(self, project_id, instance_id):
+"""
+Deletes the specified Cloud Bigtable instance.
+Raises google.api_core.exceptions.NotFound if the Cloud Bigtable 
instance does not exist.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+instance = Instance(instance_id, self.get_client(project_id))
+instance.delete()
+
+def create_instance(self,
+project_id,
+instance_id,
+main_cluster_id,
+main_cluster_zone,
+replica_cluster_id=None,
+replica_cluster_zone=None,
+instance_display_name=None,
+instance_type=enums.Instance.Type.TYPE_UNSPECIFIED,
+instance_labels=None,
+cluster_nodes=None,
+
cluster_storage_type=enums.StorageType.STORAGE_TYPE_UNSPECIFIED,
+timeout=None):
+"""
+Creates new instance.
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID for the new instance.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster.
+See https://cloud.google.com/bigtable/docs/locations for more 
details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for 
the new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: enums.Instance.Type
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance.
+Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate 
with the instance.
+:type cluster_nodes: int
+:param 

[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886593
 
 

 ##
 File path: docs/howto/operator.rst
 ##
 @@ -361,6 +361,135 @@ More information
 See `Google Compute Engine API documentation
 
`_.
 
+Google Cloud Bigtable Operators
+
+
+Arguments
+"
+
+All examples below rely on the following variables, which can be passed via 
environment variables.
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:start-after: [START howto_operator_gcp_bigtable_args]
+:end-before: [END howto_operator_gcp_bigtable_args]
+
+
+BigtableInstanceCreateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceCreateOperator`
+to create a Google Cloud Bigtable instance.
+
+If the Cloud Bigtable instance with the given ID exists, the operator does not 
compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_create]
+:end-before: [END howto_operator_gcp_bigtable_instance_create]
+
+
+BigtableInstanceDeleteOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableInstanceDeleteOperator`
+to delete a Google Cloud Bigtable instance.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_instance_delete]
+:end-before: [END howto_operator_gcp_bigtable_instance_delete]
+
+BigtableClusterUpdateOperator
+^
+
+Use the 
:class:`~airflow.contrib.operators.gcp_bigtable_operator.BigtableClusterUpdateOperator`
+to modify number of nodes in a Cloud Bigtable cluster.
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_cluster_update]
+:end-before: [END howto_operator_gcp_bigtable_cluster_update]
+
+
+BigtableTableCreateOperator
+^
+
+Creates a table in a Cloud Bigtable instance.
+
+If the table with given ID exists in the Cloud Bigtable instance, the operator 
compares the Column Families.
+If the Column Families are identical operator succeeds. Otherwise, the 
operator fails with the appropriate
+error message.
+
+
+Using the operator
+""
+
+.. literalinclude:: 
../../airflow/contrib/example_dags/example_gcp_bigtable_operators.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_gcp_bigtable_table_create]
+:end-before: [END howto_operator_gcp_bigtable_table_create]
+
+Advanced
+
+
+When creating a table, you can specify the optional ``initial_split_keys`` and 
``column_familes`.
+Please refer to the Python Client for Google Cloud Bigtable documentation
+`for Table 
`_ 
and `for Column
+Families 
`_.
+
+
+BigtableTableDeleteOperator
+^
 
 Review comment:
   remove extra `^`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao closed pull request #4420: [AIRFLOW-3613] Updated ReadMe to add Adobe as an airflow user

2019-01-02 Thread GitBox
feng-tao closed pull request #4420: [AIRFLOW-3613] Updated ReadMe to add Adobe 
as an airflow user
URL: https://github.com/apache/incubator-airflow/pull/4420
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/README.md b/README.md
index 0f7c36fdc7..425df17873 100644
--- a/README.md
+++ b/README.md
@@ -109,6 +109,7 @@ Currently **officially** using Airflow:
 1. [90 Seconds](https://90seconds.tv/) 
[[@aaronmak](https://github.com/aaronmak)]
 1. [99](https://99taxis.com) [[@fbenevides](https://github.com/fbenevides), 
[@gustavoamigo](https://github.com/gustavoamigo) & 
[@mmmaia](https://github.com/mmmaia)]
 1. [AdBOOST](https://www.adboost.sk) [[AdBOOST](https://github.com/AdBOOST)]
+1. [Adobe](https://www.adobe.com/) 
[[@mishikaSingh](https://github.com/mishikaSingh), 
[@ramandumcs](https://github.com/ramandumcs), 
[@vardancse](https://github.com/vardancse)]
 1. [Agari](https://github.com/agaridata) [[@r39132](https://github.com/r39132)]
 1. [Airbnb](http://airbnb.io/) 
[[@mistercrunch](https://github.com/mistercrunch), 
[@artwr](https://github.com/artwr)]
 1. [AirDNA](https://www.airdna.co)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244885194
 
 

 ##
 File path: airflow/contrib/example_dags/example_gcp_bigtable_operators.py
 ##
 @@ -0,0 +1,149 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# 'License'); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG that creates and performs following operations on Cloud 
Bigtable:
+- creates an Instance
+- creates a Table
+- updates Cluster
+- waits for Table replication completeness
+- deletes the Table
+- deletes the Instance
+
+This DAG relies on the following environment variables
+* CBT_PROJECT_ID - Google Cloud Platform project
 
 Review comment:
   prefix it with `GCP_`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886372
 
 

 ##
 File path: airflow/contrib/operators/gcp_bigtable_operator.py
 ##
 @@ -0,0 +1,424 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import google.api_core.exceptions
+
+from airflow import AirflowException
+from airflow.models import BaseOperator
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.contrib.hooks.gcp_bigtable_hook import BigtableHook
+from airflow.utils.decorators import apply_defaults
+from google.cloud.bigtable_admin_v2 import enums
+from google.cloud.bigtable.table import ClusterState
+
+
+class BigtableValidationMixin(object):
+"""
+Common class for Cloud Bigtable operators for validating required fields.
+"""
+
+REQUIRED_ATTRIBUTES = []
+
+def _validate_inputs(self):
+for attr_name in self.REQUIRED_ATTRIBUTES:
+if not getattr(self, attr_name):
+raise AirflowException('Empty parameter: {}'.format(attr_name))
+
+
+class BigtableInstanceCreateOperator(BaseOperator, BigtableValidationMixin):
+"""
+Creates a new Cloud Bigtable instance.
+If the Cloud Bigtable instance with the given ID exists, the operator does 
not compare its configuration
+and immediately succeeds. No changes are made to the existing instance.
+
+For more details about instance creation have a look at the reference:
+
https://googleapis.github.io/google-cloud-python/latest/bigtable/instance.html#google.cloud.bigtable.instance.Instance.create
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID of the Cloud Bigtable instance to create.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster
+See https://cloud.google.com/bigtable/docs/locations for more details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for the 
new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: IntEnum
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance. Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate with 
the instance.
+:type cluster_nodes: int
+:param cluster_nodes: (optional) Number of nodes for cluster.
+:type cluster_storage_type: IntEnum
+:param cluster_storage_type: (optional) The type of storage.
+:type timeout: int
+:param timeout: (optional) timeout (in seconds) for instance creation.
+If None is not specified, Operator will wait indefinitely.
+"""
+
+REQUIRED_ATTRIBUTES = ('project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone')
+template_fields = ['project_id', 'instance_id', 'main_cluster_id', 
'main_cluster_zone']
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ main_cluster_id,
+ main_cluster_zone,
+ replica_cluster_id=None,
+ replica_cluster_zone=None,
+ instance_display_name=None,
+ instance_type=None,
+ instance_labels=None,
+ cluster_nodes=None,
+ cluster_storage_type=None,
+ timeout=None,
+ *args, **kwargs):
+self.project_id = project_id
+self.instance_id = instance_id
+self.main_cluster_id = main_cluster_id
+self.main_cluster_zone = main_cluster_zone
+self.replica_cluster_id = replica_cluster_id
+self.replica_cluster_zone = replica_cluster_zone
+self.instance_display_name = 

[GitHub] kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244886037
 
 

 ##
 File path: airflow/contrib/hooks/gcp_bigtable_hook.py
 ##
 @@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from google.cloud.bigtable import Client
+from google.cloud.bigtable.cluster import Cluster
+from google.cloud.bigtable.instance import Instance
+from google.cloud.bigtable.table import Table
+from google.cloud.bigtable_admin_v2 import enums
+from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
+
+
+# noinspection PyAbstractClass
+class BigtableHook(GoogleCloudBaseHook):
+"""
+Hook for Google Cloud Bigtable APIs.
+"""
+
+_client = None
+
+def __init__(self,
+ gcp_conn_id='google_cloud_default',
+ delegate_to=None):
+super(BigtableHook, self).__init__(gcp_conn_id, delegate_to)
+
+def get_client(self, project_id):
+if not self._client:
+self._client = Client(project=project_id, 
credentials=self._get_credentials(), admin=True)
+return self._client
+
+def get_instance(self, project_id, instance_id):
+"""
+Retrieves and returns the specified Cloud Bigtable instance if it 
exists.
+Otherwise, returns None.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+
+client = self.get_client(project_id)
+
+instance = Instance(instance_id, client)
+if not instance.exists():
+return None
+return instance
+
+def delete_instance(self, project_id, instance_id):
+"""
+Deletes the specified Cloud Bigtable instance.
+Raises google.api_core.exceptions.NotFound if the Cloud Bigtable 
instance does not exist.
+
+:param project_id: The ID of the GCP project.
+:type project_id: str
+:param instance_id: The ID of the Cloud Bigtable instance.
+:type instance_id: str
+"""
+instance = Instance(instance_id, self.get_client(project_id))
+instance.delete()
+
+def create_instance(self,
+project_id,
+instance_id,
+main_cluster_id,
+main_cluster_zone,
+replica_cluster_id=None,
+replica_cluster_zone=None,
+instance_display_name=None,
+instance_type=enums.Instance.Type.TYPE_UNSPECIFIED,
+instance_labels=None,
+cluster_nodes=None,
+
cluster_storage_type=enums.StorageType.STORAGE_TYPE_UNSPECIFIED,
+timeout=None):
+"""
+Creates new instance.
+
+:type project_id: str
+:param project_id: The ID of the GCP project.
+:type instance_id: str
+:param instance_id: The ID for the new instance.
+:type main_cluster_id: str
+:param main_cluster_id: The ID for main cluster for the new instance.
+:type main_cluster_zone: str
+:param main_cluster_zone: The zone for main cluster.
+See https://cloud.google.com/bigtable/docs/locations for more 
details.
+:type replica_cluster_id: str
+:param replica_cluster_id: (optional) The ID for replica cluster for 
the new instance.
+:type replica_cluster_zone: str
+:param replica_cluster_zone: (optional)  The zone for replica cluster.
+:type instance_type: enums.Instance.Type
+:param instance_type: (optional) The type of the instance.
+:type instance_display_name: str
+:param instance_display_name: (optional) Human-readable name of the 
instance.
+Defaults to ``instance_id``.
+:type instance_labels: dict
+:param instance_labels: (optional) Dictionary of labels to associate 
with the instance.
+:type cluster_nodes: int
+:param 

[jira] [Created] (AIRFLOW-3623) Support download log file from UI

2019-01-02 Thread Ping Zhang (JIRA)
Ping Zhang created AIRFLOW-3623:
---

 Summary: Support download log file from UI
 Key: AIRFLOW-3623
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3623
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Reporter: Ping Zhang
Assignee: Ping Zhang


for some large log files, it is not a good idea to fetch and render in the UI. 
Adding the ability to let users to download by try_number in the dag modal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882596
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
+self.instance_id = instance_id
+self.project_id = project_id
+self.database_id = database_id
+self.ddl_statements = ddl_statements
+self.gcp_conn_id = gcp_conn_id
+self._validate_inputs()
+self._hook = CloudSpannerHook(gcp_conn_id=gcp_conn_id)
+super(CloudSpannerInstanceDatabaseDeployOperator, 
self).__init__(*args, **kwargs)
+
+def _validate_inputs(self):
+if not self.project_id:
+raise AirflowException("The required parameter 'project_id' is 
empty")
+if not self.instance_id:
+raise AirflowException("The required parameter 'instance_id' is 
empty")
+if not self.database_id:
+raise AirflowException("The required parameter 'database_id' is 
empty")
+if not self.ddl_statements:
+raise AirflowException("The required parameter 'ddl_statements' is 
empty")
+
+def execute(self, context):
+if not self._hook.get_database(self.project_id,
+   self.instance_id,
+   self.database_id):
+self.log.info("Creating Cloud Spanner database "
+  "'{}' in project '{}' and instance '{}'".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return self._hook.create_database(project_id=self.project_id,
+  instance_id=self.instance_id,
+  database_id=self.database_id,
+  
ddl_statements=self.ddl_statements)
+else:
+self.log.info("The database '{}' in project '{}' and instance '{}'"
+  " already exists. Nothing to do. Exiting.".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return True
+
+
+class CloudSpannerInstanceDatabaseUpdateOperator(BaseOperator):
+"""
+Updates a Cloud Spanner database with the specified DDL statement.
+
+:param project_id: The ID of the project that owns the the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL to apply to the 
database.
+:type ddl_statements: [str]
+:param operation_id: (Optional) Unique per database operation id that can
+   be specified to implement idempotency check.
+:type operation_id: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_update_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_update_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ 

[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882559
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
+self.instance_id = instance_id
+self.project_id = project_id
+self.database_id = database_id
+self.ddl_statements = ddl_statements
+self.gcp_conn_id = gcp_conn_id
+self._validate_inputs()
+self._hook = CloudSpannerHook(gcp_conn_id=gcp_conn_id)
+super(CloudSpannerInstanceDatabaseDeployOperator, 
self).__init__(*args, **kwargs)
+
+def _validate_inputs(self):
+if not self.project_id:
+raise AirflowException("The required parameter 'project_id' is 
empty")
+if not self.instance_id:
+raise AirflowException("The required parameter 'instance_id' is 
empty")
+if not self.database_id:
+raise AirflowException("The required parameter 'database_id' is 
empty")
+if not self.ddl_statements:
+raise AirflowException("The required parameter 'ddl_statements' is 
empty")
+
+def execute(self, context):
+if not self._hook.get_database(self.project_id,
+   self.instance_id,
+   self.database_id):
+self.log.info("Creating Cloud Spanner database "
+  "'{}' in project '{}' and instance '{}'".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return self._hook.create_database(project_id=self.project_id,
+  instance_id=self.instance_id,
+  database_id=self.database_id,
+  
ddl_statements=self.ddl_statements)
+else:
+self.log.info("The database '{}' in project '{}' and instance '{}'"
+  " already exists. Nothing to do. Exiting.".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return True
+
+
+class CloudSpannerInstanceDatabaseUpdateOperator(BaseOperator):
+"""
+Updates a Cloud Spanner database with the specified DDL statement.
+
+:param project_id: The ID of the project that owns the the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL to apply to the 
database.
+:type ddl_statements: [str]
+:param operation_id: (Optional) Unique per database operation id that can
+   be specified to implement idempotency check.
+:type operation_id: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_update_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_update_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ 

[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244881394
 
 

 ##
 File path: airflow/contrib/hooks/gcp_spanner_hook.py
 ##
 @@ -168,39 +168,171 @@ def delete_instance(self, project_id, instance_id):
 """
 Deletes an existing Cloud Spanner instance.
 
-:param project_id: The ID of the project which owns the instances, 
tables and data.
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
 :type project_id: str
-:param instance_id: The ID of the instance.
+:param instance_id:  The ID of the Cloud Spanner instance.
 :type instance_id: str
 """
-client = self.get_client(project_id)
-instance = client.instance(instance_id)
+instance = self.get_client(project_id).instance(instance_id)
 try:
 instance.delete()
 return True
 except GoogleAPICallError as e:
-self.log.error('An error occurred: %s. Aborting.', e.message)
+self.log.error('An error occurred: %s. Exiting.', e.message)
+raise e
+
+def get_database(self, project_id, instance_id, database_id):
+# type: (str, str, str) -> Optional[Database]
+"""
+Retrieves a database in Cloud Spanner. If the database does not exist
+in the specified instance, it returns None.
+
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
+:type project_id: str
+:param instance_id: The ID of the Cloud Spanner instance.
+:type instance_id: str
+:param database_id: The ID of the database in Cloud Spanner.
+:type database_id: str
+:return:
+"""
+
+instance = self.get_client(project_id=project_id).instance(
+instance_id=instance_id)
+if not instance.exists():
+raise AirflowException("The instance {} does not exist in project 
{} !".
+   format(instance_id, project_id))
+database = instance.database(database_id=database_id)
+if not database.exists():
+return None
+else:
+return database
+
+def create_database(self, project_id, instance_id, database_id, 
ddl_statements):
+# type: (str, str, str, [str]) -> bool
+"""
+Creates a new database in Cloud Spanner.
+
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
+:type project_id: str
+:param instance_id: The ID of the Cloud Spanner instance.
+:type instance_id: str
+:param database_id: The ID of the database to create in Cloud Spanner.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new 
database.
+:type ddl_statements: [str]
+:return:
+"""
+
+instance = self.get_client(project_id=project_id).instance(
+instance_id=instance_id)
+if not instance.exists():
+raise AirflowException("The instance {} does not exist in project 
{} !".
+   format(instance_id, project_id))
+database = instance.database(database_id=database_id,
+ ddl_statements=ddl_statements)
+try:
+operation = database.create()  # type: Operation
+except GoogleAPICallError as e:
+self.log.error('An error occurred: %s. Exiting.', e.message)
+raise e
+
+if operation:
+result = operation.result()
+self.log.info(result)
+return True
 
 Review comment:
   Good point @jmcarp 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882408
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
+self.instance_id = instance_id
+self.project_id = project_id
+self.database_id = database_id
+self.ddl_statements = ddl_statements
+self.gcp_conn_id = gcp_conn_id
+self._validate_inputs()
+self._hook = CloudSpannerHook(gcp_conn_id=gcp_conn_id)
+super(CloudSpannerInstanceDatabaseDeployOperator, 
self).__init__(*args, **kwargs)
+
+def _validate_inputs(self):
+if not self.project_id:
+raise AirflowException("The required parameter 'project_id' is 
empty")
+if not self.instance_id:
+raise AirflowException("The required parameter 'instance_id' is 
empty")
+if not self.database_id:
+raise AirflowException("The required parameter 'database_id' is 
empty")
+if not self.ddl_statements:
+raise AirflowException("The required parameter 'ddl_statements' is 
empty")
+
+def execute(self, context):
+if not self._hook.get_database(self.project_id,
+   self.instance_id,
+   self.database_id):
+self.log.info("Creating Cloud Spanner database "
+  "'{}' in project '{}' and instance '{}'".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return self._hook.create_database(project_id=self.project_id,
+  instance_id=self.instance_id,
+  database_id=self.database_id,
+  
ddl_statements=self.ddl_statements)
+else:
+self.log.info("The database '{}' in project '{}' and instance '{}'"
 
 Review comment:
   Update it to `%s` format so that when the log level is increased or 
decreased params are rendered correctly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882051
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
 
 Review comment:
   This is not needed if you have docstrings


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882475
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
+self.instance_id = instance_id
+self.project_id = project_id
+self.database_id = database_id
+self.ddl_statements = ddl_statements
+self.gcp_conn_id = gcp_conn_id
+self._validate_inputs()
+self._hook = CloudSpannerHook(gcp_conn_id=gcp_conn_id)
+super(CloudSpannerInstanceDatabaseDeployOperator, 
self).__init__(*args, **kwargs)
+
+def _validate_inputs(self):
+if not self.project_id:
+raise AirflowException("The required parameter 'project_id' is 
empty")
+if not self.instance_id:
+raise AirflowException("The required parameter 'instance_id' is 
empty")
+if not self.database_id:
+raise AirflowException("The required parameter 'database_id' is 
empty")
+if not self.ddl_statements:
+raise AirflowException("The required parameter 'ddl_statements' is 
empty")
+
+def execute(self, context):
+if not self._hook.get_database(self.project_id,
+   self.instance_id,
+   self.database_id):
+self.log.info("Creating Cloud Spanner database "
+  "'{}' in project '{}' and instance '{}'".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return self._hook.create_database(project_id=self.project_id,
+  instance_id=self.instance_id,
+  database_id=self.database_id,
+  
ddl_statements=self.ddl_statements)
+else:
+self.log.info("The database '{}' in project '{}' and instance '{}'"
+  " already exists. Nothing to do. Exiting.".
+  format(self.database_id, self.project_id, 
self.instance_id))
+return True
+
+
+class CloudSpannerInstanceDatabaseUpdateOperator(BaseOperator):
+"""
+Updates a Cloud Spanner database with the specified DDL statement.
+
+:param project_id: The ID of the project that owns the the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL to apply to the 
database.
+:type ddl_statements: [str]
+:param operation_id: (Optional) Unique per database operation id that can
+   be specified to implement idempotency check.
+:type operation_id: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_update_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_update_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
 
 Review comment:
   optional


This is an automated 

[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244881606
 
 

 ##
 File path: airflow/contrib/hooks/gcp_spanner_hook.py
 ##
 @@ -168,39 +168,175 @@ def delete_instance(self, project_id, instance_id):
 """
 Deletes an existing Cloud Spanner instance.
 
-:param project_id: The ID of the project which owns the instances, 
tables and data.
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
 :type project_id: str
-:param instance_id: The ID of the instance.
+:param instance_id:  The ID of the Cloud Spanner instance.
 :type instance_id: str
 """
-client = self.get_client(project_id)
-instance = client.instance(instance_id)
+instance = self.get_client(project_id).instance(instance_id)
 try:
 instance.delete()
 return True
 except GoogleAPICallError as e:
-self.log.error('An error occurred: %s. Aborting.', e.message)
+self.log.error('An error occurred: %s. Exiting.', e.message)
+raise e
+
+def get_database(self, project_id, instance_id, database_id):
+# type: (str, str, str) -> Optional[Database]
+"""
+Retrieves a database in Cloud Spanner. If the database does not exist
+in the specified instance, it returns None.
+
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
+:type project_id: str
+:param instance_id: The ID of the Cloud Spanner instance.
+:type instance_id: str
+:param database_id: The ID of the database in Cloud Spanner.
+:type database_id: str
+:return: Database object or None if database does not exist
+:rtype: Union[Database, None]
+"""
+
+instance = self.get_client(project_id=project_id).instance(
+instance_id=instance_id)
+if not instance.exists():
+raise AirflowException("The instance {} does not exist in project 
{} !".
+   format(instance_id, project_id))
+database = instance.database(database_id=database_id)
+if not database.exists():
+return None
+else:
+return database
+
+def create_database(self, project_id, instance_id, database_id, 
ddl_statements):
+# type: (str, str, str, [str]) -> bool
+"""
+Creates a new database in Cloud Spanner.
+
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
+:type project_id: str
+:param instance_id: The ID of the Cloud Spanner instance.
+:type instance_id: str
+:param database_id: The ID of the database to create in Cloud Spanner.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new 
database.
+:type ddl_statements: [str]
+:return: True if everything succeeded
+:rtype: bool
+"""
+
+instance = self.get_client(project_id=project_id).instance(
+instance_id=instance_id)
+if not instance.exists():
+raise AirflowException("The instance {} does not exist in project 
{} !".
+   format(instance_id, project_id))
+database = instance.database(database_id=database_id,
+ ddl_statements=ddl_statements)
+try:
+operation = database.create()  # type: Operation
+except GoogleAPICallError as e:
+self.log.error('An error occurred: %s. Exiting.', e.message)
 raise e
 
+if operation:
+result = operation.result()
+self.log.info(result)
+return True
+
+def update_database(self, project_id, instance_id, database_id, 
ddl_statements,
+operation_id=None):
+# type: (str, str, str, [str], str) -> bool
+"""
+Updates DDL of a database in Cloud Spanner.
+
+:param project_id: The ID of the GCP project that owns the Cloud 
Spanner database.
+:type project_id: str
+:param instance_id: The ID of the Cloud Spanner instance.
+:type instance_id: str
+:param database_id: The ID of the database in Cloud Spanner.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new 
database.
+:type ddl_statements: [str]
 
 Review comment:
   Is this the general convention?? I have always seen `List(str)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please 

[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators

2019-01-02 Thread GitBox
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244882226
 
 

 ##
 File path: airflow/contrib/operators/gcp_spanner_operator.py
 ##
 @@ -197,3 +200,201 @@ def execute(self, context):
 def sanitize_queries(queries):
 if len(queries) and queries[-1] == '':
 del queries[-1]
+
+
+class CloudSpannerInstanceDatabaseDeployOperator(BaseOperator):
+"""
+Creates a new Cloud Spanner database, or if database exists,
+the operator does nothing.
+
+
+:param project_id: The ID of the project that owns the Cloud Spanner 
Database.
+:type project_id: str
+:param instance_id: The Cloud Spanner instance ID.
+:type instance_id: str
+:param database_id: The Cloud Spanner database ID.
+:type database_id: str
+:param ddl_statements: The string list containing DDL for the new database.
+:type ddl_statements: [str]
+:param gcp_conn_id: The connection ID used to connect to Google Cloud 
Platform.
+:type gcp_conn_id: str
+"""
+# [START gcp_spanner_database_deploy_template_fields]
+template_fields = ('project_id', 'instance_id', 'database_id', 
'ddl_statements',
+   'gcp_conn_id')
+template_ext = ('.sql', )
+# [END gcp_spanner_database_deploy_template_fields]
+
+@apply_defaults
+def __init__(self,
+ project_id,
+ instance_id,
+ database_id,
+ ddl_statements,
+ gcp_conn_id='google_cloud_default',
+ *args, **kwargs):
+# type: (str, str, str, [str], str, object, object) -> None
 
 Review comment:
   And as @jmcarp mentioned somewhere, `project_id` needs to be made optional. 
While you are already making other change, you can make this change as well in 
this PR. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add DayOfWeek Sensor

2019-01-02 Thread GitBox
codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add DayOfWeek Sensor
URL: 
https://github.com/apache/incubator-airflow/pull/4363#issuecomment-449730551
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=h1)
 Report
   > Merging 
[#4363](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/ecc77c2bf706a61d1dd77b1a95a00b74f29f74e3?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4363/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4363  +/-   ##
   ==
   + Coverage78.6%   78.61%   +<.01% 
   ==
 Files 204  204  
 Lines   1644516453   +8 
   ==
   + Hits1292712934   +7 
   - Misses   3518 3519   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/incubator-airflow/pull/4363/diff?src=pr=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `92.13% <0%> (-0.97%)` | :arrow_down: |
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4363/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `69.68% <0%> (-0.01%)` | :arrow_down: |
   | 
[airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4363/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==)
 | `73.81% <0%> (+0.13%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=footer).
 Last update 
[ecc77c2...666e1e8](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #4359: [AIRFLOW-3150] Make execution_date templated in TriggerDagRunOperator

2019-01-02 Thread GitBox
codecov-io edited a comment on issue #4359: [AIRFLOW-3150] Make execution_date 
templated in TriggerDagRunOperator
URL: 
https://github.com/apache/incubator-airflow/pull/4359#issuecomment-449597919
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=h1)
 Report
   > Merging 
[#4359](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/8bdf6c4ab953cb0b2eb923186e90273412bdf94e?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `93.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4359/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #4359  +/-   ##
   =
   + Coverage   78.59%   78.6%   +<.01% 
   =
 Files 204 204  
 Lines   16453   16465  +12 
   =
   + Hits12932   12943  +11 
   - Misses   35213522   +1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/dagrun\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/4359/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZGFncnVuX29wZXJhdG9yLnB5)
 | `94.73% <93.33%> (-1.42%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=footer).
 Last update 
[8bdf6c4...d6b0cb1](https://codecov.io/gh/apache/incubator-airflow/pull/4359?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3623) Support download log file from UI

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732583#comment-16732583
 ] 

ASF GitHub Bot commented on AIRFLOW-3623:
-

pingzh commented on pull request #4425: [AIRFLOW-3623] Support download logs by 
attempts from UI
URL: https://github.com/apache/incubator-airflow/pull/4425
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3623
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Support download logs by attempts from UI
   
   
![image](https://user-images.githubusercontent.com/8662365/50620229-2ee1de00-0eb3-11e9-8377-e0ffb19e437d.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620233-330dfb80-0eb3-11e9-911a-4ee9946fb8e2.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620236-3608ec00-0eb3-11e9-82a0-971a29e055a8.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620240-3903dc80-0eb3-11e9-8b0c-2283e356fd44.png)
   
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support download log file from UI
> -
>
> Key: AIRFLOW-3623
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3623
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Ping Zhang
>Assignee: Ping Zhang
>Priority: Major
>  Labels: newbie
>
> for some large log files, it is not a good idea to fetch and render in the 
> UI. Adding the ability to let users to download by try_number in the dag 
> modal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] pingzh opened a new pull request #4425: [AIRFLOW-3623] Support download logs by attempts from UI

2019-01-02 Thread GitBox
pingzh opened a new pull request #4425: [AIRFLOW-3623] Support download logs by 
attempts from UI
URL: https://github.com/apache/incubator-airflow/pull/4425
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3623
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Support download logs by attempts from UI
   
   
![image](https://user-images.githubusercontent.com/8662365/50620229-2ee1de00-0eb3-11e9-8377-e0ffb19e437d.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620233-330dfb80-0eb3-11e9-911a-4ee9946fb8e2.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620236-3608ec00-0eb3-11e9-82a0-971a29e055a8.png)
   
![image](https://user-images.githubusercontent.com/8662365/50620240-3903dc80-0eb3-11e9-8b0c-2283e356fd44.png)
   
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-3616) Connection parsed from URI - unacceptable underscore in schema part

2019-01-02 Thread Kamil (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732598#comment-16732598
 ] 

Kamil edited comment on AIRFLOW-3616 at 1/3/19 2:30 AM:


I looked at RFC 3986. It sys that scheme part must be in form:

{{scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}}

The underscore is not allowed.

Therefore, I think that introducing aliases is more beneficial. I propose to 
create a substitution of an underline character to a minus sign.

This will allow you to define a connection using:

{{google-cloud-platform://user:pass@hostname/path}}

which connection type will be created as _google_cloud_platform_

_Reference:_

_https://www.ietf.org/rfc/rfc3986.txt_


was (Author: dzakus13):
I looked at RFC 3986. It sys that scheme part must be in form:

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

The underscore is not allowed.

Therefore, I think that introducing aliases is more beneficial. I propose to 
create a substitution of an underline character to a minus sign.

This will allow you to define a connection using:

{\{ google-cloud-platform://user:pass@hostname/path }}

which connection type will be created as \{{ google_cloud_platform }}

> Connection parsed from URI  - unacceptable underscore in schema part
> 
>
> Key: AIRFLOW-3616
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3616
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kamil Bregula
>Priority: Major
>
> There is a problem with creating a new connection based on the state of the 
> environment variable if the connection type contains the underscore character.
> If we want to configure a connection based on an environment variable, we 
> must create an environment variable. The name of the environment must be 
> given according to the scheme:
>  {{AIRFLOW_CONN_[CONN_ID]}}
>  where {{[CONN_ID]}} is the connection identifier to be used, written in 
> upper case.
>  The content of the variable defines the connection and is saved in the form 
> of URI.
> Defining a URI is complex, but the key is that the connection type is given 
> as the schema. There are many possible values to give. but the sample values 
> are {{mysql}}, {{postgresql}} or {{google_cloud_platform}}. Unfortunately, 
> the last case is not properly handled.
> This is caused by using {{urllib.parse}} to process the value. Unfortunately, 
> this module does not support the uppercase character in schema port of URI - 
> see below snippets showing the behaviour.
> Since urllib.parse is really there to parse URLs and it is not good for 
> parsing non-URL URIs - we should likely use different parser which handles 
> more generic URIs. 
>  Especially that it also creates other problems:
>  https://issues.apache.org/jira/browse/AIRFLOW-3615
> Another solution is to create aliases for each connection type with a variant 
> that does not contain an unacceptable character. For example 
> {{google_cloud_platform => gcp}}. It is worth noting that one alias is 
> currently defined - {{postgresql => postgres}}.
> Snippet showing urrlib.parse behaviour:
> Python 3.6.5 (default, Oct 3 2018, 10:03:09)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 7.0.1 – An enhanced Interactive Python. Type '?' for help
> In [1]: from urllib.parse import urlparse
> In [2]: 
> urlparse("google_cloud_platform://user:pass@hostname/path?extra_param=extra_param_value")
> Out[2]: ParseResult(scheme='', netloc='', 
> path='google_cloud_platform://user:pass@hostname/path', params='', 
> query='extra_param=extra_param_value', fragment='')
> In [3]: 
> urlparse("gcp://user:pass@hostname/path?extra_param=extra_param_value")
> Out[3]: ParseResult(scheme='gcp', netloc='user:pass@hostname', path='/path', 
> params='', query='extra_param=extra_param_value', fragment='')
> Connection parsed from URI - unacceptable underscore in schema part



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3613) Update Readme to add adobe as airflow user

2019-01-02 Thread Tao Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng closed AIRFLOW-3613.
-
Resolution: Fixed

> Update Readme to add adobe as airflow user
> --
>
> Key: AIRFLOW-3613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3613
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0, 1.10.1
>Reporter: raman
>Assignee: raman
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io commented on issue #4425: [AIRFLOW-3623] Support download logs by attempts from UI

2019-01-02 Thread GitBox
codecov-io commented on issue #4425: [AIRFLOW-3623] Support download logs by 
attempts from UI
URL: 
https://github.com/apache/incubator-airflow/pull/4425#issuecomment-451043829
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=h1)
 Report
   > Merging 
[#4425](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/8bdf6c4ab953cb0b2eb923186e90273412bdf94e?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `81.08%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4425/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4425  +/-   ##
   ==
   - Coverage   78.59%   78.59%   -0.01% 
   ==
 Files 204  204  
 Lines   1645316482  +29 
   ==
   + Hits1293212954  +22 
   - Misses   3521 3528   +7
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4425/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==)
 | `73.5% <66.66%> (-0.16%)` | :arrow_down: |
   | 
[airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4425/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5)
 | `82.75% <85.71%> (+0.14%)` | :arrow_up: |
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4425/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `69.82% <93.33%> (+0.14%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=footer).
 Last update 
[8bdf6c4...21f1380](https://codecov.io/gh/apache/incubator-airflow/pull/4425?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao opened a new pull request #4427: [AIRFLOW-XXX] Update committer list based on latest TLP discussion

2019-01-02 Thread GitBox
feng-tao opened a new pull request #4427: [AIRFLOW-XXX] Update committer list 
based on latest TLP discussion
URL: https://github.com/apache/incubator-airflow/pull/4427
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Update the doc based on 
http://mail-archives.apache.org/mod_mbox/airflow-dev/201812.mbox/%3ccadikvvsdzry1mx2n2_6rlnnhr0+7uhv0jdw81if3gjoe0oj...@mail.gmail.com%3E
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4427: [AIRFLOW-XXX] Update committer list based on latest TLP discussion

2019-01-02 Thread GitBox
feng-tao commented on issue #4427: [AIRFLOW-XXX] Update committer list based on 
latest TLP discussion
URL: 
https://github.com/apache/incubator-airflow/pull/4427#issuecomment-451073375
 
 
   PTAL @kaxil @ashb @Fokko 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4426: [AIRFLOW-XXX] Add a doc on how to add a new role in RBAC UI

2019-01-02 Thread GitBox
feng-tao commented on issue #4426: [AIRFLOW-XXX] Add a doc on how to add a new 
role in RBAC UI
URL: 
https://github.com/apache/incubator-airflow/pull/4426#issuecomment-451072081
 
 
   PTAL @kaxil @Fokko 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao opened a new pull request #4426: [AIRFLOW-XXX] Add a doc on how to add a new role in RBAC UI

2019-01-02 Thread GitBox
feng-tao opened a new pull request #4426: [AIRFLOW-XXX] Add a doc on how to add 
a new role in RBAC UI
URL: https://github.com/apache/incubator-airflow/pull/4426
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Add a doc on how to add role in new UI.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4425: [AIRFLOW-3623] Support download logs by attempts from UI

2019-01-02 Thread GitBox
feng-tao commented on issue #4425: [AIRFLOW-3623] Support download logs by 
attempts from UI
URL: 
https://github.com/apache/incubator-airflow/pull/4425#issuecomment-451072313
 
 
   Does the feature works when the log is in a remote location(like some s3 
directory)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #4425: [AIRFLOW-3623] Support download logs by attempts from UI

2019-01-02 Thread GitBox
feng-tao edited a comment on issue #4425: [AIRFLOW-3623] Support download logs 
by attempts from UI
URL: 
https://github.com/apache/incubator-airflow/pull/4425#issuecomment-451072313
 
 
   Does the feature work when the log is in a remote location(like some s3 
directory)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG opened a new pull request #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
XD-DENG opened a new pull request #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: https://github.com/apache/incubator-airflow/pull/4422
 
 
   This error passed PR CI (in 
https://github.com/apache/incubator-airflow/pull/4403) when the Flake8 test was 
not working from 
https://github.com/apache/incubator-airflow/commit/7a6acbf5b343e4a6895d1cc8af75ecc02b4fd0e8
 (10 days ago) to 
https://github.com/apache/incubator-airflow/commit/0d5c127d720e3b602fb4322065511c5c1c046adb
 (today).
   
   The Flake8 test is already fixed in 
https://github.com/apache/incubator-airflow/pull/4415 
https://github.com/apache/incubator-airflow/commit/0d5c127d720e3b602fb4322065511c5c1c046adb,
 so this error arose after it was merged into master.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4414: Support for new version of python jenkins

2019-01-02 Thread GitBox
Fokko commented on issue #4414: Support for new version of python jenkins
URL: 
https://github.com/apache/incubator-airflow/pull/4414#issuecomment-450853592
 
 
   @sarmadali20 Please file a Jira issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ron819 commented on issue #3930: [AIRFLOW-2548] Output plugin import errors to web UI

2019-01-02 Thread GitBox
ron819 commented on issue #3930: [AIRFLOW-2548] Output plugin import errors to 
web UI
URL: 
https://github.com/apache/incubator-airflow/pull/3930#issuecomment-450857165
 
 
   @Fokko @kaxil can you take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-3615) Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 -> 3.5 (but not in 3.6)

2019-01-02 Thread Jarek Potiuk (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731920#comment-16731920
 ] 

Jarek Potiuk edited comment on AIRFLOW-3615 at 1/2/19 10:30 AM:


In Django, this problem is solved somewhat hackishly:

 

[https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py]

{{ # Handle postgres percent-encoded paths.}}
{{ hostname = url.hostname or ''}}
{{ if '%2f' in hostname.lower():}}
{{  # Switch to url.netloc to avoid lower cased paths}}
{{  hostname = url.netloc}}
{{  if "@" in hostname:}}
{{     hostname = hostname.rsplit("@", 1)[1]}}
{{     if ":" in hostname:}}
{{         hostname = hostname.split(":", 1)[0]}}
{{         hostname = hostname.replace('%2f', '/').replace('%2F', '/')}}


was (Author: higrys):
In Django, this problem is solved somewhat hackishly:

 

[https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py]

{{# Handle postgres percent-encoded paths.}}
{{ hostname = url.hostname or ''}}
{{ if '%2f' in hostname.lower():}}
{{    # Switch to url.netloc to avoid lower cased paths}}
{{   hostname = url.netloc}}
{{   if "@" in hostname:}}
{{     hostname = hostname.rsplit("@", 1)[1]}}
{{     if ":" in hostname:}}
{{         hostname = hostname.split(":", 1)[0]}}
{{         hostname = hostname.replace('%2f', '/').replace('%2F', '/')}}

> Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 
> -> 3.5 (but not in 3.6) 
> --
>
> Key: AIRFLOW-3615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jarek Potiuk
>Priority: Major
>
> There is a problem with case sensitivity of parsing URI for database 
> connections which are using local UNIX sockets rather than TCP connection.
> In case of local UNIX sockets the hostname part of the URI contains 
> url-encoded local socket path rather than actual hostname and in case this 
> path contains uppercase characters, urlparse will deliberately lowercase them 
> when parsing. This is perfectly fine for hostnames (according to 
> [https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation 
> should be done for hostnames.
> However urlparse still uses hostname if the URI does not contain host but 
> only local path (i.e. when the location starts with %2F ("/")). What's more - 
> the host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this 
> is somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the 
> hostname is not normalized to lowercase any more ! - see below snippets 
> showing the behaviours for different python versions) .
> In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get 
> the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it 
> is done via hostname. There is no other, reliable way when using urlparse 
> because the path can also contain 'authority' (user/password) and this is 
> urlparse's job to separate them out. The Airflow's Connection similarly does 
> not make a distinction of TCP vs. local socket connection and it uses host 
> field to store the  socket path (it's case sensitive however). So you can use 
> UPPERCASE when you define connection in the database, but this is a problem 
> for parsing connections from environment variables, because we currently 
> cannot pass a URI where socket path contains UPPERCASE characters.
> Since urlparse is really there to parse URLs and it is not good for parsing 
> non-URL URIs - we should likely use different parser which handles more 
> generic URIs - including non-lowercasing path for all versions of python.
> I think we could also consider adding local path to Connection model and use 
> it instead of hostname to store the socket path. This approach would be the 
> "correct" one, but it might introduce some compatibility issues, so maybe 
> it's not worth, considering that host is case sensitive in Airflow.
> Snippet showing urlparse behaviour in different python versions:
> {quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58)
>  [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  >>> conn = urlparse("http://AAA;)
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA;)
>  >>> conn.hostname
>  '%2faaa'
> {quote}
>  
> {quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
>  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  Traceback (most recent call last):
>  File "", line 1, in 
>  

[GitHub] codecov-io edited a comment on issue #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
codecov-io edited a comment on issue #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: 
https://github.com/apache/incubator-airflow/pull/4422#issuecomment-450830532
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=h1)
 Report
   > Merging 
[#4422](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/671769ce8d303c15ef380d92e158961200c5ede5?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/4422/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #4422  +/-   ##
   =
   + Coverage78.6%   78.6%   +<.01% 
   =
 Files 204 204  
 Lines   16445   16445  
   =
   + Hits12926   12927   +1 
   + Misses   35193518   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4422/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `92.6% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=footer).
 Last update 
[671769c...2e70530](https://codecov.io/gh/apache/incubator-airflow/pull/4422?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3578) BigQueryOperator Type Error

2019-01-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732013#comment-16732013
 ] 

ASF GitHub Bot commented on AIRFLOW-3578:
-

Fokko commented on pull request #4384: [AIRFLOW-3578] Fix Type Error for 
BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/4384
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BigQueryOperator Type Error
> ---
>
> Key: AIRFLOW-3578
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3578
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2
>
>
> The error is because it just checks for `str` type and not unicode
> *Error*:
> {noformat}
> [2018-12-27 13:33:08,756] {__init__.py:1548} ERROR - query argument must have 
> a type  not 
> Traceback (most recent call last):
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/models/__init__.py", 
> line 1431, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/operators/bigquery_operator.py",
>  line 176, in execute
> cluster_fields=self.cluster_fields,
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py",
>  line 677, in run_query
> param_type)
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py",
>  line 1903, in _validate_value
> key, expected_type, type(value)))
> TypeError: query argument must have a type  not 
> {noformat}
> To Recreate the error, try the following code:
> {code:python}
> import airflow
> from airflow import DAG
> from airflow.contrib.operators.bigquery_operator import BigQueryOperator
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': airflow.utils.dates.days_ago(2),
> }
> dag = DAG(
> dag_id='airflow_dag_2',
> default_args=default_args,
> schedule_interval=None,
> )
> task_one = BigQueryOperator(
> task_id='task_one',
> sql='select * from airport.airport',
> bigquery_conn_id='bigquery_conn',
> dag=dag
> )
> task_one
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko closed pull request #4384: [AIRFLOW-3578] Fix Type Error for BigQueryOperator

2019-01-02 Thread GitBox
Fokko closed pull request #4384: [AIRFLOW-3578] Fix Type Error for 
BigQueryOperator
URL: https://github.com/apache/incubator-airflow/pull/4384
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 30a16305db..5d8655fbdc 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -23,6 +23,7 @@
 """
 
 import time
+import six
 from builtins import range
 from copy import deepcopy
 from six import iteritems
@@ -640,8 +641,8 @@ def run_query(self,
 cluster_fields = {'fields': cluster_fields}
 
 query_param_list = [
-(sql, 'query', None, str),
-(priority, 'priority', 'INTERACTIVE', str),
+(sql, 'query', None, six.string_types),
+(priority, 'priority', 'INTERACTIVE', six.string_types),
 (use_legacy_sql, 'useLegacySql', self.use_legacy_sql, bool),
 (query_params, 'queryParameters', None, dict),
 (udf_config, 'userDefinedFunctionResources', None, list),
diff --git a/tests/contrib/operators/test_bigquery_operator.py 
b/tests/contrib/operators/test_bigquery_operator.py
index b92116a031..d005fcd519 100644
--- a/tests/contrib/operators/test_bigquery_operator.py
+++ b/tests/contrib/operators/test_bigquery_operator.py
@@ -17,12 +17,20 @@
 # specific language governing permissions and limitations
 # under the License.
 
+import os
 import unittest
+from datetime import datetime
+
+import six
+
+from airflow import configuration, models
+from airflow.models import TaskInstance, DAG
 
 from airflow.contrib.operators.bigquery_operator import \
 BigQueryCreateExternalTableOperator, BigQueryCreateEmptyTableOperator, \
 BigQueryDeleteDatasetOperator, BigQueryCreateEmptyDatasetOperator, \
 BigQueryOperator
+from airflow.settings import Session
 
 try:
 from unittest import mock
@@ -39,6 +47,8 @@
 TEST_GCS_BUCKET = 'test-bucket'
 TEST_GCS_DATA = ['dir1/*.csv']
 TEST_SOURCE_FORMAT = 'CSV'
+DEFAULT_DATE = datetime(2015, 1, 1)
+TEST_DAG_ID = 'test-bigquery-operators'
 
 
 class BigQueryCreateEmptyTableOperatorTest(unittest.TestCase):
@@ -147,6 +157,22 @@ def test_execute(self, mock_hook):
 
 
 class BigQueryOperatorTest(unittest.TestCase):
+def setUp(self):
+configuration.conf.load_test_config()
+self.dagbag = models.DagBag(
+dag_folder='/dev/null', include_examples=True)
+self.args = {'owner': 'airflow', 'start_date': DEFAULT_DATE}
+self.dag = DAG(TEST_DAG_ID, default_args=self.args)
+
+def tearDown(self):
+session = Session()
+session.query(models.TaskInstance).filter_by(
+dag_id=TEST_DAG_ID).delete()
+session.query(models.TaskFail).filter_by(
+dag_id=TEST_DAG_ID).delete()
+session.commit()
+session.close()
+
 @mock.patch('airflow.contrib.operators.bigquery_operator.BigQueryHook')
 def test_execute(self, mock_hook):
 operator = BigQueryOperator(
@@ -197,9 +223,11 @@ def test_execute(self, mock_hook):
 
 @mock.patch('airflow.contrib.operators.bigquery_operator.BigQueryHook')
 def test_bigquery_operator_defaults(self, mock_hook):
+
 operator = BigQueryOperator(
 task_id=TASK_ID,
 sql='Select * from test_table',
+dag=self.dag, default_args=self.args
 )
 
 operator.execute(None)
@@ -225,3 +253,8 @@ def test_bigquery_operator_defaults(self, mock_hook):
 api_resource_configs=None,
 cluster_fields=None,
 )
+
+self.assertTrue(isinstance(operator.sql, six.string_types))
+ti = TaskInstance(task=operator, execution_date=DEFAULT_DATE)
+ti.render_templates()
+self.assertTrue(isinstance(ti.task.sql, six.string_types))


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-3578) BigQueryOperator Type Error

2019-01-02 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong closed AIRFLOW-3578.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> BigQueryOperator Type Error
> ---
>
> Key: AIRFLOW-3578
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3578
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2, 2.0.0
>
>
> The error is because it just checks for `str` type and not unicode
> *Error*:
> {noformat}
> [2018-12-27 13:33:08,756] {__init__.py:1548} ERROR - query argument must have 
> a type  not 
> Traceback (most recent call last):
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/models/__init__.py", 
> line 1431, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/operators/bigquery_operator.py",
>  line 176, in execute
> cluster_fields=self.cluster_fields,
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py",
>  line 677, in run_query
> param_type)
>   File 
> "/Users/kaxil/Documents/GitHub/incubator-airflow/airflow/contrib/hooks/bigquery_hook.py",
>  line 1903, in _validate_value
> key, expected_type, type(value)))
> TypeError: query argument must have a type  not 
> {noformat}
> To Recreate the error, try the following code:
> {code:python}
> import airflow
> from airflow import DAG
> from airflow.contrib.operators.bigquery_operator import BigQueryOperator
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': airflow.utils.dates.days_ago(2),
> }
> dag = DAG(
> dag_id='airflow_dag_2',
> default_args=default_args,
> schedule_interval=None,
> )
> task_one = BigQueryOperator(
> task_id='task_one',
> sql='select * from airport.airport',
> bigquery_conn_id='bigquery_conn',
> dag=dag
> )
> task_one
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] XD-DENG commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
XD-DENG commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: 
https://github.com/apache/incubator-airflow/pull/4422#issuecomment-450823813
 
 
   @Fokko @feng-tao , for you notice please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG closed pull request #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
XD-DENG closed pull request #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: https://github.com/apache/incubator-airflow/pull/4422
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/mysql_to_gcs.py 
b/airflow/contrib/operators/mysql_to_gcs.py
index b1d2d81d84..99ca29bcb2 100644
--- a/airflow/contrib/operators/mysql_to_gcs.py
+++ b/airflow/contrib/operators/mysql_to_gcs.py
@@ -71,7 +71,7 @@ class MySqlToGoogleCloudStorageOperator(BaseOperator):
 :param delegate_to: The account to impersonate, if any. For this to
 work, the service account making the request must have domain-wide
 delegation enabled.
-:type delegate_to: str
+:type delegate_to: str
 """
 template_fields = ('sql', 'bucket', 'filename', 'schema_filename', 
'schema')
 template_ext = ('.sql',)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error

2019-01-02 Thread GitBox
XD-DENG commented on issue #4422: [AIRFLOW-XXX] Fix Flake8 error
URL: 
https://github.com/apache/incubator-airflow/pull/4422#issuecomment-450827664
 
 
   @eladkal No worries!
   I will close this PR then given @Fokko already noticed this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >