[GitHub] r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log

2018-09-07 Thread GitBox
r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character 
from log
URL: 
https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419615260
 
 
   @ashb Not sure I grok your comment fully? Are you giving this a +1?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-419605796
 
 
   @ashb Thank you for reviewing. Forgot to mention about the releasing part. I 
agree with you on not to release it in 1.10.1. Besides your point, this PR is 
more meaningful with some follow up PRs that I'm working on and provide a 
better story on scheduler scaling, which can then go into a bigger release.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216114428
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -85,30 +139,67 @@ def execute_async(self, key, command,
 args=[command], queue=queue)
 self.last_state[key] = celery_states.PENDING
 
+def _num_tasks_per_process(self):
+"""
+How many Celery tasks should be sent to each worker process.
+:return: Number of tasks that should be used per process
+:rtype: int
+"""
+return max(1,
+   int(math.ceil(1.0 * len(self.tasks) / 
self._sync_parallelism)))
+
 def sync(self):
-self.log.debug("Inquiring about %s celery task(s)", len(self.tasks))
-for key, task in list(self.tasks.items()):
-try:
-state = task.state
-if self.last_state[key] != state:
-if state == celery_states.SUCCESS:
-self.success(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.FAILURE:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.REVOKED:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-else:
-self.log.info("Unexpected state: %s", state)
-self.last_state[key] = state
-except Exception as e:
-self.log.error("Error syncing the celery executor, ignoring 
it:")
-self.log.exception(e)
+num_processes = min(len(self.tasks), self._sync_parallelism)
+if num_processes == 0:
+self.log.debug("No task to query celery, skipping sync")
+return
+
+self.log.debug("Inquiring about %s celery task(s) using %s processes",
+   len(self.tasks), num_processes)
+
+# Recreate the process pool each sync in case processes in the pool die
+self._sync_pool = Pool(processes=num_processes)
 
 Review comment:
   I believe multiprocessing with use os.fork() on unix systems and thus we can 
take advantage of COW to reduce ram usage. However AFAIK on Windows child 
process will reimport all module level imports and thus may require some extra 
ram. But I don't think the extra imports will create a big burden on the ram 
usage( or maybe our scheduler box is just too big :P). I'll add an entry in 
UPDATING.md regarding this config line.
   
   About the SQLA connection, you actually have the point. On Windows we might 
ended up configuring extra 16 connection pools while reimporting. And since 
subprocesses spun up by multiprocessing module do not run atexit() we might 
leave some hanging connections there in theory. However from my observation and 
test, SQLA initializes connections lazily and thus we at most have empty pool 
in the subprocesses.
   
   I might be wrong about the Windows thing and SQLA lazy initialization thing, 
open to discuss better handling if that is the case.
   
   FYI this is the test script/result I was playing with:
   ```
   ▶ cat test.py
   import os
   from multiprocessing import Pool
   
   print('execute module code')
   
   def test_func(num):
   print(num)
   
   if __name__ == '__main__':
   pool = Pool(4)
   results = pool.map(test_func, [1,2,3,4], 1)
   pool.close()
   pool.join()
   ▶ python test.py
   execute module code
   1
   2
   3
   4
   
   -- mimic Windows behavior 
   ▶ cat test.py
   import os
   from multiprocessing import Pool
   
   print('execute module code')
   
   def test_func(num):
   from airflow import settings
   print(num)
   print(settings.engine.pool.status())
   
   if __name__ == '__main__':
   pool = Pool(4)
   results = pool.map(test_func, [1,2,3,4], 1)
   pool.close()
   pool.join()
   
   ▶ python test.py
   execute module code
   execute module code
   execute module code
   execute module code
   execute module code
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80204)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80202)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80201)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80203)
   airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
   

[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216114428
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -85,30 +139,67 @@ def execute_async(self, key, command,
 args=[command], queue=queue)
 self.last_state[key] = celery_states.PENDING
 
+def _num_tasks_per_process(self):
+"""
+How many Celery tasks should be sent to each worker process.
+:return: Number of tasks that should be used per process
+:rtype: int
+"""
+return max(1,
+   int(math.ceil(1.0 * len(self.tasks) / 
self._sync_parallelism)))
+
 def sync(self):
-self.log.debug("Inquiring about %s celery task(s)", len(self.tasks))
-for key, task in list(self.tasks.items()):
-try:
-state = task.state
-if self.last_state[key] != state:
-if state == celery_states.SUCCESS:
-self.success(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.FAILURE:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.REVOKED:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-else:
-self.log.info("Unexpected state: %s", state)
-self.last_state[key] = state
-except Exception as e:
-self.log.error("Error syncing the celery executor, ignoring 
it:")
-self.log.exception(e)
+num_processes = min(len(self.tasks), self._sync_parallelism)
+if num_processes == 0:
+self.log.debug("No task to query celery, skipping sync")
+return
+
+self.log.debug("Inquiring about %s celery task(s) using %s processes",
+   len(self.tasks), num_processes)
+
+# Recreate the process pool each sync in case processes in the pool die
+self._sync_pool = Pool(processes=num_processes)
 
 Review comment:
   I believe multiprocessing with use os.fork() on unix systems and thus we can 
take advantage of COW to reduce ram usage. However AFAIK on Windows child 
process will reimport all module level imports and thus may require some extra 
ram. But I don't think the extra imports will create a big burden on the ram 
usage( or maybe our scheduler box is just too big :P). I'll add an entry in 
UPDATING.md regarding this config line.
   
   About the SQLA connection, you actually have the point. On Windows we might 
ended up configuring extra 16 connection pool while reimporting. And since 
subprocesses spun up by multiprocessing module do not run atexit() we might 
leave some hanging connections there in theory. However from my observation and 
test, SQLA initializes connections lazily and thus we at most have empty pool 
in the subprocesses.
   
   FYI this is the test script/result I was playing with:
   ```
   ▶ cat test.py
   import os
   from multiprocessing import Pool
   
   print('execute module code')
   
   def test_func(num):
   print(num)
   
   if __name__ == '__main__':
   pool = Pool(4)
   results = pool.map(test_func, [1,2,3,4], 1)
   pool.close()
   pool.join()
   ▶ python test.py
   execute module code
   1
   2
   3
   4
   
   -- mimic Windows behavior 
   ▶ cat test.py
   import os
   from multiprocessing import Pool
   
   print('execute module code')
   
   def test_func(num):
   from airflow import settings
   print(num)
   print(settings.engine.pool.status())
   
   if __name__ == '__main__':
   pool = Pool(4)
   results = pool.map(test_func, [1,2,3,4], 1)
   pool.close()
   pool.join()
   
   ▶ python test.py
   execute module code
   execute module code
   execute module code
   execute module code
   execute module code
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80204)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80202)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80201)
   airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - 
Setting up DB connection pool (PID 80203)
   airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
   airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - 
setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600

[GitHub] yangaws commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference

2018-09-07 Thread GitBox
yangaws commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference
URL: 
https://github.com/apache/incubator-airflow/pull/3767#issuecomment-419594890
 
 
   Hi @Fokko and @schipiga 
   
   Sorry for late updates. @troychen728 is unavailable for development 
recently. I will target to update this PR sometime next week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110731
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -85,30 +139,67 @@ def execute_async(self, key, command,
 args=[command], queue=queue)
 self.last_state[key] = celery_states.PENDING
 
+def _num_tasks_per_process(self):
+"""
+How many Celery tasks should be sent to each worker process.
+:return: Number of tasks that should be used per process
+:rtype: int
+"""
+return max(1,
+   int(math.ceil(1.0 * len(self.tasks) / 
self._sync_parallelism)))
+
 def sync(self):
-self.log.debug("Inquiring about %s celery task(s)", len(self.tasks))
-for key, task in list(self.tasks.items()):
-try:
-state = task.state
-if self.last_state[key] != state:
-if state == celery_states.SUCCESS:
-self.success(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.FAILURE:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.REVOKED:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-else:
-self.log.info("Unexpected state: %s", state)
-self.last_state[key] = state
-except Exception as e:
-self.log.error("Error syncing the celery executor, ignoring 
it:")
-self.log.exception(e)
+num_processes = min(len(self.tasks), self._sync_parallelism)
+if num_processes == 0:
+self.log.debug("No task to query celery, skipping sync")
+return
+
+self.log.debug("Inquiring about %s celery task(s) using %s processes",
+   len(self.tasks), num_processes)
+
+# Recreate the process pool each sync in case processes in the pool die
+self._sync_pool = Pool(processes=num_processes)
+
+# Use chunking instead of a work queue to reduce context switching 
since tasks are
+# roughly uniform in size
+chunksize = self._num_tasks_per_process()
+
+self.log.debug("Waiting for inquiries to complete...")
+task_keys_to_states = self._sync_pool.map(
+fetch_celery_task_state,
+self.tasks.items(),
+chunksize=chunksize)
+self._sync_pool.close()
+self._sync_pool.join()
+self.log.debug("Inquiries completed.")
+
+for key_and_state in task_keys_to_states:
+if isinstance(key_and_state, ExceptionWithTraceback):
+self.log.error(
+CELERY_FETCH_ERR_MSG_HEADER + ", ignoring 
it:{}\n{}\n".format(
+key_and_state.exception, key_and_state.traceback))
+else:
+key, state = key_and_state
+try:
+if self.last_state[key] != state:
+if state == celery_states.SUCCESS:
+self.success(key)
+del self.tasks[key]
+del self.last_state[key]
+elif state == celery_states.FAILURE:
+self.fail(key)
+del self.tasks[key]
+del self.last_state[key]
+elif state == celery_states.REVOKED:
+self.fail(key)
+del self.tasks[key]
+del self.last_state[key]
+else:
+self.log.info("Unexpected state: " + state)
+self.last_state[key] = state
+except Exception as e:
+self.log.error("Error syncing the Celery executor, 
ignoring "
+   "it:\n{}\n{}".format(e, 
traceback.format_exc()))
 
 Review comment:
   I did not know log.exception would automatically include the traceback. 
Thank you for the tip! Will update to `self.log.exception("Error syncing the 
Celery executor, ignoring it.")`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110256
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -85,30 +139,67 @@ def execute_async(self, key, command,
 args=[command], queue=queue)
 self.last_state[key] = celery_states.PENDING
 
+def _num_tasks_per_process(self):
+"""
+How many Celery tasks should be sent to each worker process.
+:return: Number of tasks that should be used per process
+:rtype: int
+"""
+return max(1,
+   int(math.ceil(1.0 * len(self.tasks) / 
self._sync_parallelism)))
+
 def sync(self):
-self.log.debug("Inquiring about %s celery task(s)", len(self.tasks))
-for key, task in list(self.tasks.items()):
-try:
-state = task.state
-if self.last_state[key] != state:
-if state == celery_states.SUCCESS:
-self.success(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.FAILURE:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-elif state == celery_states.REVOKED:
-self.fail(key)
-del self.tasks[key]
-del self.last_state[key]
-else:
-self.log.info("Unexpected state: %s", state)
-self.last_state[key] = state
-except Exception as e:
-self.log.error("Error syncing the celery executor, ignoring 
it:")
-self.log.exception(e)
+num_processes = min(len(self.tasks), self._sync_parallelism)
+if num_processes == 0:
+self.log.debug("No task to query celery, skipping sync")
+return
+
+self.log.debug("Inquiring about %s celery task(s) using %s processes",
+   len(self.tasks), num_processes)
+
+# Recreate the process pool each sync in case processes in the pool die
+self._sync_pool = Pool(processes=num_processes)
+
+# Use chunking instead of a work queue to reduce context switching 
since tasks are
+# roughly uniform in size
+chunksize = self._num_tasks_per_process()
+
+self.log.debug("Waiting for inquiries to complete...")
+task_keys_to_states = self._sync_pool.map(
+fetch_celery_task_state,
+self.tasks.items(),
+chunksize=chunksize)
+self._sync_pool.close()
+self._sync_pool.join()
+self.log.debug("Inquiries completed.")
+
+for key_and_state in task_keys_to_states:
+if isinstance(key_and_state, ExceptionWithTraceback):
+self.log.error(
+CELERY_FETCH_ERR_MSG_HEADER + ", ignoring 
it:{}\n{}\n".format(
+key_and_state.exception, key_and_state.traceback))
+else:
 
 Review comment:
   I don't have a preference here on the style, will update with `continue`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110104
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -63,6 +69,40 @@ def execute_command(command):
 raise AirflowException('Celery command failed')
 
 
+class ExceptionWithTraceback(object):
+"""
+Wrapper class used to propogate exceptions to parent processes from 
subprocesses.
+:param exception: The exception to wrap
+:type exception: Exception
+:param traceback: The stacktrace to wrap
+:type traceback: str
+"""
+
+def __init__(self, exception, exception_traceback):
+self.exception = exception
+self.traceback = exception_traceback
+
+
+def fetch_celery_task_state(celery_task):
+"""
+Fetch and return the state of the given celery task. The scope of this 
function is
+global so that it can be called by subprocesses in the pool.
+:param celery_task: a tuple of the Celery task key and the async Celery 
object used
+to fetch the task's state
+:type celery_task: (str, celery.result.AsyncResult)
+:return: a tuple of the Celery task key and the Celery state of the task
+:rtype: (str, str)
+"""
+
+try:
+res = (celery_task[0], celery_task[1].state)
 
 Review comment:
   Exactly. Will add the comment there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110018
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -72,10 +112,24 @@ class CeleryExecutor(BaseExecutor):
 vast amounts of messages, while providing operations with the tools
 required to maintain such a system.
 """
-def start(self):
+
+def __init__(self):
+super(CeleryExecutor, self).__init__()
+
+# Parallelize Celery requests here since Celery does not support 
parallelization.
 
 Review comment:
   What about `Parallelize Celery requests here since Celery does not support 
parallelization natively`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-07 Thread GitBox
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize 
Celery Executor
URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216109814
 
 

 ##
 File path: airflow/executors/celery_executor.py
 ##
 @@ -63,6 +69,40 @@ def execute_command(command):
 raise AirflowException('Celery command failed')
 
 
+class ExceptionWithTraceback(object):
+"""
+Wrapper class used to propogate exceptions to parent processes from 
subprocesses.
+:param exception: The exception to wrap
+:type exception: Exception
+:param traceback: The stacktrace to wrap
+:type traceback: str
+"""
+
+def __init__(self, exception, exception_traceback):
+self.exception = exception
+self.traceback = exception_traceback
+
+
+def fetch_celery_task_state(celery_task):
 
 Review comment:
   No it does not have to be a single param. Doing it in this way so that we 
don't need to unpack the `tasks` dict and thus looks cleaner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images 
in Readme.md
URL: 
https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1)
 Report
   > Merging 
[#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3865   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 200  200   
 Lines   1581515815   
   ===
 Hits1225912259   
 Misses   3556 3556
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer).
 Last update 
[08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
codecov-io commented on issue #3865: [AIRFLOW-3028] Update Text & Images in 
Readme.md
URL: 
https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1)
 Report
   > Merging 
[#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3865   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 200  200   
 Lines   1581515815   
   ===
 Hits1225912259   
 Misses   3556 3556
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer).
 Last update 
[08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images 
in Readme.md
URL: 
https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1)
 Report
   > Merging 
[#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3865   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 200  200   
 Lines   1581515815   
   ===
 Hits1225912259   
 Misses   3556 3556
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer).
 Last update 
[08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3029) New Operator - SqlOperator

2018-09-07 Thread Josh Bacon (JIRA)
Josh Bacon created AIRFLOW-3029:
---

 Summary: New Operator - SqlOperator
 Key: AIRFLOW-3029
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3029
 Project: Apache Airflow
  Issue Type: New Feature
  Components: operators
Reporter: Josh Bacon
Assignee: Josh Bacon


Proposal to add a new operator "SqlOperator" which actually doesn't already 
exist yet.

I will be submitting a Pull Request soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-3011) CLI command to output actual airflow.cfg

2018-09-07 Thread Valerii Zhuk (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607750#comment-16607750
 ] 

Valerii Zhuk edited comment on AIRFLOW-3011 at 9/7/18 10:47 PM:


Hi [~victor.noel], I`d like to take this task. But It`s quite not clear, what 
do you mean by "output"?

Output to stdout? Or you suppose a function to take args like path/to/file to 
output config content into file?


was (Author: valerii_zhuk):
[~victor.noel] 

Hi Victor, I`d like to take this task. But It`s quite not clear, what do you 
mean by "output"?

Output to stdout? Or you suppose a function to take args like path/to/file to 
output config content into file?

> CLI command to output actual airflow.cfg
> 
>
> Key: AIRFLOW-3011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3011
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Affects Versions: 1.10.0
>Reporter: Victor
>Assignee: Valerii Zhuk
>Priority: Major
>
> The only way to see the actual airflow configuration (including overriden 
> informations with environment variables) is through web ui.
> For security reason, this is often disabled.
> A CLI command to do the same thing would:
>  * give admins a way to see it
>  * give operators a way to manipulate the configuration (storing it, etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
r39132 edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in 
Readme.md
URL: 
https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419585712
 
 
   @feng-tao @ashb  @kaxil PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3011) CLI command to output actual airflow.cfg

2018-09-07 Thread Valerii Zhuk (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607750#comment-16607750
 ] 

Valerii Zhuk commented on AIRFLOW-3011:
---

[~victor.noel] 

Hi Victor, I`d like to take this task. But It`s quite not clear, what do you 
mean by "output"?

Output to stdout? Or you suppose a function to take args like path/to/file to 
output config content into file?

> CLI command to output actual airflow.cfg
> 
>
> Key: AIRFLOW-3011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3011
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Affects Versions: 1.10.0
>Reporter: Victor
>Assignee: Valerii Zhuk
>Priority: Major
>
> The only way to see the actual airflow configuration (including overriden 
> informations with environment variables) is through web ui.
> For security reason, this is often disabled.
> A CLI command to do the same thing would:
>  * give admins a way to see it
>  * give operators a way to manipulate the configuration (storing it, etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
r39132 commented on issue #3865: [AIRFLOW-3028] Update Text & Images in 
Readme.md
URL: 
https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419585712
 
 
   @kaxil PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-3011) CLI command to output actual airflow.cfg

2018-09-07 Thread Valerii Zhuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valerii Zhuk reassigned AIRFLOW-3011:
-

Assignee: Valerii Zhuk

> CLI command to output actual airflow.cfg
> 
>
> Key: AIRFLOW-3011
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3011
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Affects Versions: 1.10.0
>Reporter: Victor
>Assignee: Valerii Zhuk
>Priority: Major
>
> The only way to see the actual airflow configuration (including overriden 
> informations with environment variables) is through web ui.
> For security reason, this is often disabled.
> A CLI command to do the same thing would:
>  * give admins a way to see it
>  * give operators a way to manipulate the configuration (storing it, etc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3028) Update Text & Images in Readme.md

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607747#comment-16607747
 ] 

ASF GitHub Bot commented on AIRFLOW-3028:
-

r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in 
Readme.md
URL: https://github.com/apache/incubator-airflow/pull/3865
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3028
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Updated Readme with new images and added Apache Airflow in a few places.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   N/A
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Text & Images in Readme.md
> -
>
> Key: AIRFLOW-3028
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3028
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Siddharth Anand
>Assignee: Siddharth Anand
>Priority: Minor
>
> The images in the Readme are more than 4 years old and no longer reflect the 
> look and feel of the project. Additionally, I'm updating the readme to 
> reference Airflow as Apache Airflow in a few places. This Readme.md has not 
> changed in these areas since it was originally ported from Airbnb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in Readme.md

2018-09-07 Thread GitBox
r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in 
Readme.md
URL: https://github.com/apache/incubator-airflow/pull/3865
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3028
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Updated Readme with new images and added Apache Airflow in a few places.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   N/A
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook

2018-09-07 Thread Anthony Miyaguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Miyaguchi updated AIRFLOW-3027:
---
Description: 
The Databricks hook requires token-based authentication via the connections 
database. The token is passed into the connections field:
{code:java}
 Extras: {"token": ""}{code}
This means the token can be seen in plaintext in the Admin UI, which is 
undesirable for our setup. The AWS hook gets around this by either using boto's 
authentication mechanisms or by reading from a file.
{code:java}
elif 's3_config_file' in connection_object.extra_dejson:
aws_access_key_id, aws_secret_access_key = \
    _parse_s3_config(
    connection_object.extra_dejson['s3_config_file'],
    connection_object.extra_dejson.get('s3_config_format')){code}
[source] 
[https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]

 

The databricks hook should also support reading the token from a file to avoid 
exposing sensitive tokens in plaintext.

 

  was:
The Databricks hook requires token-based authentication via the connections 
database. The token is passed into the connections field:
{code:java}
 Extras: {"token": ""}{code}

 This means the token can be seen in plaintext in the Admin UI, which is 
undesirable for our setup. The AWS hook gets around this by either using boto's 
authentication mechanisms or by reading from a file.
{code:java}
elif 's3_config_file' in connection_object.extra_dejson:
aws_access_key_id, aws_secret_access_key = \
    _parse_s3_config(
    connection_object.extra_dejson['s3_config_file'],
    connection_object.extra_dejson.get('s3_config_format')){code}
[[source] 
[https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]

 

The databricks hook should also support reading the token from a file to avoid 
exposing sensitive tokens in plaintext.

 


> Read credentials from a file in the Databricks operators and hook
> -
>
> Key: AIRFLOW-3027
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3027
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: authentication, hooks, operators
>Affects Versions: 1.9.0
>Reporter: Anthony Miyaguchi
>Priority: Minor
>
> The Databricks hook requires token-based authentication via the connections 
> database. The token is passed into the connections field:
> {code:java}
>  Extras: {"token": ""}{code}
> This means the token can be seen in plaintext in the Admin UI, which is 
> undesirable for our setup. The AWS hook gets around this by either using 
> boto's authentication mechanisms or by reading from a file.
> {code:java}
> elif 's3_config_file' in connection_object.extra_dejson:
> aws_access_key_id, aws_secret_access_key = \
>     _parse_s3_config(
>     connection_object.extra_dejson['s3_config_file'],
>     connection_object.extra_dejson.get('s3_config_format')){code}
> [source] 
> [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]
>  
> The databricks hook should also support reading the token from a file to 
> avoid exposing sensitive tokens in plaintext.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3028) Update Text & Images in Readme.md

2018-09-07 Thread Siddharth Anand (JIRA)
Siddharth Anand created AIRFLOW-3028:


 Summary: Update Text & Images in Readme.md
 Key: AIRFLOW-3028
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3028
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Siddharth Anand
Assignee: Siddharth Anand


The images in the Readme are more than 4 years old and no longer reflect the 
look and feel of the project. Additionally, I'm updating the readme to 
reference Airflow as Apache Airflow in a few places. This Readme.md has not 
changed in these areas since it was originally ported from Airbnb.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook

2018-09-07 Thread Anthony Miyaguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Miyaguchi updated AIRFLOW-3027:
---
Description: 
The Databricks hook requires token-based authentication via the connections 
database. The token is passed into the connections field:
{code:java}
 Extras: {"token": ""}{code}

 This means the token can be seen in plaintext in the Admin UI, which is 
undesirable for our setup. The AWS hook gets around this by either using boto's 
authentication mechanisms or by reading from a file.
{code:java}
elif 's3_config_file' in connection_object.extra_dejson:
aws_access_key_id, aws_secret_access_key = \
    _parse_s3_config(
    connection_object.extra_dejson['s3_config_file'],
    connection_object.extra_dejson.get('s3_config_format')){code}
[[source] 
[https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]

 

The databricks hook should also support reading the token from a file to avoid 
exposing sensitive tokens in plaintext.

 

  was:
The Databricks hook requires token-based authentication via the connections 
database. The token is passed into the connections field:
Extras: \{"token": ""}
This means the token can be seen in plaintext in the Admin UI, which is 
undesirable for our setup. The AWS hook gets around this by either using boto's 
authentication mechanisms or by reading from a file.
{code:java}
elif 's3_config_file' in connection_object.extra_dejson:
aws_access_key_id, aws_secret_access_key = \
    _parse_s3_config(
    connection_object.extra_dejson['s3_config_file'],
    connection_object.extra_dejson.get('s3_config_format')){code}
[[source] 
https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]

 

The databricks hook should also support reading the token from a file to avoid 
exposing sensitive tokens in plaintext.

 


> Read credentials from a file in the Databricks operators and hook
> -
>
> Key: AIRFLOW-3027
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3027
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: authentication, hooks, operators
>Affects Versions: 1.9.0
>Reporter: Anthony Miyaguchi
>Priority: Minor
>
> The Databricks hook requires token-based authentication via the connections 
> database. The token is passed into the connections field:
> {code:java}
>  Extras: {"token": ""}{code}
>  This means the token can be seen in plaintext in the Admin UI, which is 
> undesirable for our setup. The AWS hook gets around this by either using 
> boto's authentication mechanisms or by reading from a file.
> {code:java}
> elif 's3_config_file' in connection_object.extra_dejson:
> aws_access_key_id, aws_secret_access_key = \
>     _parse_s3_config(
>     connection_object.extra_dejson['s3_config_file'],
>     connection_object.extra_dejson.get('s3_config_format')){code}
> [[source] 
> [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]
>  
> The databricks hook should also support reading the token from a file to 
> avoid exposing sensitive tokens in plaintext.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook

2018-09-07 Thread Anthony Miyaguchi (JIRA)
Anthony Miyaguchi created AIRFLOW-3027:
--

 Summary: Read credentials from a file in the Databricks operators 
and hook
 Key: AIRFLOW-3027
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3027
 Project: Apache Airflow
  Issue Type: Improvement
  Components: authentication, hooks, operators
Affects Versions: 1.9.0
Reporter: Anthony Miyaguchi


The Databricks hook requires token-based authentication via the connections 
database. The token is passed into the connections field:
Extras: \{"token": ""}
This means the token can be seen in plaintext in the Admin UI, which is 
undesirable for our setup. The AWS hook gets around this by either using boto's 
authentication mechanisms or by reading from a file.
{code:java}
elif 's3_config_file' in connection_object.extra_dejson:
aws_access_key_id, aws_secret_access_key = \
    _parse_s3_config(
    connection_object.extra_dejson['s3_config_file'],
    connection_object.extra_dejson.get('s3_config_format')){code}
[[source] 
https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]

 

The databricks hook should also support reading the token from a file to avoid 
exposing sensitive tokens in plaintext.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 closed pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.

2018-09-07 Thread GitBox
r39132 closed pull request #3864: [AIRFLOW-XXX] Redirect FAQ about 
`airflow[crypto]` to How-to Guides.
URL: https://github.com/apache/incubator-airflow/pull/3864
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/faq.rst b/docs/faq.rst
index 61c1ba9ce1..07c07c0bcf 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -66,13 +66,13 @@ How do I trigger tasks based on another task's failure?
 ---
 
 Check out the ``Trigger Rule`` section in the Concepts section of the
-documentation
+documentation.
 
 Why are connection passwords still not encrypted in the metadata db after I 
installed airflow[crypto]?
 
--
 
-Check out the ``Connections`` section in the Configuration section of the
-documentation
+Check out the ``Securing Connections`` section in the How-to Guides section of 
the
+documentation.
 
 What's the deal with ``start_date``?
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.

2018-09-07 Thread GitBox
r39132 commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about 
`airflow[crypto]` to How-to Guides.
URL: 
https://github.com/apache/incubator-airflow/pull/3864#issuecomment-419573110
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log

2018-09-07 Thread GitBox
ashb commented on issue #3862: [AIRFLOW-1917] Remove extra newline character 
from log
URL: 
https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419571432
 
 
   The extra line comes from reading stdout/stderr of the sub-process, so it's 
unavoidable - check out the linked jira for a repro case involving an operator. 
this StreamLogWriter is used to capture StdOut from operators etc.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wwlian commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.

2018-09-07 Thread GitBox
wwlian commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS 
encryption.
URL: 
https://github.com/apache/incubator-airflow/pull/3805#issuecomment-419570620
 
 
   @bolkedebruin @gerardo @Fokko I understand the concerns that this change 
might be coupled too tightly to Google Cloud KMS. However, I want to second 
@jakahn's assurance that this design is agnostic to the key management service 
being used. 
   
   The only opinionated design included here is that encryption will be 
performed using the envelope encryption pattern, which is a widely-recognized 
pattern by [AWS 
KMS](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#enveloping),
 [Google Cloud KMS](https://cloud.google.com/kms/docs/envelope-encryption), and 
[Azure Key 
Vault](https://docs.microsoft.com/en-us/azure/storage/common/storage-client-side-encryption#encryption-and-decryption-via-the-envelope-technique).
   
   To add to what @jakahn said re: embedding kms_conn_id and kms_extras in the 
existing _extra column, doing so would create a chicken and egg problem, as 
their values are needed to decrypt the _extras column.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs

2018-09-07 Thread GitBox
r39132 commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs
URL: 
https://github.com/apache/incubator-airflow/pull/2092#issuecomment-419563010
 
 
   Thx @jlowin What do you want to do with the Jiras?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #2048: [AIRFLOW-828] Add maximum XCom size

2018-09-07 Thread GitBox
r39132 commented on issue #2048: [AIRFLOW-828] Add maximum XCom size
URL: 
https://github.com/apache/incubator-airflow/pull/2048#issuecomment-419562868
 
 
   @jlowin What do you want to do with the Jiras?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-888) Operators should not push XComs by default

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607593#comment-16607593
 ] 

ASF GitHub Bot commented on AIRFLOW-888:


jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs
URL: https://github.com/apache/incubator-airflow/pull/2092
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 6fd7afbe5c..e0acc49a4c 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -7,18 +7,19 @@ assists people when migrating to a new version.
 
 ### New Features
 
- Dask Executor
-
-A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.
+- [AIRFLOW-862] A new DaskExecutor allows Airflow tasks to be run in Dask 
Distributed clusters
 
 ### Deprecated Features
 These features are marked for deprecation. They may still work (and raise a 
`DeprecationWarning`), but are no longer
-supported and will be removed entirely in Airflow 2.0
+supported and will be removed entirely in a future version of Airflow.
 
-- `post_execute()` hooks now take two arguments, `context` and `result`
-  (AIRFLOW-886)
+- [AIRFLOW-886] `post_execute()` hooks now take two arguments, `context` and 
`result`. Previously, post_execute() only took one argument, `context`.
 
-  Previously, post_execute() only took one argument, `context`.
+### Breaking Changes
+These changes are not backwards-compatible with previous versions of Airflow.
+
+- [AIRFLOW-888] Operators no longer automatically push XComs. This behavior 
can be reenabled globally
+  by setting `auto_xcom_push = True` in the `operators` setting of Airflow.cfg 
or on a per-Operator basis by passing `auto_xcom_push=True` when creating the 
Operator.
 
 ## Airflow 1.8
 
@@ -47,8 +48,8 @@ interfere.
 Please read through these options, defaults have changed since 1.7.1.
 
  child_process_log_directory
-In order the increase the robustness of the scheduler, DAGS our now processed 
in their own process. Therefore each 
-DAG has its own log file for the scheduler. These are placed in 
`child_process_log_directory` which defaults to 
+In order the increase the robustness of the scheduler, DAGS our now processed 
in their own process. Therefore each
+DAG has its own log file for the scheduler. These are placed in 
`child_process_log_directory` which defaults to
 `/scheduler/latest`. You will need to make sure these log files 
are removed.
 
 > DAG logs or processor logs ignore and command line settings for log file 
 > locations.
@@ -58,7 +59,7 @@ Previously the command line option `num_runs` was used to let 
the scheduler term
 loops. This is now time bound and defaults to `-1`, which means run 
continuously. See also num_runs.
 
  num_runs
-Previously `num_runs` was used to let the scheduler terminate after a certain 
amount of loops. Now num_runs specifies 
+Previously `num_runs` was used to let the scheduler terminate after a certain 
amount of loops. Now num_runs specifies
 the number of times to try to schedule each DAG file within `run_duration` 
time. Defaults to `-1`, which means try
 indefinitely. This is only available on the command line.
 
@@ -71,7 +72,7 @@ dags are not being picked up, have a look at this number and 
decrease it when ne
 
  catchup_by_default
 By default the scheduler will fill any missing interval DAG Runs between the 
last execution date and the current date.
-This setting changes that behavior to only execute the latest interval. This 
can also be specified per DAG as 
+This setting changes that behavior to only execute the latest interval. This 
can also be specified per DAG as
 `catchup = False / True`. Command line backfills will still work.
 
 ### Faulty Dags do not show an error in the Web UI
@@ -95,33 +96,33 @@ convenience variables to the config. In case your run a 
sceure Hadoop setup it m
 required to whitelist these variables by adding the following to your 
configuration:
 
 ```
- 
+
  hive.security.authorization.sqlstd.confwhitelist.append
  airflow\.ctx\..*
 
 ```
 ### Google Cloud Operator and Hook alignment
 
-All Google Cloud Operators and Hooks are aligned and use the same client 
library. Now you have a single connection 
+All Google Cloud Operators and Hooks are aligned and use the same client 
library. Now you have a single connection
 type for all kinds of Google Cloud Operators.
 
 If you experience problems connecting with your operator make sure you set the 
connection type "Google Cloud Platform".
 
-Also the old P12 key file type is not supported anymore and only the new JSON 
key files are supported as a service 
+Also the old P12 key file type is not supported anymore and only the new JSON 
key files 

[jira] [Commented] (AIRFLOW-828) Add maximum size for XComs

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607594#comment-16607594
 ] 

ASF GitHub Bot commented on AIRFLOW-828:


jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size
URL: https://github.com/apache/incubator-airflow/pull/2048
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index 6752bdb283..927e6110f8 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -200,6 +200,10 @@ def run_command(command):
 default_disk = 512
 default_gpus = 0
 
+[xcom]
+
+# the maximum size of a pickled XCom object, in bytes
+max_size = 2
 
 [webserver]
 # The base url of your website as airflow cannot guess what domain or
diff --git a/airflow/exceptions.py b/airflow/exceptions.py
index 22312083d8..e65d907416 100644
--- a/airflow/exceptions.py
+++ b/airflow/exceptions.py
@@ -22,7 +22,7 @@ class AirflowException(Exception):
 
 class AirflowConfigException(AirflowException):
 pass
-
+
 
 class AirflowSensorTimeout(AirflowException):
 pass
@@ -34,3 +34,7 @@ class AirflowTaskTimeout(AirflowException):
 
 class AirflowSkipException(AirflowException):
 pass
+
+
+class XComException(AirflowException):
+pass
diff --git a/airflow/models.py b/airflow/models.py
index 6cf7ad9dee..8911242c1f 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -61,7 +61,8 @@
 from airflow import settings, utils
 from airflow.executors import DEFAULT_EXECUTOR, LocalExecutor
 from airflow import configuration
-from airflow.exceptions import AirflowException, AirflowSkipException, 
AirflowTaskTimeout
+from airflow.exceptions import (
+AirflowException, AirflowSkipException, AirflowTaskTimeout, XComException)
 from airflow.dag.base_dag import BaseDag, BaseDagBag
 from airflow.ti_deps.deps.not_in_retry_period_dep import NotInRetryPeriodDep
 from airflow.ti_deps.deps.prev_dagrun_dep import PrevDagrunDep
@@ -3600,6 +3601,15 @@ def set(
 """
 session.expunge_all()
 
+# check XCom size
+max_xcom_size = configuration.getint('XCOM', 'MAX_SIZE')
+xcom_size = sys.getsizeof(pickle.dumps(value))
+if xcom_size > max_xcom_size:
+raise XComException(
+"The XCom's pickled size ({} bytes) is larger than the "
+"maximum allowed size ({} bytes).".format(
+xcom_size, max_xcom_size))
+
 # remove any duplicate XComs
 session.query(cls).filter(
 cls.key == key,
diff --git a/tests/models.py b/tests/models.py
index 346f47cfab..31b8f80b93 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -23,7 +23,7 @@
 import time
 
 from airflow import models, settings, AirflowException
-from airflow.exceptions import AirflowSkipException
+from airflow.exceptions import AirflowSkipException, XComException
 from airflow.models import DAG, TaskInstance as TI
 from airflow.models import State as ST
 from airflow.models import DagModel
@@ -581,6 +581,31 @@ def test_check_task_dependencies(self, trigger_rule, 
successes, skipped,
 self.assertEqual(completed, expect_completed)
 self.assertEqual(ti.state, expect_state)
 
+
+class XComTest(unittest.TestCase):
+
+def test_xcom_max_size(self):
+"""
+Test that pushing large XComs raises an error
+"""
+small_value = [0] * 100
+large_value = [0] * 100
+
+dag = models.DAG(dag_id='test_xcom')
+task = DummyOperator(
+task_id='test_xcom',
+dag=dag,
+owner='airflow',
+start_date=datetime.datetime(2016, 6, 2, 0, 0, 0))
+ti = TI(task=task, execution_date=datetime.datetime.now())
+
+# this should work
+ti.xcom_push(key='small xcom', value=small_value)
+
+# this should fail
+with self.assertRaises(XComException):
+ti.xcom_push(key='large xcom', value=large_value)
+
 def test_xcom_pull_after_success(self):
 """
 tests xcom set/clear relative to a task in a 'success' rerun scenario


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add maximum size for XComs
> --
>
> Key: AIRFLOW-828
> URL: https://issues.apache.org/jira/browse/AIRFLOW-828
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: 

[GitHub] jlowin commented on issue #2048: [AIRFLOW-828] Add maximum XCom size

2018-09-07 Thread GitBox
jlowin commented on issue #2048: [AIRFLOW-828] Add maximum XCom size
URL: 
https://github.com/apache/incubator-airflow/pull/2048#issuecomment-419547345
 
 
   Closed bc stale!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size

2018-09-07 Thread GitBox
jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size
URL: https://github.com/apache/incubator-airflow/pull/2048
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/configuration.py b/airflow/configuration.py
index 6752bdb283..927e6110f8 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -200,6 +200,10 @@ def run_command(command):
 default_disk = 512
 default_gpus = 0
 
+[xcom]
+
+# the maximum size of a pickled XCom object, in bytes
+max_size = 2
 
 [webserver]
 # The base url of your website as airflow cannot guess what domain or
diff --git a/airflow/exceptions.py b/airflow/exceptions.py
index 22312083d8..e65d907416 100644
--- a/airflow/exceptions.py
+++ b/airflow/exceptions.py
@@ -22,7 +22,7 @@ class AirflowException(Exception):
 
 class AirflowConfigException(AirflowException):
 pass
-
+
 
 class AirflowSensorTimeout(AirflowException):
 pass
@@ -34,3 +34,7 @@ class AirflowTaskTimeout(AirflowException):
 
 class AirflowSkipException(AirflowException):
 pass
+
+
+class XComException(AirflowException):
+pass
diff --git a/airflow/models.py b/airflow/models.py
index 6cf7ad9dee..8911242c1f 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -61,7 +61,8 @@
 from airflow import settings, utils
 from airflow.executors import DEFAULT_EXECUTOR, LocalExecutor
 from airflow import configuration
-from airflow.exceptions import AirflowException, AirflowSkipException, 
AirflowTaskTimeout
+from airflow.exceptions import (
+AirflowException, AirflowSkipException, AirflowTaskTimeout, XComException)
 from airflow.dag.base_dag import BaseDag, BaseDagBag
 from airflow.ti_deps.deps.not_in_retry_period_dep import NotInRetryPeriodDep
 from airflow.ti_deps.deps.prev_dagrun_dep import PrevDagrunDep
@@ -3600,6 +3601,15 @@ def set(
 """
 session.expunge_all()
 
+# check XCom size
+max_xcom_size = configuration.getint('XCOM', 'MAX_SIZE')
+xcom_size = sys.getsizeof(pickle.dumps(value))
+if xcom_size > max_xcom_size:
+raise XComException(
+"The XCom's pickled size ({} bytes) is larger than the "
+"maximum allowed size ({} bytes).".format(
+xcom_size, max_xcom_size))
+
 # remove any duplicate XComs
 session.query(cls).filter(
 cls.key == key,
diff --git a/tests/models.py b/tests/models.py
index 346f47cfab..31b8f80b93 100644
--- a/tests/models.py
+++ b/tests/models.py
@@ -23,7 +23,7 @@
 import time
 
 from airflow import models, settings, AirflowException
-from airflow.exceptions import AirflowSkipException
+from airflow.exceptions import AirflowSkipException, XComException
 from airflow.models import DAG, TaskInstance as TI
 from airflow.models import State as ST
 from airflow.models import DagModel
@@ -581,6 +581,31 @@ def test_check_task_dependencies(self, trigger_rule, 
successes, skipped,
 self.assertEqual(completed, expect_completed)
 self.assertEqual(ti.state, expect_state)
 
+
+class XComTest(unittest.TestCase):
+
+def test_xcom_max_size(self):
+"""
+Test that pushing large XComs raises an error
+"""
+small_value = [0] * 100
+large_value = [0] * 100
+
+dag = models.DAG(dag_id='test_xcom')
+task = DummyOperator(
+task_id='test_xcom',
+dag=dag,
+owner='airflow',
+start_date=datetime.datetime(2016, 6, 2, 0, 0, 0))
+ti = TI(task=task, execution_date=datetime.datetime.now())
+
+# this should work
+ti.xcom_push(key='small xcom', value=small_value)
+
+# this should fail
+with self.assertRaises(XComException):
+ti.xcom_push(key='large xcom', value=large_value)
+
 def test_xcom_pull_after_success(self):
 """
 tests xcom set/clear relative to a task in a 'success' rerun scenario


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jlowin commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs

2018-09-07 Thread GitBox
jlowin commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs
URL: 
https://github.com/apache/incubator-airflow/pull/2092#issuecomment-419547214
 
 
   @r39132 hello! I think this is stale by default at this point -- I'll close 
it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs

2018-09-07 Thread GitBox
jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs
URL: https://github.com/apache/incubator-airflow/pull/2092
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/UPDATING.md b/UPDATING.md
index 6fd7afbe5c..e0acc49a4c 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -7,18 +7,19 @@ assists people when migrating to a new version.
 
 ### New Features
 
- Dask Executor
-
-A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters.
+- [AIRFLOW-862] A new DaskExecutor allows Airflow tasks to be run in Dask 
Distributed clusters
 
 ### Deprecated Features
 These features are marked for deprecation. They may still work (and raise a 
`DeprecationWarning`), but are no longer
-supported and will be removed entirely in Airflow 2.0
+supported and will be removed entirely in a future version of Airflow.
 
-- `post_execute()` hooks now take two arguments, `context` and `result`
-  (AIRFLOW-886)
+- [AIRFLOW-886] `post_execute()` hooks now take two arguments, `context` and 
`result`. Previously, post_execute() only took one argument, `context`.
 
-  Previously, post_execute() only took one argument, `context`.
+### Breaking Changes
+These changes are not backwards-compatible with previous versions of Airflow.
+
+- [AIRFLOW-888] Operators no longer automatically push XComs. This behavior 
can be reenabled globally
+  by setting `auto_xcom_push = True` in the `operators` setting of Airflow.cfg 
or on a per-Operator basis by passing `auto_xcom_push=True` when creating the 
Operator.
 
 ## Airflow 1.8
 
@@ -47,8 +48,8 @@ interfere.
 Please read through these options, defaults have changed since 1.7.1.
 
  child_process_log_directory
-In order the increase the robustness of the scheduler, DAGS our now processed 
in their own process. Therefore each 
-DAG has its own log file for the scheduler. These are placed in 
`child_process_log_directory` which defaults to 
+In order the increase the robustness of the scheduler, DAGS our now processed 
in their own process. Therefore each
+DAG has its own log file for the scheduler. These are placed in 
`child_process_log_directory` which defaults to
 `/scheduler/latest`. You will need to make sure these log files 
are removed.
 
 > DAG logs or processor logs ignore and command line settings for log file 
 > locations.
@@ -58,7 +59,7 @@ Previously the command line option `num_runs` was used to let 
the scheduler term
 loops. This is now time bound and defaults to `-1`, which means run 
continuously. See also num_runs.
 
  num_runs
-Previously `num_runs` was used to let the scheduler terminate after a certain 
amount of loops. Now num_runs specifies 
+Previously `num_runs` was used to let the scheduler terminate after a certain 
amount of loops. Now num_runs specifies
 the number of times to try to schedule each DAG file within `run_duration` 
time. Defaults to `-1`, which means try
 indefinitely. This is only available on the command line.
 
@@ -71,7 +72,7 @@ dags are not being picked up, have a look at this number and 
decrease it when ne
 
  catchup_by_default
 By default the scheduler will fill any missing interval DAG Runs between the 
last execution date and the current date.
-This setting changes that behavior to only execute the latest interval. This 
can also be specified per DAG as 
+This setting changes that behavior to only execute the latest interval. This 
can also be specified per DAG as
 `catchup = False / True`. Command line backfills will still work.
 
 ### Faulty Dags do not show an error in the Web UI
@@ -95,33 +96,33 @@ convenience variables to the config. In case your run a 
sceure Hadoop setup it m
 required to whitelist these variables by adding the following to your 
configuration:
 
 ```
- 
+
  hive.security.authorization.sqlstd.confwhitelist.append
  airflow\.ctx\..*
 
 ```
 ### Google Cloud Operator and Hook alignment
 
-All Google Cloud Operators and Hooks are aligned and use the same client 
library. Now you have a single connection 
+All Google Cloud Operators and Hooks are aligned and use the same client 
library. Now you have a single connection
 type for all kinds of Google Cloud Operators.
 
 If you experience problems connecting with your operator make sure you set the 
connection type "Google Cloud Platform".
 
-Also the old P12 key file type is not supported anymore and only the new JSON 
key files are supported as a service 
+Also the old P12 key file type is not supported anymore and only the new JSON 
key files are supported as a service
 account.
 
 ### Deprecated Features
-These features are marked for deprecation. They may still work (and raise a 
`DeprecationWarning`), but are no longer 
+These features are marked for deprecation. They may still 

[GitHub] Fokko commented on a change in pull request #3767: [AIRFLOW-2524]Add SageMaker Batch Inference

2018-09-07 Thread GitBox
Fokko commented on a change in pull request #3767: [AIRFLOW-2524]Add SageMaker 
Batch Inference
URL: https://github.com/apache/incubator-airflow/pull/3767#discussion_r216045945
 
 

 ##
 File path: airflow/contrib/sensors/sagemaker_transform_sensor.py
 ##
 @@ -0,0 +1,69 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.contrib.hooks.sagemaker_hook import SageMakerHook
+from airflow.contrib.sensors.sagemaker_base_sensor import SageMakerBaseSensor
+from airflow.utils.decorators import apply_defaults
+
+
+class SageMakerTransformSensor(SageMakerBaseSensor):
+"""
+Asks for the state of the transform state until it reaches a terminal 
state.
+The sensor will error if the job errors, throwing a AirflowException
+containing the failure reason.
+
+:param job_name: job_name of the transform job instance to check the state 
of
+:type job_name: string
+:param region_name: The AWS region_name
+:type region_name: string
+"""
+
+template_fields = ['job_name']
+template_ext = ()
+
+@apply_defaults
+def __init__(self,
+ job_name,
+ region_name=None,
+ *args,
+ **kwargs):
+super(SageMakerTransformSensor, self).__init__(*args, **kwargs)
+self.job_name = job_name
+self.region_name = region_name
+
+def non_terminal_states(self):
 
 Review comment:
   Since the function does not use anything from the class itself, it is 
prettier to make it static since it is then easier to reuse in other 
classes/functions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference

2018-09-07 Thread GitBox
Fokko commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference
URL: 
https://github.com/apache/incubator-airflow/pull/3767#issuecomment-419523736
 
 
   Can you rebase onto master as well? :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.

2018-09-07 Thread GitBox
codecov-io commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about 
`airflow[crypto]` to How-to Guides.
URL: 
https://github.com/apache/incubator-airflow/pull/3864#issuecomment-41952
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=h1)
 Report
   > Merging 
[#3864](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/abb12be84bce19aa0a00c96acab4daa8f8824db5?src=pr=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3864/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3864   +/-   ##
   ===
 Coverage   77.51%   77.51%   
   ===
 Files 200  200   
 Lines   1581515815   
   ===
 Hits1225912259   
 Misses   3556 3556
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=footer).
 Last update 
[abb12be...2963b28](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman edited a comment on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig

2018-09-07 Thread GitBox
dimberman edited a comment on issue #3855: [AIRFLOW-3022] Add volume mount to 
KubernetesExecutorConfig
URL: 
https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419517507
 
 
   @Fokko I can't approve anything involving the k8s executor because 
@gerardo's PR for dockerized CI broke the k8s tests s.t. they revert to the 
non-k8s tests (if you look at the CI you'll see that the k8s tests aren't 
actually launching a cluster). I've been trying to figure out how to get the 
k8s tests to work but for some reason the docker-compose stuff has broken the 
ability to correctly use minikube.
   
   PTAL at this https://github.com/apache/incubator-airflow/pull/3797. I 
managed to get most of the way with switching to the dind k8s cluster, but 
still need to figure out how to use the airflow API Since we can't use 
`minikube ip` anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dimberman commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig

2018-09-07 Thread GitBox
dimberman commented on issue #3855: [AIRFLOW-3022] Add volume mount to 
KubernetesExecutorConfig
URL: 
https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419517507
 
 
   @Fokko I can't approve anything involving the k8s executor because 
@gerardo's PR for dockerized CI broke the k8s tests s.t. they revert to the 
non-k8s tests (if you look at the CI you'll see that the k8s tests aren't 
actually launching a cluster). I've been trying to figure out how to get the 
k8s tests to work but for some reason the docker-compose stuff has broken the 
ability to correctly use minikube.
   
   PTAL at thishttps://github.com/apache/incubator-airflow/pull/3797. I managed 
to get most of the way with switching to the dind k8s cluster, but still need 
to figure out how to use the airflow API Since we can't use `minikube ip` 
anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig

2018-09-07 Thread GitBox
Fokko commented on issue #3855: [AIRFLOW-3022] Add volume mount to 
KubernetesExecutorConfig
URL: 
https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419515166
 
 
   @dimberman PTAL :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3859: [AIRFLOW-3020]LDAP Authentication doesn't check whether a user belongs to a group correctly

2018-09-07 Thread GitBox
Fokko commented on issue #3859: [AIRFLOW-3020]LDAP Authentication doesn't check 
whether a user belongs to a group correctly
URL: 
https://github.com/apache/incubator-airflow/pull/3859#issuecomment-419512721
 
 
   @zeninpalm Would it be possible to test this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zihengCat opened a new pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.

2018-09-07 Thread GitBox
zihengCat opened a new pull request #3864: [AIRFLOW-XXX] Redirect FAQ about 
`airflow[crypto]` to How-to Guides.
URL: https://github.com/apache/incubator-airflow/pull/3864
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Update `Apache Airflow (incubating)` Documentation, redirect *FAQ* about 
`airflow[crypto]` to *How-to Guides*.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator

2018-09-07 Thread GitBox
Fokko commented on a change in pull request #3860: [AIRFLOW-3025] Enable 
specifying dns and dns_search options for DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3860#discussion_r216030227
 
 

 ##
 File path: airflow/operators/docker_operator.py
 ##
 @@ -110,6 +114,8 @@ def __init__(
 api_version=None,
 command=None,
 cpus=1.0,
+dns=None,
+dns_search=None,
 
 Review comment:
   Can you move this to the end of the argument list, so we keep is as much as 
possible backward compatible?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-09-07 Thread GitBox
wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-419508787
 
 
   I have resolved the outstanding conflicts and awaiting feedback. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models

2018-09-07 Thread GitBox
Fokko commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class 
in models
URL: 
https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419506341
 
 
   Is the `from airflow.api.common.experimental import pool` still used? We try 
to unload code from the `models.py` since this file is huge :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 edited a comment on issue #3862: [AIRFLOW-1917] Remove extra newline character from log

2018-09-07 Thread GitBox
r39132 edited a comment on issue #3862: [AIRFLOW-1917] Remove extra newline 
character from log
URL: 
https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419504375
 
 
   @ckljohn Any idea where this extra newline is being created? I'd rather 
remove the source of the extra newline than strip it afterwards. This feels 
like a hack.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log

2018-09-07 Thread GitBox
r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character 
from log
URL: 
https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419504375
 
 
   @ckljohn Any idea where this extra newline is being created? This feels like 
a hack.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models

2018-09-07 Thread GitBox
ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool 
class in models
URL: 
https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419501219
 
 
   Add @Fokko and @r39132 to address the gap of [AIRFLOW-2882]


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models

2018-09-07 Thread GitBox
ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool 
class in models
URL: 
https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419500933
 
 
   @feng-tao thanks for reviewing on this, I would need some suggestions here: 
   1. Since this is refactoring changes, the tests/core.py should cover the 
unit tests for them, for example,  
https://github.com/apache/incubator-airflow/blob/master/tests/core.py#L1493. 
Should I add more unit test?
   2. I can remove pool class from api client, but my question is should we 
continue using `airflow.api.common.experimental` for pool or have all methods 
in Pool class like we did for `Variable`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607341#comment-16607341
 ] 

ASF GitHub Bot commented on AIRFLOW-2997:
-

Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered 
tables
URL: https://github.com/apache/incubator-airflow/pull/3838
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 44ecd49e9e..245e3a8233 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -496,7 +496,8 @@ def run_query(self,
   schema_update_options=(),
   priority='INTERACTIVE',
   time_partitioning=None,
-  api_resource_configs=None):
+  api_resource_configs=None,
+  cluster_fields=None):
 """
 Executes a BigQuery SQL query. Optionally persists results in a 
BigQuery
 table. See here:
@@ -565,8 +566,12 @@ def run_query(self,
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
-
+:param cluster_fields: Request that the result of this query be stored 
sorted
+by one or more columns. This is only available in combination with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
+
 if not api_resource_configs:
 api_resource_configs = self.api_resource_configs
 else:
@@ -631,6 +636,9 @@ def run_query(self,
 'tableId': destination_table,
 }
 
+if cluster_fields:
+cluster_fields = {'fields': cluster_fields}
+
 query_param_list = [
 (sql, 'query', None, str),
 (priority, 'priority', 'INTERACTIVE', str),
@@ -641,7 +649,8 @@ def run_query(self,
 (maximum_bytes_billed, 'maximumBytesBilled', None, float),
 (time_partitioning, 'timePartitioning', {}, dict),
 (schema_update_options, 'schemaUpdateOptions', None, tuple),
-(destination_dataset_table, 'destinationTable', None, dict)
+(destination_dataset_table, 'destinationTable', None, dict),
+(cluster_fields, 'clustering', None, dict),
 ]
 
 for param_tuple in query_param_list:
@@ -856,7 +865,8 @@ def run_load(self,
  allow_jagged_rows=False,
  schema_update_options=(),
  src_fmt_configs=None,
- time_partitioning=None):
+ time_partitioning=None,
+ cluster_fields=None):
 """
 Executes a BigQuery load command to load data from Google Cloud Storage
 to BigQuery. See here:
@@ -920,6 +930,10 @@ def run_load(self,
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
+:param cluster_fields: Request that the result of this load be stored 
sorted
+by one or more columns. This is only available in combination with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
 
 # bigquery only allows certain source formats
@@ -983,6 +997,9 @@ def run_load(self,
 'timePartitioning': time_partitioning
 })
 
+if cluster_fields:
+configuration['load'].update({'clustering': {'fields': 
cluster_fields}})
+
 if schema_fields:
 configuration['load']['schema'] = {'fields': schema_fields}
 
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index b0c0ce2d6e..025af034ad 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -100,6 +100,10 @@ class BigQueryOperator(BaseOperator):
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
+:param cluster_fields: Request that the result of this query be stored 
sorted
+by one or more columns. This is only available in conjunction with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
 
 template_fields = ('bql', 'sql', 'destination_dataset_table', 'labels')
@@ 

[GitHub] Fokko commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-07 Thread GitBox
Fokko commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered 
tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-419497532
 
 
   Thanks @chronitis for the PR. Thanks @tswast @kaxil for reviewing!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-07 Thread GitBox
Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered 
tables
URL: https://github.com/apache/incubator-airflow/pull/3838
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 44ecd49e9e..245e3a8233 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -496,7 +496,8 @@ def run_query(self,
   schema_update_options=(),
   priority='INTERACTIVE',
   time_partitioning=None,
-  api_resource_configs=None):
+  api_resource_configs=None,
+  cluster_fields=None):
 """
 Executes a BigQuery SQL query. Optionally persists results in a 
BigQuery
 table. See here:
@@ -565,8 +566,12 @@ def run_query(self,
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
-
+:param cluster_fields: Request that the result of this query be stored 
sorted
+by one or more columns. This is only available in combination with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
+
 if not api_resource_configs:
 api_resource_configs = self.api_resource_configs
 else:
@@ -631,6 +636,9 @@ def run_query(self,
 'tableId': destination_table,
 }
 
+if cluster_fields:
+cluster_fields = {'fields': cluster_fields}
+
 query_param_list = [
 (sql, 'query', None, str),
 (priority, 'priority', 'INTERACTIVE', str),
@@ -641,7 +649,8 @@ def run_query(self,
 (maximum_bytes_billed, 'maximumBytesBilled', None, float),
 (time_partitioning, 'timePartitioning', {}, dict),
 (schema_update_options, 'schemaUpdateOptions', None, tuple),
-(destination_dataset_table, 'destinationTable', None, dict)
+(destination_dataset_table, 'destinationTable', None, dict),
+(cluster_fields, 'clustering', None, dict),
 ]
 
 for param_tuple in query_param_list:
@@ -856,7 +865,8 @@ def run_load(self,
  allow_jagged_rows=False,
  schema_update_options=(),
  src_fmt_configs=None,
- time_partitioning=None):
+ time_partitioning=None,
+ cluster_fields=None):
 """
 Executes a BigQuery load command to load data from Google Cloud Storage
 to BigQuery. See here:
@@ -920,6 +930,10 @@ def run_load(self,
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
+:param cluster_fields: Request that the result of this load be stored 
sorted
+by one or more columns. This is only available in combination with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
 
 # bigquery only allows certain source formats
@@ -983,6 +997,9 @@ def run_load(self,
 'timePartitioning': time_partitioning
 })
 
+if cluster_fields:
+configuration['load'].update({'clustering': {'fields': 
cluster_fields}})
+
 if schema_fields:
 configuration['load']['schema'] = {'fields': schema_fields}
 
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index b0c0ce2d6e..025af034ad 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -100,6 +100,10 @@ class BigQueryOperator(BaseOperator):
 expiration as per API specifications. Note that 'field' is not 
available in
 conjunction with dataset.table$partition.
 :type time_partitioning: dict
+:param cluster_fields: Request that the result of this query be stored 
sorted
+by one or more columns. This is only available in conjunction with
+time_partitioning. The order of columns given determines the sort 
order.
+:type cluster_fields: list of str
 """
 
 template_fields = ('bql', 'sql', 'destination_dataset_table', 'labels')
@@ -127,6 +131,7 @@ def __init__(self,
  priority='INTERACTIVE',
  time_partitioning=None,
  api_resource_configs=None,
+ cluster_fields=None,
  *args,
  

[jira] [Closed] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators

2018-09-07 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong closed AIRFLOW-2997.
-
Resolution: Fixed

> Support for clustered tables in Bigquery hooks/operators
> 
>
> Key: AIRFLOW-2997
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2997
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Reporter: Gordon Ball
>Priority: Minor
> Fix For: 1.10.1
>
>
> Bigquery support for clustered tables was added (at GCP "Beta" level) on 
> 2018-07-30. This feature allows load or table-creating query operations to 
> request that data be stored sorted by a subset of columns, allowing more 
> efficient (and potentially cheaper) subsequent queries.
>  Support for specifying fields to cluster on should be added to at least the 
> bigquery hook, load-from-GCS operator and query operator.
>  Documentation: https://cloud.google.com/bigquery/docs/clustered-tables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3026) running initdb with default parameters will cause unhandled exception in models.py

2018-09-07 Thread Shawn Dion (JIRA)
Shawn Dion created AIRFLOW-3026:
---

 Summary: running initdb with default parameters will cause 
unhandled exception in models.py
 Key: AIRFLOW-3026
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3026
 Project: Apache Airflow
  Issue Type: Bug
  Components: models
Affects Versions: 1.10.0, 1.9.0
Reporter: Shawn Dion


the default fernet key is:
 fernet_key = cryptography_not_found_storing_passwords_in_plain_text
  
 when running airflow initdb:
{code:java}
File "/home/usr/.local/lib/python3.6/site-packages/airflow/models.py", line 
160, in get_fernet
_fernet = Fernet(configuration.conf.get('core', 'FERNET_KEY').encode('utf-8'))
File "/usr/local/lib64/python3.6/site-packages/cryptography/fernet.py", line 
34, in __init__
key = base64.urlsafe_b64decode(key)
File "/usr/lib64/python3.6/base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "/usr/lib64/python3.6/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding


{code}
 

TypeError and ValueError were handled in V1.9 but his error binascii.Error is 
not handled correctly / at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] XD-DENG opened a new pull request #3863: [AIRFLOW-XXX] Refine web UI authentication-related docs

2018-09-07 Thread GitBox
XD-DENG opened a new pull request #3863: [AIRFLOW-XXX] Refine web UI 
authentication-related docs
URL: https://github.com/apache/incubator-airflow/pull/3863
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Now in Airflow 1.10 we're already providing older version of web UI and new 
FAB-based UI at the same time. But the documentation is not differentiated very 
well. For example,
   - this doc https://airflow.apache.org/security.html#password is only 
applicable for old web UI only, but it's not hightlighted.
   - command line tool `create_user`  is only for new FAB-based UI only, it's 
not highlighted as well.
   
   This may be confusing to users, especially given not everyone is aware of 
the existence of two UIs.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Doc change only
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-1917) print() from python operators end up with extra new line

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607102#comment-16607102
 ] 

ASF GitHub Bot commented on AIRFLOW-1917:
-

ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline 
character from log
URL: https://github.com/apache/incubator-airflow/pull/3862
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1917
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Remove extra newline character from log.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Trivial change.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> print() from python operators end up with extra new line
> 
>
> Key: AIRFLOW-1917
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1917
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
>Priority: Major
> Fix For: 1.10.1
>
>
> If I have the following as the callable for a PythonOperator:
> {code}
> def print_stuff(ti, **kwargs):
> print("Hi from", __file__)
> print("Hi 2 from", __file__)
> {code}
> I see the following in the log files
> {noformat}
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from 
> /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: 
> [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value 
> was: None
> {noformat}
> Note the extra blank lines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline character from log

2018-09-07 Thread GitBox
ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline 
character from log
URL: https://github.com/apache/incubator-airflow/pull/3862
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-1917
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   Remove extra newline character from log.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Trivial change.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG closed pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
XD-DENG closed pull request #3861: [AIRFLOW-XXX] Fix doc of command line 
interface
URL: https://github.com/apache/incubator-airflow/pull/3861
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 4ff1ae3679..4fa0f3a5ea 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1457,7 +1457,7 @@ class CLIFactory(object):
 'subdir': Arg(
 ("-sd", "--subdir"),
 "File location or directory from which to look for the dag",
-default=settings.DAGS_FOLDER),
+default="[AIRFLOW_HOME]/dags"),
 'start_date': Arg(
 ("-s", "--start_date"), "Override start_date -MM-DD",
 type=parsedate),
@@ -1874,7 +1874,7 @@ class CLIFactory(object):
 "If reset_dag_run option is used,"
 " backfill will first prompt users whether airflow "
 "should clear all the previous dag_run and task_instances "
-"within the backfill date range."
+"within the backfill date range. "
 "If rerun_failed_tasks is used, backfill "
 "will auto re-run the previous failed task instances"
 " within the backfill date range.",


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
XD-DENG commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line 
interface
URL: 
https://github.com/apache/incubator-airflow/pull/3861#issuecomment-419405297
 
 
   Thanks @kaxil for the further checking.
   
   I'll close this PR then. Regarding **issue 2** (a missing space), may you 
fix that in your new PR as well? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
kaxil commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface
URL: 
https://github.com/apache/incubator-airflow/pull/3861#issuecomment-419398764
 
 
   Good catch @XD-DENG . Wonder how long this issue went un-noticed. We have 
this issue in **1.9.0**, **1.8.2**  docs as well:
   
   1.9.0: 
https://airflow.readthedocs.io/en/1.9.0/cli.html#Named%20Arguments_repeat4
   1.8.2: 
https://airflow-fork-k1.readthedocs.io/en/v1-8-stable/cli.html#Named%20Arguments_repeat1
   
   
![image](https://user-images.githubusercontent.com/8811558/45214222-c6327e00-b291-11e8-8a8f-15bd65137d6e.png)
   
   I fear we had this Bug since the start.
   
   @ashb you are right, changing this will cause issues. I will create a new PR 
to take care of this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of 
command line interface
URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215888710
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -1457,7 +1457,7 @@ class CLIFactory(object):
 'subdir': Arg(
 ("-sd", "--subdir"),
 "File location or directory from which to look for the dag",
-default=settings.DAGS_FOLDER),
+default="[AIRFLOW_HOME]/dags"),
 
 Review comment:
   You're right. Didn't realize this.
   
   I would leave this to @kaxil to decide how he would like to fix this (please 
feel free to close this PR should you're going to fix this separately).
   
   But we do need to fix this. `Default: /Users/kaxil/airflow/dags` in the 
documentation is very confusing and misleading to doc readers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-07 Thread GitBox
chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery 
clustered tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-419369853
 
 
   @kaxil squashed into a single commit
   
   @tswast removed client side validation logic
   
   There appears to be a transitional failure on `py27-backend_mysql`; all the 
other database/python variants succeeded.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of 
command line interface
URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215887440
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -1457,7 +1457,7 @@ class CLIFactory(object):
 'subdir': Arg(
 ("-sd", "--subdir"),
 "File location or directory from which to look for the dag",
-default=settings.DAGS_FOLDER),
+default="[AIRFLOW_HOME]/dags"),
 
 Review comment:
   Unfortunatley changing this default actually changes the value that will be 
in `args.subdir` - so this change would break a lot of things.
   
   It might be possible to change the `$AIRFLOW_HOME` when rendering the docs, 
or to change the display of this param?
   
   @kaxil How are the docs built when render then? Would it maybe be worth 
setting something like `export AIRFLOW_HOME=\$AIRFLOW_HOME` so that these 
defaults make sense? (That might well break other things)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of 
command line interface
URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215888710
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -1457,7 +1457,7 @@ class CLIFactory(object):
 'subdir': Arg(
 ("-sd", "--subdir"),
 "File location or directory from which to look for the dag",
-default=settings.DAGS_FOLDER),
+default="[AIRFLOW_HOME]/dags"),
 
 Review comment:
   You're right. Didn't realize this. I would leave this to @kaxil to decide 
how he would like to fix this (please feel free to close this PR should you're 
going to fix this separately).
   
   But we do need to fix this. `Default: /Users/kaxil/airflow/dags` in the 
documentation is very confusing and misleading to doc readers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of 
command line interface
URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215887440
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -1457,7 +1457,7 @@ class CLIFactory(object):
 'subdir': Arg(
 ("-sd", "--subdir"),
 "File location or directory from which to look for the dag",
-default=settings.DAGS_FOLDER),
+default="[AIRFLOW_HOME]/dags"),
 
 Review comment:
   Unfortunatley changing this default actually changes the value that will be 
in `args.subdir` - so this change would break a lot of things.
   
   It might be possible to change the `$AIRFLOW_HOME` when rendering the docs, 
or to change the display of this param?
   
   @kaxil How are the docs built when render


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG opened a new pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface

2018-09-07 Thread GitBox
XD-DENG opened a new pull request #3861: [AIRFLOW-XXX] Fix doc of command line 
interface
URL: https://github.com/apache/incubator-airflow/pull/3861
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   
   There are two issues found in documentation of command line interface
   
   **Issue 1**
   https://user-images.githubusercontent.com/11539188/45207953-01e23d80-b2bc-11e8-92a7-d08e7d5c54bf.png;>
   The default value of `--subdir` is generated dynamically, so Kaxil's Dag 
Folder during compiling is used in the final documentation (this appeared 14 
times in https://airflow.apache.org/cli.html). I believe this should be 
"hardcoded" into `[AIRFLOW_HOME]/dags`.
   
   **Issue 2**
   A minor typo (a space is missing).
   https://user-images.githubusercontent.com/11539188/45208164-89c84780-b2bc-11e8-9ae9-4f72f48e8b37.png;>
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   Only doc is changed
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator

2018-09-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606815#comment-16606815
 ] 

Konrad Gołuchowski commented on AIRFLOW-3025:
-

Added PR here: https://github.com/apache/incubator-airflow/pull/3860

> Allow to specify dns and dns-search parameters for DockerOperator
> -
>
> Key: AIRFLOW-3025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3025
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Konrad Gołuchowski
>Assignee: Konrad Gołuchowski
>Priority: Minor
>
> Docker allows to specify dns and dns-search options when starting a 
> container. It would be useful to enable DockerOperator to use these two 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator

2018-09-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606814#comment-16606814
 ] 

ASF GitHub Bot commented on AIRFLOW-3025:
-

kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns 
and dns_search options for DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3860
 
 
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3025
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow to specify dns and dns-search parameters for DockerOperator
> -
>
> Key: AIRFLOW-3025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3025
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Konrad Gołuchowski
>Assignee: Konrad Gołuchowski
>Priority: Minor
>
> Docker allows to specify dns and dns-search options when starting a 
> container. It would be useful to enable DockerOperator to use these two 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator

2018-09-07 Thread GitBox
kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns 
and dns_search options for DockerOperator
URL: https://github.com/apache/incubator-airflow/pull/3860
 
 
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3025
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator

2018-09-07 Thread JIRA
Konrad Gołuchowski created AIRFLOW-3025:
---

 Summary: Allow to specify dns and dns-search parameters for 
DockerOperator
 Key: AIRFLOW-3025
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3025
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Konrad Gołuchowski
Assignee: Konrad Gołuchowski


Docker allows to specify dns and dns-search options when starting a container. 
It would be useful to enable DockerOperator to use these two options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)