[GitHub] r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log
r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419615260 @ashb Not sure I grok your comment fully? Are you giving this a +1? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#issuecomment-419605796 @ashb Thank you for reviewing. Forgot to mention about the releasing part. I agree with you on not to release it in 1.10.1. Besides your point, this PR is more meaningful with some follow up PRs that I'm working on and provide a better story on scheduler scaling, which can then go into a bigger release. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216114428 ## File path: airflow/executors/celery_executor.py ## @@ -85,30 +139,67 @@ def execute_async(self, key, command, args=[command], queue=queue) self.last_state[key] = celery_states.PENDING +def _num_tasks_per_process(self): +""" +How many Celery tasks should be sent to each worker process. +:return: Number of tasks that should be used per process +:rtype: int +""" +return max(1, + int(math.ceil(1.0 * len(self.tasks) / self._sync_parallelism))) + def sync(self): -self.log.debug("Inquiring about %s celery task(s)", len(self.tasks)) -for key, task in list(self.tasks.items()): -try: -state = task.state -if self.last_state[key] != state: -if state == celery_states.SUCCESS: -self.success(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.FAILURE: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.REVOKED: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -else: -self.log.info("Unexpected state: %s", state) -self.last_state[key] = state -except Exception as e: -self.log.error("Error syncing the celery executor, ignoring it:") -self.log.exception(e) +num_processes = min(len(self.tasks), self._sync_parallelism) +if num_processes == 0: +self.log.debug("No task to query celery, skipping sync") +return + +self.log.debug("Inquiring about %s celery task(s) using %s processes", + len(self.tasks), num_processes) + +# Recreate the process pool each sync in case processes in the pool die +self._sync_pool = Pool(processes=num_processes) Review comment: I believe multiprocessing with use os.fork() on unix systems and thus we can take advantage of COW to reduce ram usage. However AFAIK on Windows child process will reimport all module level imports and thus may require some extra ram. But I don't think the extra imports will create a big burden on the ram usage( or maybe our scheduler box is just too big :P). I'll add an entry in UPDATING.md regarding this config line. About the SQLA connection, you actually have the point. On Windows we might ended up configuring extra 16 connection pools while reimporting. And since subprocesses spun up by multiprocessing module do not run atexit() we might leave some hanging connections there in theory. However from my observation and test, SQLA initializes connections lazily and thus we at most have empty pool in the subprocesses. I might be wrong about the Windows thing and SQLA lazy initialization thing, open to discuss better handling if that is the case. FYI this is the test script/result I was playing with: ``` ▶ cat test.py import os from multiprocessing import Pool print('execute module code') def test_func(num): print(num) if __name__ == '__main__': pool = Pool(4) results = pool.map(test_func, [1,2,3,4], 1) pool.close() pool.join() ▶ python test.py execute module code 1 2 3 4 -- mimic Windows behavior ▶ cat test.py import os from multiprocessing import Pool print('execute module code') def test_func(num): from airflow import settings print(num) print(settings.engine.pool.status()) if __name__ == '__main__': pool = Pool(4) results = pool.map(test_func, [1,2,3,4], 1) pool.close() pool.join() ▶ python test.py execute module code execute module code execute module code execute module code execute module code airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80204) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80202) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80201) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80203) airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216114428 ## File path: airflow/executors/celery_executor.py ## @@ -85,30 +139,67 @@ def execute_async(self, key, command, args=[command], queue=queue) self.last_state[key] = celery_states.PENDING +def _num_tasks_per_process(self): +""" +How many Celery tasks should be sent to each worker process. +:return: Number of tasks that should be used per process +:rtype: int +""" +return max(1, + int(math.ceil(1.0 * len(self.tasks) / self._sync_parallelism))) + def sync(self): -self.log.debug("Inquiring about %s celery task(s)", len(self.tasks)) -for key, task in list(self.tasks.items()): -try: -state = task.state -if self.last_state[key] != state: -if state == celery_states.SUCCESS: -self.success(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.FAILURE: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.REVOKED: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -else: -self.log.info("Unexpected state: %s", state) -self.last_state[key] = state -except Exception as e: -self.log.error("Error syncing the celery executor, ignoring it:") -self.log.exception(e) +num_processes = min(len(self.tasks), self._sync_parallelism) +if num_processes == 0: +self.log.debug("No task to query celery, skipping sync") +return + +self.log.debug("Inquiring about %s celery task(s) using %s processes", + len(self.tasks), num_processes) + +# Recreate the process pool each sync in case processes in the pool die +self._sync_pool = Pool(processes=num_processes) Review comment: I believe multiprocessing with use os.fork() on unix systems and thus we can take advantage of COW to reduce ram usage. However AFAIK on Windows child process will reimport all module level imports and thus may require some extra ram. But I don't think the extra imports will create a big burden on the ram usage( or maybe our scheduler box is just too big :P). I'll add an entry in UPDATING.md regarding this config line. About the SQLA connection, you actually have the point. On Windows we might ended up configuring extra 16 connection pool while reimporting. And since subprocesses spun up by multiprocessing module do not run atexit() we might leave some hanging connections there in theory. However from my observation and test, SQLA initializes connections lazily and thus we at most have empty pool in the subprocesses. FYI this is the test script/result I was playing with: ``` ▶ cat test.py import os from multiprocessing import Pool print('execute module code') def test_func(num): print(num) if __name__ == '__main__': pool = Pool(4) results = pool.map(test_func, [1,2,3,4], 1) pool.close() pool.join() ▶ python test.py execute module code 1 2 3 4 -- mimic Windows behavior ▶ cat test.py import os from multiprocessing import Pool print('execute module code') def test_func(num): from airflow import settings print(num) print(settings.engine.pool.status()) if __name__ == '__main__': pool = Pool(4) results = pool.map(test_func, [1,2,3,4], 1) pool.close() pool.join() ▶ python test.py execute module code execute module code execute module code execute module code execute module code airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80204) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80202) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80201) airflow.settings [2018-09-07 17:37:24,218] {{settings.py:148}} DEBUG - Setting up DB connection pool (PID 80203) airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600 airflow.settings [2018-09-07 17:37:24,219] {{settings.py:176}} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600
[GitHub] yangaws commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference
yangaws commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference URL: https://github.com/apache/incubator-airflow/pull/3767#issuecomment-419594890 Hi @Fokko and @schipiga Sorry for late updates. @troychen728 is unavailable for development recently. I will target to update this PR sometime next week. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110731 ## File path: airflow/executors/celery_executor.py ## @@ -85,30 +139,67 @@ def execute_async(self, key, command, args=[command], queue=queue) self.last_state[key] = celery_states.PENDING +def _num_tasks_per_process(self): +""" +How many Celery tasks should be sent to each worker process. +:return: Number of tasks that should be used per process +:rtype: int +""" +return max(1, + int(math.ceil(1.0 * len(self.tasks) / self._sync_parallelism))) + def sync(self): -self.log.debug("Inquiring about %s celery task(s)", len(self.tasks)) -for key, task in list(self.tasks.items()): -try: -state = task.state -if self.last_state[key] != state: -if state == celery_states.SUCCESS: -self.success(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.FAILURE: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.REVOKED: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -else: -self.log.info("Unexpected state: %s", state) -self.last_state[key] = state -except Exception as e: -self.log.error("Error syncing the celery executor, ignoring it:") -self.log.exception(e) +num_processes = min(len(self.tasks), self._sync_parallelism) +if num_processes == 0: +self.log.debug("No task to query celery, skipping sync") +return + +self.log.debug("Inquiring about %s celery task(s) using %s processes", + len(self.tasks), num_processes) + +# Recreate the process pool each sync in case processes in the pool die +self._sync_pool = Pool(processes=num_processes) + +# Use chunking instead of a work queue to reduce context switching since tasks are +# roughly uniform in size +chunksize = self._num_tasks_per_process() + +self.log.debug("Waiting for inquiries to complete...") +task_keys_to_states = self._sync_pool.map( +fetch_celery_task_state, +self.tasks.items(), +chunksize=chunksize) +self._sync_pool.close() +self._sync_pool.join() +self.log.debug("Inquiries completed.") + +for key_and_state in task_keys_to_states: +if isinstance(key_and_state, ExceptionWithTraceback): +self.log.error( +CELERY_FETCH_ERR_MSG_HEADER + ", ignoring it:{}\n{}\n".format( +key_and_state.exception, key_and_state.traceback)) +else: +key, state = key_and_state +try: +if self.last_state[key] != state: +if state == celery_states.SUCCESS: +self.success(key) +del self.tasks[key] +del self.last_state[key] +elif state == celery_states.FAILURE: +self.fail(key) +del self.tasks[key] +del self.last_state[key] +elif state == celery_states.REVOKED: +self.fail(key) +del self.tasks[key] +del self.last_state[key] +else: +self.log.info("Unexpected state: " + state) +self.last_state[key] = state +except Exception as e: +self.log.error("Error syncing the Celery executor, ignoring " + "it:\n{}\n{}".format(e, traceback.format_exc())) Review comment: I did not know log.exception would automatically include the traceback. Thank you for the tip! Will update to `self.log.exception("Error syncing the Celery executor, ignoring it.")` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110256 ## File path: airflow/executors/celery_executor.py ## @@ -85,30 +139,67 @@ def execute_async(self, key, command, args=[command], queue=queue) self.last_state[key] = celery_states.PENDING +def _num_tasks_per_process(self): +""" +How many Celery tasks should be sent to each worker process. +:return: Number of tasks that should be used per process +:rtype: int +""" +return max(1, + int(math.ceil(1.0 * len(self.tasks) / self._sync_parallelism))) + def sync(self): -self.log.debug("Inquiring about %s celery task(s)", len(self.tasks)) -for key, task in list(self.tasks.items()): -try: -state = task.state -if self.last_state[key] != state: -if state == celery_states.SUCCESS: -self.success(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.FAILURE: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -elif state == celery_states.REVOKED: -self.fail(key) -del self.tasks[key] -del self.last_state[key] -else: -self.log.info("Unexpected state: %s", state) -self.last_state[key] = state -except Exception as e: -self.log.error("Error syncing the celery executor, ignoring it:") -self.log.exception(e) +num_processes = min(len(self.tasks), self._sync_parallelism) +if num_processes == 0: +self.log.debug("No task to query celery, skipping sync") +return + +self.log.debug("Inquiring about %s celery task(s) using %s processes", + len(self.tasks), num_processes) + +# Recreate the process pool each sync in case processes in the pool die +self._sync_pool = Pool(processes=num_processes) + +# Use chunking instead of a work queue to reduce context switching since tasks are +# roughly uniform in size +chunksize = self._num_tasks_per_process() + +self.log.debug("Waiting for inquiries to complete...") +task_keys_to_states = self._sync_pool.map( +fetch_celery_task_state, +self.tasks.items(), +chunksize=chunksize) +self._sync_pool.close() +self._sync_pool.join() +self.log.debug("Inquiries completed.") + +for key_and_state in task_keys_to_states: +if isinstance(key_and_state, ExceptionWithTraceback): +self.log.error( +CELERY_FETCH_ERR_MSG_HEADER + ", ignoring it:{}\n{}\n".format( +key_and_state.exception, key_and_state.traceback)) +else: Review comment: I don't have a preference here on the style, will update with `continue`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110104 ## File path: airflow/executors/celery_executor.py ## @@ -63,6 +69,40 @@ def execute_command(command): raise AirflowException('Celery command failed') +class ExceptionWithTraceback(object): +""" +Wrapper class used to propogate exceptions to parent processes from subprocesses. +:param exception: The exception to wrap +:type exception: Exception +:param traceback: The stacktrace to wrap +:type traceback: str +""" + +def __init__(self, exception, exception_traceback): +self.exception = exception +self.traceback = exception_traceback + + +def fetch_celery_task_state(celery_task): +""" +Fetch and return the state of the given celery task. The scope of this function is +global so that it can be called by subprocesses in the pool. +:param celery_task: a tuple of the Celery task key and the async Celery object used +to fetch the task's state +:type celery_task: (str, celery.result.AsyncResult) +:return: a tuple of the Celery task key and the Celery state of the task +:rtype: (str, str) +""" + +try: +res = (celery_task[0], celery_task[1].state) Review comment: Exactly. Will add the comment there. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216110018 ## File path: airflow/executors/celery_executor.py ## @@ -72,10 +112,24 @@ class CeleryExecutor(BaseExecutor): vast amounts of messages, while providing operations with the tools required to maintain such a system. """ -def start(self): + +def __init__(self): +super(CeleryExecutor, self).__init__() + +# Parallelize Celery requests here since Celery does not support parallelization. Review comment: What about `Parallelize Celery requests here since Celery does not support parallelization natively`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on a change in pull request #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#discussion_r216109814 ## File path: airflow/executors/celery_executor.py ## @@ -63,6 +69,40 @@ def execute_command(command): raise AirflowException('Celery command failed') +class ExceptionWithTraceback(object): +""" +Wrapper class used to propogate exceptions to parent processes from subprocesses. +:param exception: The exception to wrap +:type exception: Exception +:param traceback: The stacktrace to wrap +:type traceback: str +""" + +def __init__(self, exception, exception_traceback): +self.exception = exception +self.traceback = exception_traceback + + +def fetch_celery_task_state(celery_task): Review comment: No it does not have to be a single param. Doing it in this way so that we don't need to unpack the `tasks` dict and thus looks cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1) Report > Merging [#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3865 +/- ## === Coverage 77.51% 77.51% === Files 200 200 Lines 1581515815 === Hits1225912259 Misses 3556 3556 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer). Last update [08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
codecov-io commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1) Report > Merging [#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3865 +/- ## === Coverage 77.51% 77.51% === Files 200 200 Lines 1581515815 === Hits1225912259 Misses 3556 3556 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer). Last update [08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
codecov-io edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419591019 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=h1) Report > Merging [#3865](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/08ecca47862f304dba548bcfc6c34406cdcf556f?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3865/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3865 +/- ## === Coverage 77.51% 77.51% === Files 200 200 Lines 1581515815 === Hits1225912259 Misses 3556 3556 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=footer). Last update [08ecca4...5a49e92](https://codecov.io/gh/apache/incubator-airflow/pull/3865?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3029) New Operator - SqlOperator
Josh Bacon created AIRFLOW-3029: --- Summary: New Operator - SqlOperator Key: AIRFLOW-3029 URL: https://issues.apache.org/jira/browse/AIRFLOW-3029 Project: Apache Airflow Issue Type: New Feature Components: operators Reporter: Josh Bacon Assignee: Josh Bacon Proposal to add a new operator "SqlOperator" which actually doesn't already exist yet. I will be submitting a Pull Request soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-3011) CLI command to output actual airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607750#comment-16607750 ] Valerii Zhuk edited comment on AIRFLOW-3011 at 9/7/18 10:47 PM: Hi [~victor.noel], I`d like to take this task. But It`s quite not clear, what do you mean by "output"? Output to stdout? Or you suppose a function to take args like path/to/file to output config content into file? was (Author: valerii_zhuk): [~victor.noel] Hi Victor, I`d like to take this task. But It`s quite not clear, what do you mean by "output"? Output to stdout? Or you suppose a function to take args like path/to/file to output config content into file? > CLI command to output actual airflow.cfg > > > Key: AIRFLOW-3011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3011 > Project: Apache Airflow > Issue Type: New Feature > Components: cli >Affects Versions: 1.10.0 >Reporter: Victor >Assignee: Valerii Zhuk >Priority: Major > > The only way to see the actual airflow configuration (including overriden > informations with environment variables) is through web ui. > For security reason, this is often disabled. > A CLI command to do the same thing would: > * give admins a way to see it > * give operators a way to manipulate the configuration (storing it, etc) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
r39132 edited a comment on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419585712 @feng-tao @ashb @kaxil PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3011) CLI command to output actual airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607750#comment-16607750 ] Valerii Zhuk commented on AIRFLOW-3011: --- [~victor.noel] Hi Victor, I`d like to take this task. But It`s quite not clear, what do you mean by "output"? Output to stdout? Or you suppose a function to take args like path/to/file to output config content into file? > CLI command to output actual airflow.cfg > > > Key: AIRFLOW-3011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3011 > Project: Apache Airflow > Issue Type: New Feature > Components: cli >Affects Versions: 1.10.0 >Reporter: Victor >Assignee: Valerii Zhuk >Priority: Major > > The only way to see the actual airflow configuration (including overriden > informations with environment variables) is through web ui. > For security reason, this is often disabled. > A CLI command to do the same thing would: > * give admins a way to see it > * give operators a way to manipulate the configuration (storing it, etc) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
r39132 commented on issue #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865#issuecomment-419585712 @kaxil PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-3011) CLI command to output actual airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valerii Zhuk reassigned AIRFLOW-3011: - Assignee: Valerii Zhuk > CLI command to output actual airflow.cfg > > > Key: AIRFLOW-3011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3011 > Project: Apache Airflow > Issue Type: New Feature > Components: cli >Affects Versions: 1.10.0 >Reporter: Victor >Assignee: Valerii Zhuk >Priority: Major > > The only way to see the actual airflow configuration (including overriden > informations with environment variables) is through web ui. > For security reason, this is often disabled. > A CLI command to do the same thing would: > * give admins a way to see it > * give operators a way to manipulate the configuration (storing it, etc) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3028) Update Text & Images in Readme.md
[ https://issues.apache.org/jira/browse/AIRFLOW-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607747#comment-16607747 ] ASF GitHub Bot commented on AIRFLOW-3028: - r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3028 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Updated Readme with new images and added Apache Airflow in a few places. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: N/A ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Text & Images in Readme.md > - > > Key: AIRFLOW-3028 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3028 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Siddharth Anand >Assignee: Siddharth Anand >Priority: Minor > > The images in the Readme are more than 4 years old and no longer reflect the > look and feel of the project. Additionally, I'm updating the readme to > reference Airflow as Apache Airflow in a few places. This Readme.md has not > changed in these areas since it was originally ported from Airbnb. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in Readme.md
r39132 opened a new pull request #3865: [AIRFLOW-3028] Update Text & Images in Readme.md URL: https://github.com/apache/incubator-airflow/pull/3865 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3028 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Updated Readme with new images and added Apache Airflow in a few places. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: N/A ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook
[ https://issues.apache.org/jira/browse/AIRFLOW-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Miyaguchi updated AIRFLOW-3027: --- Description: The Databricks hook requires token-based authentication via the connections database. The token is passed into the connections field: {code:java} Extras: {"token": ""}{code} This means the token can be seen in plaintext in the Admin UI, which is undesirable for our setup. The AWS hook gets around this by either using boto's authentication mechanisms or by reading from a file. {code:java} elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ _parse_s3_config( connection_object.extra_dejson['s3_config_file'], connection_object.extra_dejson.get('s3_config_format')){code} [source] [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] The databricks hook should also support reading the token from a file to avoid exposing sensitive tokens in plaintext. was: The Databricks hook requires token-based authentication via the connections database. The token is passed into the connections field: {code:java} Extras: {"token": ""}{code} This means the token can be seen in plaintext in the Admin UI, which is undesirable for our setup. The AWS hook gets around this by either using boto's authentication mechanisms or by reading from a file. {code:java} elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ _parse_s3_config( connection_object.extra_dejson['s3_config_file'], connection_object.extra_dejson.get('s3_config_format')){code} [[source] [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] The databricks hook should also support reading the token from a file to avoid exposing sensitive tokens in plaintext. > Read credentials from a file in the Databricks operators and hook > - > > Key: AIRFLOW-3027 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3027 > Project: Apache Airflow > Issue Type: Improvement > Components: authentication, hooks, operators >Affects Versions: 1.9.0 >Reporter: Anthony Miyaguchi >Priority: Minor > > The Databricks hook requires token-based authentication via the connections > database. The token is passed into the connections field: > {code:java} > Extras: {"token": ""}{code} > This means the token can be seen in plaintext in the Admin UI, which is > undesirable for our setup. The AWS hook gets around this by either using > boto's authentication mechanisms or by reading from a file. > {code:java} > elif 's3_config_file' in connection_object.extra_dejson: > aws_access_key_id, aws_secret_access_key = \ > _parse_s3_config( > connection_object.extra_dejson['s3_config_file'], > connection_object.extra_dejson.get('s3_config_format')){code} > [source] > [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] > > The databricks hook should also support reading the token from a file to > avoid exposing sensitive tokens in plaintext. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3028) Update Text & Images in Readme.md
Siddharth Anand created AIRFLOW-3028: Summary: Update Text & Images in Readme.md Key: AIRFLOW-3028 URL: https://issues.apache.org/jira/browse/AIRFLOW-3028 Project: Apache Airflow Issue Type: Improvement Reporter: Siddharth Anand Assignee: Siddharth Anand The images in the Readme are more than 4 years old and no longer reflect the look and feel of the project. Additionally, I'm updating the readme to reference Airflow as Apache Airflow in a few places. This Readme.md has not changed in these areas since it was originally ported from Airbnb. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook
[ https://issues.apache.org/jira/browse/AIRFLOW-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Miyaguchi updated AIRFLOW-3027: --- Description: The Databricks hook requires token-based authentication via the connections database. The token is passed into the connections field: {code:java} Extras: {"token": ""}{code} This means the token can be seen in plaintext in the Admin UI, which is undesirable for our setup. The AWS hook gets around this by either using boto's authentication mechanisms or by reading from a file. {code:java} elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ _parse_s3_config( connection_object.extra_dejson['s3_config_file'], connection_object.extra_dejson.get('s3_config_format')){code} [[source] [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] The databricks hook should also support reading the token from a file to avoid exposing sensitive tokens in plaintext. was: The Databricks hook requires token-based authentication via the connections database. The token is passed into the connections field: Extras: \{"token": ""} This means the token can be seen in plaintext in the Admin UI, which is undesirable for our setup. The AWS hook gets around this by either using boto's authentication mechanisms or by reading from a file. {code:java} elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ _parse_s3_config( connection_object.extra_dejson['s3_config_file'], connection_object.extra_dejson.get('s3_config_format')){code} [[source] https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] The databricks hook should also support reading the token from a file to avoid exposing sensitive tokens in plaintext. > Read credentials from a file in the Databricks operators and hook > - > > Key: AIRFLOW-3027 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3027 > Project: Apache Airflow > Issue Type: Improvement > Components: authentication, hooks, operators >Affects Versions: 1.9.0 >Reporter: Anthony Miyaguchi >Priority: Minor > > The Databricks hook requires token-based authentication via the connections > database. The token is passed into the connections field: > {code:java} > Extras: {"token": ""}{code} > This means the token can be seen in plaintext in the Admin UI, which is > undesirable for our setup. The AWS hook gets around this by either using > boto's authentication mechanisms or by reading from a file. > {code:java} > elif 's3_config_file' in connection_object.extra_dejson: > aws_access_key_id, aws_secret_access_key = \ > _parse_s3_config( > connection_object.extra_dejson['s3_config_file'], > connection_object.extra_dejson.get('s3_config_format')){code} > [[source] > [https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114]|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] > > The databricks hook should also support reading the token from a file to > avoid exposing sensitive tokens in plaintext. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3027) Read credentials from a file in the Databricks operators and hook
Anthony Miyaguchi created AIRFLOW-3027: -- Summary: Read credentials from a file in the Databricks operators and hook Key: AIRFLOW-3027 URL: https://issues.apache.org/jira/browse/AIRFLOW-3027 Project: Apache Airflow Issue Type: Improvement Components: authentication, hooks, operators Affects Versions: 1.9.0 Reporter: Anthony Miyaguchi The Databricks hook requires token-based authentication via the connections database. The token is passed into the connections field: Extras: \{"token": ""} This means the token can be seen in plaintext in the Admin UI, which is undesirable for our setup. The AWS hook gets around this by either using boto's authentication mechanisms or by reading from a file. {code:java} elif 's3_config_file' in connection_object.extra_dejson: aws_access_key_id, aws_secret_access_key = \ _parse_s3_config( connection_object.extra_dejson['s3_config_file'], connection_object.extra_dejson.get('s3_config_format')){code} [[source] https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114|https://github.com/apache/incubator-airflow/blob/08ecca47862f304dba548bcfc6c34406cdcf556f/airflow/contrib/hooks/aws_hook.py#L110-L114] The databricks hook should also support reading the token from a file to avoid exposing sensitive tokens in plaintext. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 closed pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.
r39132 closed pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides. URL: https://github.com/apache/incubator-airflow/pull/3864 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/faq.rst b/docs/faq.rst index 61c1ba9ce1..07c07c0bcf 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -66,13 +66,13 @@ How do I trigger tasks based on another task's failure? --- Check out the ``Trigger Rule`` section in the Concepts section of the -documentation +documentation. Why are connection passwords still not encrypted in the metadata db after I installed airflow[crypto]? -- -Check out the ``Connections`` section in the Configuration section of the -documentation +Check out the ``Securing Connections`` section in the How-to Guides section of the +documentation. What's the deal with ``start_date``? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.
r39132 commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides. URL: https://github.com/apache/incubator-airflow/pull/3864#issuecomment-419573110 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log
ashb commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419571432 The extra line comes from reading stdout/stderr of the sub-process, so it's unavoidable - check out the linked jira for a repro case involving an operator. this StreamLogWriter is used to capture StdOut from operators etc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wwlian commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.
wwlian commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption. URL: https://github.com/apache/incubator-airflow/pull/3805#issuecomment-419570620 @bolkedebruin @gerardo @Fokko I understand the concerns that this change might be coupled too tightly to Google Cloud KMS. However, I want to second @jakahn's assurance that this design is agnostic to the key management service being used. The only opinionated design included here is that encryption will be performed using the envelope encryption pattern, which is a widely-recognized pattern by [AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#enveloping), [Google Cloud KMS](https://cloud.google.com/kms/docs/envelope-encryption), and [Azure Key Vault](https://docs.microsoft.com/en-us/azure/storage/common/storage-client-side-encryption#encryption-and-decryption-via-the-envelope-technique). To add to what @jakahn said re: embedding kms_conn_id and kms_extras in the existing _extra column, doing so would create a chicken and egg problem, as their values are needed to decrypt the _extras column. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs
r39132 commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs URL: https://github.com/apache/incubator-airflow/pull/2092#issuecomment-419563010 Thx @jlowin What do you want to do with the Jiras? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #2048: [AIRFLOW-828] Add maximum XCom size
r39132 commented on issue #2048: [AIRFLOW-828] Add maximum XCom size URL: https://github.com/apache/incubator-airflow/pull/2048#issuecomment-419562868 @jlowin What do you want to do with the Jiras? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-888) Operators should not push XComs by default
[ https://issues.apache.org/jira/browse/AIRFLOW-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607593#comment-16607593 ] ASF GitHub Bot commented on AIRFLOW-888: jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs URL: https://github.com/apache/incubator-airflow/pull/2092 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index 6fd7afbe5c..e0acc49a4c 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -7,18 +7,19 @@ assists people when migrating to a new version. ### New Features - Dask Executor - -A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters. +- [AIRFLOW-862] A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters ### Deprecated Features These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer -supported and will be removed entirely in Airflow 2.0 +supported and will be removed entirely in a future version of Airflow. -- `post_execute()` hooks now take two arguments, `context` and `result` - (AIRFLOW-886) +- [AIRFLOW-886] `post_execute()` hooks now take two arguments, `context` and `result`. Previously, post_execute() only took one argument, `context`. - Previously, post_execute() only took one argument, `context`. +### Breaking Changes +These changes are not backwards-compatible with previous versions of Airflow. + +- [AIRFLOW-888] Operators no longer automatically push XComs. This behavior can be reenabled globally + by setting `auto_xcom_push = True` in the `operators` setting of Airflow.cfg or on a per-Operator basis by passing `auto_xcom_push=True` when creating the Operator. ## Airflow 1.8 @@ -47,8 +48,8 @@ interfere. Please read through these options, defaults have changed since 1.7.1. child_process_log_directory -In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each -DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to +In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each +DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to `/scheduler/latest`. You will need to make sure these log files are removed. > DAG logs or processor logs ignore and command line settings for log file > locations. @@ -58,7 +59,7 @@ Previously the command line option `num_runs` was used to let the scheduler term loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs. num_runs -Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies +Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try indefinitely. This is only available on the command line. @@ -71,7 +72,7 @@ dags are not being picked up, have a look at this number and decrease it when ne catchup_by_default By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date. -This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as +This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as `catchup = False / True`. Command line backfills will still work. ### Faulty Dags do not show an error in the Web UI @@ -95,33 +96,33 @@ convenience variables to the config. In case your run a sceure Hadoop setup it m required to whitelist these variables by adding the following to your configuration: ``` - + hive.security.authorization.sqlstd.confwhitelist.append airflow\.ctx\..* ``` ### Google Cloud Operator and Hook alignment -All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection +All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection type for all kinds of Google Cloud Operators. If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform". -Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service +Also the old P12 key file type is not supported anymore and only the new JSON key files
[jira] [Commented] (AIRFLOW-828) Add maximum size for XComs
[ https://issues.apache.org/jira/browse/AIRFLOW-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607594#comment-16607594 ] ASF GitHub Bot commented on AIRFLOW-828: jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size URL: https://github.com/apache/incubator-airflow/pull/2048 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index 6752bdb283..927e6110f8 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -200,6 +200,10 @@ def run_command(command): default_disk = 512 default_gpus = 0 +[xcom] + +# the maximum size of a pickled XCom object, in bytes +max_size = 2 [webserver] # The base url of your website as airflow cannot guess what domain or diff --git a/airflow/exceptions.py b/airflow/exceptions.py index 22312083d8..e65d907416 100644 --- a/airflow/exceptions.py +++ b/airflow/exceptions.py @@ -22,7 +22,7 @@ class AirflowException(Exception): class AirflowConfigException(AirflowException): pass - + class AirflowSensorTimeout(AirflowException): pass @@ -34,3 +34,7 @@ class AirflowTaskTimeout(AirflowException): class AirflowSkipException(AirflowException): pass + + +class XComException(AirflowException): +pass diff --git a/airflow/models.py b/airflow/models.py index 6cf7ad9dee..8911242c1f 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -61,7 +61,8 @@ from airflow import settings, utils from airflow.executors import DEFAULT_EXECUTOR, LocalExecutor from airflow import configuration -from airflow.exceptions import AirflowException, AirflowSkipException, AirflowTaskTimeout +from airflow.exceptions import ( +AirflowException, AirflowSkipException, AirflowTaskTimeout, XComException) from airflow.dag.base_dag import BaseDag, BaseDagBag from airflow.ti_deps.deps.not_in_retry_period_dep import NotInRetryPeriodDep from airflow.ti_deps.deps.prev_dagrun_dep import PrevDagrunDep @@ -3600,6 +3601,15 @@ def set( """ session.expunge_all() +# check XCom size +max_xcom_size = configuration.getint('XCOM', 'MAX_SIZE') +xcom_size = sys.getsizeof(pickle.dumps(value)) +if xcom_size > max_xcom_size: +raise XComException( +"The XCom's pickled size ({} bytes) is larger than the " +"maximum allowed size ({} bytes).".format( +xcom_size, max_xcom_size)) + # remove any duplicate XComs session.query(cls).filter( cls.key == key, diff --git a/tests/models.py b/tests/models.py index 346f47cfab..31b8f80b93 100644 --- a/tests/models.py +++ b/tests/models.py @@ -23,7 +23,7 @@ import time from airflow import models, settings, AirflowException -from airflow.exceptions import AirflowSkipException +from airflow.exceptions import AirflowSkipException, XComException from airflow.models import DAG, TaskInstance as TI from airflow.models import State as ST from airflow.models import DagModel @@ -581,6 +581,31 @@ def test_check_task_dependencies(self, trigger_rule, successes, skipped, self.assertEqual(completed, expect_completed) self.assertEqual(ti.state, expect_state) + +class XComTest(unittest.TestCase): + +def test_xcom_max_size(self): +""" +Test that pushing large XComs raises an error +""" +small_value = [0] * 100 +large_value = [0] * 100 + +dag = models.DAG(dag_id='test_xcom') +task = DummyOperator( +task_id='test_xcom', +dag=dag, +owner='airflow', +start_date=datetime.datetime(2016, 6, 2, 0, 0, 0)) +ti = TI(task=task, execution_date=datetime.datetime.now()) + +# this should work +ti.xcom_push(key='small xcom', value=small_value) + +# this should fail +with self.assertRaises(XComException): +ti.xcom_push(key='large xcom', value=large_value) + def test_xcom_pull_after_success(self): """ tests xcom set/clear relative to a task in a 'success' rerun scenario This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add maximum size for XComs > -- > > Key: AIRFLOW-828 > URL: https://issues.apache.org/jira/browse/AIRFLOW-828 > Project: Apache Airflow > Issue Type: Improvement > Components:
[GitHub] jlowin commented on issue #2048: [AIRFLOW-828] Add maximum XCom size
jlowin commented on issue #2048: [AIRFLOW-828] Add maximum XCom size URL: https://github.com/apache/incubator-airflow/pull/2048#issuecomment-419547345 Closed bc stale! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size
jlowin closed pull request #2048: [AIRFLOW-828] Add maximum XCom size URL: https://github.com/apache/incubator-airflow/pull/2048 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/configuration.py b/airflow/configuration.py index 6752bdb283..927e6110f8 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -200,6 +200,10 @@ def run_command(command): default_disk = 512 default_gpus = 0 +[xcom] + +# the maximum size of a pickled XCom object, in bytes +max_size = 2 [webserver] # The base url of your website as airflow cannot guess what domain or diff --git a/airflow/exceptions.py b/airflow/exceptions.py index 22312083d8..e65d907416 100644 --- a/airflow/exceptions.py +++ b/airflow/exceptions.py @@ -22,7 +22,7 @@ class AirflowException(Exception): class AirflowConfigException(AirflowException): pass - + class AirflowSensorTimeout(AirflowException): pass @@ -34,3 +34,7 @@ class AirflowTaskTimeout(AirflowException): class AirflowSkipException(AirflowException): pass + + +class XComException(AirflowException): +pass diff --git a/airflow/models.py b/airflow/models.py index 6cf7ad9dee..8911242c1f 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -61,7 +61,8 @@ from airflow import settings, utils from airflow.executors import DEFAULT_EXECUTOR, LocalExecutor from airflow import configuration -from airflow.exceptions import AirflowException, AirflowSkipException, AirflowTaskTimeout +from airflow.exceptions import ( +AirflowException, AirflowSkipException, AirflowTaskTimeout, XComException) from airflow.dag.base_dag import BaseDag, BaseDagBag from airflow.ti_deps.deps.not_in_retry_period_dep import NotInRetryPeriodDep from airflow.ti_deps.deps.prev_dagrun_dep import PrevDagrunDep @@ -3600,6 +3601,15 @@ def set( """ session.expunge_all() +# check XCom size +max_xcom_size = configuration.getint('XCOM', 'MAX_SIZE') +xcom_size = sys.getsizeof(pickle.dumps(value)) +if xcom_size > max_xcom_size: +raise XComException( +"The XCom's pickled size ({} bytes) is larger than the " +"maximum allowed size ({} bytes).".format( +xcom_size, max_xcom_size)) + # remove any duplicate XComs session.query(cls).filter( cls.key == key, diff --git a/tests/models.py b/tests/models.py index 346f47cfab..31b8f80b93 100644 --- a/tests/models.py +++ b/tests/models.py @@ -23,7 +23,7 @@ import time from airflow import models, settings, AirflowException -from airflow.exceptions import AirflowSkipException +from airflow.exceptions import AirflowSkipException, XComException from airflow.models import DAG, TaskInstance as TI from airflow.models import State as ST from airflow.models import DagModel @@ -581,6 +581,31 @@ def test_check_task_dependencies(self, trigger_rule, successes, skipped, self.assertEqual(completed, expect_completed) self.assertEqual(ti.state, expect_state) + +class XComTest(unittest.TestCase): + +def test_xcom_max_size(self): +""" +Test that pushing large XComs raises an error +""" +small_value = [0] * 100 +large_value = [0] * 100 + +dag = models.DAG(dag_id='test_xcom') +task = DummyOperator( +task_id='test_xcom', +dag=dag, +owner='airflow', +start_date=datetime.datetime(2016, 6, 2, 0, 0, 0)) +ti = TI(task=task, execution_date=datetime.datetime.now()) + +# this should work +ti.xcom_push(key='small xcom', value=small_value) + +# this should fail +with self.assertRaises(XComException): +ti.xcom_push(key='large xcom', value=large_value) + def test_xcom_pull_after_success(self): """ tests xcom set/clear relative to a task in a 'success' rerun scenario This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jlowin commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs
jlowin commented on issue #2092: [AIRFLOW-888] Don't automatically push XComs URL: https://github.com/apache/incubator-airflow/pull/2092#issuecomment-419547214 @r39132 hello! I think this is stale by default at this point -- I'll close it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs
jlowin closed pull request #2092: [AIRFLOW-888] Don't automatically push XComs URL: https://github.com/apache/incubator-airflow/pull/2092 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index 6fd7afbe5c..e0acc49a4c 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -7,18 +7,19 @@ assists people when migrating to a new version. ### New Features - Dask Executor - -A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters. +- [AIRFLOW-862] A new DaskExecutor allows Airflow tasks to be run in Dask Distributed clusters ### Deprecated Features These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer -supported and will be removed entirely in Airflow 2.0 +supported and will be removed entirely in a future version of Airflow. -- `post_execute()` hooks now take two arguments, `context` and `result` - (AIRFLOW-886) +- [AIRFLOW-886] `post_execute()` hooks now take two arguments, `context` and `result`. Previously, post_execute() only took one argument, `context`. - Previously, post_execute() only took one argument, `context`. +### Breaking Changes +These changes are not backwards-compatible with previous versions of Airflow. + +- [AIRFLOW-888] Operators no longer automatically push XComs. This behavior can be reenabled globally + by setting `auto_xcom_push = True` in the `operators` setting of Airflow.cfg or on a per-Operator basis by passing `auto_xcom_push=True` when creating the Operator. ## Airflow 1.8 @@ -47,8 +48,8 @@ interfere. Please read through these options, defaults have changed since 1.7.1. child_process_log_directory -In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each -DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to +In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each +DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to `/scheduler/latest`. You will need to make sure these log files are removed. > DAG logs or processor logs ignore and command line settings for log file > locations. @@ -58,7 +59,7 @@ Previously the command line option `num_runs` was used to let the scheduler term loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs. num_runs -Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies +Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try indefinitely. This is only available on the command line. @@ -71,7 +72,7 @@ dags are not being picked up, have a look at this number and decrease it when ne catchup_by_default By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date. -This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as +This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as `catchup = False / True`. Command line backfills will still work. ### Faulty Dags do not show an error in the Web UI @@ -95,33 +96,33 @@ convenience variables to the config. In case your run a sceure Hadoop setup it m required to whitelist these variables by adding the following to your configuration: ``` - + hive.security.authorization.sqlstd.confwhitelist.append airflow\.ctx\..* ``` ### Google Cloud Operator and Hook alignment -All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection +All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection type for all kinds of Google Cloud Operators. If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform". -Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service +Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service account. ### Deprecated Features -These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer +These features are marked for deprecation. They may still
[GitHub] Fokko commented on a change in pull request #3767: [AIRFLOW-2524]Add SageMaker Batch Inference
Fokko commented on a change in pull request #3767: [AIRFLOW-2524]Add SageMaker Batch Inference URL: https://github.com/apache/incubator-airflow/pull/3767#discussion_r216045945 ## File path: airflow/contrib/sensors/sagemaker_transform_sensor.py ## @@ -0,0 +1,69 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook +from airflow.contrib.sensors.sagemaker_base_sensor import SageMakerBaseSensor +from airflow.utils.decorators import apply_defaults + + +class SageMakerTransformSensor(SageMakerBaseSensor): +""" +Asks for the state of the transform state until it reaches a terminal state. +The sensor will error if the job errors, throwing a AirflowException +containing the failure reason. + +:param job_name: job_name of the transform job instance to check the state of +:type job_name: string +:param region_name: The AWS region_name +:type region_name: string +""" + +template_fields = ['job_name'] +template_ext = () + +@apply_defaults +def __init__(self, + job_name, + region_name=None, + *args, + **kwargs): +super(SageMakerTransformSensor, self).__init__(*args, **kwargs) +self.job_name = job_name +self.region_name = region_name + +def non_terminal_states(self): Review comment: Since the function does not use anything from the class itself, it is prettier to make it static since it is then easier to reuse in other classes/functions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference
Fokko commented on issue #3767: [AIRFLOW-2524]Add SageMaker Batch Inference URL: https://github.com/apache/incubator-airflow/pull/3767#issuecomment-419523736 Can you rebase onto master as well? :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.
codecov-io commented on issue #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides. URL: https://github.com/apache/incubator-airflow/pull/3864#issuecomment-41952 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=h1) Report > Merging [#3864](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/abb12be84bce19aa0a00c96acab4daa8f8824db5?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3864/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3864 +/- ## === Coverage 77.51% 77.51% === Files 200 200 Lines 1581515815 === Hits1225912259 Misses 3556 3556 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=footer). Last update [abb12be...2963b28](https://codecov.io/gh/apache/incubator-airflow/pull/3864?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman edited a comment on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig
dimberman edited a comment on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig URL: https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419517507 @Fokko I can't approve anything involving the k8s executor because @gerardo's PR for dockerized CI broke the k8s tests s.t. they revert to the non-k8s tests (if you look at the CI you'll see that the k8s tests aren't actually launching a cluster). I've been trying to figure out how to get the k8s tests to work but for some reason the docker-compose stuff has broken the ability to correctly use minikube. PTAL at this https://github.com/apache/incubator-airflow/pull/3797. I managed to get most of the way with switching to the dind k8s cluster, but still need to figure out how to use the airflow API Since we can't use `minikube ip` anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig
dimberman commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig URL: https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419517507 @Fokko I can't approve anything involving the k8s executor because @gerardo's PR for dockerized CI broke the k8s tests s.t. they revert to the non-k8s tests (if you look at the CI you'll see that the k8s tests aren't actually launching a cluster). I've been trying to figure out how to get the k8s tests to work but for some reason the docker-compose stuff has broken the ability to correctly use minikube. PTAL at thishttps://github.com/apache/incubator-airflow/pull/3797. I managed to get most of the way with switching to the dind k8s cluster, but still need to figure out how to use the airflow API Since we can't use `minikube ip` anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig
Fokko commented on issue #3855: [AIRFLOW-3022] Add volume mount to KubernetesExecutorConfig URL: https://github.com/apache/incubator-airflow/pull/3855#issuecomment-419515166 @dimberman PTAL :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3859: [AIRFLOW-3020]LDAP Authentication doesn't check whether a user belongs to a group correctly
Fokko commented on issue #3859: [AIRFLOW-3020]LDAP Authentication doesn't check whether a user belongs to a group correctly URL: https://github.com/apache/incubator-airflow/pull/3859#issuecomment-419512721 @zeninpalm Would it be possible to test this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zihengCat opened a new pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides.
zihengCat opened a new pull request #3864: [AIRFLOW-XXX] Redirect FAQ about `airflow[crypto]` to How-to Guides. URL: https://github.com/apache/incubator-airflow/pull/3864 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Update `Apache Airflow (incubating)` Documentation, redirect *FAQ* about `airflow[crypto]` to *How-to Guides*. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator
Fokko commented on a change in pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3860#discussion_r216030227 ## File path: airflow/operators/docker_operator.py ## @@ -110,6 +114,8 @@ def __init__( api_version=None, command=None, cpus=1.0, +dns=None, +dns_search=None, Review comment: Can you move this to the end of the argument list, so we keep is as much as possible backward compatible? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators
wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators URL: https://github.com/apache/incubator-airflow/pull/3828#issuecomment-419508787 I have resolved the outstanding conflicts and awaiting feedback. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models
Fokko commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models URL: https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419506341 Is the `from airflow.api.common.experimental import pool` still used? We try to unload code from the `models.py` since this file is huge :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 edited a comment on issue #3862: [AIRFLOW-1917] Remove extra newline character from log
r39132 edited a comment on issue #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419504375 @ckljohn Any idea where this extra newline is being created? I'd rather remove the source of the extra newline than strip it afterwards. This feels like a hack. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log
r39132 commented on issue #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862#issuecomment-419504375 @ckljohn Any idea where this extra newline is being created? This feels like a hack. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models
ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models URL: https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419501219 Add @Fokko and @r39132 to address the gap of [AIRFLOW-2882] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models
ChengzhiZhao commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models URL: https://github.com/apache/incubator-airflow/pull/3858#issuecomment-419500933 @feng-tao thanks for reviewing on this, I would need some suggestions here: 1. Since this is refactoring changes, the tests/core.py should cover the unit tests for them, for example, https://github.com/apache/incubator-airflow/blob/master/tests/core.py#L1493. Should I add more unit test? 2. I can remove pool class from api client, but my question is should we continue using `airflow.api.common.experimental` for pool or have all methods in Pool class like we did for `Variable`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607341#comment-16607341 ] ASF GitHub Bot commented on AIRFLOW-2997: - Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index 44ecd49e9e..245e3a8233 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -496,7 +496,8 @@ def run_query(self, schema_update_options=(), priority='INTERACTIVE', time_partitioning=None, - api_resource_configs=None): + api_resource_configs=None, + cluster_fields=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -565,8 +566,12 @@ def run_query(self, expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict - +:param cluster_fields: Request that the result of this query be stored sorted +by one or more columns. This is only available in combination with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ + if not api_resource_configs: api_resource_configs = self.api_resource_configs else: @@ -631,6 +636,9 @@ def run_query(self, 'tableId': destination_table, } +if cluster_fields: +cluster_fields = {'fields': cluster_fields} + query_param_list = [ (sql, 'query', None, str), (priority, 'priority', 'INTERACTIVE', str), @@ -641,7 +649,8 @@ def run_query(self, (maximum_bytes_billed, 'maximumBytesBilled', None, float), (time_partitioning, 'timePartitioning', {}, dict), (schema_update_options, 'schemaUpdateOptions', None, tuple), -(destination_dataset_table, 'destinationTable', None, dict) +(destination_dataset_table, 'destinationTable', None, dict), +(cluster_fields, 'clustering', None, dict), ] for param_tuple in query_param_list: @@ -856,7 +865,8 @@ def run_load(self, allow_jagged_rows=False, schema_update_options=(), src_fmt_configs=None, - time_partitioning=None): + time_partitioning=None, + cluster_fields=None): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -920,6 +930,10 @@ def run_load(self, expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict +:param cluster_fields: Request that the result of this load be stored sorted +by one or more columns. This is only available in combination with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ # bigquery only allows certain source formats @@ -983,6 +997,9 @@ def run_load(self, 'timePartitioning': time_partitioning }) +if cluster_fields: +configuration['load'].update({'clustering': {'fields': cluster_fields}}) + if schema_fields: configuration['load']['schema'] = {'fields': schema_fields} diff --git a/airflow/contrib/operators/bigquery_operator.py b/airflow/contrib/operators/bigquery_operator.py index b0c0ce2d6e..025af034ad 100644 --- a/airflow/contrib/operators/bigquery_operator.py +++ b/airflow/contrib/operators/bigquery_operator.py @@ -100,6 +100,10 @@ class BigQueryOperator(BaseOperator): expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict +:param cluster_fields: Request that the result of this query be stored sorted +by one or more columns. This is only available in conjunction with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ template_fields = ('bql', 'sql', 'destination_dataset_table', 'labels') @@
[GitHub] Fokko commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
Fokko commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-419497532 Thanks @chronitis for the PR. Thanks @tswast @kaxil for reviewing! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
Fokko closed pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index 44ecd49e9e..245e3a8233 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -496,7 +496,8 @@ def run_query(self, schema_update_options=(), priority='INTERACTIVE', time_partitioning=None, - api_resource_configs=None): + api_resource_configs=None, + cluster_fields=None): """ Executes a BigQuery SQL query. Optionally persists results in a BigQuery table. See here: @@ -565,8 +566,12 @@ def run_query(self, expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict - +:param cluster_fields: Request that the result of this query be stored sorted +by one or more columns. This is only available in combination with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ + if not api_resource_configs: api_resource_configs = self.api_resource_configs else: @@ -631,6 +636,9 @@ def run_query(self, 'tableId': destination_table, } +if cluster_fields: +cluster_fields = {'fields': cluster_fields} + query_param_list = [ (sql, 'query', None, str), (priority, 'priority', 'INTERACTIVE', str), @@ -641,7 +649,8 @@ def run_query(self, (maximum_bytes_billed, 'maximumBytesBilled', None, float), (time_partitioning, 'timePartitioning', {}, dict), (schema_update_options, 'schemaUpdateOptions', None, tuple), -(destination_dataset_table, 'destinationTable', None, dict) +(destination_dataset_table, 'destinationTable', None, dict), +(cluster_fields, 'clustering', None, dict), ] for param_tuple in query_param_list: @@ -856,7 +865,8 @@ def run_load(self, allow_jagged_rows=False, schema_update_options=(), src_fmt_configs=None, - time_partitioning=None): + time_partitioning=None, + cluster_fields=None): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -920,6 +930,10 @@ def run_load(self, expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict +:param cluster_fields: Request that the result of this load be stored sorted +by one or more columns. This is only available in combination with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ # bigquery only allows certain source formats @@ -983,6 +997,9 @@ def run_load(self, 'timePartitioning': time_partitioning }) +if cluster_fields: +configuration['load'].update({'clustering': {'fields': cluster_fields}}) + if schema_fields: configuration['load']['schema'] = {'fields': schema_fields} diff --git a/airflow/contrib/operators/bigquery_operator.py b/airflow/contrib/operators/bigquery_operator.py index b0c0ce2d6e..025af034ad 100644 --- a/airflow/contrib/operators/bigquery_operator.py +++ b/airflow/contrib/operators/bigquery_operator.py @@ -100,6 +100,10 @@ class BigQueryOperator(BaseOperator): expiration as per API specifications. Note that 'field' is not available in conjunction with dataset.table$partition. :type time_partitioning: dict +:param cluster_fields: Request that the result of this query be stored sorted +by one or more columns. This is only available in conjunction with +time_partitioning. The order of columns given determines the sort order. +:type cluster_fields: list of str """ template_fields = ('bql', 'sql', 'destination_dataset_table', 'labels') @@ -127,6 +131,7 @@ def __init__(self, priority='INTERACTIVE', time_partitioning=None, api_resource_configs=None, + cluster_fields=None, *args,
[jira] [Closed] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-2997. - Resolution: Fixed > Support for clustered tables in Bigquery hooks/operators > > > Key: AIRFLOW-2997 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2997 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Gordon Ball >Priority: Minor > Fix For: 1.10.1 > > > Bigquery support for clustered tables was added (at GCP "Beta" level) on > 2018-07-30. This feature allows load or table-creating query operations to > request that data be stored sorted by a subset of columns, allowing more > efficient (and potentially cheaper) subsequent queries. > Support for specifying fields to cluster on should be added to at least the > bigquery hook, load-from-GCS operator and query operator. > Documentation: https://cloud.google.com/bigquery/docs/clustered-tables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3026) running initdb with default parameters will cause unhandled exception in models.py
Shawn Dion created AIRFLOW-3026: --- Summary: running initdb with default parameters will cause unhandled exception in models.py Key: AIRFLOW-3026 URL: https://issues.apache.org/jira/browse/AIRFLOW-3026 Project: Apache Airflow Issue Type: Bug Components: models Affects Versions: 1.10.0, 1.9.0 Reporter: Shawn Dion the default fernet key is: fernet_key = cryptography_not_found_storing_passwords_in_plain_text when running airflow initdb: {code:java} File "/home/usr/.local/lib/python3.6/site-packages/airflow/models.py", line 160, in get_fernet _fernet = Fernet(configuration.conf.get('core', 'FERNET_KEY').encode('utf-8')) File "/usr/local/lib64/python3.6/site-packages/cryptography/fernet.py", line 34, in __init__ key = base64.urlsafe_b64decode(key) File "/usr/lib64/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/usr/lib64/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding {code} TypeError and ValueError were handled in V1.9 but his error binascii.Error is not handled correctly / at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG opened a new pull request #3863: [AIRFLOW-XXX] Refine web UI authentication-related docs
XD-DENG opened a new pull request #3863: [AIRFLOW-XXX] Refine web UI authentication-related docs URL: https://github.com/apache/incubator-airflow/pull/3863 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Now in Airflow 1.10 we're already providing older version of web UI and new FAB-based UI at the same time. But the documentation is not differentiated very well. For example, - this doc https://airflow.apache.org/security.html#password is only applicable for old web UI only, but it's not hightlighted. - command line tool `create_user` is only for new FAB-based UI only, it's not highlighted as well. This may be confusing to users, especially given not everyone is aware of the existence of two UIs. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Doc change only ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1917) print() from python operators end up with extra new line
[ https://issues.apache.org/jira/browse/AIRFLOW-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607102#comment-16607102 ] ASF GitHub Bot commented on AIRFLOW-1917: - ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1917 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Remove extra newline character from log. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Trivial change. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > print() from python operators end up with extra new line > > > Key: AIRFLOW-1917 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1917 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Ash Berlin-Taylor >Priority: Major > Fix For: 1.10.1 > > > If I have the following as the callable for a PythonOperator: > {code} > def print_stuff(ti, **kwargs): > print("Hi from", __file__) > print("Hi 2 from", __file__) > {code} > I see the following in the log files > {noformat} > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from > /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: > [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value > was: None > {noformat} > Note the extra blank lines. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline character from log
ckljohn opened a new pull request #3862: [AIRFLOW-1917] Remove extra newline character from log URL: https://github.com/apache/incubator-airflow/pull/3862 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1917 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Remove extra newline character from log. ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Trivial change. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG closed pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
XD-DENG closed pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 4ff1ae3679..4fa0f3a5ea 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -1457,7 +1457,7 @@ class CLIFactory(object): 'subdir': Arg( ("-sd", "--subdir"), "File location or directory from which to look for the dag", -default=settings.DAGS_FOLDER), +default="[AIRFLOW_HOME]/dags"), 'start_date': Arg( ("-s", "--start_date"), "Override start_date -MM-DD", type=parsedate), @@ -1874,7 +1874,7 @@ class CLIFactory(object): "If reset_dag_run option is used," " backfill will first prompt users whether airflow " "should clear all the previous dag_run and task_instances " -"within the backfill date range." +"within the backfill date range. " "If rerun_failed_tasks is used, backfill " "will auto re-run the previous failed task instances" " within the backfill date range.", This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface
XD-DENG commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#issuecomment-419405297 Thanks @kaxil for the further checking. I'll close this PR then. Regarding **issue 2** (a missing space), may you fix that in your new PR as well? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface
kaxil commented on issue #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#issuecomment-419398764 Good catch @XD-DENG . Wonder how long this issue went un-noticed. We have this issue in **1.9.0**, **1.8.2** docs as well: 1.9.0: https://airflow.readthedocs.io/en/1.9.0/cli.html#Named%20Arguments_repeat4 1.8.2: https://airflow-fork-k1.readthedocs.io/en/v1-8-stable/cli.html#Named%20Arguments_repeat1 ![image](https://user-images.githubusercontent.com/8811558/45214222-c6327e00-b291-11e8-8a8f-15bd65137d6e.png) I fear we had this Bug since the start. @ashb you are right, changing this will cause issues. I will create a new PR to take care of this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215888710 ## File path: airflow/bin/cli.py ## @@ -1457,7 +1457,7 @@ class CLIFactory(object): 'subdir': Arg( ("-sd", "--subdir"), "File location or directory from which to look for the dag", -default=settings.DAGS_FOLDER), +default="[AIRFLOW_HOME]/dags"), Review comment: You're right. Didn't realize this. I would leave this to @kaxil to decide how he would like to fix this (please feel free to close this PR should you're going to fix this separately). But we do need to fix this. `Default: /Users/kaxil/airflow/dags` in the documentation is very confusing and misleading to doc readers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-419369853 @kaxil squashed into a single commit @tswast removed client side validation logic There appears to be a transitional failure on `py27-backend_mysql`; all the other database/python variants succeeded. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215887440 ## File path: airflow/bin/cli.py ## @@ -1457,7 +1457,7 @@ class CLIFactory(object): 'subdir': Arg( ("-sd", "--subdir"), "File location or directory from which to look for the dag", -default=settings.DAGS_FOLDER), +default="[AIRFLOW_HOME]/dags"), Review comment: Unfortunatley changing this default actually changes the value that will be in `args.subdir` - so this change would break a lot of things. It might be possible to change the `$AIRFLOW_HOME` when rendering the docs, or to change the display of this param? @kaxil How are the docs built when render then? Would it maybe be worth setting something like `export AIRFLOW_HOME=\$AIRFLOW_HOME` so that these defaults make sense? (That might well break other things) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
XD-DENG commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215888710 ## File path: airflow/bin/cli.py ## @@ -1457,7 +1457,7 @@ class CLIFactory(object): 'subdir': Arg( ("-sd", "--subdir"), "File location or directory from which to look for the dag", -default=settings.DAGS_FOLDER), +default="[AIRFLOW_HOME]/dags"), Review comment: You're right. Didn't realize this. I would leave this to @kaxil to decide how he would like to fix this (please feel free to close this PR should you're going to fix this separately). But we do need to fix this. `Default: /Users/kaxil/airflow/dags` in the documentation is very confusing and misleading to doc readers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
ashb commented on a change in pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861#discussion_r215887440 ## File path: airflow/bin/cli.py ## @@ -1457,7 +1457,7 @@ class CLIFactory(object): 'subdir': Arg( ("-sd", "--subdir"), "File location or directory from which to look for the dag", -default=settings.DAGS_FOLDER), +default="[AIRFLOW_HOME]/dags"), Review comment: Unfortunatley changing this default actually changes the value that will be in `args.subdir` - so this change would break a lot of things. It might be possible to change the `$AIRFLOW_HOME` when rendering the docs, or to change the display of this param? @kaxil How are the docs built when render This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG opened a new pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface
XD-DENG opened a new pull request #3861: [AIRFLOW-XXX] Fix doc of command line interface URL: https://github.com/apache/incubator-airflow/pull/3861 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [X] Here are some details about my PR, including screenshots of any UI changes: There are two issues found in documentation of command line interface **Issue 1** https://user-images.githubusercontent.com/11539188/45207953-01e23d80-b2bc-11e8-92a7-d08e7d5c54bf.png;> The default value of `--subdir` is generated dynamically, so Kaxil's Dag Folder during compiling is used in the final documentation (this appeared 14 times in https://airflow.apache.org/cli.html). I believe this should be "hardcoded" into `[AIRFLOW_HOME]/dags`. **Issue 2** A minor typo (a space is missing). https://user-images.githubusercontent.com/11539188/45208164-89c84780-b2bc-11e8-9ae9-4f72f48e8b37.png;> ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Only doc is changed ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606815#comment-16606815 ] Konrad Gołuchowski commented on AIRFLOW-3025: - Added PR here: https://github.com/apache/incubator-airflow/pull/3860 > Allow to specify dns and dns-search parameters for DockerOperator > - > > Key: AIRFLOW-3025 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3025 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Konrad Gołuchowski >Assignee: Konrad Gołuchowski >Priority: Minor > > Docker allows to specify dns and dns-search options when starting a > container. It would be useful to enable DockerOperator to use these two > options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606814#comment-16606814 ] ASF GitHub Bot commented on AIRFLOW-3025: - kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3860 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3025 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow to specify dns and dns-search parameters for DockerOperator > - > > Key: AIRFLOW-3025 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3025 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Konrad Gołuchowski >Assignee: Konrad Gołuchowski >Priority: Minor > > Docker allows to specify dns and dns-search options when starting a > container. It would be useful to enable DockerOperator to use these two > options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator
kodieg opened a new pull request #3860: [AIRFLOW-3025] Enable specifying dns and dns_search options for DockerOperator URL: https://github.com/apache/incubator-airflow/pull/3860 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3025 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator
Konrad Gołuchowski created AIRFLOW-3025: --- Summary: Allow to specify dns and dns-search parameters for DockerOperator Key: AIRFLOW-3025 URL: https://issues.apache.org/jira/browse/AIRFLOW-3025 Project: Apache Airflow Issue Type: Improvement Reporter: Konrad Gołuchowski Assignee: Konrad Gołuchowski Docker allows to specify dns and dns-search options when starting a container. It would be useful to enable DockerOperator to use these two options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)