[jira] [Commented] (AIRFLOW-1006) Move configuration templates to separate files
[ https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934050#comment-15934050 ] ASF subversion and git services commented on AIRFLOW-1006: -- Commit b586bd6123ff43b0e1a885489f06dc965d22e705 in incubator-airflow's branch refs/heads/master from [~jlowin] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b586bd6 ] [AIRFLOW-1006] Add config_templates to MANIFEST Without this line, the config templates are not included when Airflow is installed Closes #2173 from jlowin/speedup-2 > Move configuration templates to separate files > -- > > Key: AIRFLOW-1006 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1006 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Affects Versions: Airflow 1.8 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.9.0 > > > Currently both the default and test configuration templates are just strings > inside configuration.py. This makes them difficult to work with. It would be > much better to expose them as separate files, "default_airflow.cfg" and > "default_test.cfg", to make it clear they are distinct config templates. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-1006] Add config_templates to MANIFEST
Repository: incubator-airflow Updated Branches: refs/heads/master 8de850162 -> b586bd612 [AIRFLOW-1006] Add config_templates to MANIFEST Without this line, the config templates are not included when Airflow is installed Closes #2173 from jlowin/speedup-2 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b586bd61 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b586bd61 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b586bd61 Branch: refs/heads/master Commit: b586bd6123ff43b0e1a885489f06dc965d22e705 Parents: 8de8501 Author: Jeremiah LowinAuthored: Mon Mar 20 23:48:08 2017 -0400 Committer: Jeremiah Lowin Committed: Mon Mar 20 23:48:08 2017 -0400 -- MANIFEST.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b586bd61/MANIFEST.in -- diff --git a/MANIFEST.in b/MANIFEST.in index 717b077..69ccafe 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -21,4 +21,4 @@ graft airflow/www/static include airflow/alembic.ini graft scripts/systemd graft scripts/upstart - +graft airflow/config_templates
[jira] [Resolved] (AIRFLOW-1006) Move configuration templates to separate files
[ https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-1006. - Resolution: Fixed Issue resolved by pull request #2173 [https://github.com/apache/incubator-airflow/pull/2173] > Move configuration templates to separate files > -- > > Key: AIRFLOW-1006 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1006 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Affects Versions: Airflow 1.8 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.9.0 > > > Currently both the default and test configuration templates are just strings > inside configuration.py. This makes them difficult to work with. It would be > much better to expose them as separate files, "default_airflow.cfg" and > "default_test.cfg", to make it clear they are distinct config templates. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1019) active_dagruns shouldn't include paused DAGs
Dan Davydov created AIRFLOW-1019: Summary: active_dagruns shouldn't include paused DAGs Key: AIRFLOW-1019 URL: https://issues.apache.org/jira/browse/AIRFLOW-1019 Project: Apache Airflow Issue Type: Bug Reporter: Dan Davydov Priority: Critical Since 1.8.0 Airflow resets orphaned tasks (tasks that are in the DB but not in the executor's memory). The problem is that Airflow counts dagruns in paused DAGs as running as long as the dagruns state is running. Instead we should join against non-paused DAGs everywhere we calculate active dagruns (e.g. in _process_task_instances in the Scheduler class in jobs.py). If there are enough paused DAGs it brings the scheduler to a halt especially on scheduler restarts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[incubator-airflow] Git Push Summary
Repository: incubator-airflow Updated Tags: refs/tags/1.8.0-docs [created] a86c7674d
[1/2] incubator-airflow-site git commit: v1.8.0
Repository: incubator-airflow-site Updated Branches: refs/heads/asf-site 5e5740122 -> 69cff4922 http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/69cff492/searchindex.js -- diff --git a/searchindex.js b/searchindex.js index 625cb26..121f5a8 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["api","cli","code","concepts","configuration","faq","index","installation","integration","license","plugins","profiling","project","scheduler","security","start","tutorial","ui"],envversion:50,filenames:["api.rst","cli.rst","code.rst","concepts.rst","configuration.rst","faq.rst","index.rst","installation.rst","integration.rst","license.rst","plugins.rst","profiling.rst","project.rst","scheduler.rst","security.rst","start.rst","tutorial.rst","ui.rst"],objects:{"airflow.contrib":{hooks:[2,1,0,"-"],operators:[2,1,0,"-"]},"airflow.contrib.executors.mesos_executor":{MesosExecutor:[2,0,1,""]},"airflow.contrib.hooks":{FTPHook:[2,0,1,""],SSHHook:[2,0,1,""]},"airflow.contrib.hooks.FTPHook":{close_conn:[2,2,1,""],create_directory:[2,2,1,""],delete_directory:[2,2,1,""],delete_file:[2,2,1,""],describe_directory:[2,2,1,""],get_conn:[2,2,1,""],list_directory:[2,2,1,""],rename:[2,2,1,""],retrieve_file:[2,2,1,""],store_file:[2,2,1,""]},"airflow.contrib.hooks.SSHHook":{Pop en:[2,2,1,""],check_output:[2,2,1,""],tunnel:[2,2,1,""]},"airflow.contrib.operators":{SSHExecuteOperator:[2,0,1,""]},"airflow.contrib.operators.hipchat_operator":{HipChatAPIOperator:[2,0,1,""],HipChatAPISendRoomNotificationOperator:[2,0,1,""]},"airflow.executors":{LocalExecutor:[2,0,1,""],SequentialExecutor:[2,0,1,""]},"airflow.hooks":{DbApiHook:[2,0,1,""],HttpHook:[2,0,1,""],MySqlHook:[2,0,1,""],PrestoHook:[2,0,1,""],SqliteHook:[2,0,1,""]},"airflow.hooks.DbApiHook":{bulk_dump:[2,2,1,""],bulk_load:[2,2,1,""],get_conn:[2,2,1,""],get_cursor:[2,2,1,""],get_first:[2,2,1,""],get_pandas_df:[2,2,1,""],get_records:[2,2,1,""],insert_rows:[2,2,1,""],run:[2,2,1,""]},"airflow.hooks.HttpHook":{get_conn:[2,2,1,""],run:[2,2,1,""],run_and_check:[2,2,1,""]},"airflow.hooks.MySqlHook":{bulk_load:[2,2,1,""],get_conn:[2,2,1,""]},"airflow.hooks.PrestoHook":{get_conn:[2,2,1,""],get_first:[2,2,1,""],get_pandas_df:[2,2,1,""],get_records:[2,2,1,""],run:[2,2,1,""]},"airflow.hooks.SqliteHook":{get_conn:[2,2,1, ""]},"airflow.macros":{ds_add:[2,3,1,""],ds_format:[2,3,1,""],hive:[2,1,0,"-"],random:[2,3,1,""]},"airflow.macros.hive":{closest_ds_partition:[2,3,1,""],max_partition:[2,3,1,""]},"airflow.models":{BaseOperator:[2,0,1,""],Connection:[2,0,1,""],DAG:[2,0,1,""],DagBag:[2,0,1,""],TaskInstance:[2,0,1,""]},"airflow.models.BaseOperator":{clear:[2,2,1,""],dag:[2,4,1,""],deps:[2,4,1,""],detect_downstream_cycle:[2,2,1,""],downstream_list:[2,4,1,""],execute:[2,2,1,""],get_direct_relatives:[2,2,1,""],get_flat_relatives:[2,2,1,""],get_task_instances:[2,2,1,""],has_dag:[2,2,1,""],on_kill:[2,2,1,""],post_execute:[2,2,1,""],pre_execute:[2,2,1,""],prepare_template:[2,2,1,""],render_template:[2,2,1,""],render_template_from_field:[2,2,1,""],run:[2,2,1,""],schedule_interval:[2,4,1,""],set_downstream:[2,2,1,""],set_upstream:[2,2,1,""],upstream_list:[2,4,1,""],xcom_pull:[2,2,1,""],xcom_push:[2,2,1,""]},"airflow.models.Connection":{extra_dejson:[2,4,1,""]},"airflow.models.DAG":{add_task:[2,2,1,""],add_task s:[2,2,1,""],clear:[2,2,1,""],cli:[2,2,1,""],concurrency_reached:[2,4,1,""],crawl_for_tasks:[2,2,1,""],create_dagrun:[2,2,1,""],deactivate_stale_dags:[2,5,1,""],deactivate_unknown_dags:[2,5,1,""],filepath:[2,4,1,""],folder:[2,4,1,""],get_active_runs:[2,2,1,""],get_dagrun:[2,2,1,""],get_last_dagrun:[2,2,1,""],get_template_env:[2,2,1,""],is_paused:[2,4,1,""],latest_execution_date:[2,4,1,""],normalize_schedule:[2,2,1,""],run:[2,2,1,""],set_dependency:[2,2,1,""],sub_dag:[2,2,1,""],subdags:[2,4,1,""],sync_to_db:[2,5,1,""],tree_view:[2,2,1,""]},"airflow.models.DagBag":{bag_dag:[2,2,1,""],collect_dags:[2,2,1,""],dagbag_report:[2,2,1,""],get_dag:[2,2,1,""],kill_zombies:[2,2,1,""],process_file:[2,2,1,""],size:[2,2,1,""]},"airflow.models.TaskInstance":{are_dependencies_met:[2,2,1,""],are_dependents_done:[2,2,1,""],clear_xcom_data:[2,2,1,""],command:[2,2,1,""],command_as_list:[2,2,1,""],current_state:[2,2,1,""],error:[2,2,1,""],generate_command:[2,5,1,""],get_dagrun:[2,2,1,""],init_on_load:[2,
[2/2] incubator-airflow-site git commit: v1.8.0
v1.8.0 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/commit/69cff492 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/tree/69cff492 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/diff/69cff492 Branch: refs/heads/asf-site Commit: 69cff49228f32d88fcdf3e92808ad51ab3438d8d Parents: 5e57401 Author: Maxime BeaucheminAuthored: Mon Mar 20 17:04:37 2017 -0700 Committer: Maxime Beauchemin Committed: Mon Mar 20 17:04:37 2017 -0700 -- .../contrib/executors/mesos_executor.html | 2 +- _modules/airflow/models.html| 134 --- _modules/airflow/operators/sensors.html | 2 +- _modules/mysql_hook.html| 15 +-- _modules/mysql_operator.html| 8 +- _modules/sensors.html | 2 +- _sources/concepts.rst.txt | 3 +- _sources/configuration.rst.txt | 35 - code.html | 27 ++-- concepts.html | 3 +- configuration.html | 28 genindex.html | 2 + index.html | 1 - objects.inv | Bin 2147 -> 2159 bytes searchindex.js | 2 +- 15 files changed, 123 insertions(+), 141 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/69cff492/_modules/airflow/contrib/executors/mesos_executor.html -- diff --git a/_modules/airflow/contrib/executors/mesos_executor.html b/_modules/airflow/contrib/executors/mesos_executor.html index dbca37d..311d1a8 100644 --- a/_modules/airflow/contrib/executors/mesos_executor.html +++ b/_modules/airflow/contrib/executors/mesos_executor.html @@ -331,7 +331,7 @@ except KeyError: # The map may not contain an item if the framework re-registered after a failover. # Discard these tasks. -logging.warning(Unrecognised task key %s % update.task_id.value) +logging.warn(Unrecognised task key %s % update.task_id.value) return if update.state == mesos_pb2.TASK_FINISHED: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/69cff492/_modules/airflow/models.html -- diff --git a/_modules/airflow/models.html b/_modules/airflow/models.html index a94686b..0b043ea 100644 --- a/_modules/airflow/models.html +++ b/_modules/airflow/models.html @@ -632,7 +632,7 @@ def paused_dags(self): session = settings.Session() dag_ids = [dp.dag_id for dp in session.query(DagModel).filter( -DagModel.is_paused.__eq__(True))] +DagModel.is_paused.is_(True))] session.commit() session.close() return dag_ids @@ -1161,6 +1161,7 @@ self.end_date = ti.end_date self.try_number = ti.try_number self.hostname = ti.hostname +self.pid = ti.pid else: self.state = None @@ -1452,19 +1453,20 @@ verbose=True) if not runnable and not mark_success: -if self.state != State.QUEUED: -# If a tasks dependencies are met but it cant be run yet then queue it -# instead -self.state = State.QUEUED -msg = Queuing attempt {attempt} of {total}.format( -attempt=self.try_number % (task.retries + 1) + 1, -total=task.retries + 1) -logging.info(hr + msg + hr) - -self.queued_dttm = datetime.now() -msg = Queuing into pool {}.format(self.pool) -logging.info(msg) -session.merge(self) +# FIXME: we might have hit concurrency limits, which means we probably +# have been running prematurely. This should be handled in the +# scheduling mechanism. +self.state = State.NONE +msg = (FIXME: Rescheduling due to concurrency limits reached at task + runtime. Attempt {attempt} of {total}. State set to NONE.).format( +attempt=self.try_number % (task.retries + 1) + 1, +total=task.retries + 1) +logging.warning(hr + msg + hr) + +self.queued_dttm = datetime.now() +msg = Queuing into pool {}.format(self.pool) +logging.info(msg) +session.merge(self)
[jira] [Created] (AIRFLOW-1018) Scheduler DAG processes can not log to stdout
Vincent Poulain created AIRFLOW-1018: Summary: Scheduler DAG processes can not log to stdout Key: AIRFLOW-1018 URL: https://issues.apache.org/jira/browse/AIRFLOW-1018 Project: Apache Airflow Issue Type: Bug Affects Versions: Airflow 1.8 Environment: Airflow 1.8.0 Reporter: Vincent Poulain Each DAG has its own log file for the scheduler and we can specify the directory with child_process_log_directory param. Unfortunately we can not change device / by specifying /dev/stdout for example. That is very useful when we execute Airflow in a container. When we specify /dev/stdout it raises: "OSError: [Errno 20] Not a directory: '/dev/stdout/2017-03-19'" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1017) get_task_instance should return None instead of throw an exception for non-existent TIs
Dan Davydov created AIRFLOW-1017: Summary: get_task_instance should return None instead of throw an exception for non-existent TIs Key: AIRFLOW-1017 URL: https://issues.apache.org/jira/browse/AIRFLOW-1017 Project: Apache Airflow Issue Type: Bug Reporter: Dan Davydov Assignee: Dan Davydov Priority: Critical We were seeing errors in our scheduler like the following due to this issue: File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2796, in one raise orm_exc.NoResultFound("No row was found for one()") -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1016) Allow HTTP HEAD request method on HTTPSensor
msempere created AIRFLOW-1016: - Summary: Allow HTTP HEAD request method on HTTPSensor Key: AIRFLOW-1016 URL: https://issues.apache.org/jira/browse/AIRFLOW-1016 Project: Apache Airflow Issue Type: Improvement Reporter: msempere Assignee: msempere Priority: Minor HTTPSensor hardcodes the HTTP request method to `GET`, and could be the case where `HEAD` method is needed to act as a sensor. This case is useful when we just need to retrieve some meta data and not the complete body for that particular request, and that metadata information is enough for our sensor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-983) Make trigger rules more explicit regarding success vs skipped
[ https://issues.apache.org/jira/browse/AIRFLOW-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Davydov updated AIRFLOW-983: Priority: Critical (was: Blocker) > Make trigger rules more explicit regarding success vs skipped > - > > Key: AIRFLOW-983 > URL: https://issues.apache.org/jira/browse/AIRFLOW-983 > Project: Apache Airflow > Issue Type: Improvement > Components: dependencies >Reporter: Daniel Huang >Priority: Critical > > Since AIRFLOW-719, the trigger rules all_success/one_success include both > success and skipped states. > We should probably make ALL_SUCCESS strictly success (again) and add a > separate ALL_SUCCESS_OR_SKIPPED/ALL_FAILED_OR_SKIPPED. ALL_SUCCESS_OR_SKIPPED > may be a more appropriate default trigger rule as well. Otherwise, we need to > note in LatestOnly/ShortCircuit/Branch operators of the appropriate trigger > rule to use there. > Some previous discussion in > https://github.com/apache/incubator-airflow/pull/1961 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-983) Make trigger rules more explicit regarding success vs skipped
[ https://issues.apache.org/jira/browse/AIRFLOW-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Davydov updated AIRFLOW-983: Priority: Blocker (was: Major) > Make trigger rules more explicit regarding success vs skipped > - > > Key: AIRFLOW-983 > URL: https://issues.apache.org/jira/browse/AIRFLOW-983 > Project: Apache Airflow > Issue Type: Improvement > Components: dependencies >Reporter: Daniel Huang >Priority: Blocker > > Since AIRFLOW-719, the trigger rules all_success/one_success include both > success and skipped states. > We should probably make ALL_SUCCESS strictly success (again) and add a > separate ALL_SUCCESS_OR_SKIPPED/ALL_FAILED_OR_SKIPPED. ALL_SUCCESS_OR_SKIPPED > may be a more appropriate default trigger rule as well. Otherwise, we need to > note in LatestOnly/ShortCircuit/Branch operators of the appropriate trigger > rule to use there. > Some previous discussion in > https://github.com/apache/incubator-airflow/pull/1961 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1015) TreeView displayed over task instances
Ruslan Dautkhanov created AIRFLOW-1015: -- Summary: TreeView displayed over task instances Key: AIRFLOW-1015 URL: https://issues.apache.org/jira/browse/AIRFLOW-1015 Project: Apache Airflow Issue Type: Bug Components: ui Affects Versions: Airflow 1.8 Reporter: Ruslan Dautkhanov Attachments: TreeView-bug.png See screnshot: !TreeView-bug.png! It would be nice if first TI horizontal offset would automatically shift to the right, depending on how many tasks are in longest branch to the left. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-999) Support for Redis database
[ https://issues.apache.org/jira/browse/AIRFLOW-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-999. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2165 [https://github.com/apache/incubator-airflow/pull/2165] > Support for Redis database > -- > > Key: AIRFLOW-999 > URL: https://issues.apache.org/jira/browse/AIRFLOW-999 > Project: Apache Airflow > Issue Type: Improvement > Components: db >Reporter: msempere >Assignee: msempere >Priority: Minor > Labels: features > Fix For: 1.9.0 > > > Currently Airflow doesn't offer support for Redis DB. > The idea is to create a Hook to connect to it and offer a minimal > functionality. > So the proposal is to create a sensor that monitor for a Redis key existence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1) Migrate GitHub code to Apache git
[ https://issues.apache.org/jira/browse/AIRFLOW-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933224#comment-15933224 ] ASF subversion and git services commented on AIRFLOW-1: --- Commit 8de85016265443987a0e0fff406e996d421dc9d6 in incubator-airflow's branch refs/heads/master from [~msempere] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8de8501 ] [AIRFLOW-999] Add support for Redis database This PR includes a redis_hook and a redis_key_sensor to enable checking for key existence in redis. It also updates the documentation and add the relevant unit tests. - [x] Opened a PR on Github - [x] My PR addresses the following Airflow JIRA issues: - https://issues.apache.org/jira/browse/AIRFLOW-999 - [x] The PR title references the JIRA issues. For example, "[AIRFLOW-1] My Airflow PR" - [x] My PR adds unit tests - [ ] __OR__ my PR does not need testing for this extremely good reason: - [x] Here are some details about my PR: - [ ] Here are screenshots of any UI changes, if appropriate: - [x] Each commit subject references a JIRA issue. For example, "[AIRFLOW-1] Add new feature" - [x] Multiple commits addressing the same JIRA issue have been squashed - [x] My commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2165 from msempere/AIRFLOW-999/support- for-redis-database > Migrate GitHub code to Apache git > - > > Key: AIRFLOW-1 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: Maxime Beauchemin >Assignee: Maxime Beauchemin > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-999) Support for Redis database
[ https://issues.apache.org/jira/browse/AIRFLOW-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933222#comment-15933222 ] ASF subversion and git services commented on AIRFLOW-999: - Commit 8de85016265443987a0e0fff406e996d421dc9d6 in incubator-airflow's branch refs/heads/master from [~msempere] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8de8501 ] [AIRFLOW-999] Add support for Redis database This PR includes a redis_hook and a redis_key_sensor to enable checking for key existence in redis. It also updates the documentation and add the relevant unit tests. - [x] Opened a PR on Github - [x] My PR addresses the following Airflow JIRA issues: - https://issues.apache.org/jira/browse/AIRFLOW-999 - [x] The PR title references the JIRA issues. For example, "[AIRFLOW-1] My Airflow PR" - [x] My PR adds unit tests - [ ] __OR__ my PR does not need testing for this extremely good reason: - [x] Here are some details about my PR: - [ ] Here are screenshots of any UI changes, if appropriate: - [x] Each commit subject references a JIRA issue. For example, "[AIRFLOW-1] Add new feature" - [x] Multiple commits addressing the same JIRA issue have been squashed - [x] My commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2165 from msempere/AIRFLOW-999/support- for-redis-database > Support for Redis database > -- > > Key: AIRFLOW-999 > URL: https://issues.apache.org/jira/browse/AIRFLOW-999 > Project: Apache Airflow > Issue Type: Improvement > Components: db >Reporter: msempere >Assignee: msempere >Priority: Minor > Labels: features > > Currently Airflow doesn't offer support for Redis DB. > The idea is to create a Hook to connect to it and offer a minimal > functionality. > So the proposal is to create a sensor that monitor for a Redis key existence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1) Migrate GitHub code to Apache git
[ https://issues.apache.org/jira/browse/AIRFLOW-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933226#comment-15933226 ] ASF subversion and git services commented on AIRFLOW-1: --- Commit 8de85016265443987a0e0fff406e996d421dc9d6 in incubator-airflow's branch refs/heads/master from [~msempere] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8de8501 ] [AIRFLOW-999] Add support for Redis database This PR includes a redis_hook and a redis_key_sensor to enable checking for key existence in redis. It also updates the documentation and add the relevant unit tests. - [x] Opened a PR on Github - [x] My PR addresses the following Airflow JIRA issues: - https://issues.apache.org/jira/browse/AIRFLOW-999 - [x] The PR title references the JIRA issues. For example, "[AIRFLOW-1] My Airflow PR" - [x] My PR adds unit tests - [ ] __OR__ my PR does not need testing for this extremely good reason: - [x] Here are some details about my PR: - [ ] Here are screenshots of any UI changes, if appropriate: - [x] Each commit subject references a JIRA issue. For example, "[AIRFLOW-1] Add new feature" - [x] Multiple commits addressing the same JIRA issue have been squashed - [x] My commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2165 from msempere/AIRFLOW-999/support- for-redis-database > Migrate GitHub code to Apache git > - > > Key: AIRFLOW-1 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: Maxime Beauchemin >Assignee: Maxime Beauchemin > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-999) Support for Redis database
[ https://issues.apache.org/jira/browse/AIRFLOW-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933227#comment-15933227 ] ASF subversion and git services commented on AIRFLOW-999: - Commit 8de85016265443987a0e0fff406e996d421dc9d6 in incubator-airflow's branch refs/heads/master from [~msempere] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8de8501 ] [AIRFLOW-999] Add support for Redis database This PR includes a redis_hook and a redis_key_sensor to enable checking for key existence in redis. It also updates the documentation and add the relevant unit tests. - [x] Opened a PR on Github - [x] My PR addresses the following Airflow JIRA issues: - https://issues.apache.org/jira/browse/AIRFLOW-999 - [x] The PR title references the JIRA issues. For example, "[AIRFLOW-1] My Airflow PR" - [x] My PR adds unit tests - [ ] __OR__ my PR does not need testing for this extremely good reason: - [x] Here are some details about my PR: - [ ] Here are screenshots of any UI changes, if appropriate: - [x] Each commit subject references a JIRA issue. For example, "[AIRFLOW-1] Add new feature" - [x] Multiple commits addressing the same JIRA issue have been squashed - [x] My commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2165 from msempere/AIRFLOW-999/support- for-redis-database > Support for Redis database > -- > > Key: AIRFLOW-999 > URL: https://issues.apache.org/jira/browse/AIRFLOW-999 > Project: Apache Airflow > Issue Type: Improvement > Components: db >Reporter: msempere >Assignee: msempere >Priority: Minor > Labels: features > > Currently Airflow doesn't offer support for Redis DB. > The idea is to create a Hook to connect to it and offer a minimal > functionality. > So the proposal is to create a sensor that monitor for a Redis key existence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
incubator-airflow git commit: [AIRFLOW-999] Add support for Redis database
Repository: incubator-airflow Updated Branches: refs/heads/master 23a16f7ad -> 8de850162 [AIRFLOW-999] Add support for Redis database This PR includes a redis_hook and a redis_key_sensor to enable checking for key existence in redis. It also updates the documentation and add the relevant unit tests. - [x] Opened a PR on Github - [x] My PR addresses the following Airflow JIRA issues: - https://issues.apache.org/jira/browse/AIRFLOW-999 - [x] The PR title references the JIRA issues. For example, "[AIRFLOW-1] My Airflow PR" - [x] My PR adds unit tests - [ ] __OR__ my PR does not need testing for this extremely good reason: - [x] Here are some details about my PR: - [ ] Here are screenshots of any UI changes, if appropriate: - [x] Each commit subject references a JIRA issue. For example, "[AIRFLOW-1] Add new feature" - [x] Multiple commits addressing the same JIRA issue have been squashed - [x] My commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2165 from msempere/AIRFLOW-999/support- for-redis-database Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8de85016 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8de85016 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8de85016 Branch: refs/heads/master Commit: 8de85016265443987a0e0fff406e996d421dc9d6 Parents: 23a16f7 Author: MSempereAuthored: Mon Mar 20 11:10:55 2017 -0700 Committer: Arthur Wiedmer Committed: Mon Mar 20 11:11:31 2017 -0700 -- airflow/contrib/hooks/redis_hook.py | 92 airflow/contrib/sensors/redis_key_sensor.py | 46 airflow/models.py | 4 ++ airflow/utils/db.py | 5 ++ docs/installation.rst | 2 + setup.py| 2 + tests/contrib/hooks/test_redis_hook.py | 46 tests/contrib/sensors/redis_sensor.py | 64 + 8 files changed, 261 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8de85016/airflow/contrib/hooks/redis_hook.py -- diff --git a/airflow/contrib/hooks/redis_hook.py b/airflow/contrib/hooks/redis_hook.py new file mode 100644 index 000..936eff8 --- /dev/null +++ b/airflow/contrib/hooks/redis_hook.py @@ -0,0 +1,92 @@ +# -*- coding: utf-8 -*- +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +RedisHook module +""" + +import logging + +from redis import StrictRedis + +from airflow.exceptions import AirflowException +from airflow.hooks.base_hook import BaseHook + + +class RedisHook(BaseHook): +""" +Hook to interact with Redis database +""" +def __init__(self, redis_conn_id='redis_default'): +""" +Prepares hook to connect to a Redis database. + +:param conn_id: the name of the connection that has the parameters +we need to connect to Redis. +""" +self.redis_conn_id = redis_conn_id +self.client = None +conn = self.get_connection(self.redis_conn_id) +self.host = conn.host +self.port = int(conn.port) +self.password = conn.password +self.db = int(conn.extra_dejson.get('db', 0)) +self.logger = logging.getLogger(__name__) +self.logger.debug( +'''Connection "{conn}": +\thost: {host} +\tport: {port} +\textra: {extra} +'''.format( +conn=self.redis_conn_id, +host=self.host, +port=self.port, +extra=conn.extra_dejson +) +) + +def get_conn(self): +""" +Returns a Redis connection. +""" +if not self.client: +self.logger.debug( +'generating
[jira] [Commented] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932997#comment-15932997 ] Vijay Krishna Ramesh commented on AIRFLOW-1011: --- Ah just to clarify, the behavior I detailed above is for a normally scheduled dag run, not a backfill or manually triggered run via trigger_dag. I do recall some weirdness with backfill and sub dag task instances after upgrading to the 1.8 RC4 but I don't have specific notes or logs saved to see if it was the same issue you are seeing. > Task Instance Results not stored for SubDAG Tasks > - > > Key: AIRFLOW-1011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1011 > Project: Apache Airflow > Issue Type: Bug > Components: backfill, subdag >Affects Versions: Airflow 1.8 >Reporter: Joe Schmid >Priority: Critical > Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, > 2-ZoomedSubDAG-NoTaskInstances-v1.8.png, > 3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, test_subdag.py > > > In previous Airflow versions, results for tasks executed as a subdag were > written as rows to task_instances. In Airflow 1.8 only rows for tasks inside > the top-level DAG (non-subdag tasks) seem to get written to the database. > This results in being unable to check the status of task instances inside the > subdag from the UI, check the logs for those task instances from the UI, etc. > Attached is a simple test DAG that exhibits the issue along with screenshots > showing the UI differences between v1.8 and v1.7.1.3. > Note that if the DAG is run via backfill from command line (e.g. "airflow > backfill Test_SubDAG -s 2017-03-18 -e 2017-03-18") the task instances show up > successfully. > Also, we're using CeleryExecutor and not specifying a different executor for > our subdags. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932685#comment-15932685 ] Joe Schmid edited comment on AIRFLOW-1011 at 3/20/17 2:13 PM: -- Vijay, what you are describing is NOT what I'm observing. I just confirmed by doing: 1. airflow resetdb 2. Confirm there are no rows in task_instances (SELECT * FROM task_instances) 3. Trigger Test_SubDAG 4. Confirm the DAG finished executing successfully 5. SELECT * FROM task_instances only returns 3 rows and no rows for any of the SubDAG task instances ||task_id|dag_id|execution_date|start_date|end_date|duration|state|try_number|hostname|unixname|job_id|pool|queue|priority_weight|operator|queued_dttm|pid|| |DAG_Task1|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:01:49.452930|2017-03-20 14:01:49.508669|0.055739|success|1|5afc1ff6f68b|root|3||default|3|DummyOperator|2017-03-20 14:01:39.841692|19188| |DAG_Task2|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:03:49.826690|2017-03-20 14:03:49.968047|0.141357|success|1|5afc1ff6f68b|root|6||default|1|DummyOperator|2017-03-20 14:03:41.486849|19606| |SubDagOp|Test_SubDAG|2017-03-20 14:01:22.193546|2017-03-20 14:02:48.214548|2017-03-20 14:02:48.399452|0.184904|success|1|5afc1ff6f68b|root|4||default|2|SubDagOperator|2017-03-20 14:02:40.413805|19392| In fact, it looks like the backfill for the subdag is not actually running. Log of the SubDagOperator task instance show that it immediately logs "Backfill done. Exiting." right after if starts executing: {code:none} [2017-03-20 14:02:43,919] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py [2017-03-20 14:02:44,245] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'airflow run Test_SubDAG SubDagOp 2017-03-20T14:01:22.193546 --job_id 4 --raw -sd DAGS_FOLDER/common/util/test_subdag.py'] [2017-03-20 14:02:46,661] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:46,656] {__init__.py:57} INFO - Using executor CeleryExecutor [2017-03-20 14:02:47,842] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:47,841] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py [2017-03-20 14:02:48,214] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,214] {models.py:1126} INFO - Dependencies all met for [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,229] {models.py:1126} INFO - Dependencies all met for [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {models.py:1318} INFO - [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1 [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {models.py:1342} INFO - Executingon 2017-03-20 14:01:22.193546 [2017-03-20 14:02:48,371] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,370] {jobs.py:2023} INFO - Backfill done. Exiting. [2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/site-packages/airflow/ti_deps/deps/base_ti_dep.py:94: PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration [2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: for dep_status in self._get_dep_statuses(ti, session, dep_context): [2017-03-20 14:02:49,275] {jobs.py:2083} INFO - Task exited with return code 0 {code} was (Author: jschmid): Vijay, what you are describing is NOT what I'm observing. I just confirmed by doing: 1. airflow resetdb 2. Confirm there are no rows in task_instances (SELECT * FROM task_instances) 3. Trigger Test_SubDAG 4. Confirm the DAG finished executing successfully 5. SELECT * FROM task_instances only returns 3 rows and no rows for any of the SubDAG task instances ``` task_id,dag_id,execution_date,start_date,end_date,duration,state,try_number,hostname,unixname,job_id,pool,queue,priority_weight,operator,queued_dttm,pid DAG_Task1,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:01:49.452930,2017-03-20 14:01:49.508669,0.055739,success,1,5afc1ff6f68b,root,3,,default,3,DummyOperator,2017-03-20 14:01:39.841692,19188 DAG_Task2,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:03:49.826690,2017-03-20 14:03:49.968047,0.141357,success,1,5afc1ff6f68b,root,6,,default,1,DummyOperator,2017-03-20 14:03:41.486849,19606 SubDagOp,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20
[jira] [Commented] (AIRFLOW-1011) Task Instance Results not stored for SubDAG Tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932685#comment-15932685 ] Joe Schmid commented on AIRFLOW-1011: - Vijay, what you are describing is NOT what I'm observing. I just confirmed by doing: 1. airflow resetdb 2. Confirm there are no rows in task_instances (SELECT * FROM task_instances) 3. Trigger Test_SubDAG 4. Confirm the DAG finished executing successfully 5. SELECT * FROM task_instances only returns 3 rows and no rows for any of the SubDAG task instances ``` task_id,dag_id,execution_date,start_date,end_date,duration,state,try_number,hostname,unixname,job_id,pool,queue,priority_weight,operator,queued_dttm,pid DAG_Task1,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:01:49.452930,2017-03-20 14:01:49.508669,0.055739,success,1,5afc1ff6f68b,root,3,,default,3,DummyOperator,2017-03-20 14:01:39.841692,19188 DAG_Task2,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:03:49.826690,2017-03-20 14:03:49.968047,0.141357,success,1,5afc1ff6f68b,root,6,,default,1,DummyOperator,2017-03-20 14:03:41.486849,19606 SubDagOp,Test_SubDAG,2017-03-20 14:01:22.193546,2017-03-20 14:02:48.214548,2017-03-20 14:02:48.399452,0.184904,success,1,5afc1ff6f68b,root,4,,default,2,SubDagOperator,2017-03-20 14:02:40.413805,19392 ``` In fact, it looks like the backfill for the subdag is not actually running. Log of the SubDagOperator task instance: ``` [2017-03-20 14:02:43,919] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py [2017-03-20 14:02:44,245] {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'airflow run Test_SubDAG SubDagOp 2017-03-20T14:01:22.193546 --job_id 4 --raw -sd DAGS_FOLDER/common/util/test_subdag.py'] [2017-03-20 14:02:46,661] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:46,656] {__init__.py:57} INFO - Using executor CeleryExecutor [2017-03-20 14:02:47,842] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:47,841] {models.py:167} INFO - Filling up the DagBag from /usr/local/airflow/dags/common/util/test_subdag.py [2017-03-20 14:02:48,214] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,214] {models.py:1126} INFO - Dependencies all met for [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,229] {models.py:1126} INFO - Dependencies all met for [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {models.py:1318} INFO - [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1 [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,230] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,265] {models.py:1342} INFO - Executingon 2017-03-20 14:01:22.193546 [2017-03-20 14:02:48,371] {base_task_runner.py:95} INFO - Subtask: [2017-03-20 14:02:48,370] {jobs.py:2023} INFO - Backfill done. Exiting. [2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: /usr/local/lib/python3.5/site-packages/airflow/ti_deps/deps/base_ti_dep.py:94: PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration [2017-03-20 14:02:48,433] {base_task_runner.py:95} INFO - Subtask: for dep_status in self._get_dep_statuses(ti, session, dep_context): [2017-03-20 14:02:49,275] {jobs.py:2083} INFO - Task exited with return code 0 ``` > Task Instance Results not stored for SubDAG Tasks > - > > Key: AIRFLOW-1011 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1011 > Project: Apache Airflow > Issue Type: Bug > Components: backfill, subdag >Affects Versions: Airflow 1.8 >Reporter: Joe Schmid >Priority: Critical > Attachments: 1-TopLevelDAGTaskInstancesShownCorrectly.png, > 2-ZoomedSubDAG-NoTaskInstances-v1.8.png, > 3-ZoomedSubDAG-TaskInstances-v1.7.1.3.png, test_subdag.py > > > In previous Airflow versions, results for tasks executed as a subdag were > written as rows to task_instances. In Airflow 1.8 only rows for tasks inside > the top-level DAG (non-subdag tasks) seem to get written to the database. > This results in being unable to check the status of task instances inside the > subdag from the UI, check the logs for those task instances from the UI, etc. > Attached is a simple test DAG that exhibits the issue along with screenshots > showing the UI differences between v1.8 and v1.7.1.3. > Note that if the DAG is run via backfill from
[jira] [Reopened] (AIRFLOW-1006) Move configuration templates to separate files
[ https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin reopened AIRFLOW-1006: - > Move configuration templates to separate files > -- > > Key: AIRFLOW-1006 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1006 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Affects Versions: Airflow 1.8 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: 1.9.0 > > > Currently both the default and test configuration templates are just strings > inside configuration.py. This makes them difficult to work with. It would be > much better to expose them as separate files, "default_airflow.cfg" and > "default_test.cfg", to make it clear they are distinct config templates. -- This message was sent by Atlassian JIRA (v6.3.15#6346)