[jira] [Work started] (AIRFLOW-3412) Worker pods are not being deleted after termination
[ https://issues.apache.org/jira/browse/AIRFLOW-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-3412 started by Viktor. --- > Worker pods are not being deleted after termination > --- > > Key: AIRFLOW-3412 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3412 > Project: Apache Airflow > Issue Type: Bug > Components: executor, kubernetes >Affects Versions: 1.10.0 >Reporter: Viktor >Assignee: Viktor >Priority: Major > Fix For: 1.10.2 > > > When using KubernetesExecutor multiple pods are spawned for tasks. > When their job is done they are not deleted automatically even if you specify > *delete_worker_pods=true* in the Airflow configuration and RBAC is properly > configured to allow the scheduler to delete pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs
XD-DENG commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs URL: https://github.com/apache/incubator-airflow/pull/2635#issuecomment-443115477 The weird thing is only the two `backedn-sqlite` tests are affected. The time needed by `mysql`/`postgres` tests remain roughly the same. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs
ashb commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs URL: https://github.com/apache/incubator-airflow/pull/2635#issuecomment-443113751 We probably need to not disable loading all DAGs and instead only specific dags in many tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KevinYang21 commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
KevinYang21 commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#issuecomment-443084658 @feng-tao lol I'm oncall too :D Don't worry about it, we got plenty of committers ;) Ty for being caring. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
feng-tao commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#issuecomment-443084443 @KevinYang21, I am oncall this week. will take a look next week if this pr is still open This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs
XD-DENG commented on issue #2635: [AIRFLOW-1561] Fix scheduler to pick up example DAGs without other DAGs URL: https://github.com/apache/incubator-airflow/pull/2635#issuecomment-44307 Hi @Fokko @kaxil , seems the Travis CI testing time needed for the two `backend-sqlite` tests increased from ~30 minutes to ~40 minutes since this PR is merged. May worth having a check. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
codecov-io edited a comment on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#issuecomment-442794666 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=h1) Report > Merging [#4253](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d1d612e5c78c89487a05ab3cc86f7d3472ac2a5d?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `85.71%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4253/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4253 +/- ## == - Coverage 78.07% 78.07% -0.01% == Files 201 201 Lines 1645516458 +3 == + Hits1284812850 +2 - Misses 3607 3608 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `80.41% <100%> (ø)` | :arrow_up: | | [airflow/logging\_config.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9sb2dnaW5nX2NvbmZpZy5weQ==) | `97.56% <100%> (+0.06%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.67% <66.66%> (+0.14%)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.29% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=footer). Last update [d1d612e...eecc75a](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KevinYang21 commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
KevinYang21 commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#discussion_r237725457 ## File path: airflow/logging_config.py ## @@ -73,7 +73,7 @@ def configure_logging(): validate_logging_config(logging_config) -return logging_config +return logging_config, logging_class_path Review comment: @XD-DENG Thank you for reviewing. I actually don't think the return value is used anywhere. I was playing it safe by keep the original return value. After you mentioned, I think we can just remove that return value--we can always add it back if it is needed in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#discussion_r237724308 ## File path: airflow/logging_config.py ## @@ -73,7 +73,7 @@ def configure_logging(): validate_logging_config(logging_config) -return logging_config +return logging_config, logging_class_path Review comment: Thanks @KevinYang21 ! Is there any specific reason that we need to return `logging_config`? -- **Before this PR**: `configure_logging()` is invoked in `airflow/settings.py`, but seems what it returns is not used (its return value is not passed to any variable actually, https://github.com/apache/incubator-airflow/pull/4253/files#diff-535b21d0d7b5e33649e0eb521b41f157L264). **After this PR**: `logging_class_path` is returned by `configure_logging()`, then used in `airflow/utils/dag_processing.py`. But `logging_config` is still not used anywhere. -- So I'm thinking if we really need to include `logging_config` in the return values? Kindly let me know if I missed anything (quite likely). Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#discussion_r237724308 ## File path: airflow/logging_config.py ## @@ -73,7 +73,7 @@ def configure_logging(): validate_logging_config(logging_config) -return logging_config +return logging_config, logging_class_path Review comment: Thanks @KevinYang21 ! Is there any specific reason that we need to return `logging_config`? **Before this PR**: `configure_logging()` is invoked in `airflow/settings.py`, but seems what it returns is not used (its return value is not passed to any variable actually, https://github.com/apache/incubator-airflow/pull/4253/files#diff-535b21d0d7b5e33649e0eb521b41f157L264). **After this PR**: `logging_class_path` is returned by `configure_logging()`, then used in `airflow/utils/dag_processing.py`. But `logging_config` is still not used anywhere. So I'm thinking if we really need to include `logging_config` in the return values? Kindly let me know if I missed anything (quite likely). Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
XD-DENG commented on a change in pull request #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#discussion_r237724308 ## File path: airflow/logging_config.py ## @@ -73,7 +73,7 @@ def configure_logging(): validate_logging_config(logging_config) -return logging_config +return logging_config, logging_class_path Review comment: Thanks @KevinYang21 ! Is there any specific reason that we need to return `logging_config`? Before this PR: `configure_logging()` is invoked in `airflow/settings.py`, but seems what it returns is not used. After this PR: `logging_class_path` is returned by `configure_logging()`, then used in `airflow/utils/dag_processing.py`. But `logging_config` is still not used anywhere. Kindly let me know if I missed anything. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-3424) Task not getting retried when using Kubernetes Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand closed AIRFLOW-3424. -- Resolution: Invalid > Task not getting retried when using Kubernetes Executor > --- > > Key: AIRFLOW-3424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3424 > Project: Apache Airflow > Issue Type: Bug > Components: kubernetes >Affects Versions: 1.10.1 >Reporter: Anand >Priority: Major > Attachments: sample_dag.py > > > Steps to reproduce- > Use airflow K8 executor and deploy server and scheduler on K8. > # Create a dag with task having retries (dag attached) > # Enable and trigger the dag. > # Scheduler spins up a new pod for the task. > # Kill the Scheduler and then Worker Pod, check that task instance state is > marked failed/up_for_retry in DB. > # Restart the scheduler. > # The failed task is not picked up again. > Expected- Failed task should be retried. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3424) Task not getting retried when using Kubernetes Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand updated AIRFLOW-3424: --- Description: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries (dag attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Scheduler and then Worker Pod, check that task instance state is marked failed/up_for_retry in DB. # Restart the scheduler. # The failed task is not picked up again. Expected- Failed task should be retried. was: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries (dag attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Scheduler and then Worker Pod, check that task instance state is marked failed in DB. # Restart the scheduler. # The failed task is not picked up again. Expected- Failed task should be retried. > Task not getting retried when using Kubernetes Executor > --- > > Key: AIRFLOW-3424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3424 > Project: Apache Airflow > Issue Type: Bug > Components: kubernetes >Affects Versions: 1.10.1 >Reporter: Anand >Priority: Major > Attachments: sample_dag.py > > > Steps to reproduce- > Use airflow K8 executor and deploy server and scheduler on K8. > # Create a dag with task having retries (dag attached) > # Enable and trigger the dag. > # Scheduler spins up a new pod for the task. > # Kill the Scheduler and then Worker Pod, check that task instance state is > marked failed/up_for_retry in DB. > # Restart the scheduler. > # The failed task is not picked up again. > Expected- Failed task should be retried. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3424) Task not getting retried when using Kubernetes Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand updated AIRFLOW-3424: --- Description: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries and retry delay as 5 minutes(attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Scheduler and then Worker Pod, check that task instance state is marked failed in DB. # Restart the scheduler. # The failed task is not picked up again. Expected- Failed task should be retried. was: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries and retry delay as 5 minutes(attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Worker Pod, check that task instance is marked failed in DB. # Restart the scheduler within retry delay period. # The failed task is not picked up again. Expected- Failed task should be retried. > Task not getting retried when using Kubernetes Executor > --- > > Key: AIRFLOW-3424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3424 > Project: Apache Airflow > Issue Type: Bug > Components: kubernetes >Affects Versions: 1.10.1 >Reporter: Anand >Priority: Major > Attachments: sample_dag.py > > > Steps to reproduce- > Use airflow K8 executor and deploy server and scheduler on K8. > # Create a dag with task having retries and retry delay as 5 > minutes(attached) > # Enable and trigger the dag. > # Scheduler spins up a new pod for the task. > # Kill the Scheduler and then Worker Pod, check that task instance state is > marked failed in DB. > # Restart the scheduler. > # The failed task is not picked up again. > Expected- Failed task should be retried. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3424) Task not getting retried when using Kubernetes Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand updated AIRFLOW-3424: --- Description: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries (dag attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Scheduler and then Worker Pod, check that task instance state is marked failed in DB. # Restart the scheduler. # The failed task is not picked up again. Expected- Failed task should be retried. was: Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries and retry delay as 5 minutes(attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Scheduler and then Worker Pod, check that task instance state is marked failed in DB. # Restart the scheduler. # The failed task is not picked up again. Expected- Failed task should be retried. > Task not getting retried when using Kubernetes Executor > --- > > Key: AIRFLOW-3424 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3424 > Project: Apache Airflow > Issue Type: Bug > Components: kubernetes >Affects Versions: 1.10.1 >Reporter: Anand >Priority: Major > Attachments: sample_dag.py > > > Steps to reproduce- > Use airflow K8 executor and deploy server and scheduler on K8. > # Create a dag with task having retries (dag attached) > # Enable and trigger the dag. > # Scheduler spins up a new pod for the task. > # Kill the Scheduler and then Worker Pod, check that task instance state is > marked failed in DB. > # Restart the scheduler. > # The failed task is not picked up again. > Expected- Failed task should be retried. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3414) reload_module not working with custom logging class
[ https://issues.apache.org/jira/browse/AIRFLOW-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Yang updated AIRFLOW-3414: Issue Type: Bug (was: Improvement) > reload_module not working with custom logging class > --- > > Key: AIRFLOW-3414 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3414 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.2 >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Major > > If using custom logging class, the reload_module in dag_processing.py will > fail because it will try to reload default logging class, which is not loaded > at the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] KevinYang21 commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
KevinYang21 commented on issue #4253: [AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#issuecomment-443021672 @Fokko @ashb @feng-tao @XD-DENG @kaxil @saguziel @aoen @YingboWang PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3424) Task not getting retried when using Kubernetes Executor
Anand created AIRFLOW-3424: -- Summary: Task not getting retried when using Kubernetes Executor Key: AIRFLOW-3424 URL: https://issues.apache.org/jira/browse/AIRFLOW-3424 Project: Apache Airflow Issue Type: Bug Components: kubernetes Affects Versions: 1.10.1 Reporter: Anand Attachments: sample_dag.py Steps to reproduce- Use airflow K8 executor and deploy server and scheduler on K8. # Create a dag with task having retries and retry delay as 5 minutes(attached) # Enable and trigger the dag. # Scheduler spins up a new pod for the task. # Kill the Worker Pod, check that task instance is marked failed in DB. # Restart the scheduler within retry delay period. # The failed task is not picked up again. Expected- Failed task should be retried. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3423) Fix unauthenticated access for mongo hook
Marcin Szymanski created AIRFLOW-3423: - Summary: Fix unauthenticated access for mongo hook Key: AIRFLOW-3423 URL: https://issues.apache.org/jira/browse/AIRFLOW-3423 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.1 Reporter: Marcin Szymanski Assignee: Marcin Szymanski -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6
[ https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703822#comment-16703822 ] Ash Berlin-Taylor commented on AIRFLOW-3422: Dang :( Does it work if a different timezone is used? What version of Pendulum and pytz do you have installed please? > Infinite loops during springtime DST transitions on python 3.6 > -- > > Key: AIRFLOW-3422 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3422 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.1 >Reporter: Till Heistermann >Priority: Major > > Automatic DST transitions can cause dags to be stuck in an infinite loop, if > they happen to be scheduled in the "skipped" hour during a springtime DST > transition. > The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does > not seem to work for python 3.6, only for 3.5 and 2.7. > Example to reproduce (current master, python 3.6): > {code:java} > import pendulum > from datetime import datetime > from airflow.utils.timezone import make_aware > from airflow.models import DAG > nsw = pendulum.Timezone.load("Australia/Sydney") > dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw) > dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] rcorre commented on issue #3546: AIRFLOW-2664: Support filtering dag runs by id prefix in API.
rcorre commented on issue #3546: AIRFLOW-2664: Support filtering dag runs by id prefix in API. URL: https://github.com/apache/incubator-airflow/pull/3546#issuecomment-442988770 @xnuinside sorry for the delay, I've been slammed at work but I do intend to follow up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aoen commented on issue #3925: [AIRFLOW-3091]Deactivate inactive DAGs after collecting DagBag
aoen commented on issue #3925: [AIRFLOW-3091]Deactivate inactive DAGs after collecting DagBag URL: https://github.com/apache/incubator-airflow/pull/3925#issuecomment-442986622 Closed for the reasons mentioned, I had a diff to fix this but I'm having trouble finding it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3091) Deactivate inactive DAG in collecting DagBag
[ https://issues.apache.org/jira/browse/AIRFLOW-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703795#comment-16703795 ] ASF GitHub Bot commented on AIRFLOW-3091: - aoen closed pull request #3925: [AIRFLOW-3091]Deactivate inactive DAGs after collecting DagBag URL: https://github.com/apache/incubator-airflow/pull/3925 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 1e4949e563..6c40fc5c72 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -272,6 +272,7 @@ def __init__( 'example_dags') self.collect_dags(example_dag_folder) self.collect_dags(dag_folder) +self.deactivate_inactive_dags() def size(self): """ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Deactivate inactive DAG in collecting DagBag > > > Key: AIRFLOW-3091 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3091 > Project: Apache Airflow > Issue Type: Improvement > Components: ui, webserver >Affects Versions: 1.8.1, 1.8.2, 1.9.0, 1.10.0 >Reporter: Yi Wei >Assignee: Yi Wei >Priority: Minor > > Airflow still displays all DAGs in admin UI, despite the fact that many DAGs > are already inactive. We should clean them up to avoid potential confusion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] aoen closed pull request #3925: [AIRFLOW-3091]Deactivate inactive DAGs after collecting DagBag
aoen closed pull request #3925: [AIRFLOW-3091]Deactivate inactive DAGs after collecting DagBag URL: https://github.com/apache/incubator-airflow/pull/3925 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/models.py b/airflow/models.py index 1e4949e563..6c40fc5c72 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -272,6 +272,7 @@ def __init__( 'example_dags') self.collect_dags(example_dag_folder) self.collect_dags(dag_folder) +self.deactivate_inactive_dags() def size(self): """ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6
[ https://issues.apache.org/jira/browse/AIRFLOW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Heistermann updated AIRFLOW-3422: -- Component/s: scheduler > Infinite loops during springtime DST transitions on python 3.6 > -- > > Key: AIRFLOW-3422 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3422 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.1 >Reporter: Till Heistermann >Priority: Major > > Automatic DST transitions can cause dags to be stuck in an infinite loop, if > they happen to be scheduled in the "skipped" hour during a springtime DST > transition. > The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does > not seem to work for python 3.6, only for 3.5 and 2.7. > Example to reproduce (current master, python 3.6): > {code:java} > import pendulum > from datetime import datetime > from airflow.utils.timezone import make_aware > from airflow.models import DAG > nsw = pendulum.Timezone.load("Australia/Sydney") > dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw) > dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > dt = dag.following_schedule(dt); print(dt) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3421) CSS issue on Airflow 1.10 Tree view UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adityan updated AIRFLOW-3421: - Description: After upgrading to airflow 1.10, hovering the mouse over the rectangles in tree view, causes the tooltip to popup much higher than it used to be. In the screenshot attached, the right lowermost rectangle is the one that is being hovered over. Once you scroll down on the tree view, the tooltip starts floating up. Things I have tried to fix this behavior: 1. Change the css themes in webserver_config.py and restart the web server 2. Inspected the tooltip in Chrome, it seems to be a dynamically generated CSS class. The CSS class controlling this behavior seem to be the same in Airflow 1.9. *NOTE: This issue only happens with rbac is set to True in airflow.cfg. If you turn off rbac, then this issue doesn't occur. Also, the dag needs to be sufficiently large (vertically) so that you need to scroll with your mouse for this issue to occur.* was: After upgrading to airflow 1.10, hovering the mouse over the rectangles in tree view, causes the tooltip to popup much higher than it used to be. In the screenshot attached, the right lowermost rectangle is the one that is being hovered over. Once you scroll down on the tree view, the tooltip starts floating up. Things I have tried to fix this behavior: 1. Change the css themes in webserver_config.py and restart the web server 2. Inspected the tooltip in Chrome, it seems to be a dynamically generated CSS class. The CSS class controlling this behavior seem to be the same in Airflow 1.9. > CSS issue on Airflow 1.10 Tree view UI > -- > > Key: AIRFLOW-3421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3421 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.10.0, 1.10.1 >Reporter: Adityan >Priority: Minor > Labels: beginner > Attachments: Screen Shot 2018-11-07 at 1.27.05 PM.png > > > After upgrading to airflow 1.10, hovering the mouse over the rectangles in > tree view, causes the tooltip to popup much higher than it used to be. In the > screenshot attached, the right lowermost rectangle is the one that is being > hovered over. > > Once you scroll down on the tree view, the tooltip starts floating up. > > Things I have tried to fix this behavior: > 1. Change the css themes in webserver_config.py and restart the web server > 2. Inspected the tooltip in Chrome, it seems to be a dynamically generated > CSS class. The CSS class controlling this behavior seem to be the same in > Airflow 1.9. > > *NOTE: This issue only happens with rbac is set to True in airflow.cfg. If > you turn off rbac, then this issue doesn't occur. Also, the dag needs to be > sufficiently large (vertically) so that you need to scroll with your mouse > for this issue to occur.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3422) Infinite loops during springtime DST transitions on python 3.6
Till Heistermann created AIRFLOW-3422: - Summary: Infinite loops during springtime DST transitions on python 3.6 Key: AIRFLOW-3422 URL: https://issues.apache.org/jira/browse/AIRFLOW-3422 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.1 Reporter: Till Heistermann Automatic DST transitions can cause dags to be stuck in an infinite loop, if they happen to be scheduled in the "skipped" hour during a springtime DST transition. The fix introduced in https://issues.apache.org/jira/browse/AIRFLOW-3277 does not seem to work for python 3.6, only for 3.5 and 2.7. Example to reproduce (current master, python 3.6): {code:java} import pendulum from datetime import datetime from airflow.utils.timezone import make_aware from airflow.models import DAG nsw = pendulum.Timezone.load("Australia/Sydney") dt = make_aware(datetime(2018, 10, 3, 2, 30), nsw) dag = DAG("id", schedule_interval="30 2 * * *", start_date=dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) dt = dag.following_schedule(dt); print(dt) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703594#comment-16703594 ] Yuri Bendana edited comment on AIRFLOW-3415 at 11/29/18 7:33 PM: - [~ashb], I added it. was (Author: ybendana): [~ashb], I attached it. > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example > [gist|https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]. For > the purposes here, you can ignore the DagRunSensor task. On the first > iteration the 'sleeper' dag gets triggered but on the second iteration I see a > {noformat} > TypeError: 'NoneType' object is not callable{noformat} > To workaround this, I have to copy the import (in this case trigger_dag) > inside the loop. > Here is the stacktrace: > {code:java} > [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; > marking task as FAILED > [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 Traceback (most recent call last): > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 args.func(args) > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, > in wrapper > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return f(*args, **kwargs) > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in > run > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 _run(args, dag, ti) > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in > _run > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 pool=args.pool, > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return func(*args, **kwargs) > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in > _run_raw_task > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 result = task_copy.execute(context=context) > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 95, in execute > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return_value = self.execute_callable() > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 100, in execute_callable > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in > triggersleeper > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 tdr = trigger_dag('sleeper') > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 TypeError: 'NoneType' object is not callable > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3421) CSS issue on Airflow 1.10 Tree view UI
Adityan created AIRFLOW-3421: Summary: CSS issue on Airflow 1.10 Tree view UI Key: AIRFLOW-3421 URL: https://issues.apache.org/jira/browse/AIRFLOW-3421 Project: Apache Airflow Issue Type: Bug Components: webserver Affects Versions: 1.10.1, 1.10.0 Reporter: Adityan Attachments: Screen Shot 2018-11-07 at 1.27.05 PM.png After upgrading to airflow 1.10, hovering the mouse over the rectangles in tree view, causes the tooltip to popup much higher than it used to be. In the screenshot attached, the right lowermost rectangle is the one that is being hovered over. Once you scroll down on the tree view, the tooltip starts floating up. Things I have tried to fix this behavior: 1. Change the css themes in webserver_config.py and restart the web server 2. Inspected the tooltip in Chrome, it seems to be a dynamically generated CSS class. The CSS class controlling this behavior seem to be the same in Airflow 1.9. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Description: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [gist|https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return f(*args, **kwargs) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 _run(args, dag, ti) [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in _run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 pool=args.pool, [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return func(*args, **kwargs) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in _run_raw_task [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 result = task_copy.execute(context=context) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 95, in execute [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return_value = self.execute_callable() [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 100, in execute_callable [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in triggersleeper [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 tdr = trigger_dag('sleeper') [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 TypeError: 'NoneType' object is not callable {code} was: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [gist|[https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]]. {{[Atlassian|http://atlassian.com] }}For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File
[GitHub] ksolan opened a new pull request #4257: Ksolan/pendulum 2.x
ksolan opened a new pull request #4257: Ksolan/pendulum 2.x URL: https://github.com/apache/incubator-airflow/pull/4257 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Description: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [gist|[https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]]. {{[Atlassian|http://atlassian.com] }}For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return f(*args, **kwargs) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 _run(args, dag, ti) [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in _run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 pool=args.pool, [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return func(*args, **kwargs) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in _run_raw_task [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 result = task_copy.execute(context=context) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 95, in execute [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return_value = self.execute_callable() [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 100, in execute_callable [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in triggersleeper [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 tdr = trigger_dag('sleeper') [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 TypeError: 'NoneType' object is not callable {code} was: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [gist|[https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Description: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [gist|[https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9]]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return f(*args, **kwargs) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 _run(args, dag, ti) [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in _run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 pool=args.pool, [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return func(*args, **kwargs) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in _run_raw_task [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 result = task_copy.execute(context=context) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 95, in execute [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return_value = self.execute_callable() [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 100, in execute_callable [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in triggersleeper [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 tdr = trigger_dag('sleeper') [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 TypeError: 'NoneType' object is not callable {code} was: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [ gist | [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74,
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Attachment: (was: triggerdag_stacktrace.txt) > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example [ gist | > [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For > the purposes here, you can ignore the DagRunSensor task. On the first > iteration the 'sleeper' dag gets triggered but on the second iteration I see a > {noformat} > TypeError: 'NoneType' object is not callable{noformat} > To workaround this, I have to copy the import (in this case trigger_dag) > inside the loop. > Here is the stacktrace: > {code:java} > [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; > marking task as FAILED > [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 Traceback (most recent call last): > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 args.func(args) > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, > in wrapper > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return f(*args, **kwargs) > [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in > run > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 _run(args, dag, ti) > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in > _run > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 pool=args.pool, > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return func(*args, **kwargs) > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in > _run_raw_task > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 result = task_copy.execute(context=context) > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 95, in execute > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return_value = self.execute_callable() > [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File > "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 100, in execute_callable > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in > triggersleeper > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 tdr = trigger_dag('sleeper') > [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: > Subtask t1 TypeError: 'NoneType' object is not callable > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Description: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [ gist | [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. Here is the stacktrace: {code:java} [2018-11-28 18:23:42,492] {{models.py:1789}} INFO - All retries failed; marking task as FAILED [2018-11-28 18:23:42,517] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 Traceback (most recent call last): [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/bin/airflow", line 32, in [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 args.func(args) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return f(*args, **kwargs) [2018-11-28 18:23:42,518] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 490, in run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 _run(args, dag, ti) [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/bin/cli.py", line 406, in _run [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 pool=args.pool, [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper [2018-11-28 18:23:42,519] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return func(*args, **kwargs) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/models.py", line 1659, in _run_raw_task [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 result = task_copy.execute(context=context) [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 95, in execute [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return_value = self.execute_callable() [2018-11-28 18:23:42,520] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/venv/sgmo/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 100, in execute_callable [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 return self.python_callable(*self.op_args, **self.op_kwargs) [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 File "/opt/airflow/dags/sleepersensor.py", line 11, in triggersleeper [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 tdr = trigger_dag('sleeper') [2018-11-28 18:23:42,521] {{base_task_runner.py:101}} INFO - Job 1354170: Subtask t1 TypeError: 'NoneType' object is not callable {code} was: When triggering dagruns in a loop, the imported references become null on the second iteration. Here is an example [ gist | [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For the purposes here, you can ignore the DagRunSensor task. On the first iteration the 'sleeper' dag gets triggered but on the second iteration I see a {noformat} TypeError: 'NoneType' object is not callable{noformat} To workaround this, I have to copy the import (in this case trigger_dag) inside the loop. > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > Attachments: triggerdag_stacktrace.txt > > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example [ gist | >
[jira] [Commented] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703594#comment-16703594 ] Yuri Bendana commented on AIRFLOW-3415: --- [~ashb], I attached it. > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > Attachments: triggerdag_stacktrace.txt > > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example [ gist | > [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For > the purposes here, you can ignore the DagRunSensor task. On the first > iteration the 'sleeper' dag gets triggered but on the second iteration I see a > {noformat} > TypeError: 'NoneType' object is not callable{noformat} > To workaround this, I have to copy the import (in this case trigger_dag) > inside the loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Bendana updated AIRFLOW-3415: -- Attachment: triggerdag_stacktrace.txt > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > Attachments: triggerdag_stacktrace.txt > > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example [ gist | > [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For > the purposes here, you can ignore the DagRunSensor task. On the first > iteration the 'sleeper' dag gets triggered but on the second iteration I see a > {noformat} > TypeError: 'NoneType' object is not callable{noformat} > To workaround this, I have to copy the import (in this case trigger_dag) > inside the loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3420) Kubernetes executor sync floods the logs
Daniel Mateus Pires created AIRFLOW-3420: Summary: Kubernetes executor sync floods the logs Key: AIRFLOW-3420 URL: https://issues.apache.org/jira/browse/AIRFLOW-3420 Project: Apache Airflow Issue Type: Bug Reporter: Daniel Mateus Pires Hi there, I'm using Airflow on Kubernetes with the Kubernetes Executor and I'm getting flooded by: [2018-11-29 17:17:23,427] \{kubernetes_executor.py:570} INFO - self.running: {('', '', datetime.datetime(2018, 11, 29, 17, 6, 32, 935655, tzinfo=)) This logging happens at a very high frequency (I think I'm getting in the order of 1000s of entries per second coming just from this line) Could we move this logging to a place that is called less frequently or set the LEVEL to DEBUG ? To give you more context, it renders my logs quite hard to browse / or even get through kubectl, I'm suspecting it is the cause of my scheduler instance pod being Evicted every few hours for DiskPressure on the imagefs Those logs come from [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/executors/kubernetes_executor.py#L593] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type
codecov-io commented on issue #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256#issuecomment-442901660 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=h1) Report > Merging [#4256](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/d1d612e5c78c89487a05ab3cc86f7d3472ac2a5d?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4256/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4256 +/- ## === Coverage 78.07% 78.07% === Files 201 201 Lines 1645516455 === Hits1284812848 Misses 3607 3607 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=footer). Last update [d1d612e...a59e0e8](https://codecov.io/gh/apache/incubator-airflow/pull/4256?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3419) S3_hook.select_key is broken on Python3
[ https://issues.apache.org/jira/browse/AIRFLOW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3419: --- Summary: S3_hook.select_key is broken on Python3 (was: S3_hook.select_key is broken) > S3_hook.select_key is broken on Python3 > --- > > Key: AIRFLOW-3419 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3419 > Project: Apache Airflow > Issue Type: Bug > Components: boto3, hooks >Affects Versions: 1.10.1 >Reporter: Maria Rebelka >Priority: Major > > Hello, > Using select_key throws an error: > {quote}text = S3Hook('aws_conn').select_key(key='my_key', > bucket_name='my_bucket', > expression='SELECT * FROM S3Object s', > expression_type='SQL', > input_serialization={'JSON': \{'Type': > 'DOCUMENT'}}, > output_serialization={'JSON': {}}){quote} > Traceback (most recent call last): > {quote} File "db.py", line 31, in > output_serialization={'JSON': {}}) > File "/usr/local/lib/python3.4/site-packages/airflow/hooks/S3_hook.py", > line 262, in select_key > for event in response['Payload'] > TypeError: sequence item 0: expected str instance, bytes found{quote} > Seems that the problem is in this line: > S3_hook.py, line 262: return ''.join(event['Records']['Payload'] > which probably should be return > ''.join(event['Records']['Payload'].decode('utf-8') > From example in Amazon blog: > https://aws.amazon.com/blogs/aws/s3-glacier-select/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3419) S3_hook.select_key is broken
Maria Rebelka created AIRFLOW-3419: -- Summary: S3_hook.select_key is broken Key: AIRFLOW-3419 URL: https://issues.apache.org/jira/browse/AIRFLOW-3419 Project: Apache Airflow Issue Type: Bug Components: boto3, hooks Affects Versions: 1.10.1 Reporter: Maria Rebelka Hello, Using select_key throws an error: {quote}text = S3Hook('aws_conn').select_key(key='my_key', bucket_name='my_bucket', expression='SELECT * FROM S3Object s', expression_type='SQL', input_serialization={'JSON': \{'Type': 'DOCUMENT'}}, output_serialization={'JSON': {}}){quote} Traceback (most recent call last): {quote} File "db.py", line 31, in output_serialization={'JSON': {}}) File "/usr/local/lib/python3.4/site-packages/airflow/hooks/S3_hook.py", line 262, in select_key for event in response['Payload'] TypeError: sequence item 0: expected str instance, bytes found{quote} Seems that the problem is in this line: S3_hook.py, line 262: return ''.join(event['Records']['Payload'] which probably should be return ''.join(event['Records']['Payload'].decode('utf-8') >From example in Amazon blog: https://aws.amazon.com/blogs/aws/s3-glacier-select/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[01/38] incubator-airflow-site git commit: Docs from 1.10.1
Repository: incubator-airflow-site Updated Branches: refs/heads/asf-site 7d4d76286 -> 1f06fa0e0 http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/security.html -- diff --git a/security.html b/security.html index 6bf00b5..fa4fa71 100644 --- a/security.html +++ b/security.html @@ -206,6 +206,13 @@ SSH tunnels. It is however possible to switch on authentication by either using one of the supplied backends or creating your own. Be sure to checkout Experimental Rest API for securing the API. + +Note +Airflow uses the config parser of Python. This config parser interpolates +â%â-signs. Make sure escape any % signs in your config file (but not +environment variables) as %%, otherwise Airflow might leak these +passwords on a config parser exception to a log. + Web Authentication¶ @@ -244,8 +251,7 @@ Type help, copyright& LDAP¶ To turn on LDAP authentication configure your airflow.cfg as follows. Please note that the example uses -an encrypted connection to the ldap server as you probably do not want passwords be readable on the network level. -It is however possible to configure without encryption if you really want to. +an encrypted connection to the ldap server as we do not want passwords be readable on the network level. Additionally, if you are using Active Directory, and are not explicitly specifying an OU that your users are in, you will need to change search_scope to âSUBTREEâ. Valid search_scope options can be found in the http://ldap3.readthedocs.org/searches.html?highlight=search_scope;>ldap3 Documentation @@ -439,7 +445,7 @@ backend. In order to setup an application: Google Authentication¶ The Google authentication backend can be used to authenticate users -against Google using OAuth2. You must specify the email domains to restrict +against Google using OAuth2. You must specify the domains to restrict login, separated with a comma, to only members of those domains. [webserver] authenticate = True @@ -488,10 +494,10 @@ standard port 443, youâll need to configure that too. Be aware that super user Enable CeleryExecutor with SSL. Ensure you properly generate client and server certs and keys. [celery] -CELERY_SSL_ACTIVE = True -CELERY_SSL_KEY = path to key -CELERY_SSL_CERT = path to cert -CELERY_SSL_CACERT = path to cacert +ssl_active = True +ssl_key = path to key +ssl_cert = path to cert +ssl_cacert = path to cacert @@ -560,20 +566,13 @@ not set. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'./', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/start.html -- diff --git a/start.html b/start.html index b5fdcdb..1846ed2 100644 --- a/start.html +++ b/start.html @@ -253,20 +253,13 @@ airflow backfill example_bash_operator -s 2015-01-01 -e < - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'./', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/timezone.html -- diff --git a/timezone.html b/timezone.html index 587f7d6..7fc2526 100644 --- a/timezone.html +++ b/timezone.html @@ -266,6 +266,10 @@ recommended to use pendulum for this, but pytz (to be print(dag.timezone) # Timezone [Europe/Amsterdam] +Please note that while it is possible to set a start_date and end_date for Tasks always the DAG timezone +or global timezone (in that order) will be used to calculate the next execution date. Upon first encounter +the start date or end date will be converted to UTC using the timezone associated with start_date or end_date, +then for calculations this timezone information will be disregarded. Templates¶ Airflow returns time zone aware datetimes in templates, but does not convert them to local time so they remain in UTC. @@ -331,20 +335,13 @@ be taken into account. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'./', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + +
[11/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/cli.html -- diff --git a/cli.html b/cli.html index 7db5974..88a75ad 100644 --- a/cli.html +++ b/cli.html @@ -91,20 +91,24 @@ Command Line Interface Positional Arguments Sub-commands: -resetdb +version +initdb +upgradedb +delete_dag +Positional Arguments Named Arguments -render -Positional Arguments +task_state +Positional Arguments Named Arguments -variables +list_dags Named Arguments -connections +resetdb Named Arguments @@ -112,95 +116,91 @@ Named Arguments -pause -Positional Arguments +webserver Named Arguments -task_failed_deps -Positional Arguments +pool Named Arguments -version -trigger_dag -Positional Arguments +scheduler Named Arguments -initdb -test -Positional Arguments +serve_logs +clear +Positional Arguments Named Arguments -unpause -Positional Arguments +trigger_dag +Positional Arguments Named Arguments -dag_state -Positional Arguments +test +Positional Arguments Named Arguments -run -Positional Arguments +connections Named Arguments -list_tasks -Positional Arguments +worker Named Arguments -backfill -Positional Arguments +kerberos +Positional Arguments Named Arguments -list_dags +pause +Positional Arguments Named Arguments -kerberos -Positional Arguments +task_failed_deps +Positional Arguments Named Arguments -worker +render +Positional Arguments Named Arguments -webserver +run +Positional Arguments Named Arguments -flower +list_tasks +Positional Arguments Named Arguments -scheduler +backfill +Positional Arguments Named Arguments -task_state -Positional Arguments +dag_state +Positional Arguments Named Arguments -pool +variables Named Arguments -serve_logs -clear -Positional Arguments +flower Named Arguments -upgradedb -delete_dag +unpause Positional Arguments Named Arguments @@ -288,7 +288,7 @@ many types of operation on a DAG, starting services, and supporting development and testing. usage: airflow [-h] - {resetdb,render,variables,connections,create_user,pause,task_failed_deps,version,trigger_dag,initdb,test,unpause,dag_state,run,list_tasks,backfill,list_dags,kerb eros,worker,webserver,flower,scheduler,task_state,pool,serve_logs,clear,upgradedb,delete_dag} + {version,initdb,upgradedb,delete_dag,task_state,list_dags,resetdb,create_user,webserver,pool,scheduler,serve_logs,clear,trigger_dag,test,connections,worker,kerbe ros,pause,task_failed_deps,render,run,list_tasks,backfill,dag_state,variables,flower,unpause} ... @@ -300,7 +300,7 @@ development and testing. subcommand -Possible choices: resetdb, render, variables, connections, create_user, pause, task_failed_deps, version, trigger_dag, initdb, test, unpause, dag_state, run, list_tasks, backfill, list_dags, kerberos, worker, webserver, flower, scheduler, task_state, pool, serve_logs, clear, upgradedb, delete_dag +Possible choices: version, initdb, upgradedb, delete_dag, task_state, list_dags, resetdb, create_user, webserver, pool, scheduler, serve_logs, clear, trigger_dag, test, connections, worker, kerberos, pause, task_failed_deps, render, run, list_tasks, backfill, dag_state, variables, flower, unpause sub-command help @@ -308,12 +308,45 @@ development and testing. Sub-commands:¶ - -resetdb¶ -Burn down and rebuild the metadata database -airflow resetdb [-h] [-y] + +version¶ +Show the version +airflow version [-h] + + + + +initdb¶ +Initialize the metadata database +airflow initdb [-h] + + + + +upgradedb¶ +Upgrade the metadata database to latest version +airflow upgradedb [-h] + + + + +delete_dag¶ +Delete all DB records related to the specified DAG +airflow delete_dag [-h] [-y] dag_id + +Positional Arguments¶ + + + + + +dag_id +The id of the dag + + + Named Arguments¶ @@ -329,14 +362,14 @@ development and testing. - -render¶ -Render a task instanceâs template(s) -airflow render [-h] [-sd SUBDIR] dag_id task_id execution_date + +task_state¶ +Get the status of a task instance +airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date - -Positional Arguments¶ + +Positional Arguments¶ @@ -361,18 +394,17 @@ development and testing. -sd, --subdir -File location or directory from which to look for the dag -Default: /Users/kaxil/airflow/dags +File location or directory from which to look for the dag. Defaults to â[AIRFLOW_HOME]/dagsâ where [AIRFLOW_HOME] is the value you set for âAIRFLOW_HOMEâ config you set in âairflow.cfgâ +Default: â[AIRFLOW_HOME]/dagsâ - -variables¶ -CRUD operations on variables -airflow variables [-h] [-s KEY VAL] [-g KEY] [-j] [-d VAL] [-i FILEPATH] - [-e FILEPATH] [-x KEY] + +list_dags¶ +List all the DAGs +airflow list_dags [-h] [-sd SUBDIR] [-r] @@ -382,40 +414,23 @@ development
[08/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/genindex.html -- diff --git a/genindex.html b/genindex.html index 71dfe50..f5fd1b9 100644 --- a/genindex.html +++ b/genindex.html @@ -215,12 +215,12 @@ are_dependents_done() (airflow.models.TaskInstance method) - await() (airflow.contrib.hooks.gcp_dataproc_hook.DataProcHook method) - AWSBatchOperator (class in airflow.contrib.operators.awsbatch_operator), [1] AwsDynamoDBHook (class in airflow.contrib.hooks.aws_dynamodb_hook) + AwsFirehoseHook (class in airflow.contrib.hooks.aws_firehose_hook) + AwsHook (class in airflow.contrib.hooks.aws_hook) AwsLambdaHook (class in airflow.contrib.hooks.aws_lambda_hook) @@ -255,6 +255,8 @@ BigQueryCreateExternalTableOperator (class in airflow.contrib.operators.bigquery_operator), [1] + BigQueryDeleteDatasetOperator (class in airflow.contrib.operators.bigquery_operator), [1] + BigQueryGetDataOperator (class in airflow.contrib.operators.bigquery_get_data), [1] BigQueryHook (class in airflow.contrib.hooks.bigquery_hook), [1] @@ -307,6 +309,8 @@ CassandraHook (class in airflow.contrib.hooks.cassandra_hook) + CassandraRecordSensor (class in airflow.contrib.sensors.cassandra_sensor) + CassandraToGoogleCloudStorageOperator (class in airflow.contrib.operators.cassandra_to_gcs) CeleryExecutor (class in airflow.executors.celery_executor) @@ -343,6 +347,16 @@ check_response() (airflow.hooks.http_hook.HttpHook method) + check_s3_url() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + check_status() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + check_training_config() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + check_training_status_with_log() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + check_tuning_config() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + CheckOperator (class in airflow.operators.check_operator) clear() (airflow.models.BaseOperator method), [1] @@ -367,20 +381,38 @@ CloudantHook (class in airflow.contrib.hooks.cloudant_hook) - cluster_status() (airflow.contrib.hooks.redshift_hook.RedshiftHook method), [1] + CloudSqlHook (class in airflow.contrib.hooks.gcp_sql_hook) - collect_dags() (airflow.models.DagBag method) + CloudSqlInstanceCreateOperator (class in airflow.contrib.operators.gcp_sql_operator) + + CloudSqlInstanceDatabaseCreateOperator (class in airflow.contrib.operators.gcp_sql_operator) + + CloudSqlInstanceDatabaseDeleteOperator (class in airflow.contrib.operators.gcp_sql_operator) + + CloudSqlInstanceDatabasePatchOperator (class in airflow.contrib.operators.gcp_sql_operator) + + CloudSqlInstanceDeleteOperator (class in airflow.contrib.operators.gcp_sql_operator) + + CloudSqlInstancePatchOperator (class in airflow.contrib.operators.gcp_sql_operator) + + cluster_status() (airflow.contrib.hooks.redshift_hook.RedshiftHook method), [1] + collect_dags() (airflow.models.DagBag method) + command() (airflow.models.TaskInstance method) command_as_list() (airflow.models.TaskInstance method) commit() (airflow.contrib.hooks.datastore_hook.DatastoreHook method), [1] + concurrency (airflow.models.DAG attribute) + concurrency_reached (airflow.models.DAG attribute) + configure_s3_resources() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + Connection (class in airflow.models) construct_api_call_params() (airflow.operators.slack_operator.SlackAPIOperator method) @@ -403,16 +435,24 @@ copy_expert() (airflow.hooks.postgres_hook.PostgresHook method) + copy_object() (airflow.hooks.S3_hook.S3Hook method), [1] + create() (airflow.models.DagStat static method) create_bucket() (airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook method), [1] + + +(airflow.hooks.S3_hook.S3Hook method), [1] + create_cluster() (airflow.contrib.hooks.gcp_container_hook.GKEClusterHook method), [1] create_cluster_snapshot() (airflow.contrib.hooks.redshift_hook.RedshiftHook method), [1] create_dagrun() (airflow.models.DAG method) + create_database() (airflow.contrib.hooks.gcp_sql_hook.CloudSqlHook method) + create_directory() (airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook method), [1] @@ -421,17 +461,33 @@ (airflow.contrib.hooks.sftp_hook.SFTPHook method) + create_endpoint() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + create_endpoint_config() (airflow.contrib.hooks.sagemaker_hook.SageMakerHook method) + + create_instance()
[02/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/searchindex.js -- diff --git a/searchindex.js b/searchindex.js index 1a79f0c..59f653f 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["api","cli","code","concepts","faq","howto/executor/use-celery","howto/executor/use-dask","howto/executor/use-mesos","howto/index","howto/initialize-database","howto/manage-connections","howto/operator","howto/run-with-systemd","howto/run-with-upstart","howto/secure-connections","howto/set-config","howto/use-test-config","howto/write-logs","index","installation","integration","kubernetes","license","lineage","plugins","profiling","project","scheduler","security","start","timezone","tutorial","ui"],envversion:52,filenames:["api.rst","cli.rst","code.rst","concepts.rst","faq.rst","howto/executor/use-celery.rst","howto/executor/use-dask.rst","howto/executor/use-mesos.rst","howto/index.rst","howto/initialize-database.rst","howto/manage-connections.rst","howto/operator.rst","howto/run-with-systemd.rst","howto/run-with-upstart.rst","howto/secure-connections.rst","howto/set-config.rst","howto/use-test-config.rst","howto/write-logs.rst","index.rst","installation.rs t","integration.rst","kubernetes.rst","license.rst","lineage.rst","plugins.rst","profiling.rst","project.rst","scheduler.rst","security.rst","start.rst","timezone.rst","tutorial.rst","ui.rst"],objects:{"airflow.contrib.executors.mesos_executor":{MesosExecutor:[2,0,1,""]},"airflow.contrib.executors.mesos_executor.MesosExecutor":{end:[2,1,1,""],execute_async:[2,1,1,""],start:[2,1,1,""],sync:[2,1,1,""]},"airflow.contrib.hooks.aws_dynamodb_hook":{AwsDynamoDBHook:[2,0,1,""]},"airflow.contrib.hooks.aws_dynamodb_hook.AwsDynamoDBHook":{write_batch_data:[2,1,1,""]},"airflow.contrib.hooks.aws_hook":{AwsHook:[2,0,1,""]},"airflow.contrib.hooks.aws_hook.AwsHook":{get_credentials:[2,1,1,""],get_session:[2,1,1,""]},"airflow.contrib.hooks.aws_lambda_hook":{AwsLambdaHook:[2,0,1,""]},"airflow.contrib.hooks.aws_lambda_hook.AwsLambdaHook":{invoke_lambda:[2,1,1,""]},"airflow.contrib.hooks.azure_data_lake_hook":{AzureDataLakeHook:[20,0,1,""]},"airflow.contrib.hooks.azure_data_lake_hook.AzureDataLakeHook" :{check_for_file:[20,1,1,""],download_file:[20,1,1,""],get_conn:[20,1,1,""],upload_file:[20,1,1,""]},"airflow.contrib.hooks.azure_fileshare_hook":{AzureFileShareHook:[20,0,1,""]},"airflow.contrib.hooks.azure_fileshare_hook.AzureFileShareHook":{check_for_directory:[20,1,1,""],check_for_file:[20,1,1,""],create_directory:[20,1,1,""],get_conn:[20,1,1,""],get_file:[20,1,1,""],get_file_to_stream:[20,1,1,""],list_directories_and_files:[20,1,1,""],load_file:[20,1,1,""],load_stream:[20,1,1,""],load_string:[20,1,1,""]},"airflow.contrib.hooks.bigquery_hook":{BigQueryHook:[20,0,1,""]},"airflow.contrib.hooks.bigquery_hook.BigQueryHook":{get_conn:[20,1,1,""],get_pandas_df:[20,1,1,""],get_service:[20,1,1,""],insert_rows:[20,1,1,""],table_exists:[20,1,1,""]},"airflow.contrib.hooks.cassandra_hook":{CassandraHook:[2,0,1,""]},"airflow.contrib.hooks.cassandra_hook.CassandraHook":{get_conn:[2,1,1,""],record_exists:[2,1,1,""],shutdown_cluster:[2,1,1,""]},"airflow.contrib.hooks.cloudant_hook":{CloudantHoo k:[2,0,1,""]},"airflow.contrib.hooks.cloudant_hook.CloudantHook":{db:[2,1,1,""]},"airflow.contrib.hooks.databricks_hook":{DatabricksHook:[2,0,1,""]},"airflow.contrib.hooks.databricks_hook.DatabricksHook":{submit_run:[2,1,1,""]},"airflow.contrib.hooks.datadog_hook":{DatadogHook:[2,0,1,""]},"airflow.contrib.hooks.datadog_hook.DatadogHook":{post_event:[2,1,1,""],query_metric:[2,1,1,""],send_metric:[2,1,1,""]},"airflow.contrib.hooks.datastore_hook":{DatastoreHook:[20,0,1,""]},"airflow.contrib.hooks.datastore_hook.DatastoreHook":{allocate_ids:[20,1,1,""],begin_transaction:[20,1,1,""],commit:[20,1,1,""],delete_operation:[20,1,1,""],export_to_storage_bucket:[20,1,1,""],get_conn:[20,1,1,""],get_operation:[20,1,1,""],import_from_storage_bucket:[20,1,1,""],lookup:[20,1,1,""],poll_operation_until_done:[20,1,1,""],rollback:[20,1,1,""],run_query:[20,1,1,""]},"airflow.contrib.hooks.discord_webhook_hook":{DiscordWebhookHook:[2,0,1,""]},"airflow.contrib.hooks.discord_webhook_hook.DiscordWebhookHook
[18/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/models.html -- diff --git a/_modules/airflow/models.html b/_modules/airflow/models.html index 6860ce6..49046b8 100644 --- a/_modules/airflow/models.html +++ b/_modules/airflow/models.html @@ -202,6 +202,7 @@ import logging import numbers import os +import pendulum import pickle import re import signal @@ -222,7 +223,9 @@ from sqlalchemy.ext.declarative import declarative_base, declared_attr from sqlalchemy.orm import reconstructor, relationship, synonym -from croniter import croniter +from croniter import ( +croniter, CroniterBadCronError, CroniterBadDateError, CroniterNotAlphaError +) import six from airflow import settings, utils @@ -302,6 +305,8 @@ :raises: AirflowException if theres a problem trying to load Fernet global _fernet +log = LoggingMixin().log + if _fernet: return _fernet try: @@ -310,18 +315,26 @@ InvalidFernetToken = InvalidToken except BuiltinImportError: -LoggingMixin().log.warn(cryptography not found - values will not be stored -encrypted., -exc_info=1) +log.warning( +cryptography not found - values will not be stored encrypted. +) _fernet = NullFernet() return _fernet try: -_fernet = Fernet(configuration.conf.get(core, FERNET_KEY).encode(utf-8)) -_fernet.is_encrypted = True -return _fernet +fernet_key = configuration.conf.get(core, FERNET_KEY) +if not fernet_key: +log.warning( +empty cryptography key - values will not be stored encrypted. +) +_fernet = NullFernet() +else: +_fernet = Fernet(fernet_key.encode(utf-8)) +_fernet.is_encrypted = True except (ValueError, TypeError) as ve: -raise AirflowException(Could not create Fernet object: {}.format(ve)) +raise AirflowException(Could not create Fernet object: {}.format(ve)) + +return _fernet # Used by DAG context_managers @@ -496,7 +509,8 @@ return found_dags mods = [] -if not zipfile.is_zipfile(filepath): +is_zipfile = zipfile.is_zipfile(filepath) +if not is_zipfile: if safe_mode and os.path.isfile(filepath): with open(filepath, rb) as f: content = f.read() @@ -568,13 +582,23 @@ if isinstance(dag, DAG): if not dag.full_filepath: dag.full_filepath = filepath -if dag.fileloc != filepath: +if dag.fileloc != filepath and not is_zipfile: dag.fileloc = filepath try: dag.is_subdag = False self.bag_dag(dag, parent_dag=dag, root_dag=dag) +if isinstance(dag._schedule_interval, six.string_types): +croniter(dag._schedule_interval) found_dags.append(dag) found_dags += dag.subdags +except (CroniterBadCronError, +CroniterBadDateError, +CroniterNotAlphaError) as cron_e: +self.log.exception(Failed to bag_dag: %s, dag.full_filepath) +self.import_errors[dag.full_filepath] = \ +Invalid Cron expression: + str(cron_e) +self.file_last_changed[dag.full_filepath] = \ +file_last_changed_on_disk except AirflowDagCycleException as cycle_exception: self.log.exception(Failed to bag_dag: %s, dag.full_filepath) self.import_errors[dag.full_filepath] = str(cycle_exception) @@ -669,10 +693,12 @@ Given a file path or a folder, this method looks for python modules, imports them and adds them to the dagbag collection. -Note that if a .airflowignore file is found while processing, -the directory, it will behaves much like a .gitignore does, +Note that if a ``.airflowignore`` file is found while processing +the directory, it will behave much like a ``.gitignore``, ignoring files that match any of the regex patterns specified -in the file. **Note**: The patterns in .airflowignore are treated as +in the file. + +**Note**: The patterns in .airflowignore are treated as un-anchored regexes, not shell-like glob patterns. start_dttm = timezone.utcnow() @@ -1268,10 +1294,10 @@ BASE_URL = configuration.conf.get(webserver, BASE_URL) if
[36/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/databricks_hook.html -- diff --git a/_modules/airflow/contrib/hooks/databricks_hook.html b/_modules/airflow/contrib/hooks/databricks_hook.html index 0d21a4a..0493901 100644 --- a/_modules/airflow/contrib/hooks/databricks_hook.html +++ b/_modules/airflow/contrib/hooks/databricks_hook.html @@ -185,6 +185,7 @@ from airflow.hooks.base_hook import BaseHook from requests import exceptions as requests_exceptions from requests.auth import AuthBase +from time import sleep from airflow.utils.log.logging_mixin import LoggingMixin @@ -193,6 +194,9 @@ except ImportError: import urlparse +RESTART_CLUSTER_ENDPOINT = (POST, api/2.0/clusters/restart) +START_CLUSTER_ENDPOINT = (POST, api/2.0/clusters/start) +TERMINATE_CLUSTER_ENDPOINT = (POST, api/2.0/clusters/delete) SUBMIT_RUN_ENDPOINT = (POST, api/2.0/jobs/runs/submit) GET_RUN_ENDPOINT = (GET, api/2.0/jobs/runs/get) @@ -208,7 +212,8 @@ self, databricks_conn_id=databricks_default, timeout_seconds=180, -retry_limit=3): +retry_limit=3, +retry_delay=1.0): :param databricks_conn_id: The name of the databricks connection to use. :type databricks_conn_id: string @@ -218,12 +223,17 @@ :param retry_limit: The number of times to retry the connection in case of service outages. :type retry_limit: int +:param retry_delay: The number of seconds to wait between retries (it +might be a floating point number). +:type retry_delay: float self.databricks_conn_id = databricks_conn_id self.databricks_conn = self.get_connection(databricks_conn_id) self.timeout_seconds = timeout_seconds -assert retry_limit = 1, Retry limit must be greater than equal to 1 +if retry_limit 1: +raise ValueError(Retry limit must be greater than equal to 1) self.retry_limit = retry_limit +self.retry_delay = retry_delay def _parse_host(self, host): @@ -278,7 +288,8 @@ else: raise AirflowException(Unexpected HTTP Method: + method) -for attempt_num in range(1, self.retry_limit + 1): +attempt_num = 1 +while True: try: response = request_func( url, @@ -286,21 +297,29 @@ auth=auth, headers=USER_AGENT_HEADER, timeout=self.timeout_seconds) -if response.status_code == requests.codes.ok: -return response.json() -else: +response.raise_for_status() +return response.json() +except requests_exceptions.RequestException as e: +if not _retryable_error(e): # In this case, the user probably made a mistake. # Dont retry. raise AirflowException(Response: {0}, Status Code: {1}.format( -response.content, response.status_code)) -except (requests_exceptions.ConnectionError, -requests_exceptions.Timeout) as e: -self.log.error( -Attempt %s API Request to Databricks failed with reason: %s, -attempt_num, e -) -raise AirflowException((API requests to Databricks failed {} times. + - Giving up.).format(self.retry_limit)) +e.response.content, e.response.status_code)) + +self._log_request_error(attempt_num, e) + +if attempt_num == self.retry_limit: +raise AirflowException((API requests to Databricks failed {} times. + +Giving up.).format(self.retry_limit)) + +attempt_num += 1 +sleep(self.retry_delay) + +def _log_request_error(self, attempt_num, error): +self.log.error( +Attempt %s API Request to Databricks failed with reason: %s, +attempt_num, error +) [docs] def submit_run(self, json): @@ -331,7 +350,22 @@ def cancel_run(self, run_id): json = {run_id: run_id} -self._do_api_call(CANCEL_RUN_ENDPOINT, json) +self._do_api_call(CANCEL_RUN_ENDPOINT, json) + +def restart_cluster(self, json): +self._do_api_call(RESTART_CLUSTER_ENDPOINT, json) + +def start_cluster(self, json): +self._do_api_call(START_CLUSTER_ENDPOINT, json) + +def terminate_cluster(self, json): +self._do_api_call(TERMINATE_CLUSTER_ENDPOINT, json) + + +def _retryable_error(exception): +return isinstance(exception, requests_exceptions.ConnectionError) \ +or
[04/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/integration.html -- diff --git a/integration.html b/integration.html index 326ae1e..12a5e30 100644 --- a/integration.html +++ b/integration.html @@ -159,6 +159,20 @@ BigQueryHook +Cloud SQL +Cloud SQL Operators +Cloud SQL Hook + + +Compute Engine +Compute Engine Operators + + +Cloud Functions +Cloud Functions Operators +Cloud Functions Hook + + Cloud DataFlow DataFlow Operators DataFlowHook @@ -322,6 +336,15 @@ flexibility. +To ensure that Airflow generates URLs with the correct scheme when +running behind a TLS-terminating proxy, you should configure the proxy +to set the X-Forwarded-Proto header, and enable the ProxyFix +middleware in your airflow.cfg: +enable_proxy_fix = True + + +Note: you should only enable the ProxyFix middleware when running +Airflow behind a trusted proxy (AWS ELB, nginx, etc.). Azure: Microsoft Azure¶ @@ -503,6 +526,37 @@ using a SAS token by adding {âsas_tokenâ: âYOUR_TOKENâ}. + +delete_file(container_name, blob_name, is_prefix=False, ignore_if_missing=False, **kwargs)[source]¶ +Delete a file from Azure Blob Storage. + + + + +Parameters: +container_name (str) â Name of the container. +blob_name (str) â Name of the blob. +is_prefix (bool) â If blob_name is a prefix, delete all matching files +ignore_if_missing â if True, then return success even if the + + + + + +blob does not exist. +:type ignore_if_missing: bool +:param kwargs: Optional keyword arguments that + +BlockBlobService.create_blob_from_path() takes. + + + + + + + + + get_conn()[source]¶ Return the BlockBlobService object. @@ -994,6 +1048,14 @@ Operators are in the contrib section. + + +execute(context)[source]¶ +This is the main method to derive when creating an operator. +Context is the same dictionary used as when rendering jinja templates. +Refer to get_template_context for more context. + + @@ -1020,6 +1082,14 @@ emr_connection extra. (templated) + + +execute(context)[source]¶ +This is the main method to derive when creating an operator. +Context is the same dictionary used as when rendering jinja templates. +Refer to get_template_context for more context. + + @@ -1042,6 +1112,14 @@ emr_connection extra. (templated) + + +execute(context)[source]¶ +This is the main method to derive when creating an operator. +Context is the same dictionary used as when rendering jinja templates. +Refer to get_template_context for more context. + + @@ -1127,6 +1205,79 @@ Overrides for this config may be passed as the job_flow_overrides. + +copy_object(source_bucket_key, dest_bucket_key, source_bucket_name=None, dest_bucket_name=None, source_version_id=None)[source]¶ +Creates a copy of an object that is already stored in S3. +Note: the S3 connection used here needs to have access to both +source and destination bucket/key. + + + + +Parameters: +source_bucket_key (str) â The key of the source object. +It can be either full s3:// style url or relative path from root level. +When itâs specified as a full s3:// url, please omit source_bucket_name. + +dest_bucket_key (str) â The key of the object to copy to. +The convention to specify dest_bucket_key is the same +as source_bucket_key. + +source_bucket_name (str) â Name of the S3 bucket where the source object is in. +It should be omitted when source_bucket_key is provided as a full s3:// url. + +dest_bucket_name (str) â Name of the S3 bucket to where the object is copied. +It should be omitted when dest_bucket_key is provided as a full s3:// url. + +source_version_id (str) â Version ID of the source object (OPTIONAL) + + + + + + + + + +create_bucket(bucket_name, region_name=None)[source]¶ +Creates an Amazon S3 bucket. + + + + +Parameters: +bucket_name (str) â The name of the bucket +region_name (str) â The name of the aws region in which to create the bucket. + + + + + + + + + +delete_objects(bucket, keys)[source]¶ + + + + +Parameters: +bucket (str) â Name of the bucket in which you are going to delete object(s) +keys (str or list) â The key(s) to delete from S3 bucket. +When keys is a string, itâs supposed to be the key name of +the single object to delete. +When keys is a list, itâs supposed to be the list of the +keys to delete. + + + + + + + + + get_bucket(bucket_name)[source]¶ Returns a boto3.S3.Bucket object @@ -1268,6 +1419,29 @@ by S3 and will be stored in an encrypted form while at rest in S3. + +load_file_obj(file_obj, key, bucket_name=None, replace=False, encrypt=False)[source]¶ +Loads a file object to S3 + + + + +Parameters: +file_obj (file-like object) â The file-like object to set as the content for the S3 key. +key (str) â S3 key that will point to the file +bucket_name (str) â Name of the bucket in which to store the file +replace (bool) â A flag that indicates
[32/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/sagemaker_hook.html -- diff --git a/_modules/airflow/contrib/hooks/sagemaker_hook.html b/_modules/airflow/contrib/hooks/sagemaker_hook.html new file mode 100644 index 000..d501097 --- /dev/null +++ b/_modules/airflow/contrib/hooks/sagemaker_hook.html @@ -0,0 +1,972 @@ + + + + + + + + + + + airflow.contrib.hooks.sagemaker_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.sagemaker_hook + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.sagemaker_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import tarfile +import tempfile +import time +import os +import collections + +import botocore.config +from botocore.exceptions import ClientError + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.hooks.S3_hook import S3Hook +from airflow.utils import timezone + + +class LogState(object): +STARTING = 1 +WAIT_IN_PROGRESS = 2 +TAILING = 3 +JOB_COMPLETE = 4 +COMPLETE = 5 + + +# Position is a tuple that includes the last read timestamp and the number of items that were read +# at that time. This is used to figure out which event to start with on the next read. +Position = collections.namedtuple(Position, [timestamp, skip]) + + +def argmin(arr, f): +Return the index, i, in arr that minimizes f(arr[i]) +m = None +i = None +for idx, item in enumerate(arr): +if item is not None: +if m is None or f(item) m: +m = f(item) +i = idx +return i + + +def secondary_training_status_changed(current_job_description, prev_job_description): + +Returns true if training jobs secondary status message has changed. + +:param current_job_description: Current job description, returned from DescribeTrainingJob call. +:type current_job_description: dict +:param prev_job_description: Previous job description, returned from DescribeTrainingJob call. +:type prev_job_description: dict + +:return: Whether the secondary status message of a training job changed or not. + +current_secondary_status_transitions = current_job_description.get(SecondaryStatusTransitions) +if current_secondary_status_transitions is None or len(current_secondary_status_transitions) == 0: +return False + +prev_job_secondary_status_transitions = prev_job_description.get(SecondaryStatusTransitions) \ +if prev_job_description is not None else None + +last_message = prev_job_secondary_status_transitions[-1][StatusMessage] \ +if prev_job_secondary_status_transitions is not None \ +and len(prev_job_secondary_status_transitions) 0 else + +message = current_job_description[SecondaryStatusTransitions][-1][StatusMessage] + +return message != last_message + + +def secondary_training_status_message(job_description, prev_description): + +Returns a string
[16/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/mssql_operator.html -- diff --git a/_modules/airflow/operators/mssql_operator.html b/_modules/airflow/operators/mssql_operator.html index 8dcdfe3..91504b6 100644 --- a/_modules/airflow/operators/mssql_operator.html +++ b/_modules/airflow/operators/mssql_operator.html @@ -211,12 +211,12 @@ self.autocommit = autocommit self.database = database -def execute(self, context): +[docs] def execute(self, context): self.log.info(Executing: %s, self.sql) hook = MsSqlHook(mssql_conn_id=self.mssql_conn_id, schema=self.database) hook.run(self.sql, autocommit=self.autocommit, - parameters=self.parameters) + parameters=self.parameters) @@ -247,20 +247,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/mssql_to_hive.html -- diff --git a/_modules/airflow/operators/mssql_to_hive.html b/_modules/airflow/operators/mssql_to_hive.html index 7c90d6c..6cf4178 100644 --- a/_modules/airflow/operators/mssql_to_hive.html +++ b/_modules/airflow/operators/mssql_to_hive.html @@ -267,7 +267,7 @@ } return d[mssql_type] if mssql_type in d else STRING -def execute(self, context): +[docs] def execute(self, context): hive = HiveCliHook(hive_cli_conn_id=self.hive_cli_conn_id) mssql = MsSqlHook(mssql_conn_id=self.mssql_conn_id) @@ -297,7 +297,7 @@ partition=self.partition, delimiter=self.delimiter, recreate=self.recreate, -tblproperties=self.tblproperties) +tblproperties=self.tblproperties) @@ -328,20 +328,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/mysql_operator.html -- diff --git a/_modules/airflow/operators/mysql_operator.html b/_modules/airflow/operators/mysql_operator.html index 82fcb45..62adbf6 100644 --- a/_modules/airflow/operators/mysql_operator.html +++ b/_modules/airflow/operators/mysql_operator.html @@ -212,14 +212,14 @@ self.parameters = parameters self.database = database -def execute(self, context): +[docs] def execute(self, context): self.log.info(Executing: %s, self.sql) hook = MySqlHook(mysql_conn_id=self.mysql_conn_id, schema=self.database) hook.run( self.sql, autocommit=self.autocommit, -parameters=self.parameters) +parameters=self.parameters) @@ -250,20 +250,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/mysql_to_hive.html -- diff --git a/_modules/airflow/operators/mysql_to_hive.html b/_modules/airflow/operators/mysql_to_hive.html index 0add38c..e635a0a 100644 --- a/_modules/airflow/operators/mysql_to_hive.html +++ b/_modules/airflow/operators/mysql_to_hive.html @@ -274,7 +274,7 @@ } return d[mysql_type] if mysql_type in d else STRING -def execute(self, context): +[docs] def execute(self, context): hive = HiveCliHook(hive_cli_conn_id=self.hive_cli_conn_id) mysql = MySqlHook(mysql_conn_id=self.mysql_conn_id) @@ -300,7 +300,7 @@ partition=self.partition, delimiter=self.delimiter, recreate=self.recreate, -
[25/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcs_download_operator.html -- diff --git a/_modules/airflow/contrib/operators/gcs_download_operator.html b/_modules/airflow/contrib/operators/gcs_download_operator.html index a02f0a2..3154d79 100644 --- a/_modules/airflow/contrib/operators/gcs_download_operator.html +++ b/_modules/airflow/contrib/operators/gcs_download_operator.html @@ -232,7 +232,7 @@ self.google_cloud_storage_conn_id = google_cloud_storage_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): self.log.info(Executing download: %s, %s, %s, self.bucket, self.object, self.filename) hook = GoogleCloudStorageHook( @@ -249,7 +249,7 @@ raise RuntimeError( The size of the downloaded file is too large to push to XCom! ) -self.log.debug(file_bytes) +self.log.debug(file_bytes) @@ -280,20 +280,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcs_list_operator.html -- diff --git a/_modules/airflow/contrib/operators/gcs_list_operator.html b/_modules/airflow/contrib/operators/gcs_list_operator.html index bacdd21..5d2f44d 100644 --- a/_modules/airflow/contrib/operators/gcs_list_operator.html +++ b/_modules/airflow/contrib/operators/gcs_list_operator.html @@ -238,7 +238,7 @@ self.google_cloud_storage_conn_id = google_cloud_storage_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): hook = GoogleCloudStorageHook( google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, @@ -250,7 +250,7 @@ return hook.list(bucket=self.bucket, prefix=self.prefix, - delimiter=self.delimiter) + delimiter=self.delimiter) @@ -281,20 +281,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcs_operator.html -- diff --git a/_modules/airflow/contrib/operators/gcs_operator.html b/_modules/airflow/contrib/operators/gcs_operator.html index dcdee63..e563104 100644 --- a/_modules/airflow/contrib/operators/gcs_operator.html +++ b/_modules/airflow/contrib/operators/gcs_operator.html @@ -265,7 +265,7 @@ self.google_cloud_storage_conn_id = google_cloud_storage_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): if self.labels is not None: self.labels.update( {airflow-version: v + version.replace(., -).replace(+, -)} @@ -280,7 +280,7 @@ storage_class=self.storage_class, location=self.location, project_id=self.project_id, - labels=self.labels) + labels=self.labels) @@ -311,20 +311,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcs_to_bq.html -- diff --git a/_modules/airflow/contrib/operators/gcs_to_bq.html b/_modules/airflow/contrib/operators/gcs_to_bq.html index 4dabc6a..34bafd1 100644 --- a/_modules/airflow/contrib/operators/gcs_to_bq.html +++
[20/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/sensors/sagemaker_training_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/sagemaker_training_sensor.html b/_modules/airflow/contrib/sensors/sagemaker_training_sensor.html new file mode 100644 index 000..2f649d3 --- /dev/null +++ b/_modules/airflow/contrib/sensors/sagemaker_training_sensor.html @@ -0,0 +1,313 @@ + + + + + + + + + + + airflow.contrib.sensors.sagemaker_training_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.sensors.sagemaker_training_sensor + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.sensors.sagemaker_training_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import time + +from airflow.contrib.hooks.sagemaker_hook import SageMakerHook, LogState +from airflow.contrib.sensors.sagemaker_base_sensor import SageMakerBaseSensor +from airflow.utils.decorators import apply_defaults + + +[docs]class SageMakerTrainingSensor(SageMakerBaseSensor): + +Asks for the state of the training state until it reaches a terminal state. +If it fails the sensor errors, failing the task. + +:param job_name: name of the SageMaker training job to check the state of +:type job_name: str +:param print_log: if the operator should print the cloudwatch log +:type print_log: bool + + +template_fields = [job_name] +template_ext = () + +@apply_defaults +def __init__(self, + job_name, + print_log=True, + *args, + **kwargs): +super(SageMakerTrainingSensor, self).__init__(*args, **kwargs) +self.job_name = job_name +self.print_log = print_log +self.positions = {} +self.stream_names = [] +self.instance_count = None +self.state = None +self.last_description = None +self.last_describe_job_call = None +self.log_resource_inited = False + +def init_log_resource(self, hook): +description = hook.describe_training_job(self.job_name) +self.instance_count = description[ResourceConfig][InstanceCount] + +status = description[TrainingJobStatus] +job_already_completed = status not in self.non_terminal_states() +self.state = LogState.TAILING if not job_already_completed else LogState.COMPLETE +self.last_description = description +self.last_describe_job_call = time.time() +self.log_resource_inited = True + +def non_terminal_states(self): +return SageMakerHook.non_terminal_states + +def failed_states(self): +return SageMakerHook.failed_states + +def get_sagemaker_response(self): +sagemaker_hook = SageMakerHook(aws_conn_id=self.aws_conn_id) +if self.print_log: +if not self.log_resource_inited: +self.init_log_resource(sagemaker_hook) +self.state, self.last_description, self.last_describe_job_call = \ +
[35/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/gcp_dataflow_hook.html -- diff --git a/_modules/airflow/contrib/hooks/gcp_dataflow_hook.html b/_modules/airflow/contrib/hooks/gcp_dataflow_hook.html index 808d22f..cf99744 100644 --- a/_modules/airflow/contrib/hooks/gcp_dataflow_hook.html +++ b/_modules/airflow/contrib/hooks/gcp_dataflow_hook.html @@ -178,6 +178,7 @@ # specific language governing permissions and limitations # under the License. import json +import re import select import subprocess import time @@ -194,12 +195,13 @@ class _DataflowJob(LoggingMixin): -def __init__(self, dataflow, project_number, name, location, poll_sleep=10): +def __init__(self, dataflow, project_number, name, location, poll_sleep=10, + job_id=None): self._dataflow = dataflow self._project_number = project_number self._job_name = name self._job_location = location -self._job_id = None +self._job_id = job_id self._job = self._get_job() self._poll_sleep = poll_sleep @@ -207,7 +209,7 @@ jobs = self._dataflow.projects().locations().jobs().list( projectId=self._project_number, location=self._job_location -).execute() +).execute(num_retries=5) for job in jobs[jobs]: if job[name] == self._job_name: self._job_id = job[id] @@ -215,13 +217,15 @@ return None def _get_job(self): -if self._job_name: +if self._job_id: +job = self._dataflow.projects().locations().jobs().get( +projectId=self._project_number, +location=self._job_location, +jobId=self._job_id).execute(num_retries=5) +elif self._job_name: job = self._get_job_id_from_name() else: -job = self._dataflow.projects().jobs().get( -projectId=self._project_number, -jobId=self._job_id -).execute() +raise Exception(Missing both dataflow job ID and name.) if job and currentState in job: self.log.info( @@ -284,36 +288,50 @@ def _line(self, fd): if fd == self._proc.stderr.fileno(): -lines = self._proc.stderr.readlines() -for line in lines: +line = b.join(self._proc.stderr.readlines()) +if line: self.log.warning(line[:-1]) -if lines: -return lines[-1] +return line if fd == self._proc.stdout.fileno(): -line = self._proc.stdout.readline() +line = b.join(self._proc.stdout.readlines()) +if line: +self.log.info(line[:-1]) return line @staticmethod def _extract_job(line): -if line is not None: -if line.startswith(Submitted job: ): -return line[15:-1] +# Job id info: https://goo.gl/SE29y9. +job_id_pattern = re.compile( +b.*console.cloud.google.com/dataflow.*/jobs/([a-z|0-9|A-Z|\-|\_]+).*) +matched_job = job_id_pattern.search(line or ) +if matched_job: +return matched_job.group(1).decode() def wait_for_done(self): reads = [self._proc.stderr.fileno(), self._proc.stdout.fileno()] self.log.info(Start waiting for DataFlow process to complete.) -while self._proc.poll() is None: +job_id = None +# Make sure logs are processed regardless whether the subprocess is +# terminated. +process_ends = False +while True: ret = select.select(reads, [], [], 5) if ret is not None: for fd in ret[0]: line = self._line(fd) if line: -self.log.debug(line[:-1]) +job_id = job_id or self._extract_job(line) else: self.log.info(Waiting for DataFlow process to complete.) +if process_ends: +break +if self._proc.poll() is not None: +# Mark process completion but allows its outputs to be consumed. +process_ends = True if self._proc.returncode is not 0: raise Exception(DataFlow failed with return code {}.format( self._proc.returncode)) +return job_id [docs]class DataFlowHook(GoogleCloudBaseHook): @@ -327,7 +345,7 @@ [docs] def get_conn(self): -Returns a Google Cloud Storage service object. +Returns a Google Cloud Dataflow service object. http_authorized = self._authorize() return build( @@ -338,9 +356,10 @@ variables = self._set_variables(variables) cmd = command_prefix +
[30/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/wasb_hook.html -- diff --git a/_modules/airflow/contrib/hooks/wasb_hook.html b/_modules/airflow/contrib/hooks/wasb_hook.html index e98b5cf..13007c9 100644 --- a/_modules/airflow/contrib/hooks/wasb_hook.html +++ b/_modules/airflow/contrib/hooks/wasb_hook.html @@ -179,6 +179,7 @@ # under the License. # +from airflow import AirflowException from airflow.hooks.base_hook import BaseHook from azure.storage.blob import BlockBlobService @@ -308,7 +309,47 @@ return self.connection.get_blob_to_text(container_name, blob_name, -**kwargs).content +**kwargs).content + +[docs] def delete_file(self, container_name, blob_name, is_prefix=False, +ignore_if_missing=False, **kwargs): + +Delete a file from Azure Blob Storage. + +:param container_name: Name of the container. +:type container_name: str +:param blob_name: Name of the blob. +:type blob_name: str +:param is_prefix: If blob_name is a prefix, delete all matching files +:type is_prefix: bool +:param ignore_if_missing: if True, then return success even if the +blob does not exist. +:type ignore_if_missing: bool +:param kwargs: Optional keyword arguments that +`BlockBlobService.create_blob_from_path()` takes. +:type kwargs: object + + +if is_prefix: +blobs_to_delete = [ +blob.name for blob in self.connection.list_blobs( +container_name, prefix=blob_name, **kwargs +) +] +elif self.check_for_blob(container_name, blob_name): +blobs_to_delete = [blob_name] +else: +blobs_to_delete = [] + +if not ignore_if_missing and len(blobs_to_delete) == 0: +raise AirflowException(Blob(s) not found: {}.format(blob_name)) + +for blob_uri in blobs_to_delete: +self.log.info(Deleting blob: + blob_uri) +self.connection.delete_blob(container_name, +blob_uri, +delete_snapshots=include, +**kwargs) @@ -339,20 +380,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/winrm_hook.html -- diff --git a/_modules/airflow/contrib/hooks/winrm_hook.html b/_modules/airflow/contrib/hooks/winrm_hook.html index c89917c..5496cb2 100644 --- a/_modules/airflow/contrib/hooks/winrm_hook.html +++ b/_modules/airflow/contrib/hooks/winrm_hook.html @@ -326,20 +326,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/kubernetes/secret.html -- diff --git a/_modules/airflow/contrib/kubernetes/secret.html b/_modules/airflow/contrib/kubernetes/secret.html index 4d77038..6f75ac3 100644 --- a/_modules/airflow/contrib/kubernetes/secret.html +++ b/_modules/airflow/contrib/kubernetes/secret.html @@ -230,20 +230,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/awsbatch_operator.html -- diff --git a/_modules/airflow/contrib/operators/awsbatch_operator.html
[38/38] incubator-airflow-site git commit: Docs from 1.10.1
Docs from 1.10.1 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/commit/1f06fa0e Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/tree/1f06fa0e Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/diff/1f06fa0e Branch: refs/heads/asf-site Commit: 1f06fa0e0ead70d24f3c87326734aec89930207d Parents: 7d4d762 Author: Ash Berlin-Taylor Authored: Thu Nov 29 15:57:25 2018 + Committer: Ash Berlin-Taylor Committed: Thu Nov 29 15:57:25 2018 + -- _images/airflow.gif | Bin 622963 -> 622832 bytes .../contrib/executors/mesos_executor.html | 548 --- .../contrib/hooks/aws_dynamodb_hook.html| 21 +- .../contrib/hooks/aws_firehose_hook.html| 267 ++ _modules/airflow/contrib/hooks/aws_hook.html| 104 +- .../airflow/contrib/hooks/aws_lambda_hook.html | 21 +- .../contrib/hooks/azure_data_lake_hook.html | 21 +- .../contrib/hooks/azure_fileshare_hook.html | 21 +- .../airflow/contrib/hooks/bigquery_hook.html| 139 +- .../airflow/contrib/hooks/cassandra_hook.html | 21 +- .../airflow/contrib/hooks/cloudant_hook.html| 21 +- .../airflow/contrib/hooks/databricks_hook.html | 87 +- .../airflow/contrib/hooks/datadog_hook.html | 21 +- .../airflow/contrib/hooks/datastore_hook.html | 21 +- .../contrib/hooks/discord_webhook_hook.html | 21 +- _modules/airflow/contrib/hooks/emr_hook.html| 36 +- _modules/airflow/contrib/hooks/fs_hook.html | 21 +- _modules/airflow/contrib/hooks/ftp_hook.html| 91 +- .../contrib/hooks/gcp_api_base_hook.html| 23 +- .../contrib/hooks/gcp_container_hook.html | 25 +- .../contrib/hooks/gcp_dataflow_hook.html| 128 +- .../contrib/hooks/gcp_dataproc_hook.html| 21 +- .../contrib/hooks/gcp_function_hook.html| 406 +++ .../contrib/hooks/gcp_mlengine_hook.html| 32 +- .../airflow/contrib/hooks/gcp_pubsub_hook.html | 21 +- .../airflow/contrib/hooks/gcp_sql_hook.html | 474 +++ _modules/airflow/contrib/hooks/gcs_hook.html| 60 +- _modules/airflow/contrib/hooks/imap_hook.html | 491 +++ .../airflow/contrib/hooks/jenkins_hook.html | 21 +- _modules/airflow/contrib/hooks/jira_hook.html | 21 +- _modules/airflow/contrib/hooks/mongo_hook.html | 21 +- _modules/airflow/contrib/hooks/pinot_hook.html | 21 +- _modules/airflow/contrib/hooks/qubole_hook.html | 21 +- _modules/airflow/contrib/hooks/redis_hook.html | 21 +- .../airflow/contrib/hooks/redshift_hook.html| 21 +- .../airflow/contrib/hooks/sagemaker_hook.html | 972 ++ .../airflow/contrib/hooks/segment_hook.html | 21 +- _modules/airflow/contrib/hooks/sftp_hook.html | 98 +- .../contrib/hooks/slack_webhook_hook.html | 21 +- .../airflow/contrib/hooks/snowflake_hook.html | 21 +- .../airflow/contrib/hooks/spark_jdbc_hook.html | 21 +- .../airflow/contrib/hooks/spark_sql_hook.html | 21 +- .../contrib/hooks/spark_submit_hook.html| 21 +- _modules/airflow/contrib/hooks/sqoop_hook.html | 21 +- _modules/airflow/contrib/hooks/ssh_hook.html| 315 +- .../airflow/contrib/hooks/vertica_hook.html | 21 +- _modules/airflow/contrib/hooks/wasb_hook.html | 64 +- _modules/airflow/contrib/hooks/winrm_hook.html | 21 +- _modules/airflow/contrib/kubernetes/secret.html | 21 +- .../contrib/operators/awsbatch_operator.html| 47 +- .../operators/bigquery_check_operator.html | 71 +- .../contrib/operators/bigquery_get_data.html| 25 +- .../contrib/operators/bigquery_operator.html| 202 +- .../bigquery_table_delete_operator.html | 25 +- .../contrib/operators/bigquery_to_bigquery.html | 33 +- .../contrib/operators/bigquery_to_gcs.html | 34 +- .../contrib/operators/cassandra_to_gcs.html | 25 +- .../contrib/operators/databricks_operator.html | 37 +- .../contrib/operators/dataflow_operator.html| 104 +- .../contrib/operators/dataproc_operator.html| 130 +- .../operators/datastore_export_operator.html| 25 +- .../operators/datastore_import_operator.html| 25 +- .../operators/discord_webhook_operator.html | 21 +- .../contrib/operators/druid_operator.html | 25 +- .../airflow/contrib/operators/ecs_operator.html | 52 +- .../operators/emr_add_steps_operator.html | 25 +- .../operators/emr_create_job_flow_operator.html | 25 +- .../emr_terminate_job_flow_operator.html| 25 +- .../airflow/contrib/operators/file_to_gcs.html | 34 +- .../airflow/contrib/operators/file_to_wasb.html | 21 +- .../contrib/operators/gcp_compute_operator.html | 394 +++ .../operators/gcp_container_operator.html | 29 +- .../operators/gcp_function_operator.html| 522 +++
[23/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/sagemaker_endpoint_operator.html -- diff --git a/_modules/airflow/contrib/operators/sagemaker_endpoint_operator.html b/_modules/airflow/contrib/operators/sagemaker_endpoint_operator.html new file mode 100644 index 000..563ec94 --- /dev/null +++ b/_modules/airflow/contrib/operators/sagemaker_endpoint_operator.html @@ -0,0 +1,362 @@ + + + + + + + + + + + airflow.contrib.operators.sagemaker_endpoint_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.sagemaker_endpoint_operator + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.sagemaker_endpoint_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +[docs]class SageMakerEndpointOperator(SageMakerBaseOperator): + + +Create a SageMaker endpoint. + +This operator returns The ARN of the endpoint created in Amazon SageMaker + +:param config: +The configuration necessary to create an endpoint. + +If you need to create a SageMaker endpoint based on an existed SageMaker model and an existed SageMaker +endpoint config, + +config = endpoint_configuration; + +If you need to create all of SageMaker model, SageMaker endpoint-config and SageMaker endpoint, + +config = { +Model: model_configuration, + +EndpointConfig: endpoint_config_configuration, + +Endpoint: endpoint_configuration +} + +For details of the configuration parameter of model_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model + +For details of the configuration parameter of endpoint_config_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config + +For details of the configuration parameter of endpoint_configuration, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +:param wait_for_completion: Whether the operator should wait until the endpoint creation finishes. +:type wait_for_completion: bool +:param check_interval: If wait is set to True, this is the time interval, in seconds, that this operation waits +before polling the status of the endpoint creation. +:type check_interval: int +:param max_ingestion_time: If wait is set to True, this operation fails if the endpoint creation doesnt finish +
[GitHub] ron819 commented on issue #2526: [AIRFLOW-1514] Truncate large logs in webui, add raw log link
ron819 commented on issue #2526: [AIRFLOW-1514] Truncate large logs in webui, add raw log link URL: https://github.com/apache/incubator-airflow/pull/2526#issuecomment-442886854 Can this be modified to the rbac UI? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[05/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/write-logs.html -- diff --git a/howto/write-logs.html b/howto/write-logs.html index 7be99f8..79c91b0 100644 --- a/howto/write-logs.html +++ b/howto/write-logs.html @@ -197,7 +197,7 @@ directory. In addition, users can supply a remote location for storing logs and log backups in cloud storage. -In the Airflow Web UI, local logs take precedence over remote logs. If local logs +In the Airflow Web UI, local logs take precedance over remote logs. If local logs can not be found or accessed, the remote logs will be displayed. Note that logs are only sent to remote storage once a task completes (including failure). In other words, remote logs for running tasks are unavailable. Logs are stored in the log @@ -269,59 +269,22 @@ Follow the steps below to enable Azure Blob Storage logging. Writing Logs to Google Cloud Storage¶ Follow the steps below to enable Google Cloud Storage logging. - -Airflowâs logging system requires a custom .py file to be located in the PYTHONPATH, so that itâs importable from Airflow. Start by creating a directory to store the config file. $AIRFLOW_HOME/config is recommended. - -Create empty files called $AIRFLOW_HOME/config/log_config.py and $AIRFLOW_HOME/config/__init__.py. - -Copy the contents of airflow/config_templates/airflow_local_settings.py into the log_config.py file that was just created in the step above. - -Customize the following portions of the template: - -# Add this variable to the top of the file. Note the trailing slash. -GCS_LOG_FOLDER = gs://bucket where logs should be persisted/ - -# Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG -LOGGING_CONFIG = ... - -# Add a GCSTaskHandler to the handlers block of the LOGGING_CONFIG variable -gcs.task: { -class: airflow.utils.log.gcs_task_handler.GCSTaskHandler, -formatter: airflow.task, -base_log_folder: os.path.expanduser(BASE_LOG_FOLDER), -gcs_log_folder: GCS_LOG_FOLDER, -filename_template: FILENAME_TEMPLATE, -}, - -# Update the airflow.task and airflow.task_runner blocks to be gcs.task instead of file.task. -loggers: { -airflow.task: { -handlers: [gcs.task], -... -}, -airflow.task_runner: { -handlers: [gcs.task], -... -}, -airflow: { -handlers: [console], -... -}, -} +To enable this feature, airflow.cfg must be configured as in this +example: +[core] +# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search. +# Users must supply an Airflow connection id that provides access to the storage +# location. If remote_logging is set to true, see UPDATING.md for additional +# configuration requirements. +remote_logging = True +remote_base_log_folder = gs://my-bucket/path/to/logs +remote_log_conn_id = MyGCSConn - - -Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in GCS_LOG_FOLDER. + +Install the gcp_api package first, like so: pip install apache-airflow[gcp_api]. -Update $AIRFLOW_HOME/airflow.cfg to contain: - -task_log_reader = gcs.task -logging_config_class = log_config.LOGGING_CONFIG -remote_log_conn_id = name of the Google cloud platform hook - - - +Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in remote_base_log_folder. Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution. @@ -340,12 +303,6 @@ Follow the steps below to enable Azure Blob Storage logging. Note the top line that says itâs reading from the remote log file. -Please be aware that if you were persisting logs to Google Cloud Storage -using the old-style airflow.cfg configuration method, the old logs will no -longer be visible in the Airflow UI, though theyâll still exist in Google -Cloud Storage. This is a backwards incompatbile change. If you are unhappy -with it, you can change the FILENAME_TEMPLATE to reflect the old-style -log filename format. @@ -387,20 +344,13 @@ log filename format. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/index.html -- diff --git a/index.html b/index.html index 099c047..c8a8049 100644 --- a/index.html +++ b/index.html @@ -270,6 +270,17 @@ unit of work and
[14/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_sources/installation.rst.txt -- diff --git a/_sources/installation.rst.txt b/_sources/installation.rst.txt index 6d32c07..5faca5e 100644 --- a/_sources/installation.rst.txt +++ b/_sources/installation.rst.txt @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +..http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + Installation @@ -14,7 +31,7 @@ You can also install Airflow with support for extra features like ``s3`` or ``po .. code-block:: bash -pip install apache-airflow[postgres,s3] +pip install "apache-airflow[s3, postgres]" .. note:: GPL dependency @@ -41,67 +58,66 @@ Here's the list of the subpackages and what they enable: +---+--+-+ | subpackage| install command | enables | +===+==+=+ -| all | ``pip install apache-airflow[all]`` | All Airflow features known to man | -+---+--+-+ -| all_dbs | ``pip install apache-airflow[all_dbs]`` | All databases integrations | +| all | ``pip install apache-airflow[all]`` | All Airflow features known to man | +---+--+-+ -| async | ``pip install apache-airflow[async]``| Async worker classes for Gunicorn | +| all_dbs | ``pip install apache-airflow[all_dbs]`` | All databases integrations | +---+--+-+ -| celery| ``pip install apache-airflow[celery]`` | CeleryExecutor | +| async| ``pip install apache-airflow[async]``| Async worker classes for gunicorn | +---+--+-+ -| cloudant | ``pip install apache-airflow[cloudant]`` | Cloudant hook | +| devel| ``pip install apache-airflow[devel]``| Minimum dev tools requirements | +---+--+-+ -| crypto| ``pip install apache-airflow[crypto]`` | Encrypt connection passwords in metadata db | +| devel_hadoop | ``pip install apache-airflow[devel_hadoop]`` | Airflow + dependencies on the Hadoop stack | +---+--+-+ -| devel | ``pip install apache-airflow[devel]``| Minimum dev tools requirements | +| celery | ``pip install apache-airflow[celery]`` | CeleryExecutor | +---+--+-+ -| devel_hadoop | ``pip install apache-airflow[devel_hadoop]`` | Airflow + dependencies on the Hadoop stack | +| crypto | ``pip install apache-airflow[crypto]`` | Encrypt connection passwords in metadata db | +---+--+-+ -| druid | ``pip install apache-airflow[druid]``| Druid related operators & hooks | +| druid| ``pip install apache-airflow[druid]``| Druid.io related operators & hooks | +---+--+-+ -| gcp_api | ``pip install
[34/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/gcp_sql_hook.html -- diff --git a/_modules/airflow/contrib/hooks/gcp_sql_hook.html b/_modules/airflow/contrib/hooks/gcp_sql_hook.html new file mode 100644 index 000..9428e74 --- /dev/null +++ b/_modules/airflow/contrib/hooks/gcp_sql_hook.html @@ -0,0 +1,474 @@ + + + + + + + + + + + airflow.contrib.hooks.gcp_sql_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.gcp_sql_hook + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.gcp_sql_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import time +from googleapiclient.discovery import build + +from airflow import AirflowException +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + +# Number of retries - used by googleapiclient method calls to perform retries +# For requests that are retriable +NUM_RETRIES = 5 + +# Time to sleep between active checks of the operation results +TIME_TO_SLEEP_IN_SECONDS = 1 + + +class CloudSqlOperationStatus: +PENDING = PENDING +RUNNING = RUNNING +DONE = DONE +UNKNOWN = UNKNOWN + + +# noinspection PyAbstractClass +[docs]class CloudSqlHook(GoogleCloudBaseHook): + +Hook for Google Cloud SQL APIs. + +_conn = None + +def __init__(self, + api_version, + gcp_conn_id=google_cloud_default, + delegate_to=None): +super(CloudSqlHook, self).__init__(gcp_conn_id, delegate_to) +self.api_version = api_version + +[docs] def get_conn(self): + +Retrieves connection to Cloud SQL. + +:return: Google Cloud SQL services object. +:rtype: dict + +if not self._conn: +http_authorized = self._authorize() +self._conn = build(sqladmin, self.api_version, + http=http_authorized, cache_discovery=False) +return self._conn + +[docs] def get_instance(self, project_id, instance): + +Retrieves a resource containing information about a Cloud SQL instance. + +:param project_id: Project ID of the project that contains the instance. +:type project_id: str +:param instance: Database instance ID. This does not include the project ID. +:type instance: str +:return: A Cloud SQL instance resource. +:rtype: dict + +return self.get_conn().instances().get( +project=project_id, +instance=instance +).execute(num_retries=NUM_RETRIES) + +[docs] def create_instance(self, project_id, body): + +Creates a new Cloud SQL instance. + +:param project_id: Project ID of the project to which the newly created +Cloud SQL instances should belong. +:type project_id: str +:param body: Body required by the Cloud SQL insert API, as described in +
[09/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/concepts.html -- diff --git a/concepts.html b/concepts.html index 58f47df..87be121 100644 --- a/concepts.html +++ b/concepts.html @@ -123,6 +123,7 @@ Packaged dags +.airflowignore Data Profiling @@ -469,7 +470,7 @@ execution parallelism is only limited to the executorâs setting. Connections¶ The connection information to external systems is stored in the Airflow -metadata database and managed in the UI (Menu - Admin - Connections) +metadata database and managed in the UI (Menu - Admin - Connections). A conn_id is defined there and hostname / login / password / schema information attached to it. Airflow pipelines can simply refer to the centrally managed conn_id without having to hard code any of this @@ -479,15 +480,6 @@ is the case, and when the hooks uses the BaseHook, Airflow will choose one connection randomly, allowing for some basic load balancing and fault tolerance when used in conjunction with retries. -Airflow also has the ability to reference connections via environment -variables from the operating system. But it only supports URI format. If you -need to specify extra for your connection, please use web UI. -If connections with the same conn_id are defined in both Airflow metadata -database and environment variables, only the one in environment variables -will be referenced by Airflow (for example, given conn_id postgres_master, -Airflow will search for AIRFLOW_CONN_POSTGRES_MASTER -in environment variables first and directly reference it if found, -before it starts to search in metadata database). Many hooks have a default conn_id, where operators using that hook do not need to supply an explicit connection ID. For example, the default conn_id for the PostgresHook is @@ -496,14 +488,14 @@ need to supply an explicit connection ID. For example, the default Queues¶ -When using the CeleryExecutor, the celery queues that tasks are sent to +When using the CeleryExecutor, the Celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow.cfgâs celery - default_queue. This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started. Workers can listen to one or multiple queues of tasks. When a worker is -started (using the command airflow worker), a set of comma delimited +started (using the command airflow worker), a set of comma-delimited queue names can be specified (e.g. airflow worker -q spark). This worker will then only pick up tasks wired to the specified queue(s). This can be useful if you need specialized workers, either from a @@ -950,6 +942,28 @@ to be available on the system if a module needs those. In other words only pure python modules can be packaged. + +.airflowignore¶ +A .airflowignore file specifies the directories or files in DAG_FOLDER +that Airflow should intentionally ignore. Each line in .airflowignore +specifies a regular expression pattern, and directories or files whose names +(not DAG id) match any of the patterns would be ignored (under the hood, +re.findall() is used to match the pattern). Overall it works like a +.gitignore file. +.airflowignore file should be put in your DAG_FOLDER. +For example, you can prepare a .airflowignore file with contents +project_a +tenant_[\d] + + +Then files like âproject_a_dag_1.pyâ, âTESTING_project_a.pyâ, âtenant_1.pyâ, +âproject_a/dag_1.pyâ, and âtenant_1/dag_1.pyâ in your DAG_FOLDER would be ignored +(If a directoryâs name matches any of the patterns, this directory and all its subfolders +would not be scanned by Airflow at all. This improves efficiency of DAG finding). +The scope of a .airflowignore file is the directory it is in plus all its subfolders. +You can also prepare .airflowignore file for a subfolder in DAG_FOLDER and it +would only be applicable for that subfolder. + @@ -990,20 +1004,13 @@ pure python modules can be packaged. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'./', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/faq.html -- diff --git a/faq.html b/faq.html index d2bf58f..0f8bd57 100644 --- a/faq.html +++ b/faq.html @@ -373,20 +373,13 @@ performs the actual work - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'./', -
[10/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/code.html -- diff --git a/code.html b/code.html index 4208c5a..9c74cb2 100644 --- a/code.html +++ b/code.html @@ -101,20 +101,20 @@ BaseOperator BaseSensorOperator Core Operators -Operators +Operators Sensors Community-contributed Operators -Operators -Sensors +Operators +Sensors Macros Default Variables -Macros +Macros Models @@ -220,7 +220,7 @@ to understand the primitive features that can be leveraged in your DAGs. -class airflow.models.BaseOperator(task_id, owner='Airflow', email=None, email_on_retry=True, email_on_failure=True, retries=0, retry_delay=datetime.timedelta(0, 300), retry_exponential_backoff=False, max_retry_delay=None, start_date=None, end_date=None, schedule_interval=None, depends_on_past=False, wait_for_downstream=False, dag=None, params=None, default_args=None, adhoc=False, priority_weight=1, weight_rule=u'downstream', queue='default', pool=None, sla=None, execution_timeout=None, on_failure_callback=None, on_success_callback=None, on_retry_callback=None, trigger_rule=u'all_success', resources=None , run_as_user=None, task_concurrency=None, executor_config=None, inlets=None, outlets=None, *args, **kwargs)[source]¶ +class airflow.models.BaseOperator(task_id, owner='Airflow', email=None, email_on_retry=True, email_on_failure=True, retries=0, retry_delay=datetime.timedelta(0, 300), retry_exponential_backoff=False, max_retry_delay=None, start_date=None, end_date=None, schedule_interval=None, depends_on_past=False, wait_for_downstream=False, dag=None, params=None, default_args=None, adhoc=False, priority_weight=1, weight_rule='downstream', queue='default', pool=None, sla=None, execution_timeout=None, on_failure_callback=None, on_success_callback=None, on_retry_callback=None, trigger_rule='all_success', resources=None, run_as_user=None, task_concurrency=None, executor_config=None, inlets=None, outlets=None, *args, **kwargs)[source]¶ Bases: airflow.utils.log.logging_mixin.LoggingMixin Abstract base class for all operators. Since operators create objects that become nodes in the dag, BaseOperator contains many recursive methods for @@ -326,7 +326,7 @@ of this task fails. a context dictionary is passed as a single parameter to this function. Context contains references to related objects to the task instance and is documented under the macros section of the API. -on_retry_callback â much like the on_failure_callback except +on_retry_callback (callable) â much like the on_failure_callback except that it is executed when retries occur. on_success_callback (callable) â much like the on_failure_callback except that it is executed when the task succeeds. @@ -344,17 +344,17 @@ Resources constructor) to their values. runs across execution_dates executor_config (dict) â Additional task-level configuration parameters that are interpreted by a specific executor. Parameters are namespaced by the name of -executor. -``example: to run this task in a specific docker container through -the KubernetesExecutor -MyOperator(â¦, - -executor_config={ -âKubernetesExecutorâ: -{âimageâ: âmyCustomDockerImageâ} -} - -)`` +executor. +Example: to run this task in a specific docker container through +the KubernetesExecutor +MyOperator(..., +executor_config={ +KubernetesExecutor: +{image: myCustomDockerImage} +} +) + + @@ -363,7 +363,7 @@ MyOperator(â¦, -clear(**kwargs)[source]¶ +clear(start_date=None, end_date=None, upstream=False, downstream=False, session=None)[source]¶ Clears the state of task instances associated with the task, following the parameters specified. @@ -446,7 +446,7 @@ ghost processes behind. -post_execute(context, *args, **kwargs)[source]¶ +post_execute(context, result=None)[source]¶ This hook is triggered right after self.execute() is called. It is passed the execution context and any results returned by the operator. @@ -454,7 +454,7 @@ operator. -pre_execute(context, *args, **kwargs)[source]¶ +pre_execute(context)[source]¶ This hook is triggered right before self.execute() is called. @@ -519,7 +519,7 @@ task. -xcom_pull(context, task_ids=None, dag_id=None, key=u'return_value', include_prior_dates=None)[source]¶ +xcom_pull(context, task_ids=None, dag_id=None, key='return_value', include_prior_dates=None)[source]¶ See TaskInstance.xcom_pull() @@ -540,7 +540,7 @@ attributes. class airflow.sensors.base_sensor_operator.BaseSensorOperator(poke_interval=60, timeout=604800, soft_fail=False, *args, **kwargs)[source]¶ -Bases: airflow.models.BaseOperator, airflow.models.SkipMixin +Bases: airflow.models.BaseOperator, airflow.models.SkipMixin Sensor operators are derived from this class an inherit these attributes. Sensor operators keep executing at a time interval and
[17/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/dagrun_operator.html -- diff --git a/_modules/airflow/operators/dagrun_operator.html b/_modules/airflow/operators/dagrun_operator.html index 3a3036c..1d0c4ec 100644 --- a/_modules/airflow/operators/dagrun_operator.html +++ b/_modules/airflow/operators/dagrun_operator.html @@ -227,7 +227,7 @@ self.trigger_dag_id = trigger_dag_id self.execution_date = execution_date -def execute(self, context): +[docs] def execute(self, context): dro = DagRunOrder(run_id=trig__ + timezone.utcnow().isoformat()) if self.python_callable is not None: dro = self.python_callable(context, dro) @@ -238,7 +238,7 @@ execution_date=self.execution_date, replace_microseconds=False) else: -self.log.info(Criteria not met, moving on) +self.log.info(Criteria not met, moving on) @@ -269,20 +269,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/operators/docker_operator.html -- diff --git a/_modules/airflow/operators/docker_operator.html b/_modules/airflow/operators/docker_operator.html index 3ecbd9f..f5e63d1 100644 --- a/_modules/airflow/operators/docker_operator.html +++ b/_modules/airflow/operators/docker_operator.html @@ -204,6 +204,7 @@ be provided with the parameter ``docker_conn_id``. :param image: Docker image from which to create the container. +If image tag is omitted, latest will be used. :type image: str :param api_version: Remote API version. Set to ``auto`` to automatically detect the servers version. @@ -214,12 +215,16 @@ This value gets multiplied with 1024. See https://docs.docker.com/engine/reference/run/#cpu-share-constraint :type cpus: float +:param dns: Docker custom DNS servers +:type dns: list of strings +:param dns_search: Docker custom DNS search domain +:type dns_search: list of strings :param docker_url: URL of the host running the docker daemon. Default is unix://var/run/docker.sock :type docker_url: str :param environment: Environment variables to set in the container. (templated) :type environment: dict -:param force_pull: Pull the docker image on every run. Default is false. +:param force_pull: Pull the docker image on every run. Default is False. :type force_pull: bool :param mem_limit: Maximum amount of memory the container can use. Either a float value, which represents the limit in bytes, @@ -260,6 +265,9 @@ :type xcom_all: bool :param docker_conn_id: ID of the Airflow connection to use :type docker_conn_id: str +:param shm_size: Size of ``/dev/shm`` in bytes. The size must be +greater than 0. If omitted uses system default. +:type shm_size: int template_fields = (command, environment,) template_ext = (.sh, .bash,) @@ -288,6 +296,9 @@ xcom_push=False, xcom_all=False, docker_conn_id=None, +dns=None, +dns_search=None, +shm_size=None, *args, **kwargs): @@ -295,6 +306,8 @@ self.api_version = api_version self.command = command self.cpus = cpus +self.dns = dns +self.dns_search = dns_search self.docker_url = docker_url self.environment = environment or {} self.force_pull = force_pull @@ -313,7 +326,7 @@ self.xcom_push_flag = xcom_push self.xcom_all = xcom_all self.docker_conn_id = docker_conn_id -self.shm_size = kwargs.get(shm_size) +self.shm_size = shm_size self.cli = None self.container = None @@ -326,7 +339,7 @@ tls=self.__get_tls_config() ) -def execute(self, context): +[docs] def execute(self, context): self.log.info(Starting docker container from image %s, self.image) tls_config = self.__get_tls_config() @@ -340,18 +353,12 @@ tls=tls_config ) -if : not in self.image: -image = self.image + :latest -else: -image = self.image - -if self.force_pull or len(self.cli.images(name=image)) == 0: -self.log.info(Pulling docker
[22/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/sagemaker_tuning_operator.html -- diff --git a/_modules/airflow/contrib/operators/sagemaker_tuning_operator.html b/_modules/airflow/contrib/operators/sagemaker_tuning_operator.html new file mode 100644 index 000..5534a66 --- /dev/null +++ b/_modules/airflow/contrib/operators/sagemaker_tuning_operator.html @@ -0,0 +1,310 @@ + + + + + + + + + + + airflow.contrib.operators.sagemaker_tuning_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.sagemaker_tuning_operator + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.sagemaker_tuning_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.aws_hook import AwsHook +from airflow.contrib.operators.sagemaker_base_operator import SageMakerBaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.exceptions import AirflowException + + +[docs]class SageMakerTuningOperator(SageMakerBaseOperator): + +Initiate a SageMaker hyperparameter tuning job. + +This operator returns The ARN of the tuning job created in Amazon SageMaker. + +:param config: The configuration necessary to start a tuning job (templated). + +For details of the configuration parameter, See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_hyper_parameter_tuning_job +:type config: dict +:param aws_conn_id: The AWS connection ID to use. +:type aws_conn_id: str +:param wait_for_completion: Set to True to wait until the tuning job finishes. +:type wait_for_completion: bool +:param check_interval: If wait is set to True, the time interval, in seconds, +that this operation waits to check the status of the tuning job. +:type check_interval: int +:param max_ingestion_time: If wait is set to True, the operation fails +if the tuning job doesnt finish within max_ingestion_time seconds. If you +set this parameter to None, the operation does not timeout. +:type max_ingestion_time: int + # noqa: E501 + +integer_fields = [ +[HyperParameterTuningJobConfig, ResourceLimits, MaxNumberOfTrainingJobs], +[HyperParameterTuningJobConfig, ResourceLimits, MaxParallelTrainingJobs], +[TrainingJobDefinition, ResourceConfig, InstanceCount], +[TrainingJobDefinition, ResourceConfig, VolumeSizeInGB], +[TrainingJobDefinition, StoppingCondition, MaxRuntimeInSeconds] +] + +@apply_defaults +def __init__(self, + config, + wait_for_completion=True, + check_interval=30, + max_ingestion_time=None, + *args, **kwargs): +super(SageMakerTuningOperator, self).__init__(config=config, + *args, **kwargs) +self.config = config +
[29/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/bigquery_to_gcs.html -- diff --git a/_modules/airflow/contrib/operators/bigquery_to_gcs.html b/_modules/airflow/contrib/operators/bigquery_to_gcs.html index 9052db6..87a2a29 100644 --- a/_modules/airflow/contrib/operators/bigquery_to_gcs.html +++ b/_modules/airflow/contrib/operators/bigquery_to_gcs.html @@ -215,8 +215,12 @@ For this to work, the service account making the request must have domain-wide delegation enabled. :type delegate_to: string +:param labels: a dictionary containing labels for the job/query, +passed to BigQuery +:type labels: dict -template_fields = (source_project_dataset_table, destination_cloud_storage_uris) +template_fields = (source_project_dataset_table, + destination_cloud_storage_uris, labels) template_ext = (.sql,) ui_color = #e4e6f0 @@ -230,6 +234,7 @@ print_header=True, bigquery_conn_id=bigquery_default, delegate_to=None, + labels=None, *args, **kwargs): super(BigQueryToCloudStorageOperator, self).__init__(*args, **kwargs) @@ -241,8 +246,9 @@ self.print_header = print_header self.bigquery_conn_id = bigquery_conn_id self.delegate_to = delegate_to +self.labels = labels -def execute(self, context): +[docs] def execute(self, context): self.log.info(Executing extract of %s into: %s, self.source_project_dataset_table, self.destination_cloud_storage_uris) @@ -256,7 +262,8 @@ self.compression, self.export_format, self.field_delimiter, -self.print_header) +self.print_header, +self.labels) @@ -287,20 +294,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/cassandra_to_gcs.html -- diff --git a/_modules/airflow/contrib/operators/cassandra_to_gcs.html b/_modules/airflow/contrib/operators/cassandra_to_gcs.html index a53bc3d..73b5b34 100644 --- a/_modules/airflow/contrib/operators/cassandra_to_gcs.html +++ b/_modules/airflow/contrib/operators/cassandra_to_gcs.html @@ -287,7 +287,7 @@ VarcharType: STRING, } -def execute(self, context): +[docs] def execute(self, context): cursor = self._query_cassandra() files_to_upload = self._write_local_data_files(cursor) @@ -306,7 +306,7 @@ file_handle.close() # Close all sessions and connection associated with this Cassandra cluster -self.hook.shutdown_cluster() +self.hook.shutdown_cluster() def _query_cassandra(self): @@ -544,20 +544,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/databricks_operator.html -- diff --git a/_modules/airflow/contrib/operators/databricks_operator.html b/_modules/airflow/contrib/operators/databricks_operator.html index f401636..51c13d7 100644 --- a/_modules/airflow/contrib/operators/databricks_operator.html +++ b/_modules/airflow/contrib/operators/databricks_operator.html @@ -307,6 +307,9 @@ :param databricks_retry_limit: Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1. :type databricks_retry_limit: int +:param databricks_retry_delay: Number of seconds to wait between retries (it +might be a floating point number). +:type databricks_retry_delay: float :param do_xcom_push: Whether we should push run_id and run_page_url to xcom. :type do_xcom_push: boolean @@ -329,6 +332,7 @@ databricks_conn_id=databricks_default, polling_period_seconds=30, databricks_retry_limit=3,
[33/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/imap_hook.html -- diff --git a/_modules/airflow/contrib/hooks/imap_hook.html b/_modules/airflow/contrib/hooks/imap_hook.html new file mode 100644 index 000..c9d9e48 --- /dev/null +++ b/_modules/airflow/contrib/hooks/imap_hook.html @@ -0,0 +1,491 @@ + + + + + + + + + + + airflow.contrib.hooks.imap_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.imap_hook + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.imap_hook +# -*- coding: utf-8 -*- +# +# Licensed under the Apache License, Version 2.0 (the License); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import email +import imaplib +import os +import re + +from airflow import LoggingMixin +from airflow.hooks.base_hook import BaseHook + + +[docs]class ImapHook(BaseHook): + +This hook connects to a mail server by using the imap protocol. + +:param imap_conn_id: The connection id that contains the information + used to authenticate the client. + The default value is imap_default. +:type imap_conn_id: str + + +def __init__(self, imap_conn_id=imap_default): +super(ImapHook, self).__init__(imap_conn_id) +self.conn = self.get_connection(imap_conn_id) +self.mail_client = imaplib.IMAP4_SSL(self.conn.host) + +def __enter__(self): +self.mail_client.login(self.conn.login, self.conn.password) +return self + +def __exit__(self, exc_type, exc_val, exc_tb): +self.mail_client.logout() + +[docs] def has_mail_attachment(self, name, mail_folder=INBOX, check_regex=False): + +Checks the mail folder for mails containing attachments with the given name. + +:param name: The name of the attachment that will be searched for. +:type name: str +:param mail_folder: The mail folder where to look at. +The default value is INBOX. +:type mail_folder: str +:param check_regex: Checks the name for a regular expression. +The default value is False. +:type check_regex: bool +:returns: True if there is an attachment with the given name and False if not. +:rtype: bool + +mail_attachments = self._retrieve_mails_attachments_by_name(name, mail_folder, +check_regex, +latest_only=True) +return len(mail_attachments) 0 + +[docs] def retrieve_mail_attachments(self, name, mail_folder=INBOX, check_regex=False, + latest_only=False): + +Retrieves mails attachments in the mail folder by its name. + +:param name: The name of the attachment that will be downloaded. +:type name: str +:param mail_folder: The mail folder where to look at. +The default value is INBOX. +:type mail_folder: str +:param check_regex: Checks the name for a regular expression. +The default value is False. +:type check_regex: bool +:param latest_only: If set to True it will only retrieve +
[21/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/vertica_to_hive.html -- diff --git a/_modules/airflow/contrib/operators/vertica_to_hive.html b/_modules/airflow/contrib/operators/vertica_to_hive.html index 6e6f8c4..24b843e 100644 --- a/_modules/airflow/contrib/operators/vertica_to_hive.html +++ b/_modules/airflow/contrib/operators/vertica_to_hive.html @@ -268,7 +268,7 @@ } return d[vertica_type] if vertica_type in d else STRING -def execute(self, context): +[docs] def execute(self, context): hive = HiveCliHook(hive_cli_conn_id=self.hive_cli_conn_id) vertica = VerticaHook(vertica_conn_id=self.vertica_conn_id) @@ -297,7 +297,7 @@ create=self.create, partition=self.partition, delimiter=self.delimiter, -recreate=self.recreate) +recreate=self.recreate) @@ -328,20 +328,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/winrm_operator.html -- diff --git a/_modules/airflow/contrib/operators/winrm_operator.html b/_modules/airflow/contrib/operators/winrm_operator.html index fa546c3..ec9a1f2 100644 --- a/_modules/airflow/contrib/operators/winrm_operator.html +++ b/_modules/airflow/contrib/operators/winrm_operator.html @@ -223,7 +223,7 @@ self.timeout = timeout self.do_xcom_push = do_xcom_push -def execute(self, context): +[docs] def execute(self, context): try: if self.ssh_conn_id and not self.winrm_hook: self.log.info(hook not found, creating) @@ -268,7 +268,7 @@ except Exception as e: raise AirflowException(WinRM operator error: {0}.format(str(e))) -return True +return True @@ -299,20 +299,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html b/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html index 84012a0..55215d1 100644 --- a/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html +++ b/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html @@ -241,20 +241,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/sensors/bash_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/bash_sensor.html b/_modules/airflow/contrib/sensors/bash_sensor.html index f9858d8..b3c1619 100644 --- a/_modules/airflow/contrib/sensors/bash_sensor.html +++ b/_modules/airflow/contrib/sensors/bash_sensor.html @@ -284,20 +284,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/sensors/bigquery_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/bigquery_sensor.html b/_modules/airflow/contrib/sensors/bigquery_sensor.html index
[31/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/segment_hook.html -- diff --git a/_modules/airflow/contrib/hooks/segment_hook.html b/_modules/airflow/contrib/hooks/segment_hook.html index ccb8fd0..fb4396f 100644 --- a/_modules/airflow/contrib/hooks/segment_hook.html +++ b/_modules/airflow/contrib/hooks/segment_hook.html @@ -281,20 +281,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/hooks/sftp_hook.html -- diff --git a/_modules/airflow/contrib/hooks/sftp_hook.html b/_modules/airflow/contrib/hooks/sftp_hook.html index efc5485..84a9fe8 100644 --- a/_modules/airflow/contrib/hooks/sftp_hook.html +++ b/_modules/airflow/contrib/hooks/sftp_hook.html @@ -182,11 +182,14 @@ import pysftp import logging import datetime -from airflow.hooks.base_hook import BaseHook +from airflow.contrib.hooks.ssh_hook import SSHHook -[docs]class SFTPHook(BaseHook): +[docs]class SFTPHook(SSHHook): +This hook is inherited from SSH hook. Please refer to SSH hook for the input +arguments. + Interact with SFTP. Aims to be interchangeable with FTPHook. Pitfalls: - In contrast with FTPHook describe_directory only returns size, type and @@ -200,32 +203,74 @@ Errors that may occur throughout but should be handled downstream. -def __init__(self, ftp_conn_id=sftp_default): -self.ftp_conn_id = ftp_conn_id +def __init__(self, ftp_conn_id=sftp_default, *args, **kwargs): +kwargs[ssh_conn_id] = ftp_conn_id +super(SFTPHook, self).__init__(*args, **kwargs) + self.conn = None +self.private_key_pass = None + +# Fail for unverified hosts, unless this is explicitly allowed +self.no_host_key_check = False + +if self.ssh_conn_id is not None: +conn = self.get_connection(self.ssh_conn_id) +if conn.extra is not None: +extra_options = conn.extra_dejson +if private_key_pass in extra_options: +self.private_key_pass = extra_options.get(private_key_pass, None) + +# For backward compatibility +# TODO: remove in Airflow 2.1 +import warnings +if ignore_hostkey_verification in extra_options: +warnings.warn( +Extra option `ignore_hostkey_verification` is deprecated. +Please use `no_host_key_check` instead. +This option will be removed in Airflow 2.1, +DeprecationWarning, +stacklevel=2, +) +self.no_host_key_check = str( +extra_options[ignore_hostkey_verification] +).lower() == true + +if no_host_key_check in extra_options: +self.no_host_key_check = str( +extra_options[no_host_key_check]).lower() == true + +if private_key in extra_options: +warnings.warn( +Extra option `private_key` is deprecated. +Please use `key_file` instead. +This option will be removed in Airflow 2.1, +DeprecationWarning, +stacklevel=2, +) +self.key_file = extra_options.get(private_key) [docs] def get_conn(self): Returns an SFTP connection object if self.conn is None: -params = self.get_connection(self.ftp_conn_id) cnopts = pysftp.CnOpts() -if (ignore_hostkey_verification in params.extra_dejson and -params.extra_dejson[ignore_hostkey_verification]): +if self.no_host_key_check: cnopts.hostkeys = None +cnopts.compression = self.compress conn_params = { -host: params.host, -port: params.port, -username: params.login, +host: self.remote_host, +port: self.port, +username: self.username, cnopts: cnopts } -if params.password is not None: -conn_params[password] = params.password -if private_key in
[28/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/datastore_export_operator.html -- diff --git a/_modules/airflow/contrib/operators/datastore_export_operator.html b/_modules/airflow/contrib/operators/datastore_export_operator.html index 915c753..2237184 100644 --- a/_modules/airflow/contrib/operators/datastore_export_operator.html +++ b/_modules/airflow/contrib/operators/datastore_export_operator.html @@ -245,7 +245,7 @@ self.overwrite_existing = overwrite_existing self.xcom_push = xcom_push -def execute(self, context): +[docs] def execute(self, context): self.log.info(Exporting data to Cloud Storage bucket + self.bucket) if self.overwrite_existing and self.namespace: @@ -268,7 +268,7 @@ raise AirflowException(Operation failed: result={}.format(result)) if self.xcom_push: -return result +return result @@ -299,20 +299,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/datastore_import_operator.html -- diff --git a/_modules/airflow/contrib/operators/datastore_import_operator.html b/_modules/airflow/contrib/operators/datastore_import_operator.html index c7d84b4..8feddfb 100644 --- a/_modules/airflow/contrib/operators/datastore_import_operator.html +++ b/_modules/airflow/contrib/operators/datastore_import_operator.html @@ -239,7 +239,7 @@ self.polling_interval_in_seconds = polling_interval_in_seconds self.xcom_push = xcom_push -def execute(self, context): +[docs] def execute(self, context): self.log.info(Importing data from Cloud Storage bucket %s, self.bucket) ds_hook = DatastoreHook(self.datastore_conn_id, self.delegate_to) result = ds_hook.import_from_storage_bucket(bucket=self.bucket, @@ -256,7 +256,7 @@ raise AirflowException(Operation failed: result={}.format(result)) if self.xcom_push: -return result +return result @@ -287,20 +287,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/discord_webhook_operator.html -- diff --git a/_modules/airflow/contrib/operators/discord_webhook_operator.html b/_modules/airflow/contrib/operators/discord_webhook_operator.html index 3122f2a..c480587 100644 --- a/_modules/airflow/contrib/operators/discord_webhook_operator.html +++ b/_modules/airflow/contrib/operators/discord_webhook_operator.html @@ -287,20 +287,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/druid_operator.html -- diff --git a/_modules/airflow/contrib/operators/druid_operator.html b/_modules/airflow/contrib/operators/druid_operator.html index dd9de81..7941dc6 100644 --- a/_modules/airflow/contrib/operators/druid_operator.html +++ b/_modules/airflow/contrib/operators/druid_operator.html @@ -214,13 +214,13 @@ separators=(,, : ) ) -def execute(self, context): +[docs] def execute(self, context): hook = DruidHook( druid_ingest_conn_id=self.conn_id, max_ingestion_time=self.max_ingestion_time ) self.log.info(Sumitting %s, self.index_spec_str) -hook.submit_indexing_job(self.index_spec_str) +hook.submit_indexing_job(self.index_spec_str) @@ -251,20 +251,13 @@ - -
[24/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/pubsub_operator.html -- diff --git a/_modules/airflow/contrib/operators/pubsub_operator.html b/_modules/airflow/contrib/operators/pubsub_operator.html index 44fa17e..38a471e 100644 --- a/_modules/airflow/contrib/operators/pubsub_operator.html +++ b/_modules/airflow/contrib/operators/pubsub_operator.html @@ -250,12 +250,12 @@ self.gcp_conn_id = gcp_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): hook = PubSubHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to) hook.create_topic(self.project, self.topic, - fail_if_exists=self.fail_if_exists) + fail_if_exists=self.fail_if_exists) [docs]class PubSubSubscriptionCreateOperator(BaseOperator): @@ -358,14 +358,14 @@ self.gcp_conn_id = gcp_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): hook = PubSubHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to) return hook.create_subscription( self.topic_project, self.topic, self.subscription, self.subscription_project, self.ack_deadline_secs, -self.fail_if_exists) +self.fail_if_exists) [docs]class PubSubTopicDeleteOperator(BaseOperator): @@ -434,12 +434,12 @@ self.gcp_conn_id = gcp_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): hook = PubSubHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to) hook.delete_topic(self.project, self.topic, - fail_if_not_exists=self.fail_if_not_exists) + fail_if_not_exists=self.fail_if_not_exists) [docs]class PubSubSubscriptionDeleteOperator(BaseOperator): @@ -510,12 +510,12 @@ self.gcp_conn_id = gcp_conn_id self.delegate_to = delegate_to -def execute(self, context): +[docs] def execute(self, context): hook = PubSubHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to) hook.delete_subscription(self.project, self.subscription, - fail_if_not_exists=self.fail_if_not_exists) + fail_if_not_exists=self.fail_if_not_exists) [docs]class PubSubPublishOperator(BaseOperator): @@ -588,10 +588,10 @@ self.topic = topic self.messages = messages -def execute(self, context): +[docs] def execute(self, context): hook = PubSubHook(gcp_conn_id=self.gcp_conn_id, delegate_to=self.delegate_to) -hook.publish(self.project, self.topic, self.messages) +hook.publish(self.project, self.topic, self.messages) @@ -622,20 +622,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/qubole_check_operator.html -- diff --git a/_modules/airflow/contrib/operators/qubole_check_operator.html b/_modules/airflow/contrib/operators/qubole_check_operator.html index 0d54850..4a9c63e 100644 --- a/_modules/airflow/contrib/operators/qubole_check_operator.html +++ b/_modules/airflow/contrib/operators/qubole_check_operator.html @@ -245,12 +245,12 @@ self.on_failure_callback = QuboleCheckHook.handle_failure_retry self.on_retry_callback = QuboleCheckHook.handle_failure_retry -def execute(self, context=None): +[docs] def execute(self, context=None): try: self.hook = self.get_hook(context=context) super(QuboleCheckOperator, self).execute(context=context) except AirflowException as e: -handle_airflow_exception(e, self.get_hook()) +handle_airflow_exception(e, self.get_hook()) def get_db_hook(self): return self.get_hook() @@ -332,12 +332,12 @@ self.on_failure_callback = QuboleCheckHook.handle_failure_retry self.on_retry_callback = QuboleCheckHook.handle_failure_retry -def execute(self, context=None): +[docs] def execute(self, context=None): try: self.hook =
[27/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcp_function_operator.html -- diff --git a/_modules/airflow/contrib/operators/gcp_function_operator.html b/_modules/airflow/contrib/operators/gcp_function_operator.html new file mode 100644 index 000..b84716a --- /dev/null +++ b/_modules/airflow/contrib/operators/gcp_function_operator.html @@ -0,0 +1,522 @@ + + + + + + + + + + + airflow.contrib.operators.gcp_function_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.gcp_function_operator + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.gcp_function_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +import re + +from googleapiclient.errors import HttpError + +from airflow import AirflowException +from airflow.contrib.utils.gcp_field_validator import GcpBodyFieldValidator, \ +GcpFieldValidationException +from airflow.version import version +from airflow.models import BaseOperator +from airflow.contrib.hooks.gcp_function_hook import GcfHook +from airflow.utils.decorators import apply_defaults + + +def _validate_available_memory_in_mb(value): +if int(value) = 0: +raise GcpFieldValidationException(The available memory has to be greater than 0) + + +def _validate_max_instances(value): +if int(value) = 0: +raise GcpFieldValidationException( +The max instances parameter has to be greater than 0) + + +CLOUD_FUNCTION_VALIDATION = [ +dict(name=name, regexp=^.+$), +dict(name=description, regexp=^.+$, optional=True), +dict(name=entryPoint, regexp=r^.+$, optional=True), +dict(name=runtime, regexp=r^.+$, optional=True), +dict(name=timeout, regexp=r^.+$, optional=True), +dict(name=availableMemoryMb, custom_validation=_validate_available_memory_in_mb, + optional=True), +dict(name=labels, optional=True), +dict(name=environmentVariables, optional=True), +dict(name=network, regexp=r^.+$, optional=True), +dict(name=maxInstances, optional=True, custom_validation=_validate_max_instances), + +dict(name=source_code, type=union, fields=[ +dict(name=sourceArchiveUrl, regexp=r^.+$), +dict(name=sourceRepositoryUrl, regexp=r^.+$, api_version=v1beta2), +dict(name=sourceRepository, type=dict, fields=[ +dict(name=url, regexp=r^.+$) +]), +dict(name=sourceUploadUrl) +]), + +dict(name=trigger, type=union, fields=[ +dict(name=httpsTrigger, type=dict, fields=[ +# This dict should be empty at input (url is added at output) +]), +dict(name=eventTrigger, type=dict, fields=[ +dict(name=eventType, regexp=r^.+$), +dict(name=resource, regexp=r^.+$), +dict(name=service, regexp=r^.+$, optional=True), +dict(name=failurePolicy, type=dict, optional=True, fields=[ +dict(name=retry, type=dict, optional=True) +]) +]) +]), +] + +
[07/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/executor/use-celery.html -- diff --git a/howto/executor/use-celery.html b/howto/executor/use-celery.html index d29401f..a28a857 100644 --- a/howto/executor/use-celery.html +++ b/howto/executor/use-celery.html @@ -216,7 +216,7 @@ to start a Flower web server. Make sure to use a database backed result backend Make sure to set a visibility timeout in [celery_broker_transport_options] that exceeds the ETA of your longest running task -Tasks can and consume resources, make sure your worker as enough resources to run worker_concurrency tasks +Tasks can consume resources. Make sure your worker has enough resources to run worker_concurrency tasks @@ -258,20 +258,13 @@ to start a Flower web server. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/executor/use-dask.html -- diff --git a/howto/executor/use-dask.html b/howto/executor/use-dask.html index bc18ef5..bb7433d 100644 --- a/howto/executor/use-dask.html +++ b/howto/executor/use-dask.html @@ -244,20 +244,13 @@ warning will be raised but the task will be submitted to the cluster. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/executor/use-mesos.html -- diff --git a/howto/executor/use-mesos.html b/howto/executor/use-mesos.html index ba905ea..f958feb 100644 --- a/howto/executor/use-mesos.html +++ b/howto/executor/use-mesos.html @@ -283,20 +283,13 @@ For any queries/bugs on MesosExecutor, please contact -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/index.html -- diff --git a/howto/index.html b/howto/index.html index 4189bc4..6147610 100644 --- a/howto/index.html +++ b/howto/index.html @@ -259,20 +259,13 @@ configuring an Airflow environment. - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/initialize-database.html -- diff --git a/howto/initialize-database.html b/howto/initialize-database.html index b03d757..21bd6de 100644 --- a/howto/initialize-database.html +++ b/howto/initialize-database.html @@ -249,20 +249,13 @@ airflow initdb - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/howto/manage-connections.html -- diff --git a/howto/manage-connections.html b/howto/manage-connections.html index 0cedef4..0859080 100644 --- a/howto/manage-connections.html +++ b/howto/manage-connections.html @@ -94,6 +94,7 @@ Creating a Connection with Environment Variables Connection Types Google Cloud Platform +MySQL @@ -191,7 +192,7 @@ Managing Connections¶ Airflow needs to know how to connect to your environment. Information such as
[03/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/kubernetes.html -- diff --git a/kubernetes.html b/kubernetes.html index 9d0ca02..a9bd534 100644 --- a/kubernetes.html +++ b/kubernetes.html @@ -163,9 +163,11 @@ Kubernetes Executor¶ The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes executor will create a new pod for every task instance. -Example helm charts are available at scripts/ci/kubernetes/kube/{airflow,volumes,postgres}.yaml in the source distribution. The volumes are optional and depend on your configuration. There are two volumes available: -- Dags: by storing all the dags onto the persistent disks, all the workers can read the dags from there. Another option is using git-sync, before starting the container, a git pull of the dags repository will be performed and used throughout the lifecycle of the pod/ -- Logs: by storing the logs onto a persistent disk, all the logs will be available for all the workers and the webserver itself. If you donât configure this, the logs will be lost after the worker pods shuts down. Another option is to use S3/GCS/etc to store the logs. +Example helm charts are available at scripts/ci/kubernetes/kube/{airflow,volumes,postgres}.yaml in the source distribution. The volumes are optional and depend on your configuration. There are two volumes available: + +Dags: by storing all the dags onto the persistent disks, all the workers can read the dags from there. Another option is using git-sync, before starting the container, a git pull of the dags repository will be performed and used throughout the lifecycle of the pod. +Logs: by storing the logs onto a persistent disk, all the logs will be available for all the workers and the webserver itself. If you donât configure this, the logs will be lost after the worker pods shuts down. Another option is to use S3/GCS/etc to store the logs. + Kubernetes Operator¶ @@ -237,6 +239,14 @@ } } +tolerations = [ +{ +key: key, +operator: Equal, +value: value + } +] + k = KubernetesPodOperator(namespace=default, image=ubuntu:16.04, cmds=[bash, -cx], @@ -247,13 +257,16 @@ volume_mounts=[volume_mount] name=test, task_id=task, - affinity=affinity + affinity=affinity, + is_delete_operator_pod=True, + hostnetwork=False, + tolerations=tolerations ) -class airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator(namespace, image, name, cmds=None, arguments=None, volume_mounts=None, volumes=None, env_vars=None, secrets=None, in_cluster=False, cluster_context=None, labels=None, startup_timeout_seconds=120, get_logs=True, image_pull_policy='IfNotPresent', annotations=None, resources=None, affinity=None, config_file=None, xcom_push=False, *args, **kwargs)[source]¶ +class airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator(namespace, image, name, cmds=None, arguments=None, volume_mounts=None, volumes=None, env_vars=None, secrets=None, in_cluster=False, cluster_context=None, labels=None, startup_timeout_seconds=120, get_logs=True, image_pull_policy='IfNotPresent', annotations=None, resources=None, affinity=None, config_file=None, xcom_push=False, node_selectors=None, image_pull_secrets=None, service_account_name='default', is_delete_operator_pod=False, hostnetwork=False, tolerations=None, *args, **kwargs)[source]¶ Bases: airflow.models.BaseOperator Execute a task in a Kubernetes Pod @@ -281,10 +294,12 @@ They can be exposed as environment vars or files in a volume. Ignored when in_cluster is True. If None, current-context is used. get_logs (bool) â get the stdout of the container as logs of the tasks affinity (dict) â A dict containing a group of affinity scheduling rules +node_selectors (dict) â A dict containing a group of scheduling rules config_file (str) â The path to the Kubernetes config file xcom_push (bool) â If xcom_push is True, the content of the file /airflow/xcom/return.json in the container will also be pushed to an XCom when the container completes. +tolerations â Kubernetes tolerations @@ -296,12 +311,22 @@ XCom when the container completes. +:type list of tolerations + + +execute(context)[source]¶ +This is the main method to derive when creating an operator. +Context is the same dictionary used as when rendering jinja templates. +Refer to get_template_context for more context. + + class airflow.contrib.kubernetes.secret.Secret(deploy_type, deploy_target, secret, key)[source]¶ -Defines Kubernetes Secret Volume +Bases: object +Defines Kubernetes
[19/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/hooks/druid_hook.html -- diff --git a/_modules/airflow/hooks/druid_hook.html b/_modules/airflow/hooks/druid_hook.html index f8d8597..825021e 100644 --- a/_modules/airflow/hooks/druid_hook.html +++ b/_modules/airflow/hooks/druid_hook.html @@ -198,7 +198,8 @@ which accepts index jobs :type druid_ingest_conn_id: string :param timeout: The interval between polling -the Druid job for the status of the ingestion job +the Druid job for the status of the ingestion job. +Must be greater than or equal to 1 :type timeout: int :param max_ingestion_time: The maximum ingestion time before assuming the job failed :type max_ingestion_time: int @@ -214,6 +215,9 @@ self.max_ingestion_time = max_ingestion_time self.header = {content-type: application/json} +if self.timeout 1: +raise ValueError(Druid timeout should be equal or greater than 1) + def get_conn_url(self): conn = self.get_connection(self.druid_ingest_conn_id) host = conn.host @@ -225,7 +229,8 @@ def submit_indexing_job(self, json_index_spec): url = self.get_conn_url() -req_index = requests.post(url, data=json_index_spec, headers=self.header) +self.log.info(Druid ingestion spec: {}.format(json_index_spec)) +req_index = requests.post(url, json=json_index_spec, headers=self.header) if (req_index.status_code != 200): raise AirflowException(Did not get 200 when submitting the Druid job to {}.format(url)) @@ -233,6 +238,7 @@ req_json = req_index.json() # Wait until the job is completed druid_task_id = req_json[task] +self.log.info(Druid indexing task-id: {}.format(druid_task_id)) running = True @@ -242,8 +248,6 @@ self.log.info(Job still running for %s seconds..., sec) -sec = sec + 1 - if self.max_ingestion_time and sec self.max_ingestion_time: # ensure that the job gets killed if the max ingestion time is exceeded requests.post({0}/{1}/shutdown.format(url, druid_task_id)) @@ -252,6 +256,8 @@ time.sleep(self.timeout) +sec = sec + self.timeout + status = req_status.json()[status][status] if status == RUNNING: running = True @@ -348,20 +354,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/hooks/hdfs_hook.html -- diff --git a/_modules/airflow/hooks/hdfs_hook.html b/_modules/airflow/hooks/hdfs_hook.html index 64e7162..b2068ae 100644 --- a/_modules/airflow/hooks/hdfs_hook.html +++ b/_modules/airflow/hooks/hdfs_hook.html @@ -290,20 +290,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/hooks/hive_hooks.html -- diff --git a/_modules/airflow/hooks/hive_hooks.html b/_modules/airflow/hooks/hive_hooks.html index ff22edf..f854ebd 100644 --- a/_modules/airflow/hooks/hive_hooks.html +++ b/_modules/airflow/hooks/hive_hooks.html @@ -1069,20 +1069,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/hooks/http_hook.html -- diff --git a/_modules/airflow/hooks/http_hook.html b/_modules/airflow/hooks/http_hook.html index 46340b7..769856e 100644 ---
[15/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/hdfs_sensor.html -- diff --git a/_modules/airflow/sensors/hdfs_sensor.html b/_modules/airflow/sensors/hdfs_sensor.html index fdfec6d..2fb801d 100644 --- a/_modules/airflow/sensors/hdfs_sensor.html +++ b/_modules/airflow/sensors/hdfs_sensor.html @@ -306,20 +306,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/hive_partition_sensor.html -- diff --git a/_modules/airflow/sensors/hive_partition_sensor.html b/_modules/airflow/sensors/hive_partition_sensor.html index d70cbb5..4df1553 100644 --- a/_modules/airflow/sensors/hive_partition_sensor.html +++ b/_modules/airflow/sensors/hive_partition_sensor.html @@ -264,20 +264,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/http_sensor.html -- diff --git a/_modules/airflow/sensors/http_sensor.html b/_modules/airflow/sensors/http_sensor.html index 7fbbe5d..f5c5c8c 100644 --- a/_modules/airflow/sensors/http_sensor.html +++ b/_modules/airflow/sensors/http_sensor.html @@ -281,20 +281,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/metastore_partition_sensor.html -- diff --git a/_modules/airflow/sensors/metastore_partition_sensor.html b/_modules/airflow/sensors/metastore_partition_sensor.html index b9d57e6..fefe40f 100644 --- a/_modules/airflow/sensors/metastore_partition_sensor.html +++ b/_modules/airflow/sensors/metastore_partition_sensor.html @@ -272,20 +272,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/named_hive_partition_sensor.html -- diff --git a/_modules/airflow/sensors/named_hive_partition_sensor.html b/_modules/airflow/sensors/named_hive_partition_sensor.html index 96d0246..06fcb20 100644 --- a/_modules/airflow/sensors/named_hive_partition_sensor.html +++ b/_modules/airflow/sensors/named_hive_partition_sensor.html @@ -293,20 +293,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html', -HAS_SOURCE: true, -SOURCELINK_SUFFIX: '.txt' -}; - - - - + + + + + + + http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/sensors/s3_key_sensor.html -- diff --git a/_modules/airflow/sensors/s3_key_sensor.html b/_modules/airflow/sensors/s3_key_sensor.html index 67e32a3..1fe36d8 100644 --- a/_modules/airflow/sensors/s3_key_sensor.html +++ b/_modules/airflow/sensors/s3_key_sensor.html @@ -270,20 +270,13 @@ - -var DOCUMENTATION_OPTIONS = { -URL_ROOT:'../../../', -VERSION:'', -LANGUAGE:'None', -COLLAPSE_INDEX:false, -FILE_SUFFIX:'.html',
[26/38] incubator-airflow-site git commit: Docs from 1.10.1
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/1f06fa0e/_modules/airflow/contrib/operators/gcp_sql_operator.html -- diff --git a/_modules/airflow/contrib/operators/gcp_sql_operator.html b/_modules/airflow/contrib/operators/gcp_sql_operator.html new file mode 100644 index 000..62c01b8 --- /dev/null +++ b/_modules/airflow/contrib/operators/gcp_sql_operator.html @@ -0,0 +1,737 @@ + + + + + + + + + + + airflow.contrib.operators.gcp_sql_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.gcp_sql_operator + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.gcp_sql_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from googleapiclient.errors import HttpError + +from airflow import AirflowException +from airflow.contrib.hooks.gcp_sql_hook import CloudSqlHook +from airflow.contrib.utils.gcp_field_validator import GcpBodyFieldValidator +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + +SETTINGS = settings +SETTINGS_VERSION = settingsVersion + +CLOUD_SQL_VALIDATION = [ +dict(name=name, allow_empty=False), +dict(name=settings, type=dict, fields=[ +dict(name=tier, allow_empty=False), +dict(name=backupConfiguration, type=dict, fields=[ +dict(name=binaryLogEnabled, optional=True), +dict(name=enabled, optional=True), +dict(name=replicationLogArchivingEnabled, optional=True), +dict(name=startTime, allow_empty=False, optional=True) +], optional=True), +dict(name=activationPolicy, allow_empty=False, optional=True), +dict(name=authorizedGaeApplications, type=list, optional=True), +dict(name=crashSafeReplicationEnabled, optional=True), +dict(name=dataDiskSizeGb, optional=True), +dict(name=dataDiskType, allow_empty=False, optional=True), +dict(name=databaseFlags, type=list, optional=True), +dict(name=ipConfiguration, type=dict, fields=[ +dict(name=authorizedNetworks, type=list, fields=[ +dict(name=expirationTime, optional=True), +dict(name=name, allow_empty=False, optional=True), +dict(name=value, allow_empty=False, optional=True) +], optional=True), +dict(name=ipv4Enabled, optional=True), +dict(name=privateNetwork, allow_empty=False, optional=True), +dict(name=requireSsl, optional=True), +], optional=True), +dict(name=locationPreference, type=dict, fields=[ +dict(name=followGaeApplication, allow_empty=False, optional=True), +dict(name=zone, allow_empty=False, optional=True), +], optional=True), +dict(name=maintenanceWindow, type=dict, fields=[ +dict(name=hour, optional=True), +dict(name=day, optional=True), +dict(name=updateTrack, allow_empty=False, optional=True), +], optional=True), +dict(name=pricingPlan, allow_empty=False,
[GitHub] ron819 commented on issue #2362: Make FileSensor detect file or folder
ron819 commented on issue #2362: Make FileSensor detect file or folder URL: https://github.com/apache/incubator-airflow/pull/2362#issuecomment-442878797 This PR is no longer needed. The current file sensor returns true only if the file exists in directory or sub directory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on issue #4232: [AIRFLOW-3390] Add connections to api.
jmcarp commented on issue #4232: [AIRFLOW-3390] Add connections to api. URL: https://github.com/apache/incubator-airflow/pull/4232#issuecomment-442874552 @XD-DENG: thanks for taking a look! I updated the docstrings like you suggested and added the new views to `www_rbac`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4232: [AIRFLOW-3390] Add connections to api.
jmcarp commented on a change in pull request #4232: [AIRFLOW-3390] Add connections to api. URL: https://github.com/apache/incubator-airflow/pull/4232#discussion_r237532310 ## File path: airflow/api/common/experimental/connection.py ## @@ -0,0 +1,77 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.exceptions import AirflowBadRequest, ConnectionNotFound +from airflow.models import Connection +from airflow.utils.db import provide_session + + +@provide_session +def get_connection(conn_id, session=None): +"""Get connection by a given ID.""" +if not (conn_id and conn_id.strip()): +raise AirflowBadRequest("Connection ID shouldn't be empty") + +connection = session.query(Connection).filter_by(conn_id=conn_id).first() +if connection is None: +raise ConnectionNotFound("Connection '%s' doesn't exist" % conn_id) + +return connection + + +@provide_session +def get_connections(session=None): +"""Get all connections.""" +return session.query(Connection).all() + + +@provide_session +def create_connection(conn_id, session=None, **kwargs): +"""Create a connection with a given parameters.""" +if not (conn_id and conn_id.strip()): +raise AirflowBadRequest("Connection ID shouldn't be empty") + +session.expire_on_commit = False +connection = session.query(Connection).filter_by(conn_id=conn_id).first() +if connection is None: +connection = Connection(conn_id=conn_id, **kwargs) +session.add(connection) +else: +for key, value in kwargs.items(): +setattr(connection, key, value) + +session.commit() + +return connection + + +@provide_session +def delete_connection(conn_id, session=None): Review comment: Auth works the same for connections as for dags, pools, etc.: all views in the `endpoints` modules are decorated with `requires_authentication`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files
[ https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703345#comment-16703345 ] ASF GitHub Bot commented on AIRFLOW-3239: - ashb closed pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/api/common/experimental/mark_tasks.py b/tests/api/common/experimental/test_mark_tasks.py similarity index 97% rename from tests/api/common/experimental/mark_tasks.py rename to tests/api/common/experimental/test_mark_tasks.py index 304e261b98..9afe31c951 100644 --- a/tests/api/common/experimental/mark_tasks.py +++ b/tests/api/common/experimental/test_mark_tasks.py @@ -20,7 +20,7 @@ import unittest from datetime import datetime -from airflow import models +from airflow import configuration, models from airflow.api.common.experimental.mark_tasks import ( set_state, _create_dagruns, set_dag_run_state_to_success, set_dag_run_state_to_failed, set_dag_run_state_to_running) @@ -31,12 +31,14 @@ DEV_NULL = "/dev/null" +configuration.load_test_config() + class TestMarkTasks(unittest.TestCase): def setUp(self): self.dagbag = models.DagBag(include_examples=True) -self.dag1 = self.dagbag.dags['test_example_bash_operator'] +self.dag1 = self.dagbag.dags['example_bash_operator'] self.dag2 = self.dagbag.dags['example_subdag_operator'] self.execution_dates = [days_ago(2), days_ago(1)] @@ -195,6 +197,11 @@ def test_mark_tasks_past(self): self.verify_state(self.dag1, [task.task_id], self.execution_dates, State.SUCCESS, snapshot) +# TODO: this skipIf should be removed once a fixing solution is found later +# We skip it here because this test case is working with Postgres & SQLite +# but not with MySQL +@unittest.skipIf('mysql' in configuration.conf.get('core', 'sql_alchemy_conn'), + "Flaky with MySQL") def test_mark_tasks_subdag(self): # set one task to success towards end of scheduled dag runs task = self.dag2.get_task("section-1") @@ -217,15 +224,15 @@ def test_mark_tasks_subdag(self): class TestMarkDAGRun(unittest.TestCase): def setUp(self): self.dagbag = models.DagBag(include_examples=True) -self.dag1 = self.dagbag.dags['test_example_bash_operator'] +self.dag1 = self.dagbag.dags['example_bash_operator'] self.dag2 = self.dagbag.dags['example_subdag_operator'] -self.execution_dates = [days_ago(3), days_ago(2), days_ago(1)] +self.execution_dates = [days_ago(2), days_ago(1), days_ago(0)] self.session = Session() def _set_default_task_instance_states(self, dr): -if dr.dag_id != 'test_example_bash_operator': +if dr.dag_id != 'example_bash_operator': return # success task dr.get_task_instance('runme_0').set_state(State.SUCCESS, self.session) @@ -510,6 +517,7 @@ def tearDown(self): self.session.query(models.TaskInstance).delete() self.session.query(models.DagStat).delete() self.session.commit() +self.session.close() if __name__ == '__main__': diff --git a/tests/api/common/experimental/test_pool.py b/tests/api/common/experimental/test_pool.py index e5efa2c676..fac1e2e71b 100644 --- a/tests/api/common/experimental/test_pool.py +++ b/tests/api/common/experimental/test_pool.py @@ -28,7 +28,6 @@ class TestPool(unittest.TestCase): def setUp(self): -super(TestPool, self).setUp() self.session = settings.Session() self.pools = [] for i in range(2): @@ -46,7 +45,6 @@ def tearDown(self): self.session.query(models.Pool).delete() self.session.commit() self.session.close() -super(TestPool, self).tearDown() def test_get_pool(self): pool = pool_api.get_pool(name=self.pools[0].pool, session=self.session) diff --git a/tests/api/common/experimental/trigger_dag_tests.py b/tests/api/common/experimental/test_trigger_dag.py similarity index 100% rename from tests/api/common/experimental/trigger_dag_tests.py rename to tests/api/common/experimental/test_trigger_dag.py This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Test discovery partial fails due to incorrect name
[GitHub] ashb closed pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/
ashb closed pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/api/common/experimental/mark_tasks.py b/tests/api/common/experimental/test_mark_tasks.py similarity index 97% rename from tests/api/common/experimental/mark_tasks.py rename to tests/api/common/experimental/test_mark_tasks.py index 304e261b98..9afe31c951 100644 --- a/tests/api/common/experimental/mark_tasks.py +++ b/tests/api/common/experimental/test_mark_tasks.py @@ -20,7 +20,7 @@ import unittest from datetime import datetime -from airflow import models +from airflow import configuration, models from airflow.api.common.experimental.mark_tasks import ( set_state, _create_dagruns, set_dag_run_state_to_success, set_dag_run_state_to_failed, set_dag_run_state_to_running) @@ -31,12 +31,14 @@ DEV_NULL = "/dev/null" +configuration.load_test_config() + class TestMarkTasks(unittest.TestCase): def setUp(self): self.dagbag = models.DagBag(include_examples=True) -self.dag1 = self.dagbag.dags['test_example_bash_operator'] +self.dag1 = self.dagbag.dags['example_bash_operator'] self.dag2 = self.dagbag.dags['example_subdag_operator'] self.execution_dates = [days_ago(2), days_ago(1)] @@ -195,6 +197,11 @@ def test_mark_tasks_past(self): self.verify_state(self.dag1, [task.task_id], self.execution_dates, State.SUCCESS, snapshot) +# TODO: this skipIf should be removed once a fixing solution is found later +# We skip it here because this test case is working with Postgres & SQLite +# but not with MySQL +@unittest.skipIf('mysql' in configuration.conf.get('core', 'sql_alchemy_conn'), + "Flaky with MySQL") def test_mark_tasks_subdag(self): # set one task to success towards end of scheduled dag runs task = self.dag2.get_task("section-1") @@ -217,15 +224,15 @@ def test_mark_tasks_subdag(self): class TestMarkDAGRun(unittest.TestCase): def setUp(self): self.dagbag = models.DagBag(include_examples=True) -self.dag1 = self.dagbag.dags['test_example_bash_operator'] +self.dag1 = self.dagbag.dags['example_bash_operator'] self.dag2 = self.dagbag.dags['example_subdag_operator'] -self.execution_dates = [days_ago(3), days_ago(2), days_ago(1)] +self.execution_dates = [days_ago(2), days_ago(1), days_ago(0)] self.session = Session() def _set_default_task_instance_states(self, dr): -if dr.dag_id != 'test_example_bash_operator': +if dr.dag_id != 'example_bash_operator': return # success task dr.get_task_instance('runme_0').set_state(State.SUCCESS, self.session) @@ -510,6 +517,7 @@ def tearDown(self): self.session.query(models.TaskInstance).delete() self.session.query(models.DagStat).delete() self.session.commit() +self.session.close() if __name__ == '__main__': diff --git a/tests/api/common/experimental/test_pool.py b/tests/api/common/experimental/test_pool.py index e5efa2c676..fac1e2e71b 100644 --- a/tests/api/common/experimental/test_pool.py +++ b/tests/api/common/experimental/test_pool.py @@ -28,7 +28,6 @@ class TestPool(unittest.TestCase): def setUp(self): -super(TestPool, self).setUp() self.session = settings.Session() self.pools = [] for i in range(2): @@ -46,7 +45,6 @@ def tearDown(self): self.session.query(models.Pool).delete() self.session.commit() self.session.close() -super(TestPool, self).tearDown() def test_get_pool(self): pool = pool_api.get_pool(name=self.pools[0].pool, session=self.session) diff --git a/tests/api/common/experimental/trigger_dag_tests.py b/tests/api/common/experimental/test_trigger_dag.py similarity index 100% rename from tests/api/common/experimental/trigger_dag_tests.py rename to tests/api/common/experimental/test_trigger_dag.py This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/
codecov-io commented on issue #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255#issuecomment-442871391 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=h1) Report > Merging [#4255](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/3fede98eab6145ed9c1deea3fe93f0b9b7f322e4?src=pr=desc) will **increase** coverage by `0.26%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4255/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4255 +/- ## == + Coverage 77.81% 78.07% +0.26% == Files 201 201 Lines 1645516455 == + Hits1280512848 +43 + Misses 3650 3607 -43 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4255/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.33% <0%> (+0.08%)` | :arrow_up: | | [airflow/utils/timezone.py](https://codecov.io/gh/apache/incubator-airflow/pull/4255/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy90aW1lem9uZS5weQ==) | `91.3% <0%> (+2.17%)` | :arrow_up: | | [airflow/api/common/experimental/mark\_tasks.py](https://codecov.io/gh/apache/incubator-airflow/pull/4255/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9tYXJrX3Rhc2tzLnB5) | `97.65% <0%> (+31.25%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=footer). Last update [3fede98...c61e03f](https://codecov.io/gh/apache/incubator-airflow/pull/4255?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3417) Use the platformVersion only for the FARGATE launch type
[ https://issues.apache.org/jira/browse/AIRFLOW-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703284#comment-16703284 ] ASF GitHub Bot commented on AIRFLOW-3417: - zingorn opened a new pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3417 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: The current implementation passing the platformVersion parameter all the time and we got an exception. ### Tests - [x] My PR updates the following unit tests: `test_ecs_operator.py` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use the platformVersion only for the FARGATE launch type > > > Key: AIRFLOW-3417 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3417 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 2.0.0 >Reporter: Alexander Kovalenko >Assignee: Alexander Kovalenko >Priority: Major > > By default an ECS container should be run with the EC2 launch type. > The current implementation passing the {{platformVersion}} parameter all the > time and we got an exception: > {code:java} > botocore.errorfactory.InvalidParameterException: An error occurred > (InvalidParameterException) when calling the RunTask operation: The platform > version must be null when specifying an EC2 launch type.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zingorn opened a new pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type
zingorn opened a new pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3417 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: The current implementation passing the platformVersion parameter all the time and we got an exception. ### Tests - [x] My PR updates the following unit tests: `test_ecs_operator.py` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/
XD-DENG commented on issue #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255#issuecomment-442856476 Hi @ashb @Fokko PTAL. Cheers This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files
[ https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703277#comment-16703277 ] ASF GitHub Bot commented on AIRFLOW-3239: - XD-DENG opened a new pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-3239 ### Description This is a follow up on JIRA 3239, to **further fix the unit tests**. Earlier related PRs: https://github.com/apache/incubator-airflow/pull/4074, https://github.com/apache/incubator-airflow/pull/4131). What is done in this PR - 1. Fixed (test_)trigger_dag.py by correcting its name, so it can be discovered by the CI. - 2. Fix (test_)mark_tasks.py - 2-1. properly name the file, so it can be discovered by the CI. - 2-2. Correct the name of sample DAG - 2-3. Correct the range of sample execution_dates (earlier one conflicts with the start_date of the sample DAG, which fails the test case) - 2-4. Skip `test_mark_tasks_subdag` when running with MySQL Seems something is wrong with `airflow.api.common.experimental.mark_tasks.set_state`. Corresponding test case (process subdags with `downstream=True`) works on Postgres & SQLite, but fails when on MySQL ("`(1062, "Duplicate entry '110' for key 'PRIMARY'")`". Example: https://travis-ci.org/XD-DENG/incubator-airflow/jobs/461200901). A TODO note is added to remind us fix it for MySQL later. - 3. Remove unnecessary lines in test_pool.py ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Test discovery partial fails due to incorrect name of the test files > > > Key: AIRFLOW-3239 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3239 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Major > Fix For: 2.0.0 > > > In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed > the incorrect name of some test files (resulting in partial failure in test > discovery). > There are some other scripts with this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG opened a new pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/
XD-DENG opened a new pull request #4255: [AIRFLOW-3239] Fix/refine tests in api/common/experimental/ URL: https://github.com/apache/incubator-airflow/pull/4255 ### Jira - https://issues.apache.org/jira/browse/AIRFLOW-3239 ### Description This is a follow up on JIRA 3239, to **further fix the unit tests**. Earlier related PRs: https://github.com/apache/incubator-airflow/pull/4074, https://github.com/apache/incubator-airflow/pull/4131). What is done in this PR - 1. Fixed (test_)trigger_dag.py by correcting its name, so it can be discovered by the CI. - 2. Fix (test_)mark_tasks.py - 2-1. properly name the file, so it can be discovered by the CI. - 2-2. Correct the name of sample DAG - 2-3. Correct the range of sample execution_dates (earlier one conflicts with the start_date of the sample DAG, which fails the test case) - 2-4. Skip `test_mark_tasks_subdag` when running with MySQL Seems something is wrong with `airflow.api.common.experimental.mark_tasks.set_state`. Corresponding test case (process subdags with `downstream=True`) works on Postgres & SQLite, but fails when on MySQL ("`(1062, "Duplicate entry '110' for key 'PRIMARY'")`". Example: https://travis-ci.org/XD-DENG/incubator-airflow/jobs/461200901). A TODO note is added to remind us fix it for MySQL later. - 3. Remove unnecessary lines in test_pool.py ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3417) Use the platformVersion only for the FARGATE launch type
[ https://issues.apache.org/jira/browse/AIRFLOW-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Kovalenko updated AIRFLOW-3417: - Summary: Use the platformVersion only for the FARGATE launch type (was: The platform version should be provided when ECS runs with EC2 lunch type) > Use the platformVersion only for the FARGATE launch type > > > Key: AIRFLOW-3417 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3417 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 2.0.0 >Reporter: Alexander Kovalenko >Assignee: Alexander Kovalenko >Priority: Major > > By default an ECS container should be run with the EC2 launch type. > The current implementation passing the {{platformVersion}} parameter all the > time and we got an exception: > {code:java} > botocore.errorfactory.InvalidParameterException: An error occurred > (InvalidParameterException) when calling the RunTask operation: The platform > version must be null when specifying an EC2 launch type.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3417) The platform version should be provided when ECS runs with EC2 lunch type
Alexander Kovalenko created AIRFLOW-3417: Summary: The platform version should be provided when ECS runs with EC2 lunch type Key: AIRFLOW-3417 URL: https://issues.apache.org/jira/browse/AIRFLOW-3417 Project: Apache Airflow Issue Type: Bug Components: aws Affects Versions: 2.0.0 Reporter: Alexander Kovalenko By default an ECS container should be run with the EC2 launch type. The current implementation passing the {{platformVersion}} parameter all the time and we got an exception: {code:java} botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: The platform version must be null when specifying an EC2 launch type.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3417) The platform version should be provided when ECS runs with EC2 lunch type
[ https://issues.apache.org/jira/browse/AIRFLOW-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Kovalenko reassigned AIRFLOW-3417: Assignee: Alexander Kovalenko > The platform version should be provided when ECS runs with EC2 lunch type > - > > Key: AIRFLOW-3417 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3417 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 2.0.0 >Reporter: Alexander Kovalenko >Assignee: Alexander Kovalenko >Priority: Major > > By default an ECS container should be run with the EC2 launch type. > The current implementation passing the {{platformVersion}} parameter all the > time and we got an exception: > {code:java} > botocore.errorfactory.InvalidParameterException: An error occurred > (InvalidParameterException) when calling the RunTask operation: The platform > version must be null when specifying an EC2 launch type.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tal181 commented on issue #4244: [AIRFLOW-3403] Create Athena sensor
tal181 commented on issue #4244: [AIRFLOW-3403] Create Athena sensor URL: https://github.com/apache/incubator-airflow/pull/4244#issuecomment-442798769 @XD-DENG seems that the test are passing but the build is failing because : ERROR: nsenter build failed, log: cat nsenter.build.log Unable to find image 'ubuntu:14.04' locally 14.04: Pulling from library/ubuntu docker: Get https://registry-1.docker.io/v2/library/ubuntu/manifests/sha256:296c2904734ac0f13f3ab7265eeafb2efc33f085eeb87c875d28c360cec18700: EOF. See 'docker run --help'. can you please take a look ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4253: [WIP][AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent
codecov-io commented on issue #4253: [WIP][AIRFLOW-3414] Fix reload_module in DagFileProcessorAgent URL: https://github.com/apache/incubator-airflow/pull/4253#issuecomment-442794666 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=h1) Report > Merging [#4253](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/3fede98eab6145ed9c1deea3fe93f0b9b7f322e4?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `85.71%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4253/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4253 +/- ## == + Coverage 77.81% 77.82% +0.01% == Files 201 201 Lines 1645516458 +3 == + Hits1280512809 +4 + Misses 3650 3649 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `80.41% <100%> (ø)` | :arrow_up: | | [airflow/logging\_config.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9sb2dnaW5nX2NvbmZpZy5weQ==) | `97.56% <100%> (+0.06%)` | :arrow_up: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `59.67% <66.66%> (+0.14%)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4253/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.29% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=footer). Last update [3fede98...2b4e59d](https://codecov.io/gh/apache/incubator-airflow/pull/4253?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3416) CloudSqlQueryOperator with sql proxy does not work with Python 3.x
[ https://issues.apache.org/jira/browse/AIRFLOW-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3416: --- Fix Version/s: 1.10.2 > CloudSqlQueryOperator with sql proxy does not work with Python 3.x > -- > > Key: AIRFLOW-3416 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3416 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.1 >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Major > Fix For: 1.10.2 > > > There are compatibility issues with Python 3.x for CloudSQLoperator. Output > of cloud_sql_proxy binary is parsed and the output in Python3 is bytes rather > than string so several "in"s raise an exception in Python 3. It needs > explicit decode('utf-8') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] sprzedwojski commented on a change in pull request #4251: [AIRFLOW-2440] Add Google Cloud SQL import/export operator
sprzedwojski commented on a change in pull request #4251: [AIRFLOW-2440] Add Google Cloud SQL import/export operator URL: https://github.com/apache/incubator-airflow/pull/4251#discussion_r237425378 ## File path: airflow/contrib/hooks/gcp_sql_hook.py ## @@ -254,6 +254,54 @@ def delete_database(self, project, instance, database): operation_name = response["name"] return self._wait_for_operation_to_complete(project, operation_name) +def export_instance(self, project_id, instance_id, body): +""" +Exports data from a Cloud SQL instance to a Cloud Storage bucket as a SQL dump +or CSV file. + +:param project_id: Project ID of the project where the instance exists. +:type project_id: str +:param instance_id: Name of the Cloud SQL instance. This does not include the +project ID. +:type instance_id: str +:param body: The request body, as described in + https://cloud.google.com/sql/docs/mysql/admin-api/v1beta4/instances/export#request-body +:type body: dict +:return: True if the operation succeeded, raises an error otherwise +:rtype: bool +""" +response = self.get_conn().instances().export( Review comment: Thanks, it's a valid point. However, looking at other methods in this hook and also in other GCP-related hooks we've created recently, we don't do `try`...`catch` anywhere. Therefore I thought that maybe for the sake of coherence we could merge this "as is", and then I'd make a separate PR correcting this in all the GCP hooks? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3415) Imports become null when triggering dagruns in a loop
[ https://issues.apache.org/jira/browse/AIRFLOW-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702910#comment-16702910 ] Ash Berlin-Taylor commented on AIRFLOW-3415: Could you include the stack trace too please? > Imports become null when triggering dagruns in a loop > - > > Key: AIRFLOW-3415 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3415 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.10.1 > Environment: CentOS 7 >Reporter: Yuri Bendana >Priority: Minor > > When triggering dagruns in a loop, the imported references become null on the > second iteration. Here is an example [ gist | > [https://gist.github.com/ybendana/3bc0791fe00b099be04aca47a8d524c9] ]. For > the purposes here, you can ignore the DagRunSensor task. On the first > iteration the 'sleeper' dag gets triggered but on the second iteration I see a > {noformat} > TypeError: 'NoneType' object is not callable{noformat} > To workaround this, I have to copy the import (in this case trigger_dag) > inside the loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #4254: [AIRFLOW-3416] Fixes Python 3 compatibility with CloudSqlQueryOperator
codecov-io commented on issue #4254: [AIRFLOW-3416] Fixes Python 3 compatibility with CloudSqlQueryOperator URL: https://github.com/apache/incubator-airflow/pull/4254#issuecomment-442764081 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=h1) Report > Merging [#4254](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/3fede98eab6145ed9c1deea3fe93f0b9b7f322e4?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4254/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4254 +/- ## === Coverage 77.81% 77.81% === Files 201 201 Lines 1645516455 === Hits1280512805 Misses 3650 3650 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=footer). Last update [3fede98...22bea28](https://codecov.io/gh/apache/incubator-airflow/pull/4254?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #2581: AIRFLOW-743 - Adding LDAP features (bind user and username template)
ron819 commented on issue #2581: AIRFLOW-743 - Adding LDAP features (bind user and username template) URL: https://github.com/apache/incubator-airflow/pull/2581#issuecomment-442755368 @ehochmuth can you rebase? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3416) CloudSqlQueryOperator with sql proxy does not work with Python 3.x
[ https://issues.apache.org/jira/browse/AIRFLOW-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702864#comment-16702864 ] ASF GitHub Bot commented on AIRFLOW-3416: - potiuk opened a new pull request #4254: [AIRFLOW-3416] Fixes Python 3 compatibility with CloudSqlQueryOperator URL: https://github.com/apache/incubator-airflow/pull/4254 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3416 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Added several missing decodes on reading output from running subprocess (cloud_sql_proxy). This fixes Python3 compatibility. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: The problem with testing this issue is that it really needs integration testing level with GCP - something that I proposed in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests and we have a working proof of concept using Cloud Build and our own GCP project (we will soon share it with community to see what they think about) so this problem has been actually detected using our automated integration testing and then fixed and tested in python 3.5 and python 3.6 environment. Here are some results of those tests: Tests succeeding in Python 2.7 but failing on both 3.5 and 3.6: https://storage.googleapis.com/polidea-airflow-builds/944c09c7-fbb5-4ae4-b069-7c2590cd1e7e/index.html Tests succeeding in all 3 environments after adding decodes: https://storage.googleapis.com/polidea-airflow-builds/5843e2a9-f086-448d-bdc3-14646ebb7f5d/index.html ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. No new functionality. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CloudSqlQueryOperator with sql proxy does not work with Python 3.x > -- > > Key: AIRFLOW-3416 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3416 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.1 >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Major > > There are compatibility issues with Python 3.x for CloudSQLoperator. Output > of cloud_sql_proxy binary is parsed and the output in Python3 is bytes rather > than string so several "in"s raise an exception in Python 3. It needs > explicit decode('utf-8') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] potiuk opened a new pull request #4254: [AIRFLOW-3416] Fixes Python 3 compatibility with CloudSqlQueryOperator
potiuk opened a new pull request #4254: [AIRFLOW-3416] Fixes Python 3 compatibility with CloudSqlQueryOperator URL: https://github.com/apache/incubator-airflow/pull/4254 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3416 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Added several missing decodes on reading output from running subprocess (cloud_sql_proxy). This fixes Python3 compatibility. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: The problem with testing this issue is that it really needs integration testing level with GCP - something that I proposed in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests and we have a working proof of concept using Cloud Build and our own GCP project (we will soon share it with community to see what they think about) so this problem has been actually detected using our automated integration testing and then fixed and tested in python 3.5 and python 3.6 environment. Here are some results of those tests: Tests succeeding in Python 2.7 but failing on both 3.5 and 3.6: https://storage.googleapis.com/polidea-airflow-builds/944c09c7-fbb5-4ae4-b069-7c2590cd1e7e/index.html Tests succeeding in all 3 environments after adding decodes: https://storage.googleapis.com/polidea-airflow-builds/5843e2a9-f086-448d-bdc3-14646ebb7f5d/index.html ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. No new functionality. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3416) CloudSqlQueryOperator with sql proxy does not work with Python 3.x
[ https://issues.apache.org/jira/browse/AIRFLOW-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Potiuk updated AIRFLOW-3416: -- Issue Type: Bug (was: New Feature) > CloudSqlQueryOperator with sql proxy does not work with Python 3.x > -- > > Key: AIRFLOW-3416 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3416 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.1 >Reporter: Jarek Potiuk >Assignee: Jarek Potiuk >Priority: Major > > There are compatibility issues with Python 3.x for CloudSQLoperator. Output > of cloud_sql_proxy binary is parsed and the output in Python3 is bytes rather > than string so several "in"s raise an exception in Python 3. It needs > explicit decode('utf-8') -- This message was sent by Atlassian JIRA (v7.6.3#76005)