[jira] [Closed] (AIRFLOW-2942) pypi not updated to 1.10
[ https://issues.apache.org/jira/browse/AIRFLOW-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias closed AIRFLOW-2942. - Resolution: Done I'm closing this myself as pypi ([https://pypi.org/project/apache-airflow/]) has been updated with version 1.10. Thanks > pypi not updated to 1.10 > > > Key: AIRFLOW-2942 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2942 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Matthias >Priority: Major > > According to > [https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements,] airflow > 1.10 is out since August 20th. > Pypi is still serving 1.9.0 ([https://pypi.org/project/apache-airflow/)] as > of today. > > Installing via `pip install apache-airflow==1.10.0` does not work as that > release is not available on pypi. > output: > ``` > Collecting apache-airflow==1.10.0 > Could not find a version that satisfies the requirement > apache-airflow==1.10.0 (from versions: 1.8.1, 1.8.2rc1, 1.8.2, 1.9.0) > No matching distribution found for apache-airflow==1.10.0 > ``` > > Fix should be trivial - pushing the new release to pypi. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2965) Add CLI command to find the next dag run.
[ https://issues.apache.org/jira/browse/AIRFLOW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594518#comment-16594518 ] jack commented on AIRFLOW-2965: --- [~sanand] I have no knowledge of how to implement this. Anyone Is welcome to take it and submit a PR :) > Add CLI command to find the next dag run. > - > > Key: AIRFLOW-2965 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2965 > Project: Apache Airflow > Issue Type: Task >Reporter: jack >Priority: Minor > Fix For: 1.10.1 > > > I have a dag with the following properties: > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 1 * * *', > max_active_runs=1, > catchup=False){code} > > > This runs great. > Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) > > Now it's 2018-08-27 17:55 I decided to change my dag to: > > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 23 * * *', > max_active_runs=1, > catchup=False){code} > > Now, I have no idea when will be the next dag run. > Will it be today at 23:00? I can't be sure when the cycle is complete. I'm > not even sure that this change will do what I wish. > I'm sure you guys are expert and you can answer this question but most of us > wouldn't know. > > The scheduler has the knowledge when the dag is available for running. All > I'm asking is to take that knowledge and create a CLI command that I will > give the dag_id and it will tell me the next date/hour which my dag will be > runnable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2965) Add CLI command to find the next dag run.
[ https://issues.apache.org/jira/browse/AIRFLOW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594394#comment-16594394 ] Siddharth Anand commented on AIRFLOW-2965: -- This is a good ask! I recall it was requested a long time ago and somehow never made it to reality! Would you like to take a crack at it? You can send a note to the Dev list if someone would like to take it on as well! -s > Add CLI command to find the next dag run. > - > > Key: AIRFLOW-2965 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2965 > Project: Apache Airflow > Issue Type: Task >Reporter: jack >Priority: Minor > Fix For: 1.10.1 > > > I have a dag with the following properties: > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 1 * * *', > max_active_runs=1, > catchup=False){code} > > > This runs great. > Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) > > Now it's 2018-08-27 17:55 I decided to change my dag to: > > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 23 * * *', > max_active_runs=1, > catchup=False){code} > > Now, I have no idea when will be the next dag run. > Will it be today at 23:00? I can't be sure when the cycle is complete. I'm > not even sure that this change will do what I wish. > I'm sure you guys are expert and you can answer this question but most of us > wouldn't know. > > The scheduler has the knowledge when the dag is available for running. All > I'm asking is to take that knowledge and create a CLI command that I will > give the dag_id and it will tell me the next date/hour which my dag will be > runnable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2962) PiPy Package not published for Airflow 1.10.0
[ https://issues.apache.org/jira/browse/AIRFLOW-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand closed AIRFLOW-2962. Resolution: Fixed > PiPy Package not published for Airflow 1.10.0 > - > > Key: AIRFLOW-2962 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2962 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Affects Versions: 1.10.0, 1.10 > Environment: Python 3 > Docker >Reporter: Arihant >Assignee: Siddharth Anand >Priority: Major > Labels: CI, build > > In the Airflow > [announcements|https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-Aug20,2018] > it was mentioned that the package is available via PyPi: > Pypi : [https://pypi.python.org/pypi/apache-airflow] (Run `pip install > apache-airflow`) > > But the release version is not available there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2962) PiPy Package not published for Airflow 1.10.0
[ https://issues.apache.org/jira/browse/AIRFLOW-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594388#comment-16594388 ] Siddharth Anand commented on AIRFLOW-2962: -- This works now! (venv) sianand@LM-SJN-21002367:~/airflow_release $ pip install apache-airflow ... (venv) sianand@LM-SJN-21002367:~/airflow_release $ airflow version [2018-08-27 18:36:16,686] \{__init__.py:51} INFO - Using executor SequentialExecutor /Users/sianand/airflow_release/venv/lib/python3.6/site-packages/airflow/bin/cli.py:1595: DeprecationWarning: The celeryd_concurrency option in [celery] has been renamed to worker_concurrency - the old setting has been used, but please update your config. default=conf.get('celery', 'worker_concurrency')), _ |__( )_ __/__ / __ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \//|__/ v1.10.0 > PiPy Package not published for Airflow 1.10.0 > - > > Key: AIRFLOW-2962 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2962 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Affects Versions: 1.10.0, 1.10 > Environment: Python 3 > Docker >Reporter: Arihant >Assignee: Siddharth Anand >Priority: Major > Labels: CI, build > > In the Airflow > [announcements|https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-Aug20,2018] > it was mentioned that the package is available via PyPi: > Pypi : [https://pypi.python.org/pypi/apache-airflow] (Run `pip install > apache-airflow`) > > But the release version is not available there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2962) PiPy Package not published for Airflow 1.10.0
[ https://issues.apache.org/jira/browse/AIRFLOW-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Anand reassigned AIRFLOW-2962: Assignee: Siddharth Anand > PiPy Package not published for Airflow 1.10.0 > - > > Key: AIRFLOW-2962 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2962 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Affects Versions: 1.10.0, 1.10 > Environment: Python 3 > Docker >Reporter: Arihant >Assignee: Siddharth Anand >Priority: Major > Labels: CI, build > > In the Airflow > [announcements|https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-Aug20,2018] > it was mentioned that the package is available via PyPi: > Pypi : [https://pypi.python.org/pypi/apache-airflow] (Run `pip install > apache-airflow`) > > But the release version is not available there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
r39132 edited a comment on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-416415615 @tedmiston Please resolve conflicts and squash your commits. @verdan thx for your feedback. It would be great if you could do one more pass after the rebase & squash! Also, can you provide steps that I can run to verify the fixes? Are the steps you outline above (i.e. `npm run lint`) still valid? Thx for your work on this btw! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] r39132 commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
r39132 commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-416415615 @tedmiston Please resolve conflicts and squash your commits. @verdan thx for your feedback. It would be great if you could do one more pass after the rebase & squash! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cHYzZQo commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency
cHYzZQo commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-416403083 another option might be to have a bracket invocation that can be used. ei: ``` airflow[gpl] ``` or ``` airflow[non-gpl] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cHYzZQo commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency
cHYzZQo commented on issue #3660: [AIRFLOW-2817] Force explicit choice on GPL dependency URL: https://github.com/apache/incubator-airflow/pull/3660#issuecomment-416400378 This is such an anti-pattern IMO. Making the package install broken by default. It's also a backward incompatible change that breaks everyone's current install scripts. I'd request you reconsider. If we really can't use a GPL package by default print a warning and use the NON-GPL option. Packages shouldn't just refuse to install by default. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594200#comment-16594200 ] Victor Jimenez commented on AIRFLOW-2964: - That makes sense. I will post a question there. > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416340975 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=h1) Report > Merging [#3806](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/1801baefe44f361010c23e6ec4ee8b8569eab82d?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3806/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3806 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581015810 === Hits1223912239 Misses 3571 3571 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvX19pbml0X18ucHk=) | `61.11% <ø> (ø)` | :arrow_up: | | [airflow/sensors/http\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2h0dHBfc2Vuc29yLnB5) | `96.42% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=footer). Last update [1801bae...e6f5f22](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shenni119 closed pull request #3812: editting zendesk_hook.py https://citybase.atlassian.net/browse/DS-78
shenni119 closed pull request #3812: editting zendesk_hook.py https://citybase.atlassian.net/browse/DS-78 URL: https://github.com/apache/incubator-airflow/pull/3812 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/hooks/zendesk_hook.py b/airflow/hooks/zendesk_hook.py index 3cf8353344..2579d7f30b 100644 --- a/airflow/hooks/zendesk_hook.py +++ b/airflow/hooks/zendesk_hook.py @@ -78,7 +78,7 @@ def call(self, path, query=None, get_all_pages=True, side_loading=False): next_page = results['next_page'] if side_loading: keys += query['include'].split(',') -results = {key: results[key] for key in keys} +results = {key: results[key] for key in keys} if get_all_pages: while next_page is not None: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shenni119 opened a new pull request #3812: editting zendesk_hook.py https://citybase.atlassian.net/browse/DS-78
shenni119 opened a new pull request #3812: editting zendesk_hook.py https://citybase.atlassian.net/browse/DS-78 URL: https://github.com/apache/incubator-airflow/pull/3812 **Description:** When I am using zendesk_hook.py file in incubator-airflow/airflow/hooks, I am getting an error: "KeyError: 'search'" - the error is due to line 81 in the zendesk_hook file "results ={key: results[key] for key in keys}" I believe the line needs an extra tab at line 81, because this line is running regardless if sideloading is set to true. I believe it should only run when sideloading is set to true. **Tests:** My code runs once the tab has been added. ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
codecov-io edited a comment on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#issuecomment-401965037 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@fc10f7e`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3570/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=tree) ```diff @@Coverage Diff@@ ## master#3570 +/- ## = Coverage ? 77.41% = Files ? 203 Lines ?15810 Branches ?0 = Hits ?12239 Misses? 3571 Partials ?0 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=footer). Last update [fc10f7e...fdec2d0](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
codecov-io edited a comment on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#issuecomment-401965037 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@fc10f7e`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3570/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=tree) ```diff @@Coverage Diff@@ ## master#3570 +/- ## = Coverage ? 77.41% = Files ? 203 Lines ?15810 Branches ?0 = Hits ?12239 Misses? 3571 Partials ?0 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=footer). Last update [fc10f7e...fdec2d0](https://codecov.io/gh/apache/incubator-airflow/pull/3570?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594159#comment-16594159 ] Andrew Chen commented on AIRFLOW-2964: -- My guess is no. Maybe we should post this as a question to dev mailing list? d...@airflow.incubator.apache.org > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] justinholmes commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
justinholmes commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416347018 Will try to add a few example tests as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andrewmchen commented on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on issue #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#issuecomment-416346693 If this is ready @betabandido, could we get some help getting this merged? Would you be the right person @Fokko? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594146#comment-16594146 ] Victor Jimenez edited comment on AIRFLOW-2964 at 8/27/18 7:40 PM: -- I am not familiar at all with [XCom|https://airflow.apache.org/concepts.html#xcoms], but do you think using this feature might be a way to solve this issue? was (Author: betabandido): I am not familiar at all with {{XCom}}, but do you think using this feature might be a way to solve this issue? > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594146#comment-16594146 ] Victor Jimenez edited comment on AIRFLOW-2964 at 8/27/18 7:37 PM: -- I am not familiar at all with {{XCom}}, but do you think using this feature might be a way to solve this issue? was (Author: betabandido): I am not familiar at all with `XCom`, but do you think using this feature might be a way to solve this issue? > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594146#comment-16594146 ] Victor Jimenez commented on AIRFLOW-2964: - I am not familiar at all with `XCom`, but do you think using this feature might be a way to solve this issue? > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416340975 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=h1) Report > Merging [#3806](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/1801baefe44f361010c23e6ec4ee8b8569eab82d?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3806/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3806 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581015810 === Hits1223912239 Misses 3571 3571 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvX19pbml0X18ucHk=) | `61.11% <ø> (ø)` | :arrow_up: | | [airflow/sensors/http\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2h0dHBfc2Vuc29yLnB5) | `96.42% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=footer). Last update [1801bae...e6f5f22](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
codecov-io edited a comment on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416340975 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=h1) Report > Merging [#3806](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/1801baefe44f361010c23e6ec4ee8b8569eab82d?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3806/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3806 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581015810 === Hits1223912239 Misses 3571 3571 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvX19pbml0X18ucHk=) | `61.11% <ø> (ø)` | :arrow_up: | | [airflow/sensors/http\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2h0dHBfc2Vuc29yLnB5) | `96.42% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=footer). Last update [1801bae...e6f5f22](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
codecov-io commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416340975 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=h1) Report > Merging [#3806](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/1801baefe44f361010c23e6ec4ee8b8569eab82d?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3806/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#3806 +/- ## === Coverage 77.41% 77.41% === Files 203 203 Lines 1581015810 === Hits1223912239 Misses 3571 3571 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvX19pbml0X18ucHk=) | `61.11% <ø> (ø)` | :arrow_up: | | [airflow/sensors/http\_sensor.py](https://codecov.io/gh/apache/incubator-airflow/pull/3806/diff?src=pr=tree#diff-YWlyZmxvdy9zZW5zb3JzL2h0dHBfc2Vuc29yLnB5) | `96.42% <ø> (ø)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=footer). Last update [1801bae...e6f5f22](https://codecov.io/gh/apache/incubator-airflow/pull/3806?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
betabandido commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r213084704 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -175,6 +190,12 @@ def cancel_run(self, run_id): self._do_api_call(CANCEL_RUN_ENDPOINT, json) +def _retryable_error(exception): +return type(exception) == requests_exceptions.ConnectionError \ Review comment: @andrewmchen Oh, that is completely right! Otherwise derived exceptions won't be treated as retryable. Thanks for spotting that. I did a small change to the tests to cover this case too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-2967) Beeline commands do not have quoted urls
[ https://issues.apache.org/jira/browse/AIRFLOW-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris McLennon closed AIRFLOW-2967. --- Resolution: Fixed > Beeline commands do not have quoted urls > > > Key: AIRFLOW-2967 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2967 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chris McLennon >Assignee: Chris McLennon >Priority: Major > > When HiveHook uses beeline with kerberos auth, it doesn't quote the > connection URL, leading to an error. It should quote the connection URL. > > Example: > Airflow runs command which fails: > {code:java} > beeline -u > jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com; > -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} > Command should be run as: > {code:java} > beeline -u > "jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com;" > -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2967) Beeline commands do not have quoted urls
[ https://issues.apache.org/jira/browse/AIRFLOW-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594105#comment-16594105 ] Chris McLennon commented on AIRFLOW-2967: - Actually, nevermind. I just saw this has already been implemented for future release: https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L139 > Beeline commands do not have quoted urls > > > Key: AIRFLOW-2967 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2967 > Project: Apache Airflow > Issue Type: Bug >Reporter: Chris McLennon >Assignee: Chris McLennon >Priority: Major > > When HiveHook uses beeline with kerberos auth, it doesn't quote the > connection URL, leading to an error. It should quote the connection URL. > > Example: > Airflow runs command which fails: > {code:java} > beeline -u > jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com; > -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} > Command should be run as: > {code:java} > beeline -u > "jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com;" > -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2967) Beeline commands do not have quoted urls
Chris McLennon created AIRFLOW-2967: --- Summary: Beeline commands do not have quoted urls Key: AIRFLOW-2967 URL: https://issues.apache.org/jira/browse/AIRFLOW-2967 Project: Apache Airflow Issue Type: Bug Reporter: Chris McLennon Assignee: Chris McLennon When HiveHook uses beeline with kerberos auth, it doesn't quote the connection URL, leading to an error. It should quote the connection URL. Example: Airflow runs command which fails: {code:java} beeline -u jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com; -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} Command should be run as: {code:java} beeline -u "jdbc:hive2://hiveserver.mycompany.com:1/default;principal=hive/hiveserver.mycompany@mycompany.com;" -f /tmp/airflow_hiveop_OwZO45/tmpOW_gZ6{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] justinholmes commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator
justinholmes commented on issue #3806: [AIRFLOW-2956] added kubernetes tolerations to kubernetes pod operator URL: https://github.com/apache/incubator-airflow/pull/3806#issuecomment-416328208 Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tedmiston commented on issue #3808: [AIRFLOW-2957] Remove obselete sensor references
tedmiston commented on issue #3808: [AIRFLOW-2957] Remove obselete sensor references URL: https://github.com/apache/incubator-airflow/pull/3808#issuecomment-416314704 @Fokko @r39132 Thank you for catching this one I missed in #3760! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593997#comment-16593997 ] Andrew Chen commented on AIRFLOW-2964: -- This certainly makes a lot of sense to me, especially if we can maintain the current behavior (use the callable only if it is defined). That being said, maybe it's a good idea to get some input from the community whether or not this is the right way to do things. A quick spot check of other operators seems to suggest they don't support this sort of lazy generation of parameters. > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook
andrewmchen commented on a change in pull request #3570: [AIRFLOW-2709] Improve error handling in Databricks hook URL: https://github.com/apache/incubator-airflow/pull/3570#discussion_r213048323 ## File path: airflow/contrib/hooks/databricks_hook.py ## @@ -175,6 +190,12 @@ def cancel_run(self, run_id): self._do_api_call(CANCEL_RUN_ENDPOINT, json) +def _retryable_error(exception): +return type(exception) == requests_exceptions.ConnectionError \ Review comment: nit: I think we should prefer `isinstanceof`. See https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] andscoop commented on a change in pull request #3796: [AIRFLOW-2824] - Add config to disable default conn creation
andscoop commented on a change in pull request #3796: [AIRFLOW-2824] - Add config to disable default conn creation URL: https://github.com/apache/incubator-airflow/pull/3796#discussion_r213043144 ## File path: airflow/utils/db.py ## @@ -286,6 +284,16 @@ def initdb(rbac=False): conn_id='cassandra_default', conn_type='cassandra', host='cassandra', port=9042)) + +def initdb(rbac=False): +session = settings.Session() + +from airflow import models Review comment: @feng-tao It is to prevent a circular import This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-2966) KubernetesExecutor + namespace quotas kills scheduler if the pod can't be launched
John Hofman created AIRFLOW-2966: Summary: KubernetesExecutor + namespace quotas kills scheduler if the pod can't be launched Key: AIRFLOW-2966 URL: https://issues.apache.org/jira/browse/AIRFLOW-2966 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.10 Environment: Kubernetes 1.9.8 Reporter: John Hofman When running Airflow in Kubernetes with the KubernetesExecutor and resource quota's set on the namespace Airflow is deployed in. If the scheduler tries to launch a pod into the namespace that exceeds the namespace limits it gets an ApiException, and crashes the scheduler. This stack trace is an example of the ApiException from the kubernetes client: {code:java} [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when attempting to create Namespaced Pod. Traceback (most recent call last): File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line 55, in run_pod_async resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6057, in create_namespaced_pod (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6142, in create_namespaced_pod_with_http_info collection_formats=collection_formats) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api _return_http_data_only, collection_formats, _preload_content, _request_timeout) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 364, in request body=body) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 266, in POST body=body) File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json', 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota: compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi, limited: limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code} I would expect the scheduler to catch the Exception and at least mark the task as failed, or better yet retry the task later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[36/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/awsbatch_operator.html -- diff --git a/_modules/airflow/contrib/operators/awsbatch_operator.html b/_modules/airflow/contrib/operators/awsbatch_operator.html new file mode 100644 index 000..33e1e8b --- /dev/null +++ b/_modules/airflow/contrib/operators/awsbatch_operator.html @@ -0,0 +1,403 @@ + + + + + + + + + + + airflow.contrib.operators.awsbatch_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.awsbatch_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.awsbatch_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import sys + +from math import pow +from time import sleep + +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator +from airflow.utils import apply_defaults + +from airflow.contrib.hooks.aws_hook import AwsHook + + +[docs]class AWSBatchOperator(BaseOperator): + +Execute a job on AWS Batch Service + +:param job_name: the name for the job that will run on AWS Batch +:type job_name: str +:param job_definition: the job definition name on AWS Batch +:type job_definition: str +:param queue: the queue name on AWS Batch +:type queue: str +:param: overrides: the same parameter that boto3 will receive on containerOverrides: + http://boto3.readthedocs.io/en/latest/reference/services/batch.html#submit_job +:type: overrides: dict +:param max_retries: exponential backoff retries while waiter is not merged +:type max_retries: int +:param aws_conn_id: connection id of AWS credentials / region name. If None, +credential boto3 strategy will be used (http://boto3.readthedocs.io/en/latest/guide/configuration.html). +:type aws_conn_id: str +:param region_name: region name to use in AWS Hook. Override the region_name in connection (if provided) + + +ui_color = #c3dae0 +client = None +arn = None +template_fields = (overrides,) + +@apply_defaults +def __init__(self, job_name, job_definition, queue, overrides, max_retries=288, + aws_conn_id=None, region_name=None, **kwargs): +super(AWSBatchOperator, self).__init__(**kwargs) + +self.job_name = job_name +self.aws_conn_id = aws_conn_id +self.region_name = region_name +self.job_definition = job_definition +self.queue = queue +self.overrides = overrides +self.max_retries = max_retries + +self.jobId = None +self.jobName = None + +self.hook = self.get_hook() + +def execute(self, context): +self.log.info( +Running AWS Batch Job - Job definition: %s - on queue %s, +self.job_definition, self.queue +) +self.log.info(AWSBatchOperator overrides: %s, self.overrides) + +self.client = self.hook.get_client_type(
[09/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/postgres_operator.html -- diff --git a/_modules/airflow/operators/postgres_operator.html b/_modules/airflow/operators/postgres_operator.html new file mode 100644 index 000..dc0669e --- /dev/null +++ b/_modules/airflow/operators/postgres_operator.html @@ -0,0 +1,297 @@ + + + + + + + + + + + airflow.operators.postgres_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.operators.postgres_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.operators.postgres_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from airflow.hooks.postgres_hook import PostgresHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class PostgresOperator(BaseOperator): + +Executes sql code in a specific Postgres database + +:param postgres_conn_id: reference to a specific postgres database +:type postgres_conn_id: string +:param sql: the sql code to be executed +:type sql: Can receive a str representing a sql statement, +a list of str (sql statements), or reference to a template file. +Template reference are recognized by str ending in .sql +:param database: name of database which overwrite defined one in connection +:type database: string + + +template_fields = (sql,) +template_ext = (.sql,) +ui_color = #ededed + +@apply_defaults +def __init__( +self, sql, +postgres_conn_id=postgres_default, autocommit=False, +parameters=None, +database=None, +*args, **kwargs): +super(PostgresOperator, self).__init__(*args, **kwargs) +self.sql = sql +self.postgres_conn_id = postgres_conn_id +self.autocommit = autocommit +self.parameters = parameters +self.database = database + +def execute(self, context): +self.log.info(Executing: %s, self.sql) +self.hook = PostgresHook(postgres_conn_id=self.postgres_conn_id, + schema=self.database) +self.hook.run(self.sql, self.autocommit, parameters=self.parameters) +for output in self.hook.conn.notices: +self.log.info(output) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function
[33/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/dataproc_operator.html -- diff --git a/_modules/airflow/contrib/operators/dataproc_operator.html b/_modules/airflow/contrib/operators/dataproc_operator.html index d49719e..f26e459 100644 --- a/_modules/airflow/contrib/operators/dataproc_operator.html +++ b/_modules/airflow/contrib/operators/dataproc_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,29 +171,42 @@ Source code for airflow.contrib.operators.dataproc_operator # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at # -# http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. # +import ntpath +import os +import re import time +import uuid +from datetime import timedelta from airflow.contrib.hooks.gcp_dataproc_hook import DataProcHook +from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook +from airflow.exceptions import AirflowException from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults from airflow.version import version from googleapiclient.errors import HttpError +from airflow.utils import timezone -class DataprocClusterCreateOperator(BaseOperator): +[docs]class DataprocClusterCreateOperator(BaseOperator): Create a new cluster on Google Cloud Dataproc. The operator will wait until the creation is successful or an error occurs in the creation process. @@ -202,9 +217,79 @@ for a detailed explanation on the different parameters. Most of the configuration parameters detailed in the link are available as a parameter to this operator. + +:param cluster_name: The name of the DataProc cluster to create. +:type cluster_name: string +:param project_id: The ID of the google cloud project in which +to create the cluster +:type project_id: string +:param num_workers: The # of workers to spin up +:type num_workers: int +:param storage_bucket: The storage bucket to use, setting to None lets dataproc +generate a custom one for you +:type storage_bucket: string +:param init_actions_uris: List of GCS uris containing +dataproc initialization scripts +:type init_actions_uris: list[string] +:param init_action_timeout: Amount of time executable scripts in +init_actions_uris has to complete +:type init_action_timeout: string +:param metadata: dict of key-value google compute engine metadata entries +to add to all instances +:type metadata: dict +:param image_version: the version of software inside the Dataproc cluster +:type image_version: string +:param properties: dict of properties to set on +config files (e.g. spark-defaults.conf), see + https://cloud.google.com/dataproc/docs/reference/rest/v1/ \ +projects.regions.clusters#SoftwareConfig +:type properties: dict +:param master_machine_type: Compute engine machine type to use for the master node +:type master_machine_type: string +:param master_disk_size: Disk size for the master node +:type master_disk_size: int +:param worker_machine_type: Compute engine machine type to use for the worker nodes +:type worker_machine_type: string +:param worker_disk_size: Disk size for the worker nodes +:type worker_disk_size: int +:param num_preemptible_workers: The # of preemptible worker
[42/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/pinot_hook.html -- diff --git a/_modules/airflow/contrib/hooks/pinot_hook.html b/_modules/airflow/contrib/hooks/pinot_hook.html new file mode 100644 index 000..a34b554 --- /dev/null +++ b/_modules/airflow/contrib/hooks/pinot_hook.html @@ -0,0 +1,340 @@ + + + + + + + + + + + airflow.contrib.hooks.pinot_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.pinot_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.pinot_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import six + +from pinotdb import connect + +from airflow.hooks.dbapi_hook import DbApiHook + + +[docs]class PinotDbApiHook(DbApiHook): + +Connect to pinot db(https://github.com/linkedin/pinot) to issue pql + +conn_name_attr = pinot_broker_conn_id +default_conn_name = pinot_broker_default +supports_autocommit = False + +def __init__(self, *args, **kwargs): +super(PinotDbApiHook, self).__init__(*args, **kwargs) + +[docs] def get_conn(self): + +Establish a connection to pinot broker through pinot dbqpi. + +conn = self.get_connection(self.pinot_broker_conn_id) +pinot_broker_conn = connect( +host=conn.host, +port=conn.port, +path=conn.extra_dejson.get(endpoint, /pql), +scheme=conn.extra_dejson.get(schema, http) +) +self.log.info(Get the connection to pinot + broker on {host}.format(host=conn.host)) +return pinot_broker_conn + +[docs] def get_uri(self): + +Get the connection uri for pinot broker. + +e.g: http://localhost:9000/pql + +conn = self.get_connection(getattr(self, self.conn_name_attr)) +host = conn.host +if conn.port is not None: +host += :{port}.format(port=conn.port) +conn_type = http if not conn.conn_type else conn.conn_type +endpoint = conn.extra_dejson.get(endpoint, pql) +return {conn_type}://{host}/{endpoint}.format( +conn_type=conn_type, host=host, endpoint=endpoint) + +[docs] def get_records(self, sql): + +Executes the sql and returns a set of records. + +:param sql: the sql statement to be executed (str) or a list of +sql statements to execute +:type sql: str + +if six.PY2: +sql = sql.encode(utf-8) + +with self.get_conn() as cur: +cur.execute(sql) +return cur.fetchall() + +[docs] def get_first(self, sql): + +Executes the sql and returns the first resulting row. + +:param sql: the sql statement to be executed (str) or a list of +sql statements to execute +:type sql: str or list + +if six.PY2: +sql = sql.encode(utf-8) + +with
[18/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/executors/local_executor.html -- diff --git a/_modules/airflow/executors/local_executor.html b/_modules/airflow/executors/local_executor.html index cfbf241..24436b4 100644 --- a/_modules/airflow/executors/local_executor.html +++ b/_modules/airflow/executors/local_executor.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,17 +171,49 @@ Source code for airflow.executors.local_executor # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +LocalExecutor runs tasks by spawning processes in a controlled fashion in different +modes. Given that BaseExecutor has the option to receive a `parallelism` parameter to +limit the number of process spawned, when this parameter is `0` the number of processes +that LocalExecutor can spawn is unlimited. + +The following strategies are implemented: +1. Unlimited Parallelism (self.parallelism == 0): In this strategy, LocalExecutor will +spawn a process every time `execute_async` is called, that is, every task submitted to the +LocalExecutor will be executed in its own process. Once the task is executed and the +result stored in the `result_queue`, the process terminates. There is no need for a +`task_queue` in this approach, since as soon as a task is received a new process will be +allocated to the task. Processes used in this strategy are of class LocalWorker. + +2. Limited Parallelism (self.parallelism 0): In this strategy, the LocalExecutor spawns +the number of processes equal to the value of `self.parallelism` at `start` time, +using a `task_queue` to coordinate the ingestion of tasks and the work distribution among +the workers, which will take a task as soon as they are ready. During the lifecycle of +the LocalExecutor, the worker processes are running waiting for tasks, once the +LocalExecutor receives the call to shutdown the executor a poison token is sent to the +workers to terminate them. Processes used in this strategy are of class QueuedLocalWorker. + +Arguably, `SequentialExecutor` could be thought as a LocalExecutor with limited +parallelism of just 1 worker, i.e. `self.parallelism = 1`. +This option could lead to the unification of the executor implementations, running +locally, into just one `LocalExecutor` with multiple modes. + import multiprocessing import subprocess @@ -187,20 +221,63 @@ from builtins import range -from airflow import configuration from airflow.executors.base_executor import BaseExecutor from airflow.utils.log.logging_mixin import LoggingMixin from airflow.utils.state import State -PARALLELISM = configuration.get(core, PARALLELISM) - class LocalWorker(multiprocessing.Process, LoggingMixin): + +LocalWorker Process implementation to run airflow commands. Executes the given +command and puts the result into a result queue when done, terminating execution. + +def __init__(self, result_queue): + +:param result_queue: the queue to store result states tuples (key, State) +:type result_queue: multiprocessing.Queue + +super(LocalWorker, self).__init__() +self.daemon = True +self.result_queue = result_queue +self.key = None +self.command = None + +def execute_work(self, key, command): + +Executes
[35/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/bigquery_operator.html -- diff --git a/_modules/airflow/contrib/operators/bigquery_operator.html b/_modules/airflow/contrib/operators/bigquery_operator.html index 5a74ba1..4055654 100644 --- a/_modules/airflow/contrib/operators/bigquery_operator.html +++ b/_modules/airflow/contrib/operators/bigquery_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,19 +171,27 @@ Source code for airflow.contrib.operators.bigquery_operator # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import json from airflow.contrib.hooks.bigquery_hook import BigQueryHook +from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook, _parse_gcs_url from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults @@ -204,6 +214,13 @@ :param create_disposition: Specifies whether the job is allowed to create new tables. (default: CREATE_IF_NEEDED) :type create_disposition: string +:param allow_large_results: Whether to allow large results. +:type allow_large_results: boolean +:param flatten_results: If true and query uses legacy SQL dialect, flattens +all nested and repeated fields in the query results. ``allow_large_results`` +must be ``true`` if this is set to ``false``. For standard SQL queries, this +flag is ignored and results are never flattened. +:type flatten_results: boolean :param bigquery_conn_id: reference to a specific BigQuery hook. :type bigquery_conn_id: string :param delegate_to: The account to impersonate, if any. @@ -215,16 +232,25 @@ :type udf_config: list :param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false). :type use_legacy_sql: boolean -:param maximum_billing_tier: Positive integer that serves as a multiplier of the basic price. +:param maximum_billing_tier: Positive integer that serves as a multiplier +of the basic price. Defaults to None, in which case it uses the value set in the project. :type maximum_billing_tier: integer -:param query_params: a dictionary containing query parameter types and values, passed to -BigQuery. +:param maximum_bytes_billed: Limits the bytes billed for this job. +Queries that will have bytes billed beyond this limit will fail +(without incurring a charge). If unspecified, this will be +set to your project default. +:type maximum_bytes_billed: float +:param schema_update_options: Allows the schema of the destination +table to be updated as a side effect of the load job. +:type schema_update_options: tuple +:param query_params: a dictionary containing query parameter types and +values, passed to BigQuery. :type query_params: dict template_fields = (bql, destination_dataset_table) -template_ext = (.sql,) +template_ext = (.sql, ) ui_color = #e4f0e8 @apply_defaults @@ -233,13 +259,17 @@ destination_dataset_table=False, write_disposition=WRITE_EMPTY, allow_large_results=False, + flatten_results=False, bigquery_conn_id=bigquery_default,
[40/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/spark_jdbc_hook.html -- diff --git a/_modules/airflow/contrib/hooks/spark_jdbc_hook.html b/_modules/airflow/contrib/hooks/spark_jdbc_hook.html new file mode 100644 index 000..c22b1e5 --- /dev/null +++ b/_modules/airflow/contrib/hooks/spark_jdbc_hook.html @@ -0,0 +1,481 @@ + + + + + + + + + + + airflow.contrib.hooks.spark_jdbc_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.spark_jdbc_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.spark_jdbc_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import os +from airflow.contrib.hooks.spark_submit_hook import SparkSubmitHook +from airflow.exceptions import AirflowException + + +[docs]class SparkJDBCHook(SparkSubmitHook): + +This hook extends the SparkSubmitHook specifically for performing data +transfers to/from JDBC-based databases with Apache Spark. + +:param spark_app_name: Name of the job (default airflow-spark-jdbc) +:type spark_app_name: str +:param spark_conn_id: Connection id as configured in Airflow administration +:type spark_conn_id: str +:param spark_conf: Any additional Spark configuration properties +:type spark_conf: dict +:param spark_py_files: Additional python files used (.zip, .egg, or .py) +:type spark_py_files: str +:param spark_files: Additional files to upload to the container running the job +:type spark_files: str +:param spark_jars: Additional jars to upload and add to the driver and + executor classpath +:type spark_jars: str +:param num_executors: number of executor to run. This should be set so as to manage + the number of connections made with the JDBC database +:type num_executors: int +:param executor_cores: Number of cores per executor +:type executor_cores: int +:param executor_memory: Memory per executor (e.g. 1000M, 2G) +:type executor_memory: str +:param driver_memory: Memory allocated to the driver (e.g. 1000M, 2G) +:type driver_memory: str +:param verbose: Whether to pass the verbose flag to spark-submit for debugging +:type verbose: bool +:param keytab: Full path to the file that contains the keytab +:type keytab: str +:param principal: The name of the kerberos principal used for keytab +:type principal: str +:param cmd_type: Which way the data should flow. 2 possible values: + spark_to_jdbc: data written by spark from metastore to jdbc + jdbc_to_spark: data written by spark from jdbc to metastore +:type cmd_type: str +:param jdbc_table: The name of the JDBC table +:type jdbc_table: str +:param jdbc_conn_id: Connection id used for connection to JDBC database +:type: jdbc_conn_id: str +:param jdbc_driver: Name of the JDBC driver to use
[51/51] [partial] incubator-airflow-site git commit: 1.10.0
1.10.0 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/commit/11437c14 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/tree/11437c14 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/diff/11437c14 Branch: refs/heads/asf-site Commit: 11437c14a3607f6b85b6610dc776f55d78e7d767 Parents: 28a3eb6 Author: Kaxil Naik Authored: Mon Aug 27 17:22:22 2018 +0100 Committer: Kaxil Naik Committed: Mon Aug 27 17:22:22 2018 +0100 -- _images/connection_create.png | Bin 0 -> 41547 bytes _images/connection_edit.png | Bin 0 -> 53636 bytes _images/connections.png | Bin 93057 -> 48442 bytes _modules/S3_hook.html | 489 - .../contrib/executors/mesos_executor.html |89 +- .../contrib/hooks/aws_dynamodb_hook.html| 300 + _modules/airflow/contrib/hooks/aws_hook.html| 410 + .../airflow/contrib/hooks/aws_lambda_hook.html | 302 + .../airflow/contrib/hooks/bigquery_hook.html| 889 +- .../airflow/contrib/hooks/databricks_hook.html | 462 + .../airflow/contrib/hooks/datadog_hook.html | 375 + .../airflow/contrib/hooks/datastore_hook.html |52 +- .../contrib/hooks/discord_webhook_hook.html | 375 + _modules/airflow/contrib/hooks/emr_hook.html|35 +- _modules/airflow/contrib/hooks/fs_hook.html | 281 + _modules/airflow/contrib/hooks/ftp_hook.html| 494 + .../contrib/hooks/gcp_api_base_hook.html| 379 + .../contrib/hooks/gcp_dataflow_hook.html| 196 +- .../contrib/hooks/gcp_dataproc_hook.html| 463 + .../contrib/hooks/gcp_mlengine_hook.html|20 +- .../airflow/contrib/hooks/gcp_pubsub_hook.html | 519 + _modules/airflow/contrib/hooks/gcs_hook.html| 315 +- .../airflow/contrib/hooks/jenkins_hook.html | 283 + _modules/airflow/contrib/hooks/jira_hook.html | 319 + _modules/airflow/contrib/hooks/pinot_hook.html | 340 + _modules/airflow/contrib/hooks/qubole_hook.html | 449 + _modules/airflow/contrib/hooks/redis_hook.html | 328 + .../airflow/contrib/hooks/redshift_hook.html| 348 + _modules/airflow/contrib/hooks/sftp_hook.html | 404 + .../contrib/hooks/slack_webhook_hook.html | 364 + .../airflow/contrib/hooks/spark_jdbc_hook.html | 481 + .../airflow/contrib/hooks/spark_sql_hook.html | 396 + .../contrib/hooks/spark_submit_hook.html| 799 ++ _modules/airflow/contrib/hooks/sqoop_hook.html | 580 + _modules/airflow/contrib/hooks/ssh_hook.html| 470 + .../airflow/contrib/hooks/vertica_hook.html | 288 + _modules/airflow/contrib/hooks/wasb_hook.html |94 +- _modules/airflow/contrib/kubernetes/secret.html | 276 + .../contrib/operators/awsbatch_operator.html| 403 + .../operators/bigquery_check_operator.html |37 +- .../contrib/operators/bigquery_get_data.html| 351 + .../contrib/operators/bigquery_operator.html| 399 +- .../bigquery_table_delete_operator.html | 301 + .../contrib/operators/bigquery_to_bigquery.html |39 +- .../contrib/operators/bigquery_to_gcs.html |39 +- .../contrib/operators/databricks_operator.html |45 +- .../contrib/operators/dataflow_operator.html| 241 +- .../contrib/operators/dataproc_operator.html| 866 +- .../operators/datastore_export_operator.html| 344 + .../operators/datastore_import_operator.html| 332 + .../operators/discord_webhook_operator.html | 333 + .../airflow/contrib/operators/ecs_operator.html |31 +- .../operators/emr_add_steps_operator.html |33 +- .../operators/emr_create_job_flow_operator.html |33 +- .../emr_terminate_job_flow_operator.html|31 +- .../airflow/contrib/operators/file_to_gcs.html | 310 + .../airflow/contrib/operators/file_to_wasb.html |35 +- .../operators/gcs_download_operator.html|56 +- .../contrib/operators/gcs_list_operator.html| 326 + .../airflow/contrib/operators/gcs_operator.html | 357 + .../airflow/contrib/operators/gcs_to_bq.html| 288 +- .../airflow/contrib/operators/gcs_to_gcs.html | 365 + .../airflow/contrib/operators/gcs_to_s3.html| 347 + .../contrib/operators/hipchat_operator.html |35 +- .../operators/jenkins_job_trigger_operator.html | 484 + .../contrib/operators/jira_operator.html| 329 + .../operators/kubernetes_pod_operator.html | 362 + .../contrib/operators/mlengine_operator.html| 260 +- .../airflow/contrib/operators/mysql_to_gcs.html | 524 + .../operators/postgres_to_gcs_operator.html | 481 + .../contrib/operators/pubsub_operator.html | 669 ++ .../contrib/operators/qubole_operator.html | 411 + .../contrib/operators/s3_list_operator.html
[10/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/http_operator.html -- diff --git a/_modules/airflow/operators/http_operator.html b/_modules/airflow/operators/http_operator.html new file mode 100644 index 000..5890dde --- /dev/null +++ b/_modules/airflow/operators/http_operator.html @@ -0,0 +1,327 @@ + + + + + + + + + + + airflow.operators.http_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.operators.http_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.operators.http_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.exceptions import AirflowException +from airflow.hooks.http_hook import HttpHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class SimpleHttpOperator(BaseOperator): + +Calls an endpoint on an HTTP system to execute an action + +:param http_conn_id: The connection to run the sensor against +:type http_conn_id: string +:param endpoint: The relative part of the full url +:type endpoint: string +:param method: The HTTP method to use, default = POST +:type method: string +:param data: The data to pass. POST-data in POST/PUT and params +in the URL for a GET request. +:type data: For POST/PUT, depends on the content-type parameter, +for GET a dictionary of key/value string pairs +:param headers: The HTTP headers to be added to the GET request +:type headers: a dictionary of string key/value pairs +:param response_check: A check against the requests response object. +Returns True for pass and False otherwise. +:type response_check: A lambda or defined function. +:param extra_options: Extra options for the requests library, see the +requests documentation (options to modify timeout, ssl, etc.) +:type extra_options: A dictionary of options, where key is string and value +depends on the option thats being modified. + + +template_fields = (endpoint, data,) +template_ext = () +ui_color = #f4a460 + +@apply_defaults +def __init__(self, + endpoint, + method=POST, + data=None, + headers=None, + response_check=None, + extra_options=None, + xcom_push=False, + http_conn_id=http_default, *args, **kwargs): + +If xcom_push is True, response of an HTTP request will also +be pushed to an XCom. + +super(SimpleHttpOperator, self).__init__(*args, **kwargs) +self.http_conn_id = http_conn_id +self.method = method +self.endpoint = endpoint +self.headers = headers or {} +self.data = data or {} +self.response_check = response_check +self.extra_options = extra_options or {} +self.xcom_push_flag = xcom_push + +
[48/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/databricks_hook.html -- diff --git a/_modules/airflow/contrib/hooks/databricks_hook.html b/_modules/airflow/contrib/hooks/databricks_hook.html new file mode 100644 index 000..00ae02f --- /dev/null +++ b/_modules/airflow/contrib/hooks/databricks_hook.html @@ -0,0 +1,462 @@ + + + + + + + + + + + airflow.contrib.hooks.databricks_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.databricks_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.databricks_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import requests + +from airflow import __version__ +from airflow.exceptions import AirflowException +from airflow.hooks.base_hook import BaseHook +from requests import exceptions as requests_exceptions +from requests.auth import AuthBase + +from airflow.utils.log.logging_mixin import LoggingMixin + +try: +from urllib import parse as urlparse +except ImportError: +import urlparse + + +SUBMIT_RUN_ENDPOINT = (POST, api/2.0/jobs/runs/submit) +GET_RUN_ENDPOINT = (GET, api/2.0/jobs/runs/get) +CANCEL_RUN_ENDPOINT = (POST, api/2.0/jobs/runs/cancel) +USER_AGENT_HEADER = {user-agent: airflow-{v}.format(v=__version__)} + + +[docs]class DatabricksHook(BaseHook, LoggingMixin): + +Interact with Databricks. + +def __init__( +self, +databricks_conn_id=databricks_default, +timeout_seconds=180, +retry_limit=3): + +:param databricks_conn_id: The name of the databricks connection to use. +:type databricks_conn_id: string +:param timeout_seconds: The amount of time in seconds the requests library +will wait before timing-out. +:type timeout_seconds: int +:param retry_limit: The number of times to retry the connection in case of +service outages. +:type retry_limit: int + +self.databricks_conn_id = databricks_conn_id +self.databricks_conn = self.get_connection(databricks_conn_id) +self.timeout_seconds = timeout_seconds +assert retry_limit = 1, Retry limit must be greater than equal to 1 +self.retry_limit = retry_limit + +def _parse_host(self, host): + +The purpose of this function is to be robust to improper connections +settings provided by users, specifically in the host field. + + +For example -- when users supply ``https://xx.cloud.databricks.com`` as the +host, we must strip out the protocol to get the host. + h = DatabricksHook() + assert h._parse_host(https://xx.cloud.databricks.com;) == \ +xx.cloud.databricks.com + +In the case where users supply the correct ``xx.cloud.databricks.com`` as the +host, this function is a no-op. + assert h._parse_host(xx.cloud.databricks.com) ==
[38/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/sqoop_hook.html -- diff --git a/_modules/airflow/contrib/hooks/sqoop_hook.html b/_modules/airflow/contrib/hooks/sqoop_hook.html new file mode 100644 index 000..4563030 --- /dev/null +++ b/_modules/airflow/contrib/hooks/sqoop_hook.html @@ -0,0 +1,580 @@ + + + + + + + + + + + airflow.contrib.hooks.sqoop_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.sqoop_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.sqoop_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + + +This module contains a sqoop 1.x hook + +import subprocess + +from airflow.exceptions import AirflowException +from airflow.hooks.base_hook import BaseHook +from airflow.utils.log.logging_mixin import LoggingMixin +from copy import deepcopy + + +[docs]class SqoopHook(BaseHook, LoggingMixin): + +This hook is a wrapper around the sqoop 1 binary. To be able to use the hook +it is required that sqoop is in the PATH. + +Additional arguments that can be passed via the extra JSON field of the +sqoop connection: +* job_tracker: Job tracker local|jobtracker:port. +* namenode: Namenode. +* lib_jars: Comma separated jar files to include in the classpath. +* files: Comma separated files to be copied to the map reduce cluster. +* archives: Comma separated archives to be unarchived on the compute +machines. +* password_file: Path to file containing the password. + +:param conn_id: Reference to the sqoop connection. +:type conn_id: str +:param verbose: Set sqoop to verbose. +:type verbose: bool +:param num_mappers: Number of map tasks to import in parallel. +:type num_mappers: int +:param properties: Properties to set via the -D argument +:type properties: dict + + +def __init__(self, conn_id=sqoop_default, verbose=False, + num_mappers=None, hcatalog_database=None, + hcatalog_table=None, properties=None): +# No mutable types in the default parameters +self.conn = self.get_connection(conn_id) +connection_parameters = self.conn.extra_dejson +self.job_tracker = connection_parameters.get(job_tracker, None) +self.namenode = connection_parameters.get(namenode, None) +self.libjars = connection_parameters.get(libjars, None) +self.files = connection_parameters.get(files, None) +self.archives = connection_parameters.get(archives, None) +self.password_file = connection_parameters.get(password_file, None) +self.hcatalog_database = hcatalog_database +self.hcatalog_table = hcatalog_table +self.verbose = verbose +self.num_mappers = num_mappers +self.properties = properties or {} +self.log.info(Using connection to: {}:{}/{}.format(self.conn.host, self.conn.port, self.conn.schema)) + +
[47/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/discord_webhook_hook.html -- diff --git a/_modules/airflow/contrib/hooks/discord_webhook_hook.html b/_modules/airflow/contrib/hooks/discord_webhook_hook.html new file mode 100644 index 000..1115b8c --- /dev/null +++ b/_modules/airflow/contrib/hooks/discord_webhook_hook.html @@ -0,0 +1,375 @@ + + + + + + + + + + + airflow.contrib.hooks.discord_webhook_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.discord_webhook_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.discord_webhook_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import json +import re + +from airflow.hooks.http_hook import HttpHook +from airflow.exceptions import AirflowException + + +[docs]class DiscordWebhookHook(HttpHook): + +This hook allows you to post messages to Discord using incoming webhooks. +Takes a Discord connection ID with a default relative webhook endpoint. The +default endpoint can be overridden using the webhook_endpoint parameter + (https://discordapp.com/developers/docs/resources/webhook). + +Each Discord webhook can be pre-configured to use a specific username and +avatar_url. You can override these defaults in this hook. + +:param http_conn_id: Http connection ID with host as https://discord.com/api/; and + default webhook endpoint in the extra field in the form of + {webhook_endpoint: webhooks/{webhook.id}/{webhook.token}} +:type http_conn_id: str +:param webhook_endpoint: Discord webhook endpoint in the form of + webhooks/{webhook.id}/{webhook.token} +:type webhook_endpoint: str +:param message: The message you want to send to your Discord channel +(max 2000 characters) +:type message: str +:param username: Override the default username of the webhook +:type username: str +:param avatar_url: Override the default avatar of the webhook +:type avatar_url: str +:param tts: Is a text-to-speech message +:type tts: bool +:param proxy: Proxy to use to make the Discord webhook call +:type proxy: str + + +def __init__(self, + http_conn_id=None, + webhook_endpoint=None, + message=, + username=None, + avatar_url=None, + tts=False, + proxy=None, + *args, + **kwargs): +super(DiscordWebhookHook, self).__init__(*args, **kwargs) +self.http_conn_id = http_conn_id +self.webhook_endpoint = self._get_webhook_endpoint(http_conn_id, webhook_endpoint) +self.message = message +self.username = username +self.avatar_url = avatar_url +self.tts = tts +self.proxy = proxy + +def
[30/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/gcs_to_bq.html -- diff --git a/_modules/airflow/contrib/operators/gcs_to_bq.html b/_modules/airflow/contrib/operators/gcs_to_bq.html index 1d61268..db2b756 100644 --- a/_modules/airflow/contrib/operators/gcs_to_bq.html +++ b/_modules/airflow/contrib/operators/gcs_to_bq.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,17 +171,22 @@ Source code for airflow.contrib.operators.gcs_to_bq # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. import json @@ -192,6 +199,92 @@ [docs]class GoogleCloudStorageToBigQueryOperator(BaseOperator): Loads files from Google cloud storage into BigQuery. + +The schema to be used for the BigQuery table may be specified in one of +two ways. You may either directly pass the schema fields in, or you may +point the operator to a Google cloud storage object name. The object in +Google cloud storage must be a JSON file with the schema fields in it. + +:param bucket: The bucket to load from. +:type bucket: string +:param source_objects: List of Google cloud storage URIs to load from. +If source_format is DATASTORE_BACKUP, the list must only contain a single URI. +:type object: list +:param destination_project_dataset_table: The dotted (project.)dataset.table +BigQuery table to load data into. If project is not included, project will +be the project defined in the connection json. +:type destination_project_dataset_table: string +:param schema_fields: If set, the schema field list as defined here: + https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load +Should not be set when source_format is DATASTORE_BACKUP. +:type schema_fields: list +:param schema_object: If set, a GCS object path pointing to a .json file that +contains the schema for the table. +:param schema_object: string +:param source_format: File format to export. +:type source_format: string +:param compression: [Optional] The compression type of the data source. +Possible values include GZIP and NONE. +The default value is NONE. +This setting is ignored for Google Cloud Bigtable, +Google Cloud Datastore backups and Avro formats. +:type compression: string +:param create_disposition: The create disposition if the table doesnt exist. +:type create_disposition: string +:param skip_leading_rows: Number of rows to skip when loading from a CSV. +:type skip_leading_rows: int +:param write_disposition: The write disposition if the table already exists. +:type write_disposition: string +:param field_delimiter: The delimiter to use when loading from a CSV. +:type field_delimiter: string +:param max_bad_records: The maximum number of bad records that BigQuery can +ignore when running the job. +:type max_bad_records: int +:param quote_character: The value that is used to quote data sections in a CSV file. +:type quote_character: string +:param ignore_unknown_values: [Optional] Indicates if BigQuery should allow +extra values that are not represented in the table schema. +If true, the extra values are ignored. If false, records with
[32/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/datastore_export_operator.html -- diff --git a/_modules/airflow/contrib/operators/datastore_export_operator.html b/_modules/airflow/contrib/operators/datastore_export_operator.html new file mode 100644 index 000..94e2d02 --- /dev/null +++ b/_modules/airflow/contrib/operators/datastore_export_operator.html @@ -0,0 +1,344 @@ + + + + + + + + + + + airflow.contrib.operators.datastore_export_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.datastore_export_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.datastore_export_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +from airflow.contrib.hooks.datastore_hook import DatastoreHook +from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class DatastoreExportOperator(BaseOperator): + +Export entities from Google Cloud Datastore to Cloud Storage + +:param bucket: name of the cloud storage bucket to backup data +:type bucket: string +:param namespace: optional namespace path in the specified Cloud Storage bucket +to backup data. If this namespace does not exist in GCS, it will be created. +:type namespace: str +:param datastore_conn_id: the name of the Datastore connection id to use +:type datastore_conn_id: string +:param cloud_storage_conn_id: the name of the cloud storage connection id to force-write +backup +:type cloud_storage_conn_id: string +:param delegate_to: The account to impersonate, if any. +For this to work, the service account making the request must have domain-wide +delegation enabled. +:type delegate_to: string +:param entity_filter: description of what data from the project is included in the export, +refer to https://cloud.google.com/datastore/docs/reference/rest/Shared.Types/EntityFilter +:type entity_filter: dict +:param labels: client-assigned labels for cloud storage +:type labels: dict +:param polling_interval_in_seconds: number of seconds to wait before polling for +execution status again +:type polling_interval_in_seconds: int +:param overwrite_existing: if the storage bucket + namespace is not empty, it will be +emptied prior to exports. This enables overwriting existing backups. +:type overwrite_existing: bool +:param xcom_push: push operation name to xcom for reference +:type xcom_push: bool + + +@apply_defaults +def __init__(self, + bucket, + namespace=None, + datastore_conn_id=google_cloud_default, + cloud_storage_conn_id=google_cloud_default, +
[08/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/python_operator.html -- diff --git a/_modules/airflow/operators/python_operator.html b/_modules/airflow/operators/python_operator.html new file mode 100644 index 000..fb1e29d --- /dev/null +++ b/_modules/airflow/operators/python_operator.html @@ -0,0 +1,613 @@ + + + + + + + + + + + airflow.operators.python_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.operators.python_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.operators.python_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from builtins import str +import dill +import inspect +import os +import pickle +import subprocess +import sys +import types + +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator, SkipMixin +from airflow.utils.decorators import apply_defaults +from airflow.utils.file import TemporaryDirectory + +from textwrap import dedent + + +[docs]class PythonOperator(BaseOperator): + +Executes a Python callable + +:param python_callable: A reference to an object that is callable +:type python_callable: python callable +:param op_kwargs: a dictionary of keyword arguments that will get unpacked +in your function +:type op_kwargs: dict +:param op_args: a list of positional arguments that will get unpacked when +calling your callable +:type op_args: list +:param provide_context: if set to true, Airflow will pass a set of +keyword arguments that can be used in your function. This set of +kwargs correspond exactly to what you can use in your jinja +templates. For this to work, you need to define `**kwargs` in your +function header. +:type provide_context: bool +:param templates_dict: a dictionary where the values are templates that +will get templated by the Airflow engine sometime between +``__init__`` and ``execute`` takes place and are made available +in your callables context after the template has been applied +:type templates_dict: dict of str +:param templates_exts: a list of file extensions to resolve while +processing templated fields, for examples ``[.sql, .hql]`` +:type templates_exts: list(str) + +template_fields = (templates_dict,) +template_ext = tuple() +ui_color = #ffefeb + +@apply_defaults +def __init__( +self, +python_callable, +op_args=None, +op_kwargs=None, +provide_context=False, +templates_dict=None, +templates_exts=None, +*args, **kwargs): +super(PythonOperator, self).__init__(*args, **kwargs) +if not callable(python_callable): +raise AirflowException(`python_callable` param must be callable) +self.python_callable = python_callable +self.op_args = op_args or
[41/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/redshift_hook.html -- diff --git a/_modules/airflow/contrib/hooks/redshift_hook.html b/_modules/airflow/contrib/hooks/redshift_hook.html new file mode 100644 index 000..314fd9e --- /dev/null +++ b/_modules/airflow/contrib/hooks/redshift_hook.html @@ -0,0 +1,348 @@ + + + + + + + + + + + airflow.contrib.hooks.redshift_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.redshift_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.redshift_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.aws_hook import AwsHook + + +[docs]class RedshiftHook(AwsHook): + +Interact with AWS Redshift, using the boto3 library + +def get_conn(self): +return self.get_client_type(redshift) + +# TODO: Wrap create_cluster_snapshot +[docs] def cluster_status(self, cluster_identifier): + +Return status of a cluster + +:param cluster_identifier: unique identifier of a cluster +:type cluster_identifier: str + +conn = self.get_conn() +try: +response = conn.describe_clusters( +ClusterIdentifier=cluster_identifier)[Clusters] +return response[0][ClusterStatus] if response else None +except conn.exceptions.ClusterNotFoundFault: +return cluster_not_found + +[docs] def delete_cluster( +self, +cluster_identifier, +skip_final_cluster_snapshot=True, +final_cluster_snapshot_identifier=): + +Delete a cluster and optionally create a snapshot + +:param cluster_identifier: unique identifier of a cluster +:type cluster_identifier: str +:param skip_final_cluster_snapshot: determines cluster snapshot creation +:type skip_final_cluster_snapshot: bool +:param final_cluster_snapshot_identifier: name of final cluster snapshot +:type final_cluster_snapshot_identifier: str + +response = self.get_conn().delete_cluster( +ClusterIdentifier=cluster_identifier, +SkipFinalClusterSnapshot=skip_final_cluster_snapshot, +FinalClusterSnapshotIdentifier=final_cluster_snapshot_identifier +) +return response[Cluster] if response[Cluster] else None + +[docs] def describe_cluster_snapshots(self, cluster_identifier): + +Gets a list of snapshots for a cluster + +:param cluster_identifier: unique identifier of a cluster +:type cluster_identifier: str + +response = self.get_conn().describe_cluster_snapshots( +ClusterIdentifier=cluster_identifier +) +if Snapshots not in response: +return None +snapshots = response[Snapshots] +snapshots = filter(lambda x: x[Status], snapshots) +
[07/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/s3_file_transform_operator.html -- diff --git a/_modules/airflow/operators/s3_file_transform_operator.html b/_modules/airflow/operators/s3_file_transform_operator.html index 8db7bc2..366400e 100644 --- a/_modules/airflow/operators/s3_file_transform_operator.html +++ b/_modules/airflow/operators/s3_file_transform_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,17 +171,22 @@ Source code for airflow.operators.s3_file_transform_operator # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. from tempfile import NamedTemporaryFile import subprocess @@ -200,10 +207,13 @@ The locations of the source and the destination files in the local filesystem is provided as an first and second arguments to the transformation script. The transformation script is expected to read the -data from source , transform it and write the output to the local +data from source, transform it and write the output to the local destination file. The operator then takes over control and uploads the local destination file to S3. +S3 Select is also available to filter the source contents. Users can +omit the transformation script if S3 Select expression is specified. + :param source_s3_key: The key to be retrieved from S3 :type source_s3_key: str :param source_aws_conn_id: source s3 connection @@ -216,6 +226,8 @@ :type replace: bool :param transform_script: location of the executable transformation script :type transform_script: str +:param select_expression: S3 Select expression +:type select_expression: str template_fields = (source_s3_key, dest_s3_key) @@ -227,7 +239,8 @@ self, source_s3_key, dest_s3_key, -transform_script, +transform_script=None, +select_expression=None, source_aws_conn_id=aws_default, dest_aws_conn_id=aws_default, replace=False, @@ -239,34 +252,54 @@ self.dest_aws_conn_id = dest_aws_conn_id self.replace = replace self.transform_script = transform_script +self.select_expression = select_expression def execute(self, context): +if self.transform_script is None and self.select_expression is None: +raise AirflowException( +Either transform_script or select_expression must be specified) + source_s3 = S3Hook(aws_conn_id=self.source_aws_conn_id) dest_s3 = S3Hook(aws_conn_id=self.dest_aws_conn_id) + self.log.info(Downloading source S3 file %s, self.source_s3_key) if not source_s3.check_for_key(self.source_s3_key): -raise AirflowException(The source key {0} does not exist.format(self.source_s3_key)) +raise AirflowException( +The source key {0} does not exist.format(self.source_s3_key)) source_s3_key_object = source_s3.get_key(self.source_s3_key) -with NamedTemporaryFile(w) as f_source, NamedTemporaryFile(w) as f_dest: + +with NamedTemporaryFile(wb) as f_source, NamedTemporaryFile(wb) as f_dest: self.log.info( Dumping S3 file %s contents to local file %s,
[01/51] [partial] incubator-airflow-site git commit: 1.10.0
Repository: incubator-airflow-site Updated Branches: refs/heads/asf-site 28a3eb600 -> 11437c14a http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/bigquery_hook.html -- diff --git a/_modules/bigquery_hook.html b/_modules/bigquery_hook.html deleted file mode 100644 index e58a93b..000 --- a/_modules/bigquery_hook.html +++ /dev/null @@ -1,1279 +0,0 @@ - - - - - - - - - - - bigquery_hook Airflow Documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Airflow - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Project -License -Quick Start -Installation -Tutorial -Configuration -UI / Screenshots -Concepts -Data Profiling -Command Line Interface -Scheduling Triggers -Plugins -Security -Experimental Rest API -Integration -FAQ -API Reference - - - - - - - - - - - - - - - Airflow - - - - - - - - - - - - - - - - - - - - - - - - - - - - Docs - - Module code - - bigquery_hook - - - - - - - - - - - - - - http://schema.org/Article;> - - - Source code for bigquery_hook -# -*- coding: utf-8 -*- -# -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - - -This module contains a BigQuery Hook, as well as a very basic PEP 249 -implementation for BigQuery. - - -import time - -from apiclient.discovery import build, HttpError -from googleapiclient import errors -from builtins import range -from pandas_gbq.gbq import GbqConnector, \ -_parse_data as gbq_parse_data, \ -_check_google_client_version as gbq_check_google_client_version, \ -_test_google_api_imports as gbq_test_google_api_imports -from pandas.tools.merge import concat -from past.builtins import basestring - -from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook -from airflow.hooks.dbapi_hook import DbApiHook -from airflow.utils.log.logging_mixin import LoggingMixin - - -[docs]class BigQueryHook(GoogleCloudBaseHook, DbApiHook, LoggingMixin): - -Interact with BigQuery. This hook uses the Google Cloud Platform -connection. - -conn_name_attr = bigquery_conn_id - -def __init__(self, - bigquery_conn_id=bigquery_default, - delegate_to=None): -super(BigQueryHook, self).__init__( -conn_id=bigquery_conn_id, -delegate_to=delegate_to) - -[docs] def get_conn(self): - -Returns a BigQuery PEP 249 connection object. - -service = self.get_service() -project = self._get_field(project) -return BigQueryConnection(service=service, project_id=project) - -[docs] def get_service(self): - -Returns a BigQuery service object. - -http_authorized = self._authorize() -return build(bigquery, v2, http=http_authorized) - -[docs] def insert_rows(self, table, rows, target_fields=None, commit_every=1000): - -Insertion is currently unsupported. Theoretically, you could use -BigQuerys streaming API to insert rows into a table, but this hasnt -been implemented. - -raise NotImplementedError() - -[docs] def get_pandas_df(self, bql, parameters=None, dialect=legacy): - -Returns a Pandas DataFrame for the results produced by a BigQuery -query. The DbApiHook method must be overridden because Pandas -doesnt support PEP 249 connections, except for SQLite. See: - - https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L447 -https://github.com/pydata/pandas/issues/6900 - -:param bql: The BigQuery SQL to execute. -:type bql: string -:param parameters: The parameters to render the SQL query with (not used, leave to override superclass method) -:type parameters: mapping or iterable -
[45/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/gcp_dataproc_hook.html -- diff --git a/_modules/airflow/contrib/hooks/gcp_dataproc_hook.html b/_modules/airflow/contrib/hooks/gcp_dataproc_hook.html new file mode 100644 index 000..4ca7edd --- /dev/null +++ b/_modules/airflow/contrib/hooks/gcp_dataproc_hook.html @@ -0,0 +1,463 @@ + + + + + + + + + + + airflow.contrib.hooks.gcp_dataproc_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.gcp_dataproc_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.gcp_dataproc_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import time +import uuid + +from apiclient.discovery import build + +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook +from airflow.utils.log.logging_mixin import LoggingMixin + + +class _DataProcJob(LoggingMixin): +def __init__(self, dataproc_api, project_id, job, region=global): +self.dataproc_api = dataproc_api +self.project_id = project_id +self.region = region +self.job = dataproc_api.projects().regions().jobs().submit( +projectId=self.project_id, +region=self.region, +body=job).execute() +self.job_id = self.job[reference][jobId] +self.log.info( +DataProc job %s is %s, +self.job_id, str(self.job[status][state]) +) + +def wait_for_done(self): +while True: +self.job = self.dataproc_api.projects().regions().jobs().get( +projectId=self.project_id, +region=self.region, +jobId=self.job_id).execute(num_retries=5) +if ERROR == self.job[status][state]: +print(str(self.job)) +self.log.error(DataProc job %s has errors, self.job_id) +self.log.error(self.job[status][details]) +self.log.debug(str(self.job)) +return False +if CANCELLED == self.job[status][state]: +print(str(self.job)) +self.log.warning(DataProc job %s is cancelled, self.job_id) +if details in self.job[status]: +self.log.warning(self.job[status][details]) +self.log.debug(str(self.job)) +return False +if DONE == self.job[status][state]: +return True +self.log.debug( +DataProc job %s is %s, +self.job_id, str(self.job[status][state]) +) +time.sleep(5) + +def raise_error(self, message=None): +if ERROR == self.job[status][state]: +if message is None: +message = Google DataProc job has error +raise Exception(message + : + str(self.job[status][details])) + +def get(self): +return self.job + + +class _DataProcJobBuilder: +def
[21/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/emr_job_flow_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/emr_job_flow_sensor.html b/_modules/airflow/contrib/sensors/emr_job_flow_sensor.html new file mode 100644 index 000..4fcbbc9 --- /dev/null +++ b/_modules/airflow/contrib/sensors/emr_job_flow_sensor.html @@ -0,0 +1,288 @@ + + + + + + + + + + + airflow.contrib.sensors.emr_job_flow_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.sensors.emr_job_flow_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.sensors.emr_job_flow_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from airflow.contrib.hooks.emr_hook import EmrHook +from airflow.contrib.sensors.emr_base_sensor import EmrBaseSensor +from airflow.utils import apply_defaults + + +[docs]class EmrJobFlowSensor(EmrBaseSensor): + +Asks for the state of the JobFlow until it reaches a terminal state. +If it fails the sensor errors, failing the task. + +:param job_flow_id: job_flow_id to check the state of +:type job_flow_id: string + + +NON_TERMINAL_STATES = [STARTING, BOOTSTRAPPING, RUNNING, WAITING, TERMINATING] +FAILED_STATE = [TERMINATED_WITH_ERRORS] +template_fields = [job_flow_id] +template_ext = () + +@apply_defaults +def __init__(self, + job_flow_id, + *args, + **kwargs): +super(EmrJobFlowSensor, self).__init__(*args, **kwargs) +self.job_flow_id = job_flow_id + +def get_emr_response(self): +emr = EmrHook(aws_conn_id=self.aws_conn_id).get_conn() + +self.log.info(Poking cluster %s, self.job_flow_id) +return emr.describe_cluster(ClusterId=self.job_flow_id) + +def state_from_response(self, response): +return response[Cluster][Status][State] + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/emr_step_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/emr_step_sensor.html
[44/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/gcp_pubsub_hook.html -- diff --git a/_modules/airflow/contrib/hooks/gcp_pubsub_hook.html b/_modules/airflow/contrib/hooks/gcp_pubsub_hook.html new file mode 100644 index 000..19713d1 --- /dev/null +++ b/_modules/airflow/contrib/hooks/gcp_pubsub_hook.html @@ -0,0 +1,519 @@ + + + + + + + + + + + airflow.contrib.hooks.gcp_pubsub_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.gcp_pubsub_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.gcp_pubsub_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from uuid import uuid4 + +from apiclient.discovery import build +from apiclient import errors + +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + + +def _format_subscription(project, subscription): +return projects/{}/subscriptions/{}.format(project, subscription) + + +def _format_topic(project, topic): +return projects/{}/topics/{}.format(project, topic) + + +class PubSubException(Exception): +pass + + +[docs]class PubSubHook(GoogleCloudBaseHook): +Hook for accessing Google Pub/Sub. + +The GCP project against which actions are applied is determined by +the project embedded in the Connection referenced by gcp_conn_id. + + +def __init__(self, gcp_conn_id=google_cloud_default, delegate_to=None): +super(PubSubHook, self).__init__(gcp_conn_id, delegate_to=delegate_to) + +[docs] def get_conn(self): +Returns a Pub/Sub service object. + +:rtype: apiclient.discovery.Resource + +http_authorized = self._authorize() +return build(pubsub, v1, http=http_authorized) + +[docs] def publish(self, project, topic, messages): +Publishes messages to a Pub/Sub topic. + +:param project: the GCP project ID in which to publish +:type project: string +:param topic: the Pub/Sub topic to which to publish; do not +include the ``projects/{project}/topics/`` prefix. +:type topic: string +:param messages: messages to publish; if the data field in a +message is set, it should already be base64 encoded. +:type messages: list of PubSub messages; see + http://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage + +body = {messages: messages} +full_topic = _format_topic(project, topic) +request = self.get_conn().projects().topics().publish( +topic=full_topic, body=body) +try: +request.execute() +except errors.HttpError as e: +raise PubSubException( +Error publishing to topic {}.format(full_topic), e) + +[docs] def create_topic(self, project, topic, fail_if_exists=False): +Creates a Pub/Sub topic, if it does not already exist. + +:param project: the GCP project ID in which to
[20/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/hdfs_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/hdfs_sensor.html b/_modules/airflow/contrib/sensors/hdfs_sensor.html new file mode 100644 index 000..df9ecaa --- /dev/null +++ b/_modules/airflow/contrib/sensors/hdfs_sensor.html @@ -0,0 +1,313 @@ + + + + + + + + + + + airflow.contrib.sensors.hdfs_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.sensors.hdfs_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.sensors.hdfs_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from airflow.sensors.hdfs_sensor import HdfsSensor + + +[docs]class HdfsSensorRegex(HdfsSensor): +def __init__(self, + regex, + *args, + **kwargs): +super(HdfsSensorRegex, self).__init__(*args, **kwargs) +self.regex = regex + +[docs] def poke(self, context): + +poke matching files in a directory with self.regex + +:return: Bool depending on the search criteria + +sb = self.hook(self.hdfs_conn_id).get_conn() +self.log.info( +Poking for {self.filepath} to be a directory +with files matching {self.regex.pattern}. +format(**locals()) +) +result = [f for f in sb.ls([self.filepath], include_toplevel=False) if + f[file_type] == f and + self.regex.match(f[path].replace(%s/ % self.filepath, ))] +result = self.filter_for_ignored_ext(result, self.ignored_ext, + self.ignore_copying) +result = self.filter_for_filesize(result, self.file_size) +return bool(result) + + +[docs]class HdfsSensorFolder(HdfsSensor): +def __init__(self, + be_empty=False, + *args, + **kwargs): +super(HdfsSensorFolder, self).__init__(*args, **kwargs) +self.be_empty = be_empty + +[docs] def poke(self, context): + +poke for a non empty directory + +:return: Bool depending on the search criteria + +sb = self.hook(self.hdfs_conn_id).get_conn() +result = [f for f in sb.ls([self.filepath], include_toplevel=True)] +result = self.filter_for_ignored_ext(result, self.ignored_ext, + self.ignore_copying) +result = self.filter_for_filesize(result, self.file_size) +if self.be_empty: +self.log.info(Poking for filepath {self.filepath} to a empty directory + .format(**locals())) +return len(result) == 1 and result[0][path] == self.filepath +else: +self.log.info(Poking for filepath {self.filepath} to a non empty directory + .format(**locals())) +result.pop(0) +return
[25/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/s3_to_gcs_operator.html -- diff --git a/_modules/airflow/contrib/operators/s3_to_gcs_operator.html b/_modules/airflow/contrib/operators/s3_to_gcs_operator.html new file mode 100644 index 000..754fcb2 --- /dev/null +++ b/_modules/airflow/contrib/operators/s3_to_gcs_operator.html @@ -0,0 +1,425 @@ + + + + + + + + + + + airflow.contrib.operators.s3_to_gcs_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.s3_to_gcs_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.s3_to_gcs_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from tempfile import NamedTemporaryFile + +from airflow.contrib.hooks.gcs_hook import (GoogleCloudStorageHook, +_parse_gcs_url) +from airflow.contrib.operators.s3_list_operator import S3ListOperator +from airflow.exceptions import AirflowException +from airflow.hooks.S3_hook import S3Hook +from airflow.utils.decorators import apply_defaults + + +[docs]class S3ToGoogleCloudStorageOperator(S3ListOperator): + +Synchronizes an S3 key, possibly a prefix, with a Google Cloud Storage +destination path. + +:param bucket: The S3 bucket where to find the objects. +:type bucket: string +:param prefix: Prefix string which filters objects whose name begin with +such prefix. +:type prefix: string +:param delimiter: the delimiter marks key hierarchy. +:type delimiter: string +:param aws_conn_id: The source S3 connection +:type aws_conn_id: string +:param dest_gcs_conn_id: The destination connection ID to use +when connecting to Google Cloud Storage. +:type dest_gcs_conn_id: string +:param dest_gcs: The destination Google Cloud Storage bucket and prefix +where you want to store the files. +:type dest_gcs: string +:param delegate_to: The account to impersonate, if any. +For this to work, the service account making the request must have +domain-wide delegation enabled. +:type delegate_to: string +:param replace: Whether you want to replace existing destination files +or not. +:type replace: bool + + +**Example**: +.. code-block:: python + s3_to_gcs_op = S3ToGoogleCloudStorageOperator( +task_id=s3_to_gcs_example, +bucket=my-s3-bucket, +prefix=data/customers-201804, + dest_gcs_conn_id=google_cloud_default, + dest_gcs=gs://my.gcs.bucket/some/customers/, +replace=False, +dag=my-dag) + +Note that ``bucket``, ``prefix``, ``delimiter`` and ``dest_gcs`` are +templated, so you can use variables in them if you wish. + + +template_fields = (bucket, prefix, delimiter, dest_gcs) +ui_color = #e09411 + +@apply_defaults +def __init__(self, + bucket,
[14/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/hooks/slack_hook.html -- diff --git a/_modules/airflow/hooks/slack_hook.html b/_modules/airflow/hooks/slack_hook.html new file mode 100644 index 000..f331546 --- /dev/null +++ b/_modules/airflow/hooks/slack_hook.html @@ -0,0 +1,296 @@ + + + + + + + + + + + airflow.hooks.slack_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.hooks.slack_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.hooks.slack_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from slackclient import SlackClient +from airflow.hooks.base_hook import BaseHook +from airflow.exceptions import AirflowException + + +[docs]class SlackHook(BaseHook): + + Interact with Slack, using slackclient library. + + +def __init__(self, token=None, slack_conn_id=None): + +Takes both Slack API token directly and connection that has Slack API token. + +If both supplied, Slack API token will be used. + +:param token: Slack API token +:type token: string +:param slack_conn_id: connection that has Slack API token in the password field +:type slack_conn_id: string + +self.token = self.__get_token(token, slack_conn_id) + +def __get_token(self, token, slack_conn_id): +if token is not None: +return token +elif slack_conn_id is not None: +conn = self.get_connection(slack_conn_id) + +if not getattr(conn, password, None): +raise AirflowException(Missing token(password) in Slack connection) +return conn.password +else: +raise AirflowException(Cannot get token: No valid Slack token nor slack_conn_id supplied.) + +def call(self, method, api_params): +sc = SlackClient(self.token) +rc = sc.api_call(method, **api_params) + +if not rc[ok]: +msg = Slack API call failed ({}).format(rc[error]) +raise AirflowException(msg) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/hooks/sqlite_hook.html
[04/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/sensors/hdfs_sensor.html -- diff --git a/_modules/airflow/sensors/hdfs_sensor.html b/_modules/airflow/sensors/hdfs_sensor.html new file mode 100644 index 000..7747a54 --- /dev/null +++ b/_modules/airflow/sensors/hdfs_sensor.html @@ -0,0 +1,352 @@ + + + + + + + + + + + airflow.sensors.hdfs_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.sensors.hdfs_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.sensors.hdfs_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import re +import sys +from builtins import str + +from airflow import settings +from airflow.hooks.hdfs_hook import HDFSHook +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils.decorators import apply_defaults +from airflow.utils.log.logging_mixin import LoggingMixin + + +[docs]class HdfsSensor(BaseSensorOperator): + +Waits for a file or folder to land in HDFS + +template_fields = (filepath,) +ui_color = settings.WEB_COLORS[LIGHTBLUE] + +@apply_defaults +def __init__(self, + filepath, + hdfs_conn_id=hdfs_default, + ignored_ext=[_COPYING_], + ignore_copying=True, + file_size=None, + hook=HDFSHook, + *args, + **kwargs): +super(HdfsSensor, self).__init__(*args, **kwargs) +self.filepath = filepath +self.hdfs_conn_id = hdfs_conn_id +self.file_size = file_size +self.ignored_ext = ignored_ext +self.ignore_copying = ignore_copying +self.hook = hook + +[docs] @staticmethod +def filter_for_filesize(result, size=None): + +Will test the filepath result and test if its size is at least self.filesize + +:param result: a list of dicts returned by Snakebite ls +:param size: the file size in MB a file should be at least to trigger True +:return: (bool) depending on the matching criteria + +if size: +log = LoggingMixin().log +log.debug( +Filtering for file size = %s in files: %s, +size, map(lambda x: x[path], result) +) +size *= settings.MEGABYTE +result = [x for x in result if x[length] = size] +log.debug(HdfsSensor.poke: after size filter result is %s, result) +return result + +[docs] @staticmethod +def filter_for_ignored_ext(result, ignored_ext, ignore_copying): + +Will filter if instructed to do so the result to remove matching criteria + +:param result: (list) of dicts returned by Snakebite ls +:param ignored_ext: (list) of ignored extensions +:param ignore_copying: (bool) shall we ignore ? +:return: (list) of dicts which were not removed + +if ignore_copying:
[19/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/redis_key_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/redis_key_sensor.html b/_modules/airflow/contrib/sensors/redis_key_sensor.html new file mode 100644 index 000..55e26cd --- /dev/null +++ b/_modules/airflow/contrib/sensors/redis_key_sensor.html @@ -0,0 +1,282 @@ + + + + + + + + + + + airflow.contrib.sensors.redis_key_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.sensors.redis_key_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.sensors.redis_key_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from airflow.contrib.hooks.redis_hook import RedisHook +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class RedisKeySensor(BaseSensorOperator): + +Checks for the existence of a key in a Redis database + +template_fields = (key,) +ui_color = #f0eee4 + +@apply_defaults +def __init__(self, key, redis_conn_id, *args, **kwargs): + +Create a new RedisKeySensor + +:param key: The key to be monitored +:type key: string +:param redis_conn_id: The connection ID to use when connecting to Redis DB. +:type redis_conn_id: string + +super(RedisKeySensor, self).__init__(*args, **kwargs) +self.redis_conn_id = redis_conn_id +self.key = key + +def poke(self, context): +self.log.info(Sensor check existence of key: %s, self.key) +return RedisHook(self.redis_conn_id).key_exists(self.key) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/sftp_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/sftp_sensor.html b/_modules/airflow/contrib/sensors/sftp_sensor.html new file mode 100644 index 000..1df8d4f --- /dev/null +++ b/_modules/airflow/contrib/sensors/sftp_sensor.html @@ -0,0 +1,287 @@ + + + + + + + + + + + airflow.contrib.sensors.sftp_sensor Airflow
[49/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/bigquery_hook.html -- diff --git a/_modules/airflow/contrib/hooks/bigquery_hook.html b/_modules/airflow/contrib/hooks/bigquery_hook.html index 1926c79..aff2ebd 100644 --- a/_modules/airflow/contrib/hooks/bigquery_hook.html +++ b/_modules/airflow/contrib/hooks/bigquery_hook.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,39 +171,45 @@ Source code for airflow.contrib.hooks.bigquery_hook # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. # -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - This module contains a BigQuery Hook, as well as a very basic PEP 249 implementation for BigQuery. import time - -from apiclient.discovery import build, HttpError -from googleapiclient import errors from builtins import range -from pandas_gbq.gbq import GbqConnector, \ -_parse_data as gbq_parse_data, \ -_check_google_client_version as gbq_check_google_client_version, \ -_test_google_api_imports as gbq_test_google_api_imports -from pandas.tools.merge import concat + from past.builtins import basestring +from airflow import AirflowException from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook from airflow.hooks.dbapi_hook import DbApiHook from airflow.utils.log.logging_mixin import LoggingMixin +from apiclient.discovery import HttpError, build +from googleapiclient import errors +from pandas_gbq.gbq import \ +_check_google_client_version as gbq_check_google_client_version +from pandas_gbq import read_gbq +from pandas_gbq.gbq import \ +_test_google_api_imports as gbq_test_google_api_imports +from pandas_gbq.gbq import GbqConnector [docs]class BigQueryHook(GoogleCloudBaseHook, DbApiHook, LoggingMixin): @@ -213,10 +221,11 @@ def __init__(self, bigquery_conn_id=bigquery_default, - delegate_to=None): + delegate_to=None, + use_legacy_sql=True): super(BigQueryHook, self).__init__( -conn_id=bigquery_conn_id, -delegate_to=delegate_to) +gcp_conn_id=bigquery_conn_id, delegate_to=delegate_to) +self.use_legacy_sql = use_legacy_sql [docs] def get_conn(self): @@ -224,7 +233,10 @@ service = self.get_service() project = self._get_field(project) -return BigQueryConnection(service=service, project_id=project) +return BigQueryConnection( +service=service, +project_id=project, +use_legacy_sql=self.use_legacy_sql) [docs] def get_service(self): @@ -241,7 +253,7 @@ raise NotImplementedError() -[docs] def get_pandas_df(self, bql, parameters=None, dialect=legacy): +[docs] def get_pandas_df(self, bql, parameters=None, dialect=None): Returns a Pandas DataFrame for the results produced by a BigQuery query. The DbApiHook method must be overridden because Pandas @@ -252,35 +264,31 @@ :param bql: The BigQuery SQL to execute. :type bql: string -:param parameters: The parameters to render the SQL query with (not used, leave to override superclass method) +:param parameters: The parameters to render the SQL query with (not +used, leave to override superclass method) :type
[43/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/gcs_hook.html -- diff --git a/_modules/airflow/contrib/hooks/gcs_hook.html b/_modules/airflow/contrib/hooks/gcs_hook.html index 6f3bc49..5db94f8 100644 --- a/_modules/airflow/contrib/hooks/gcs_hook.html +++ b/_modules/airflow/contrib/hooks/gcs_hook.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,23 +171,31 @@ Source code for airflow.contrib.hooks.gcs_hook # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at # -# http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. # from apiclient.discovery import build from apiclient.http import MediaFileUpload from googleapiclient import errors from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook +from airflow.exceptions import AirflowException + +import re [docs]class GoogleCloudStorageHook(GoogleCloudBaseHook): @@ -195,7 +205,7 @@ def __init__(self, - google_cloud_storage_conn_id=google_cloud_storage_default, + google_cloud_storage_conn_id=google_cloud_default, delegate_to=None): super(GoogleCloudStorageHook, self).__init__(google_cloud_storage_conn_id, delegate_to) @@ -207,7 +217,6 @@ http_authorized = self._authorize() return build(storage, v1, http=http_authorized) - # pylint:disable=redefined-builtin [docs] def copy(self, source_bucket, source_object, destination_bucket=None, destination_object=None): @@ -217,10 +226,10 @@ destination_bucket or destination_object can be omitted, in which case source bucket/object is used, but not both. -:param bucket: The bucket of the object to copy from. -:type bucket: string -:param object: The object to copy. -:type object: string +:param source_bucket: The bucket of the object to copy from. +:type source_bucket: string +:param source_object: The object to copy. +:type source_object: string :param destination_bucket: The destination of the object to copied to. Can be omitted; then the same bucket is used. :type destination_bucket: string @@ -252,9 +261,60 @@ return False raise +[docs] def rewrite(self, source_bucket, source_object, destination_bucket, +destination_object=None): + +Has the same functionality as copy, except that will work on files +over 5 TB, as well as when copying between locations and/or storage +classes. + +destination_object can be omitted, in which case source_object is used. + +:param source_bucket: The bucket of the object to copy from. +:type source_bucket: string +:param source_object: The object to copy. +:type source_object: string +:param destination_bucket: The destination of the object to copied to. +:type destination_bucket: string +:param destination_object: The (renamed) path of the object if given. +Can be omitted; then the same name is used. + +destination_object = destination_object or source_object +if (source_bucket == destination_bucket and +source_object == destination_object): +raise ValueError( +
[37/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/ssh_hook.html -- diff --git a/_modules/airflow/contrib/hooks/ssh_hook.html b/_modules/airflow/contrib/hooks/ssh_hook.html new file mode 100644 index 000..08f1831 --- /dev/null +++ b/_modules/airflow/contrib/hooks/ssh_hook.html @@ -0,0 +1,470 @@ + + + + + + + + + + + airflow.contrib.hooks.ssh_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.ssh_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.ssh_hook +# -*- coding: utf-8 -*- +# +# Copyright 2012-2015 Spotify AB +# Ported to Airflow by Bolke de Bruin +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import getpass +import os + +import paramiko +from paramiko.config import SSH_PORT + +from contextlib import contextmanager +from airflow.exceptions import AirflowException +from airflow.hooks.base_hook import BaseHook +from airflow.utils.log.logging_mixin import LoggingMixin + + +[docs]class SSHHook(BaseHook, LoggingMixin): + +Hook for ssh remote execution using Paramiko. +ref: https://github.com/paramiko/paramiko +This hook also lets you create ssh tunnel and serve as basis for SFTP file transfer + +:param ssh_conn_id: connection id from airflow Connections from where all the required +parameters can be fetched like username, password or key_file. +Thought the priority is given to the param passed during init +:type ssh_conn_id: str +:param remote_host: remote host to connect +:type remote_host: str +:param username: username to connect to the remote_host +:type username: str +:param password: password of the username to connect to the remote_host +:type password: str +:param key_file: key file to use to connect to the remote_host. +:type key_file: str +:param port: port of remote host to connect (Default is paramiko SSH_PORT) +:type port: int +:param timeout: timeout for the attempt to connect to the remote_host. +:type timeout: int +:param keepalive_interval: send a keepalive packet to remote host every keepalive_interval seconds +:type keepalive_interval: int + + +def __init__(self, + ssh_conn_id=None, + remote_host=None, + username=None, + password=None, + key_file=None, + port=SSH_PORT, + timeout=10, + keepalive_interval=30 + ): +super(SSHHook, self).__init__(ssh_conn_id) +self.ssh_conn_id = ssh_conn_id +self.remote_host = remote_host +self.username = username +self.password = password +self.key_file = key_file +self.timeout = timeout +self.keepalive_interval = keepalive_interval +# Default values, overridable from Connection +self.compress = True +self.no_host_key_check = True +self.client = None
[23/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/sqoop_operator.html -- diff --git a/_modules/airflow/contrib/operators/sqoop_operator.html b/_modules/airflow/contrib/operators/sqoop_operator.html new file mode 100644 index 000..31fafec --- /dev/null +++ b/_modules/airflow/contrib/operators/sqoop_operator.html @@ -0,0 +1,467 @@ + + + + + + + + + + + airflow.contrib.operators.sqoop_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.sqoop_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.sqoop_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + + +This module contains a sqoop 1 operator + +import os +import signal + +from airflow.contrib.hooks.sqoop_hook import SqoopHook +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class SqoopOperator(BaseOperator): + +Execute a Sqoop job. +Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html. + +template_fields = (conn_id, cmd_type, table, query, target_dir, file_type, columns, split_by, + where, export_dir, input_null_string, input_null_non_string, staging_table, + enclosed_by, escaped_by, input_fields_terminated_by, input_lines_terminated_by, + input_optionally_enclosed_by, properties, extra_import_options, driver, + extra_export_options, hcatalog_database, hcatalog_table,) +ui_color = #7D8CA4 + +@apply_defaults +def __init__(self, + conn_id=sqoop_default, + cmd_type=import, + table=None, + query=None, + target_dir=None, + append=None, + file_type=text, + columns=None, + num_mappers=None, + split_by=None, + where=None, + export_dir=None, + input_null_string=None, + input_null_non_string=None, + staging_table=None, + clear_staging_table=False, + enclosed_by=None, + escaped_by=None, + input_fields_terminated_by=None, + input_lines_terminated_by=None, + input_optionally_enclosed_by=None, + batch=False, + direct=False, + driver=None, + verbose=False, + relaxed_isolation=False, + properties=None, + hcatalog_database=None, + hcatalog_table=None, + create_hcatalog_table=False, + extra_import_options=None, + extra_export_options=None, + *args, + **kwargs):
[50/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/aws_dynamodb_hook.html -- diff --git a/_modules/airflow/contrib/hooks/aws_dynamodb_hook.html b/_modules/airflow/contrib/hooks/aws_dynamodb_hook.html new file mode 100644 index 000..68c9921 --- /dev/null +++ b/_modules/airflow/contrib/hooks/aws_dynamodb_hook.html @@ -0,0 +1,300 @@ + + + + + + + + + + + airflow.contrib.hooks.aws_dynamodb_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.aws_dynamodb_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.aws_dynamodb_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook + + +[docs]class AwsDynamoDBHook(AwsHook): + +Interact with AWS DynamoDB. + +:param table_keys: partition key and sort key +:type table_keys: list +:param table_name: target DynamoDB table +:type table_name: str +:param region_name: aws region name (example: us-east-1) +:type region_name: str + + +def __init__(self, table_keys=None, table_name=None, region_name=None, *args, **kwargs): +self.table_keys = table_keys +self.table_name = table_name +self.region_name = region_name +super(AwsDynamoDBHook, self).__init__(*args, **kwargs) + +def get_conn(self): +self.conn = self.get_resource_type(dynamodb, self.region_name) +return self.conn + +[docs] def write_batch_data(self, items): + +Write batch items to dynamodb table with provisioned throughout capacity. + + +dynamodb_conn = self.get_conn() + +try: +table = dynamodb_conn.Table(self.table_name) + +with table.batch_writer(overwrite_by_pkeys=self.table_keys) as batch: +for item in items: +batch.put_item(Item=item) +return True +except Exception as general_error: +raise AirflowException( +Failed to insert items in dynamodb, error: {error}.format( +error=str(general_error) +) +) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at
[31/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/file_to_gcs.html -- diff --git a/_modules/airflow/contrib/operators/file_to_gcs.html b/_modules/airflow/contrib/operators/file_to_gcs.html new file mode 100644 index 000..1be0683 --- /dev/null +++ b/_modules/airflow/contrib/operators/file_to_gcs.html @@ -0,0 +1,310 @@ + + + + + + + + + + + airflow.contrib.operators.file_to_gcs Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.file_to_gcs + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.file_to_gcs +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class FileToGoogleCloudStorageOperator(BaseOperator): + +Uploads a file to Google Cloud Storage + +:param src: Path to the local file +:type src: string +:param dst: Destination path within the specified bucket +:type dst: string +:param bucket: The bucket to upload to +:type bucket: string +:param google_cloud_storage_conn_id: The Airflow connection ID to upload with +:type google_cloud_storage_conn_id: string +:param mime_type: The mime-type string +:type mime_type: string +:param delegate_to: The account to impersonate, if any +:type delegate_to: string + +template_fields = (src, dst, bucket) + +@apply_defaults +def __init__(self, + src, + dst, + bucket, + google_cloud_storage_conn_id=google_cloud_default, + mime_type=application/octet-stream, + delegate_to=None, + *args, + **kwargs): +super(FileToGoogleCloudStorageOperator, self).__init__(*args, **kwargs) +self.src = src +self.dst = dst +self.bucket = bucket +self.google_cloud_storage_conn_id = google_cloud_storage_conn_id +self.mime_type = mime_type +self.delegate_to = delegate_to + +[docs] def execute(self, context): + +Uploads the file to Google cloud storage + +hook = GoogleCloudStorageHook( +google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, +delegate_to=self.delegate_to) + +hook.upload( +bucket=self.bucket, +object=self.dst, +mime_type=self.mime_type, +filename=self.src) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +
[12/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/bash_operator.html -- diff --git a/_modules/airflow/operators/bash_operator.html b/_modules/airflow/operators/bash_operator.html new file mode 100644 index 000..d6d7dc6 --- /dev/null +++ b/_modules/airflow/operators/bash_operator.html @@ -0,0 +1,376 @@ + + + + + + + + + + + airflow.operators.bash_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.operators.bash_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.operators.bash_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +from builtins import bytes +import os +import signal +from subprocess import Popen, STDOUT, PIPE +from tempfile import gettempdir, NamedTemporaryFile + +from airflow import configuration as conf +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.utils.file import TemporaryDirectory + + +# These variables are required in cases when BashOperator tasks use airflow specific code, +# e.g. they import packages in the airflow context and the possibility of impersonation +# gives not guarantee that these variables are available in the impersonated environment. +# Hence, we need to propagate them in the Bash script used as a wrapper of commands in +# this BashOperator. +PYTHONPATH_VAR = PYTHONPATH +AIRFLOW_HOME_VAR = AIRFLOW_HOME + + +[docs]class BashOperator(BaseOperator): + +Execute a Bash script, command or set of commands. + +:param bash_command: The command, set of commands or reference to a +bash script (must be .sh) to be executed. +:type bash_command: string +:param xcom_push: If xcom_push is True, the last line written to stdout +will also be pushed to an XCom when the bash command completes. +:type xcom_push: bool +:param env: If env is not None, it must be a mapping that defines the +environment variables for the new process; these are used instead +of inheriting the current process environment, which is the default +behavior. (templated) +:type env: dict +:type output_encoding: output encoding of bash command + +template_fields = (bash_command, env) +template_ext = (.sh, .bash,) +ui_color = #f0ede4 + +@apply_defaults +def __init__( +self, +bash_command, +xcom_push=False, +env=None, +output_encoding=utf-8, +*args, **kwargs): + +super(BashOperator, self).__init__(*args, **kwargs) +self.bash_command = bash_command +self.env = env +self.xcom_push_flag = xcom_push +self.output_encoding = output_encoding + +[docs] def execute(self, context): + +Execute the bash command in a temporary directory +which will be cleaned afterwards + +self.log.info(Tmp dir
[29/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/jenkins_job_trigger_operator.html -- diff --git a/_modules/airflow/contrib/operators/jenkins_job_trigger_operator.html b/_modules/airflow/contrib/operators/jenkins_job_trigger_operator.html new file mode 100644 index 000..1bb3657 --- /dev/null +++ b/_modules/airflow/contrib/operators/jenkins_job_trigger_operator.html @@ -0,0 +1,484 @@ + + + + + + + + + + + airflow.contrib.operators.jenkins_job_trigger_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.jenkins_job_trigger_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.jenkins_job_trigger_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import time +import socket +import json +from airflow.exceptions import AirflowException +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.contrib.hooks.jenkins_hook import JenkinsHook +import jenkins +from jenkins import JenkinsException +from six.moves.urllib.request import Request, urlopen +from six.moves.urllib.error import HTTPError, URLError + +try: +basestring +except NameError: +basestring = str # For python3 compatibility + + +# TODO Use jenkins_urlopen instead when it will be available +# in the stable python-jenkins version ( 0.4.15) +def jenkins_request_with_headers(jenkins_server, req, add_crumb=True): + +We need to get the headers in addition to the body answer +to get the location from them +This function is just a copy of the one present in python-jenkins library +with just the return call changed +:param jenkins_server: The server to query +:param req: The request to execute +:param add_crumb: Boolean to indicate if it should add crumb to the request +:return: + +try: +if jenkins_server.auth: +req.add_header(Authorization, jenkins_server.auth) +if add_crumb: +jenkins_server.maybe_add_crumb(req) +response = urlopen(req, timeout=jenkins_server.timeout) +response_body = response.read() +response_headers = response.info() +if response_body is None: +raise jenkins.EmptyResponseException( +Error communicating with server[%s]: +empty response % jenkins_server.server) +return {body: response_body.decode(utf-8), headers: response_headers} +except HTTPError as e: +# Jenkinss funky authentication means its nigh impossible to +# distinguish errors. +if e.code in [401, 403, 500]: +# six.moves.urllib.error.HTTPError provides a reason +# attribute for all python version except for ver 2.6 +# Falling back to HTTPError.msg since it contains the +# same info as reason +raise JenkinsException( +Error in request. + +
[24/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/spark_jdbc_operator.html -- diff --git a/_modules/airflow/contrib/operators/spark_jdbc_operator.html b/_modules/airflow/contrib/operators/spark_jdbc_operator.html new file mode 100644 index 000..5a9ddab --- /dev/null +++ b/_modules/airflow/contrib/operators/spark_jdbc_operator.html @@ -0,0 +1,449 @@ + + + + + + + + + + + airflow.contrib.operators.spark_jdbc_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.spark_jdbc_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.spark_jdbc_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator +from airflow.contrib.hooks.spark_jdbc_hook import SparkJDBCHook +from airflow.utils.decorators import apply_defaults + + +[docs]class SparkJDBCOperator(SparkSubmitOperator): + +This operator extends the SparkSubmitOperator specifically for performing data +transfers to/from JDBC-based databases with Apache Spark. As with the +SparkSubmitOperator, it assumes that the spark-submit binary is available on the +PATH. + +:param spark_app_name: Name of the job (default airflow-spark-jdbc) +:type spark_app_name: str +:param spark_conn_id: Connection id as configured in Airflow administration +:type spark_conn_id: str +:param spark_conf: Any additional Spark configuration properties +:type spark_conf: dict +:param spark_py_files: Additional python files used (.zip, .egg, or .py) +:type spark_py_files: str +:param spark_files: Additional files to upload to the container running the job +:type spark_files: str +:param spark_jars: Additional jars to upload and add to the driver and + executor classpath +:type spark_jars: str +:param num_executors: number of executor to run. This should be set so as to manage + the number of connections made with the JDBC database +:type num_executors: int +:param executor_cores: Number of cores per executor +:type executor_cores: int +:param executor_memory: Memory per executor (e.g. 1000M, 2G) +:type executor_memory: str +:param driver_memory: Memory allocated to the driver (e.g. 1000M, 2G) +:type driver_memory: str +:param verbose: Whether to pass the verbose flag to spark-submit for debugging +:type verbose: bool +:param keytab: Full path to the file that contains the keytab +:type keytab: str +:param principal: The name of the kerberos principal used for keytab +:type principal: str +:param cmd_type: Which way the data should flow. 2 possible values: + spark_to_jdbc: data written by spark from metastore to jdbc + jdbc_to_spark: data written by spark from jdbc to metastore +:type
[26/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/pubsub_operator.html -- diff --git a/_modules/airflow/contrib/operators/pubsub_operator.html b/_modules/airflow/contrib/operators/pubsub_operator.html new file mode 100644 index 000..99a540f --- /dev/null +++ b/_modules/airflow/contrib/operators/pubsub_operator.html @@ -0,0 +1,669 @@ + + + + + + + + + + + airflow.contrib.operators.pubsub_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.pubsub_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.pubsub_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.gcp_pubsub_hook import PubSubHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class PubSubTopicCreateOperator(BaseOperator): +Create a PubSub topic. + +By default, if the topic already exists, this operator will +not cause the DAG to fail. :: + +with DAG(successful DAG) as dag: +( +dag + PubSubTopicCreateOperator(project=my-project, + topic=my_new_topic) + PubSubTopicCreateOperator(project=my-project, + topic=my_new_topic) +) + +The operator can be configured to fail if the topic already exists. :: + +with DAG(failing DAG) as dag: +( +dag + PubSubTopicCreateOperator(project=my-project, + topic=my_new_topic) + PubSubTopicCreateOperator(project=my-project, + topic=my_new_topic, + fail_if_exists=True) +) + +Both ``project`` and ``topic`` are templated so you can use +variables in them. + +template_fields = [project, topic] +ui_color = #0273d4 + +@apply_defaults +def __init__( +self, +project, +topic, +fail_if_exists=False, +gcp_conn_id=google_cloud_default, +delegate_to=None, +*args, +**kwargs): + +:param project: the GCP project ID where the topic will be created +:type project: string +:param topic: the topic to create. Do not include the +full topic path. In other words, instead of +``projects/{project}/topics/{topic}``, provide only +``{topic}``. (templated) +:type topic: string +:param gcp_conn_id: The connection ID to use connecting to +Google Cloud Platform. +:type gcp_conn_id: string +:param delegate_to: The account to impersonate, if any. +For this to work, the service account making the request +
[39/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/hooks/spark_submit_hook.html -- diff --git a/_modules/airflow/contrib/hooks/spark_submit_hook.html b/_modules/airflow/contrib/hooks/spark_submit_hook.html new file mode 100644 index 000..6903f5f --- /dev/null +++ b/_modules/airflow/contrib/hooks/spark_submit_hook.html @@ -0,0 +1,799 @@ + + + + + + + + + + + airflow.contrib.hooks.spark_submit_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.hooks.spark_submit_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.hooks.spark_submit_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import os +import subprocess +import re +import time + +from airflow.hooks.base_hook import BaseHook +from airflow.exceptions import AirflowException +from airflow.utils.log.logging_mixin import LoggingMixin +from airflow.contrib.kubernetes import kube_client +from kubernetes.client.rest import ApiException + + +[docs]class SparkSubmitHook(BaseHook, LoggingMixin): + +This hook is a wrapper around the spark-submit binary to kick off a spark-submit job. +It requires that the spark-submit binary is in the PATH or the spark_home to be +supplied. +:param conf: Arbitrary Spark configuration properties +:type conf: dict +:param conn_id: The connection id as configured in Airflow administration. When an +invalid connection_id is supplied, it will default to yarn. +:type conn_id: str +:param files: Upload additional files to the executor running the job, separated by a + comma. Files will be placed in the working directory of each executor. + For example, serialized objects. +:type files: str +:param py_files: Additional python files used by the job, can be .zip, .egg or .py. +:type py_files: str +:param driver_classpath: Additional, driver-specific, classpath settings. +:type driver_classpath: str +:param jars: Submit additional jars to upload and place them in executor classpath. +:type jars: str +:param java_class: the main class of the Java application +:type java_class: str +:param packages: Comma-separated list of maven coordinates of jars to include on the +driver and executor classpaths +:type packages: str +:param exclude_packages: Comma-separated list of maven coordinates of jars to exclude +while resolving the dependencies provided in packages +:type exclude_packages: str +:param repositories: Comma-separated list of additional remote repositories to search +for the maven coordinates given with packages +:type repositories: str +:param total_executor_cores: (Standalone Mesos only) Total cores for all executors +(Default: all the available cores on the worker) +:type total_executor_cores: int +:param executor_cores: (Standalone, YARN and Kubernetes only) Number of
[16/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/hooks/hdfs_hook.html -- diff --git a/_modules/airflow/hooks/hdfs_hook.html b/_modules/airflow/hooks/hdfs_hook.html new file mode 100644 index 000..664c043 --- /dev/null +++ b/_modules/airflow/hooks/hdfs_hook.html @@ -0,0 +1,335 @@ + + + + + + + + + + + airflow.hooks.hdfs_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.hooks.hdfs_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.hooks.hdfs_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.hooks.base_hook import BaseHook +from airflow import configuration + +try: +snakebite_imported = True +from snakebite.client import Client, HAClient, Namenode, AutoConfigClient +except ImportError: +snakebite_imported = False + +from airflow.exceptions import AirflowException + + +class HDFSHookException(AirflowException): +pass + + +[docs]class HDFSHook(BaseHook): + +Interact with HDFS. This class is a wrapper around the snakebite library. + +:param hdfs_conn_id: Connection id to fetch connection info +:type conn_id: string +:param proxy_user: effective user for HDFS operations +:type proxy_user: string +:param autoconfig: use snakebites automatically configured client +:type autoconfig: bool + +def __init__(self, hdfs_conn_id=hdfs_default, proxy_user=None, + autoconfig=False): +if not snakebite_imported: +raise ImportError( +This HDFSHook implementation requires snakebite, but +snakebite is not compatible with Python 3 +(as of August 2015). Please use Python 2 if you require +this hook -- or help by submitting a PR!) +self.hdfs_conn_id = hdfs_conn_id +self.proxy_user = proxy_user +self.autoconfig = autoconfig + +[docs] def get_conn(self): + +Returns a snakebite HDFSClient object. + +# When using HAClient, proxy_user must be the same, so is ok to always +# take the first. +effective_user = self.proxy_user +autoconfig = self.autoconfig +use_sasl = configuration.conf.get(core, security) == kerberos + +try: +connections = self.get_connections(self.hdfs_conn_id) + +if not effective_user: +effective_user = connections[0].login +if not autoconfig: +autoconfig = connections[0].extra_dejson.get(autoconfig, + False) +hdfs_namenode_principal = connections[0].extra_dejson.get( +hdfs_namenode_principal) +except AirflowException: +if not autoconfig: +raise + +if autoconfig: +# will read config info from $HADOOP_HOME conf files +client = AutoConfigClient(effective_user=effective_user, +
[34/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/databricks_operator.html -- diff --git a/_modules/airflow/contrib/operators/databricks_operator.html b/_modules/airflow/contrib/operators/databricks_operator.html index cb38349..c39f769 100644 --- a/_modules/airflow/contrib/operators/databricks_operator.html +++ b/_modules/airflow/contrib/operators/databricks_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,17 +171,22 @@ Source code for airflow.contrib.operators.databricks_operator # -*- coding: utf-8 -*- # -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. # import six @@ -190,6 +197,10 @@ from airflow.models import BaseOperator +XCOM_RUN_ID_KEY = run_id +XCOM_RUN_PAGE_URL_KEY = run_page_url + + [docs]class DatabricksSubmitRunOperator(BaseOperator): Submits an Spark job run to Databricks using the @@ -307,6 +318,8 @@ :param databricks_retry_limit: Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1. :type databricks_retry_limit: int +:param do_xcom_push: Whether we should push run_id and run_page_url to xcom. +:type do_xcom_push: boolean # Used in airflow.models.BaseOperator template_fields = (json,) @@ -327,6 +340,7 @@ databricks_conn_id=databricks_default, polling_period_seconds=30, databricks_retry_limit=3, +do_xcom_push=False, **kwargs): Creates a new ``DatabricksSubmitRunOperator``. @@ -356,6 +370,7 @@ self.json = self._deep_string_coerce(self.json) # This variable will be used in case our task gets killed. self.run_id = None +self.do_xcom_push = do_xcom_push def _deep_string_coerce(self, content, json_path=json): @@ -393,8 +408,12 @@ def execute(self, context): hook = self.get_hook() self.run_id = hook.submit_run(self.json) -run_page_url = hook.get_run_page_url(self.run_id) +if self.do_xcom_push: +context[ti].xcom_push(key=XCOM_RUN_ID_KEY, value=self.run_id) self.log.info(Run submitted with run_id: %s, self.run_id) +run_page_url = hook.get_run_page_url(self.run_id) +if self.do_xcom_push: +context[ti].xcom_push(key=XCOM_RUN_PAGE_URL_KEY, value=run_page_url) self._log_run_page_url(run_page_url) while True: run_state = hook.get_run_state(self.run_id) http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/dataflow_operator.html -- diff --git a/_modules/airflow/contrib/operators/dataflow_operator.html b/_modules/airflow/contrib/operators/dataflow_operator.html index 6807e24..d5d4041 100644 --- a/_modules/airflow/contrib/operators/dataflow_operator.html +++ b/_modules/airflow/contrib/operators/dataflow_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -169,25 +171,31 @@ Source code for
[05/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/slack_operator.html -- diff --git a/_modules/airflow/operators/slack_operator.html b/_modules/airflow/operators/slack_operator.html new file mode 100644 index 000..40152da --- /dev/null +++ b/_modules/airflow/operators/slack_operator.html @@ -0,0 +1,366 @@ + + + + + + + + + + + airflow.operators.slack_operator Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.operators.slack_operator + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.operators.slack_operator +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import json + +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults +from airflow.hooks.slack_hook import SlackHook +from airflow.exceptions import AirflowException + + +[docs]class SlackAPIOperator(BaseOperator): + +Base Slack Operator +The SlackAPIPostOperator is derived from this operator. +In the future additional Slack API Operators will be derived from this class as well + +:param slack_conn_id: Slack connection ID which its password is Slack API token +:type slack_conn_id: string +:param token: Slack API token (https://api.slack.com/web) +:type token: string +:param method: The Slack API Method to Call (https://api.slack.com/methods) +:type method: string +:param api_params: API Method call parameters (https://api.slack.com/methods) +:type api_params: dict + + +@apply_defaults +def __init__(self, + slack_conn_id=None, + token=None, + method=None, + api_params=None, + *args, **kwargs): +super(SlackAPIOperator, self).__init__(*args, **kwargs) + +if token is None and slack_conn_id is None: +raise AirflowException(No valid Slack token nor slack_conn_id supplied.) +if token is not None and slack_conn_id is not None: +raise AirflowException(Cannot determine Slack credential when both token and slack_conn_id are supplied.) + +self.token = token +self.slack_conn_id = slack_conn_id + +self.method = method +self.api_params = api_params + +[docs] def construct_api_call_params(self): + +Used by the execute function. Allows templating on the source fields of the api_call_params dict before construction + +Override in child classes. +Each SlackAPIOperator child class is responsible for having a construct_api_call_params function +which sets self.api_call_params with a dict of API call parameters (https://api.slack.com/methods) + + +pass + +[docs] def execute(self, **kwargs): + +SlackAPIOperator calls will not fail even if the call is not unsuccessful. +It should not prevent a DAG from completing in success + +if not self.api_params: +
[06/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/operators/sensors.html -- diff --git a/_modules/airflow/operators/sensors.html b/_modules/airflow/operators/sensors.html deleted file mode 100644 index 7414daf..000 --- a/_modules/airflow/operators/sensors.html +++ /dev/null @@ -1,929 +0,0 @@ - - - - - - - - - - - airflow.operators.sensors Airflow Documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Airflow - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Project -License -Quick Start -Installation -Tutorial -Configuration -UI / Screenshots -Concepts -Data Profiling -Command Line Interface -Scheduling Triggers -Plugins -Security -Experimental Rest API -Integration -FAQ -API Reference - - - - - - - - - - - - - - - Airflow - - - - - - - - - - - - - - - - - - - - - - - - - - - - Docs - - Module code - - airflow.operators.sensors - - - - - - - - - - - - - - http://schema.org/Article;> - - - Source code for airflow.operators.sensors -# -*- coding: utf-8 -*- -# -# Licensed under the Apache License, Version 2.0 (the License); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an AS IS BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import print_function -from future import standard_library - -from airflow.utils.log.logging_mixin import LoggingMixin - -standard_library.install_aliases() -from builtins import str -from past.builtins import basestring - -from datetime import datetime -from urllib.parse import urlparse -from time import sleep -import re -import sys - -from airflow import settings -from airflow.exceptions import AirflowException, AirflowSensorTimeout, AirflowSkipException -from airflow.models import BaseOperator, TaskInstance -from airflow.hooks.base_hook import BaseHook -from airflow.hooks.hdfs_hook import HDFSHook -from airflow.hooks.http_hook import HttpHook -from airflow.utils.state import State -from airflow.utils.decorators import apply_defaults - - -[docs]class BaseSensorOperator(BaseOperator): - -Sensor operators are derived from this class an inherit these attributes. - -Sensor operators keep executing at a time interval and succeed when -a criteria is met and fail if and when they time out. - -:param soft_fail: Set to true to mark the task as SKIPPED on failure -:type soft_fail: bool -:param poke_interval: Time in seconds that the job should wait in -between each tries -:type poke_interval: int -:param timeout: Time, in seconds before the task times out and fails. -:type timeout: int - -ui_color = #e6f1f2 - -@apply_defaults -def __init__( -self, -poke_interval=60, -timeout=60*60*24*7, -soft_fail=False, -*args, **kwargs): -super(BaseSensorOperator, self).__init__(*args, **kwargs) -self.poke_interval = poke_interval -self.soft_fail = soft_fail -self.timeout = timeout - -def poke(self, context): - -Function that the sensors defined while deriving this class should -override. - -raise AirflowException(Override me.) - -def execute(self, context): -started_at = datetime.utcnow() -while not self.poke(context): -if (datetime.utcnow() - started_at).total_seconds() self.timeout: -if self.soft_fail: -raise AirflowSkipException(Snap. Time is OUT.) -else: -raise AirflowSensorTimeout(Snap. Time is OUT.) -sleep(self.poke_interval) -self.log.info(Success criteria met. Exiting.) - - -class SqlSensor(BaseSensorOperator): - -Runs a sql statement until a criteria is met. It will keep trying while -sql returns no row, or if the first cell in (0, 0, ). - -:param conn_id: The connection to
[02/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/sensors/web_hdfs_sensor.html -- diff --git a/_modules/airflow/sensors/web_hdfs_sensor.html b/_modules/airflow/sensors/web_hdfs_sensor.html new file mode 100644 index 000..32711ea --- /dev/null +++ b/_modules/airflow/sensors/web_hdfs_sensor.html @@ -0,0 +1,279 @@ + + + + + + + + + + + airflow.sensors.web_hdfs_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.sensors.web_hdfs_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.sensors.web_hdfs_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class WebHdfsSensor(BaseSensorOperator): + +Waits for a file or folder to land in HDFS + +template_fields = (filepath,) + +@apply_defaults +def __init__(self, + filepath, + webhdfs_conn_id=webhdfs_default, + *args, + **kwargs): +super(WebHdfsSensor, self).__init__(*args, **kwargs) +self.filepath = filepath +self.webhdfs_conn_id = webhdfs_conn_id + +def poke(self, context): +from airflow.hooks.webhdfs_hook import WebHDFSHook +c = WebHDFSHook(self.webhdfs_conn_id) +self.log.info(Poking for file {self.filepath}.format(**locals())) +return c.check_for_path(hdfs_path=self.filepath) + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/bash_operator.html -- diff --git a/_modules/bash_operator.html b/_modules/bash_operator.html deleted file mode 100644 index 3b6f9ef..000 --- a/_modules/bash_operator.html +++ /dev/null @@ -1,350 +0,0 @@ - - - - - - - - - - - bash_operator Airflow Documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Airflow - - - - - - -
[28/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/mlengine_operator.html -- diff --git a/_modules/airflow/contrib/operators/mlengine_operator.html b/_modules/airflow/contrib/operators/mlengine_operator.html index b322476..c7743c2 100644 --- a/_modules/airflow/contrib/operators/mlengine_operator.html +++ b/_modules/airflow/contrib/operators/mlengine_operator.html @@ -91,7 +91,7 @@ Quick Start Installation Tutorial -Configuration +How-to Guides UI / Screenshots Concepts Data Profiling @@ -99,8 +99,10 @@ Scheduling Triggers Plugins Security +Time zones Experimental Rest API Integration +Lineage FAQ API Reference @@ -184,77 +186,17 @@ # limitations under the License. import re -from airflow import settings +from apiclient import errors + from airflow.contrib.hooks.gcp_mlengine_hook import MLEngineHook from airflow.exceptions import AirflowException from airflow.operators import BaseOperator from airflow.utils.decorators import apply_defaults -from apiclient import errors - from airflow.utils.log.logging_mixin import LoggingMixin log = LoggingMixin().log -def _create_prediction_input(project_id, - region, - data_format, - input_paths, - output_path, - model_name=None, - version_name=None, - uri=None, - max_worker_count=None, - runtime_version=None): - -Create the batch prediction input from the given parameters. - -Args: -A subset of arguments documented in __init__ method of class -MLEngineBatchPredictionOperator - -Returns: -A dictionary representing the predictionInput object as documented -in https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs. - -Raises: -ValueError: if a unique model/version origin cannot be determined. - -prediction_input = { -dataFormat: data_format, -inputPaths: input_paths, -outputPath: output_path, -region: region -} - -if uri: -if model_name or version_name: -log.error( -Ambiguous model origin: Both uri and model/version name are provided. -) -raise ValueError(Ambiguous model origin.) -prediction_input[uri] = uri -elif model_name: -origin_name = projects/{}/models/{}.format(project_id, model_name) -if not version_name: -prediction_input[modelName] = origin_name -else: -prediction_input[versionName] = \ -origin_name + /versions/{}.format(version_name) -else: -log.error( -Missing model origin: Batch prediction expects a model, -a model version combination, or a URI to savedModel.) -raise ValueError(Missing model origin.) - -if max_worker_count: -prediction_input[maxWorkerCount] = max_worker_count -if runtime_version: -prediction_input[runtimeVersion] = runtime_version - -return prediction_input - - def _normalize_mlengine_job_id(job_id): Replaces invalid MLEngine job_id characters with _. @@ -268,10 +210,27 @@ Returns: A valid job_id representation. -match = re.search(r\d, job_id) + +# Add a prefix when a job_id starts with a digit or a template +match = re.search(r\d|\{{2}, job_id) if match and match.start() is 0: -job_id = z_{}.format(job_id) -return re.sub([^0-9a-zA-Z]+, _, job_id) +job = z_{}.format(job_id) +else: +job = job_id + +# Clean up bad characters except templates +tracker = 0 +cleansed_job_id = +for m in re.finditer(r\{{2}.+?\}{2}, job): +cleansed_job_id += re.sub(r[^0-9a-zA-Z]+, _, + job[tracker:m.start()]) +cleansed_job_id += job[m.start():m.end()] +tracker = m.end() + +# Clean up last substring or the full string if no templates +cleansed_job_id += re.sub(r[^0-9a-zA-Z]+, _, job[tracker:]) + +return cleansed_job_id [docs]class MLEngineBatchPredictionOperator(BaseOperator): @@ -289,14 +248,20 @@ In options 2 and 3, both model and version name should contain the minimal identifier. For instance, call + +:: + MLEngineBatchPredictionOperator( ..., model_name=my_model, version_name=my_version, ...) + if the desired model version is projects/my_project/models/my_model/versions/my_version. +See https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs +for further documentation on the parameters. :param project_id: The Google Cloud project
[03/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/sensors/named_hive_partition_sensor.html -- diff --git a/_modules/airflow/sensors/named_hive_partition_sensor.html b/_modules/airflow/sensors/named_hive_partition_sensor.html new file mode 100644 index 000..b199984 --- /dev/null +++ b/_modules/airflow/sensors/named_hive_partition_sensor.html @@ -0,0 +1,339 @@ + + + + + + + + + + + airflow.sensors.named_hive_partition_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.sensors.named_hive_partition_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.sensors.named_hive_partition_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from past.builtins import basestring + +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class NamedHivePartitionSensor(BaseSensorOperator): + +Waits for a set of partitions to show up in Hive. + +:param partition_names: List of fully qualified names of the +partitions to wait for. A fully qualified name is of the +form ``schema.table/pk1=pv1/pk2=pv2``, for example, +default.users/ds=2016-01-01. This is passed as is to the metastore +Thrift client ``get_partitions_by_name`` method. Note that +you cannot use logical or comparison operators as in +HivePartitionSensor. +:type partition_names: list of strings +:param metastore_conn_id: reference to the metastore thrift service +connection id +:type metastore_conn_id: str + + +template_fields = (partition_names,) +ui_color = #8d99ae + +@apply_defaults +def __init__(self, + partition_names, + metastore_conn_id=metastore_default, + poke_interval=60 * 3, + hook=None, + *args, + **kwargs): +super(NamedHivePartitionSensor, self).__init__( +poke_interval=poke_interval, *args, **kwargs) + +if isinstance(partition_names, basestring): +raise TypeError(partition_names must be an array of strings) + +self.metastore_conn_id = metastore_conn_id +self.partition_names = partition_names +self.hook = hook +if self.hook and metastore_conn_id != metastore_default: +self.log.warning(A hook was passed but a non default + metastore_conn_id= + {} was used.format(metastore_conn_id)) + +@staticmethod +def parse_partition_name(partition): +first_split = partition.split(., 1) +if len(first_split) == 1: +schema = default +table_partition = max(first_split) # poor man first +else: +schema, table_partition = first_split +second_split = table_partition.split(/, 1) +if len(second_split) == 1: +raise
[15/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/hooks/oracle_hook.html -- diff --git a/_modules/airflow/hooks/oracle_hook.html b/_modules/airflow/hooks/oracle_hook.html new file mode 100644 index 000..3683348 --- /dev/null +++ b/_modules/airflow/hooks/oracle_hook.html @@ -0,0 +1,382 @@ + + + + + + + + + + + airflow.hooks.oracle_hook Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.hooks.oracle_hook + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.hooks.oracle_hook +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import cx_Oracle + +from airflow.hooks.dbapi_hook import DbApiHook +from builtins import str +from past.builtins import basestring +from datetime import datetime +import numpy + + +[docs]class OracleHook(DbApiHook): + +Interact with Oracle SQL. + +conn_name_attr = oracle_conn_id +default_conn_name = oracle_default +supports_autocommit = False + +[docs] def get_conn(self): + +Returns a oracle connection object +Optional parameters for using a custom DSN connection (instead of using a server alias from tnsnames.ora) +The dsn (data source name) is the TNS entry (from the Oracle names server or tnsnames.ora file) +or is a string like the one returned from makedsn(). + +:param dsn: the host address for the Oracle server +:param service_name: the db_unique_name of the database that you are connecting to (CONNECT_DATA part of TNS) +You can set these parameters in the extra fields of your connection +as in ``{ dsn:some.host.address , service_name:some.service.name }`` + +conn = self.get_connection(self.oracle_conn_id) +dsn = conn.extra_dejson.get(dsn, None) +sid = conn.extra_dejson.get(sid, None) +mod = conn.extra_dejson.get(module, None) + +service_name = conn.extra_dejson.get(service_name, None) +if dsn and sid and not service_name: +dsn = cx_Oracle.makedsn(dsn, conn.port, sid) +conn = cx_Oracle.connect(conn.login, conn.password, dsn=dsn) +elif dsn and service_name and not sid: +dsn = cx_Oracle.makedsn(dsn, conn.port, service_name=service_name) +conn = cx_Oracle.connect(conn.login, conn.password, dsn=dsn) +else: +conn = cx_Oracle.connect(conn.login, conn.password, conn.host) + +if mod is not None: +conn.module = mod + +return conn + +[docs] def insert_rows(self, table, rows, target_fields=None, commit_every=1000): + +A generic way to insert a set of tuples into a table, +the whole set of inserts is treated as one transaction +Changes from standard DbApiHook implementation: +- Oracle SQL queries in cx_Oracle can not be terminated with a semicolon (;) +- Replace NaN values with NULL using numpy.nan_to_num (not using is_nan() because of input types
[22/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html -- diff --git a/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html b/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html new file mode 100644 index 000..16a47e3 --- /dev/null +++ b/_modules/airflow/contrib/sensors/aws_redshift_cluster_sensor.html @@ -0,0 +1,287 @@ + + + + + + + + + + + airflow.contrib.sensors.aws_redshift_cluster_sensor Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.sensors.aws_redshift_cluster_sensor + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.sensors.aws_redshift_cluster_sensor +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.contrib.hooks.redshift_hook import RedshiftHook +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils.decorators import apply_defaults + + +[docs]class AwsRedshiftClusterSensor(BaseSensorOperator): + +Waits for a Redshift cluster to reach a specific status. + +:param cluster_identifier: The identifier for the cluster being pinged. +:type cluster_identifier: str +:param target_status: The cluster status desired. +:type target_status: str + +template_fields = (cluster_identifier, target_status) + +@apply_defaults +def __init__(self, + cluster_identifier, + target_status=available, + aws_conn_id=aws_default, + *args, + **kwargs): +super(AwsRedshiftClusterSensor, self).__init__(*args, **kwargs) +self.cluster_identifier = cluster_identifier +self.target_status = target_status +self.aws_conn_id = aws_conn_id + +def poke(self, context): +self.log.info(Poking for status : {self.target_status}\n + for cluster {self.cluster_identifier}.format(**locals())) +hook = RedshiftHook(aws_conn_id=self.aws_conn_id) +return hook.cluster_status(self.cluster_identifier) == self.target_status + + + + + + + + + + + + + + + + + + Built with http://sphinx-doc.org/;>Sphinx using a https://github.com/snide/sphinx_rtd_theme;>theme provided by https://readthedocs.org;>Read the Docs. + + + + + + + + + + + + + + + +var DOCUMENTATION_OPTIONS = { +URL_ROOT:'../../../../', +VERSION:'', +COLLAPSE_INDEX:false, +FILE_SUFFIX:'.html', +HAS_SOURCE: true, +SOURCELINK_SUFFIX: '.txt' +}; + + + + + + + + + + + + + + + + jQuery(function () { + SphinxRtdTheme.StickyNav.enable(); + }); + + + + + \ No newline at end of file
[27/51] [partial] incubator-airflow-site git commit: 1.10.0
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/11437c14/_modules/airflow/contrib/operators/mysql_to_gcs.html -- diff --git a/_modules/airflow/contrib/operators/mysql_to_gcs.html b/_modules/airflow/contrib/operators/mysql_to_gcs.html new file mode 100644 index 000..47346f5 --- /dev/null +++ b/_modules/airflow/contrib/operators/mysql_to_gcs.html @@ -0,0 +1,524 @@ + + + + + + + + + + + airflow.contrib.operators.mysql_to_gcs Airflow Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Project +License +Quick Start +Installation +Tutorial +How-to Guides +UI / Screenshots +Concepts +Data Profiling +Command Line Interface +Scheduling Triggers +Plugins +Security +Time zones +Experimental Rest API +Integration +Lineage +FAQ +API Reference + + + + + + + + + + + + + + + Airflow + + + + + + + + + + + + + + + + + + + + + + + + + + + + Docs + + Module code + + airflow.contrib.operators.mysql_to_gcs + + + + + + + + + + + + + + http://schema.org/Article;> + + + Source code for airflow.contrib.operators.mysql_to_gcs +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# License); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import sys +import json +import time +import base64 + +from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook +from airflow.hooks.mysql_hook import MySqlHook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults +from datetime import date, datetime +from decimal import Decimal +from MySQLdb.constants import FIELD_TYPE +from tempfile import NamedTemporaryFile +from six import string_types + +PY3 = sys.version_info[0] == 3 + + +[docs]class MySqlToGoogleCloudStorageOperator(BaseOperator): + +Copy data from MySQL to Google cloud storage in JSON format. + +template_fields = (sql, bucket, filename, schema_filename, schema) +template_ext = (.sql,) +ui_color = #a0e08c + +@apply_defaults +def __init__(self, + sql, + bucket, + filename, + schema_filename=None, + approx_max_file_size_bytes=19, + mysql_conn_id=mysql_default, + google_cloud_storage_conn_id=google_cloud_default, + schema=None, + delegate_to=None, + *args, + **kwargs): + +:param sql: The SQL to execute on the MySQL table. +:type sql: string +:param bucket: The bucket to upload to. +:type bucket: string +:param filename: The filename to use as the object name when uploading +to Google cloud storage. A {} should be specified in the filename +to allow the operator to inject file numbers in cases where the +file is split due to size. +:type filename: string +:param schema_filename: If set, the filename to use as the object name +when uploading a .json file containing the BigQuery schema fields +for the table that was dumped from MySQL. +:type schema_filename: string +:param approx_max_file_size_bytes: This operator supports the ability +to split large table dumps into multiple files (see notes in the +filenamed param docs above). Google cloud storage allows for files +to be a maximum of 4GB. This param allows developers to specify the +file size of the splits. +
[GitHub] kaxil removed a comment on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
kaxil removed a comment on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811#issuecomment-416267000 @Fokko Hey, hope you had a long weekend, would you be able to upload the 1.10.0 artifacts to PyPi? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
kaxil commented on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811#issuecomment-416267713 @Fokko Let me know if you are busy, I will upload then :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
kaxil closed pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/jobs.py b/tests/jobs.py index fd3a96a4d8..f9c07b96c9 100644 --- a/tests/jobs.py +++ b/tests/jobs.py @@ -61,6 +61,7 @@ from airflow import configuration configuration.load_test_config() +logger = logging.getLogger(__name__) try: from unittest import mock @@ -194,23 +195,21 @@ def test_backfill_multi_dates(self): def test_backfill_examples(self): """ Test backfilling example dags -""" -# some DAGs really are just examples... but try to make them work! -skip_dags = [ -'example_http_operator', -'example_twitter_dag', -'example_trigger_target_dag', -'example_trigger_controller_dag', # tested above -'test_utils', # sleeps forever -'example_kubernetes_executor', # requires kubernetes cluster -'example_kubernetes_operator' # requires kubernetes cluster -] +Try to backfill some of the example dags. Be carefull, not all dags are suitable +for doing this. For example, a dag that sleeps forever, or does not have a +schedule won't work here since you simply can't backfill them. +""" +include_dags = { +'example_branch_operator', +'example_bash_operator', +'example_skip_dag', +'latest_only' +} -logger = logging.getLogger('BackfillJobTest.test_backfill_examples') dags = [ dag for dag in self.dagbag.dags.values() -if 'example_dags' in dag.full_filepath and dag.dag_id not in skip_dags +if 'example_dags' in dag.full_filepath and dag.dag_id in include_dags ] for dag in dags: @@ -218,6 +217,11 @@ def test_backfill_examples(self): start_date=DEFAULT_DATE, end_date=DEFAULT_DATE) +# Make sure that we have the dags that we want to test available +# in the example_dags folder, if this assertion fails, one of the +# dags in the include_dags array isn't available anymore +self.assertEqual(len(include_dags), len(dags)) + for i, dag in enumerate(sorted(dags, key=lambda d: d.dag_id)): logger.info('*** Running example DAG #{}: {}'.format(i, dag.dag_id)) job = BackfillJob( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
kaxil commented on issue #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811#issuecomment-416267000 @Fokko Hey, hope you had a long weekend, would you be able to upload the 1.10.0 artifacts to PyPi? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2965) Add CLI command to find the next dag run.
[ https://issues.apache.org/jira/browse/AIRFLOW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated AIRFLOW-2965: -- Description: I have a dag with the following properties: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 1 * * *', max_active_runs=1, catchup=False){code} This runs great. Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) Now it's 2018-08-27 17:55 I decided to change my dag to: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 23 * * *', max_active_runs=1, catchup=False){code} Now, I have no idea when will be the next dag run. Will it be today at 23:00? I can't be sure when the cycle is complete. I'm not even sure that this change will do what I wish. I'm sure you guys are expert and you can answer this question but most of us wouldn't know. The scheduler has the knowledge when the dag is available for running. All I'm asking is to take that knowledge and create a CLI command that I will give the dag_id and it will tell me the next date/hour which my dag will be runnable. was: I have a dag with the following properties: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 1 * * *', max_active_runs=1, catchup=False){code} This runs great. Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) Now it's 2018-08-27 17:55 I decided to change my dag to: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 23 * * *', max_active_runs=1, catchup=False){code} Now, I have no idea if the dag when will be the next run. Will it be today at 23:00? I can't be sure when the cycle is complete. I'm sure you guys are expert and you can answer this question but most of us wouldn't know. The scheduler has the knowledge when the dag is available for running. All I'm asking is to take that knowledge and create a CLI command that I will give the dag_id and it will tell me the next date/hour which my dag will be runnable. > Add CLI command to find the next dag run. > - > > Key: AIRFLOW-2965 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2965 > Project: Apache Airflow > Issue Type: Task >Reporter: jack >Priority: Minor > Fix For: 1.10.1 > > > I have a dag with the following properties: > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 1 * * *', > max_active_runs=1, > catchup=False){code} > > > This runs great. > Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) > > Now it's 2018-08-27 17:55 I decided to change my dag to: > > {code:java} > dag = DAG( > dag_id='mydag', > default_args=args, > schedule_interval='0 23 * * *', > max_active_runs=1, > catchup=False){code} > > Now, I have no idea when will be the next dag run. > Will it be today at 23:00? I can't be sure when the cycle is complete. I'm > not even sure that this change will do what I wish. > I'm sure you guys are expert and you can answer this question but most of us > wouldn't know. > > The scheduler has the knowledge when the dag is available for running. All > I'm asking is to take that knowledge and create a CLI command that I will > give the dag_id and it will tell me the next date/hour which my dag will be > runnable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2965) Add CLI command to find the next dag run.
jack created AIRFLOW-2965: - Summary: Add CLI command to find the next dag run. Key: AIRFLOW-2965 URL: https://issues.apache.org/jira/browse/AIRFLOW-2965 Project: Apache Airflow Issue Type: Task Reporter: jack Fix For: 1.10.1 I have a dag with the following properties: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 1 * * *', max_active_runs=1, catchup=False){code} This runs great. Last run is: 2018-08-26 01:00 (start date is 2018-08-27 01:00) Now it's 2018-08-27 17:55 I decided to change my dag to: {code:java} dag = DAG( dag_id='mydag', default_args=args, schedule_interval='0 23 * * *', max_active_runs=1, catchup=False){code} Now, I have no idea if the dag when will be the next run. Will it be today at 23:00? I can't be sure when the cycle is complete. I'm sure you guys are expert and you can answer this question but most of us wouldn't know. The scheduler has the knowledge when the dag is available for running. All I'm asking is to take that knowledge and create a CLI command that I will give the dag_id and it will tell me the next date/hour which my dag will be runnable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on a change in pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
Fokko commented on a change in pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811#discussion_r213001968 ## File path: tests/jobs.py ## @@ -194,30 +195,32 @@ def test_backfill_multi_dates(self): def test_backfill_examples(self): """ Test backfilling example dags -""" -# some DAGs really are just examples... but try to make them work! -skip_dags = [ -'example_http_operator', -'example_twitter_dag', -'example_trigger_target_dag', -'example_trigger_controller_dag', # tested above -'test_utils', # sleeps forever -'example_kubernetes_executor', # requires kubernetes cluster -'example_kubernetes_operator' # requires kubernetes cluster -] +Try to backfill some of the example dags. Be carefull, not all dags are suitable +for doing this. For example, a dag that sleeps forever, or does not have a +schedule won't work here since you simply can't backfill them. +""" +include_dags = { +'example_branch_operator', +'example_bash_operator', +'example_skip_dag', +'latest_only' +} -logger = logging.getLogger('BackfillJobTest.test_backfill_examples') dags = [ dag for dag in self.dagbag.dags.values() -if 'example_dags' in dag.full_filepath and dag.dag_id not in skip_dags +if 'example_dags' in dag.full_filepath and dag.dag_id in include_dags ] for dag in dags: dag.clear( start_date=DEFAULT_DATE, end_date=DEFAULT_DATE) +# Make sure that we have all dags the dags that we want to test available Review comment: Thanks, fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test
kaxil commented on a change in pull request #3811: [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test URL: https://github.com/apache/incubator-airflow/pull/3811#discussion_r212984027 ## File path: tests/jobs.py ## @@ -194,30 +195,32 @@ def test_backfill_multi_dates(self): def test_backfill_examples(self): """ Test backfilling example dags -""" -# some DAGs really are just examples... but try to make them work! -skip_dags = [ -'example_http_operator', -'example_twitter_dag', -'example_trigger_target_dag', -'example_trigger_controller_dag', # tested above -'test_utils', # sleeps forever -'example_kubernetes_executor', # requires kubernetes cluster -'example_kubernetes_operator' # requires kubernetes cluster -] +Try to backfill some of the example dags. Be carefull, not all dags are suitable +for doing this. For example, a dag that sleeps forever, or does not have a +schedule won't work here since you simply can't backfill them. +""" +include_dags = { +'example_branch_operator', +'example_bash_operator', +'example_skip_dag', +'latest_only' +} -logger = logging.getLogger('BackfillJobTest.test_backfill_examples') dags = [ dag for dag in self.dagbag.dags.values() -if 'example_dags' in dag.full_filepath and dag.dag_id not in skip_dags +if 'example_dags' in dag.full_filepath and dag.dag_id in include_dags ] for dag in dags: dag.clear( start_date=DEFAULT_DATE, end_date=DEFAULT_DATE) +# Make sure that we have all dags the dags that we want to test available Review comment: Nit: `all dags the dags that we want to` to `all the dags that we want to` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] verdan commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues
verdan commented on issue #3656: [AIRFLOW-2803] Fix all ESLint issues URL: https://github.com/apache/incubator-airflow/pull/3656#issuecomment-416202515 looks great!! I believe this one is ready to be finalized. nice work @tedmiston This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3790: [AIRFLOW-2994] Fix command status check in Qubole Check operator
codecov-io edited a comment on issue #3790: [AIRFLOW-2994] Fix command status check in Qubole Check operator URL: https://github.com/apache/incubator-airflow/pull/3790#issuecomment-416196863 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=h1) Report > Merging [#3790](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/6afd38d79b2439ca5e3bb349c5d2007ad09aa27f?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3790/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=tree) ```diff @@Coverage Diff@@ ## master #3790 +/- ## = - Coverage 77.41% 77.4% -0.01% = Files 203 203 Lines 15810 15810 = - Hits12239 12238 -1 - Misses 35713572 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3790/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=footer). Last update [6afd38d...2750e5d](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3790: [AIRFLOW-2994] Fix command status check in Qubole Check operator
codecov-io commented on issue #3790: [AIRFLOW-2994] Fix command status check in Qubole Check operator URL: https://github.com/apache/incubator-airflow/pull/3790#issuecomment-416196863 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=h1) Report > Merging [#3790](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/6afd38d79b2439ca5e3bb349c5d2007ad09aa27f?src=pr=desc) will **decrease** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3790/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=tree) ```diff @@Coverage Diff@@ ## master #3790 +/- ## = - Coverage 77.41% 77.4% -0.01% = Files 203 203 Lines 15810 15810 = - Hits12239 12238 -1 - Misses 35713572 +1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3790/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.74% <0%> (-0.05%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=footer). Last update [6afd38d...2750e5d](https://codecov.io/gh/apache/incubator-airflow/pull/3790?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated AIRFLOW-2964: Description: When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the {{json}} parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the {{python_callable}} parameter in the {{PythonOperator}}. In this way, users could pass a function that would generate the job description only when the operator is actually executed. [~andrewmchen] Is there any other way to do this? If not, does this sound reasonable? I can create a PR implementing this proposal. was: When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the {{json}} parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the {{python_callable}} parameter in the {{PythonOperator}}. In this way, users could pass a function that would generate the job description only when the operator is actually executed. > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator}} users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. > [~andrewmchen] Is there any other way to do this? If not, does this sound > reasonable? I can create a PR implementing this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated AIRFLOW-2964: Description: When instantiating a {{DatabricksSubmitRunOperator }}users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the {{json}} parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the {{python_callable}} parameter in the {{PythonOperator}}. In this way, users could pass a function that would generate the job description only when the operator is actually executed. was: When instantiating a DatabricksSubmitRunOperator users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed. > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a {{DatabricksSubmitRunOperator }}users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the {{json}} parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the {{python_callable}} > parameter in the {{PythonOperator}}. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2964) Lazy generation of the job description
[ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Jimenez updated AIRFLOW-2964: Description: When instantiating a DatabricksSubmitRunOperator users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed. was: When instantiating a `DatabricksSubmitRunOperator` users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed. > Lazy generation of the job description > -- > > Key: AIRFLOW-2964 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Affects Versions: Airflow 1.9.0, 1.10 >Reporter: Victor Jimenez >Priority: Major > > When instantiating a DatabricksSubmitRunOperator users need to pass the > description of the job that will later be executed on Databricks. > The job description is only needed at execution time (when the hook is > called). However, the `json` parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > It would be good to have an equivalent mechanism to the `python_callable` > parameter in the `PythonOperator`. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2964) Lazy generation of the job description
Victor Jimenez created AIRFLOW-2964: --- Summary: Lazy generation of the job description Key: AIRFLOW-2964 URL: https://issues.apache.org/jira/browse/AIRFLOW-2964 Project: Apache Airflow Issue Type: Improvement Components: contrib, operators Affects Versions: Airflow 1.9.0, 1.10 Reporter: Victor Jimenez When instantiating a `DatabricksSubmitRunOperator` users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI
[ https://issues.apache.org/jira/browse/AIRFLOW-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonardo de Campos Almeida updated AIRFLOW-2963: Fix Version/s: (was: 1.9.0) > Error parsing AIRFLOW_CONN_ URI > --- > > Key: AIRFLOW-2963 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2963 > Project: Apache Airflow > Issue Type: Bug > Components: boto3, configuration >Affects Versions: 1.9.0 >Reporter: Leonardo de Campos Almeida >Priority: Minor > Labels: easyfix > > I'm using the environment variable AIRFLOW_CONN_ to define my connection to > AWS, but my AWS secret access key has a slash on it. > e.g.: > {code:java} > s3://login:pass/word@bucket > {code} > The problem is that the method *BaseHook._get_connection_from_env* doesn't > accept this URI as a valid URI. When it finds the / it is assuming that the > path starts there, so it is returning: > * host: login > * port: pass > * path: word > And ignoring the rest, so I get an error, because pass is not a valid port > number. > So, I tried to pass the URI quoted > {code:java} > s3://login:pass%2Fword@bucker > {code} > But them, the values are not being unquoted correctly, and the AwsHook is > trying to use pass%2Fword as the secret access key. > I took a look at the method that parses the URI, and it is only unquoting > the host, manually. > {code:java} > def parse_from_uri(self, uri): > temp_uri = urlparse(uri) > hostname = temp_uri.hostname or '' > if '%2f' in hostname: > hostname = hostname.replace('%2f', '/').replace('%2F', '/') > conn_type = temp_uri.scheme > if conn_type == 'postgresql': > conn_type = 'postgres' > self.conn_type = conn_type > self.host = hostname > self.schema = temp_uri.path[1:] > self.login = temp_uri.username > self.password = temp_uri.password > self.port = temp_uri.port > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2963) Error parsing AIRFLOW_CONN_ URI
Leonardo de Campos Almeida created AIRFLOW-2963: --- Summary: Error parsing AIRFLOW_CONN_ URI Key: AIRFLOW-2963 URL: https://issues.apache.org/jira/browse/AIRFLOW-2963 Project: Apache Airflow Issue Type: Bug Components: boto3, configuration Affects Versions: 1.9.0 Reporter: Leonardo de Campos Almeida Fix For: 1.9.0 I'm using the environment variable AIRFLOW_CONN_ to define my connection to AWS, but my AWS secret access key has a slash on it. e.g.: {code:java} s3://login:pass/word@bucket {code} The problem is that the method *BaseHook._get_connection_from_env* doesn't accept this URI as a valid URI. When it finds the / it is assuming that the path starts there, so it is returning: * host: login * port: pass * path: word And ignoring the rest, so I get an error, because pass is not a valid port number. So, I tried to pass the URI quoted {code:java} s3://login:pass%2Fword@bucker {code} But them, the values are not being unquoted correctly, and the AwsHook is trying to use pass%2Fword as the secret access key. I took a look at the method that parses the URI, and it is only unquoting the host, manually. {code:java} def parse_from_uri(self, uri): temp_uri = urlparse(uri) hostname = temp_uri.hostname or '' if '%2f' in hostname: hostname = hostname.replace('%2f', '/').replace('%2F', '/') conn_type = temp_uri.scheme if conn_type == 'postgresql': conn_type = 'postgres' self.conn_type = conn_type self.host = hostname self.schema = temp_uri.path[1:] self.login = temp_uri.username self.password = temp_uri.password self.port = temp_uri.port {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG commented on issue #3809: [AIRFLOW-2959] Make HTTPSensor doc clearer
XD-DENG commented on issue #3809: [AIRFLOW-2959] Make HTTPSensor doc clearer URL: https://github.com/apache/incubator-airflow/pull/3809#issuecomment-416164634 Thanks @msumit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services