[GitHub] [airflow] mik-laj opened a new pull request #5817: [AIRFLOW-5148] Add Google Analytics to the Airflow doc website (#5763)

2019-08-14 Thread GitBox
mik-laj opened a new pull request #5817: [AIRFLOW-5148] Add Google Analytics to 
the Airflow doc website (#5763)
URL: https://github.com/apache/airflow/pull/5817
 
 
   # The discussion begins in PR: https://github.com/apache/airflow/pull/5763
   
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   > Asked by @aijamalnk 
   > 
   > Note from her:
   > 
   > > I've looked at Google Analytics for the Airflow site, and I noticed that:
   > > -The https://airflow.readthedocs.io/en/latest/ site has the GA code set 
up.
   > > - The https://airflow.apache.org site does NOT have the GA code set up.
   > > So the data that we're getting on GA is not complete. 
   > > It would be really helpful to fix it soon, before we start revamping the 
website to understand the changes user behavior (I am signing a contract with a 
vendor next week)
   > 
   
   The previous PR has been merged and reverted, so I create PR again with the 
same change to discuss it. Any comment on PR #5763 sends an email to all the 
committers across all Apache projects, so please do not comment. @kaxil have 
locked down the conversation but that would still allow anyone with write 
access to the repo to be able to comment on it.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] aijamalnk edited a comment on issue #5763: [AIRFLOW-5148] Add Google Analytics to the Airflow doc website

2019-08-14 Thread GitBox
aijamalnk edited a comment on issue #5763: [AIRFLOW-5148] Add Google Analytics 
to the Airflow doc website
URL: https://github.com/apache/airflow/pull/5763#issuecomment-520946607
 
 
   Hi @mik-laj , 
   As Kaxil said, it was my request and I can answer all of your questions. 
   
   First of all, no one at Google (except for me, but I am wearing by Apache & 
Airflow hat here) has access to GA. And be assured I used some screenshots only 
(about # of visitors per day)  without any personally identifying info to make 
the case to get funding for the website development from Google. These 
statistics are valuable because we are investing in the website and we want to 
make sure that new UX/UI works and brings value to Airflow users. It is also 
important for improving the most valuable pages of documentation (including 
measuring the effects from Season of Docs effort) and see how user behavior 
changes throughout time. Without GA, i don't think we can have a similar amount 
of fine-grained information around pageviews, navigation of the website, and 
time spent on each page (not counting also geographical interest, which can 
help us fund meetups around the world).
   
   Besides me, Sid, Kaxil and Ash have currently access to GA, all of them are 
Project Management Committee and someone from Apache as you say. But I am also 
sending email to add everyone in the PMC. 
   
   About GDPR it seems we can comply with it if we don't track any User login 
info (which we don't) and so we don't strictly have to have any notice [1]. The 
practice in other Apache Projects is to add the following notice to our website:
   
   -
   WEBSITE USAGE PRIVACY POLICY
   Information about your use of this website is collected using server access 
logs and a tracking cookie. The collected information consists of the following:
   - The IP address from which you access the website;
   - The type of browser and operating system you use to access our site;
   - The date and time you access our site;
   - The pages you visit; and
   - The addresses of pages from where you followed a link to our site.
   Part of this information is gathered using a tracking cookie set by the 
Google Analytics service and handled by Google as described in their privacy 
policy. See your browser documentation for instructions on how to disable the 
cookie if you prefer not to share this data with Google.
   We use the gathered information to help us make our site more useful to 
visitors and to better understand how and when our site is used. We do not 
track or collect personally identifiable information or associate gathered data 
with any personally identifying information from other sources.
   By using this website, you consent to the collection of this data in the 
manner and for the purpose described above.
   The ASF welcomes your questions or comments regarding this Privacy Policy. 
Send them to d...@airflow.apache.org
   -
   And lastly, i strongly -1 the alternative that you suggested. I don't see a 
point of abandoning the 'standard' web analytics tool that we can configure to 
be GRDP compliant and using far less popular project that personally I don't 
know how to use and have no bandwidth to study. 
   
   [1] https://www.cookiebot.com/en/google-analytics-gdpr/
   [2] https://www.apache.org/foundation/policies/privacy.html
   [3] https://activemq.apache.org/privacy-policy.html
   [4] https://mahout.apache.org/general/privacy-policy
   
   ---
   
   # Please stops comment on this PR.
   # Discussion moved to https://github.com/apache/airflow/pull/5817


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] aijamalnk edited a comment on issue #5763: [AIRFLOW-5148] Add Google Analytics to the Airflow doc website

2019-08-14 Thread GitBox
aijamalnk edited a comment on issue #5763: [AIRFLOW-5148] Add Google Analytics 
to the Airflow doc website
URL: https://github.com/apache/airflow/pull/5763#issuecomment-520946607
 
 
   Hi @mik-laj , 
   As Kaxil said, it was my request and I can answer all of your questions. 
   
   First of all, no one at Google (except for me, but I am wearing by Apache & 
Airflow hat here) has access to GA. And be assured I used some screenshots only 
(about # of visitors per day)  without any personally identifying info to make 
the case to get funding for the website development from Google. These 
statistics are valuable because we are investing in the website and we want to 
make sure that new UX/UI works and brings value to Airflow users. It is also 
important for improving the most valuable pages of documentation (including 
measuring the effects from Season of Docs effort) and see how user behavior 
changes throughout time. Without GA, i don't think we can have a similar amount 
of fine-grained information around pageviews, navigation of the website, and 
time spent on each page (not counting also geographical interest, which can 
help us fund meetups around the world).
   
   Besides me, Sid, Kaxil and Ash have currently access to GA, all of them are 
Project Management Committee and someone from Apache as you say. But I am also 
sending email to add everyone in the PMC. 
   
   About GDPR it seems we can comply with it if we don't track any User login 
info (which we don't) and so we don't strictly have to have any notice [1]. The 
practice in other Apache Projects is to add the following notice to our website:
   
   -
   WEBSITE USAGE PRIVACY POLICY
   Information about your use of this website is collected using server access 
logs and a tracking cookie. The collected information consists of the following:
   - The IP address from which you access the website;
   - The type of browser and operating system you use to access our site;
   - The date and time you access our site;
   - The pages you visit; and
   - The addresses of pages from where you followed a link to our site.
   Part of this information is gathered using a tracking cookie set by the 
Google Analytics service and handled by Google as described in their privacy 
policy. See your browser documentation for instructions on how to disable the 
cookie if you prefer not to share this data with Google.
   We use the gathered information to help us make our site more useful to 
visitors and to better understand how and when our site is used. We do not 
track or collect personally identifiable information or associate gathered data 
with any personally identifying information from other sources.
   By using this website, you consent to the collection of this data in the 
manner and for the purpose described above.
   The ASF welcomes your questions or comments regarding this Privacy Policy. 
Send them to d...@airflow.apache.org
   -
   And lastly, i strongly -1 the alternative that you suggested. I don't see a 
point of abandoning the 'standard' web analytics tool that we can configure to 
be GRDP compliant and using far less popular project that personally I don't 
know how to use and have no bandwidth to study. 
   
   [1] https://www.cookiebot.com/en/google-analytics-gdpr/
   [2] https://www.apache.org/foundation/policies/privacy.html
   [3] https://activemq.apache.org/privacy-policy.html
   [4] https://mahout.apache.org/general/privacy-policy
   ---
   # Please stops comment on this PR.
   # Discussion moved to https://github.com/apache/airflow/pull/5817


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5148) Add Google Analytics to the Airflow doc website

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906971#comment-16906971
 ] 

ASF GitHub Bot commented on AIRFLOW-5148:
-

mik-laj commented on pull request #5817: [AIRFLOW-5148] Add Google Analytics to 
the Airflow doc website (#5763)
URL: https://github.com/apache/airflow/pull/5817
 
 
   # The discussion begins in PR: https://github.com/apache/airflow/pull/5763
   
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   > Asked by @aijamalnk 
   > 
   > Note from her:
   > 
   > > I've looked at Google Analytics for the Airflow site, and I noticed that:
   > > -The https://airflow.readthedocs.io/en/latest/ site has the GA code set 
up.
   > > - The https://airflow.apache.org site does NOT have the GA code set up.
   > > So the data that we're getting on GA is not complete. 
   > > It would be really helpful to fix it soon, before we start revamping the 
website to understand the changes user behavior (I am signing a contract with a 
vendor next week)
   > 
   
   The previous PR has been merged and reverted, so I create PR again with the 
same change to discuss it. Any comment on PR #5763 sends an email to all the 
committers across all Apache projects, so please do not comment. @kaxil have 
locked down the conversation but that would still allow anyone with write 
access to the repo to be able to comment on it.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Google Analytics to the Airflow doc website
> ---
>
> Key: AIRFLOW-5148
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5148
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: documentation
>Affects Versions: 1.10.2, 1.10.3, 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Asked by [~aizhamal] 
> Note from her:
> {noformat}
> I've looked at Google Analytics for the Airflow site, and I noticed that:
> -The https://airflow.readthedocs.io/en/latest/ site has the GA code set up.
> - The https://airflow.apache.org site does NOT have the GA code set up.
> So the data that we're getting on GA is not complete. 
> It would be really helpful to fix it soon, before we start revamping the 
> website to understand the changes user behavior (I am signing a contract with 
> a vendor next week)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] tkaymak commented on a change in pull request #5685: [AIRFLOW-5072] gcs_hook's download() method should download only once

2019-08-14 Thread GitBox
tkaymak commented on a change in pull request #5685: [AIRFLOW-5072] gcs_hook's 
download() method should download only once
URL: https://github.com/apache/airflow/pull/5685#discussion_r313721113
 
 

 ##
 File path: airflow/contrib/hooks/gcs_hook.py
 ##
 @@ -172,8 +172,9 @@ def download(self, bucket_name, object_name, 
filename=None):
 if filename:
 blob.download_to_filename(filename)
 self.log.info('File downloaded to %s', filename)
-
-return blob.download_as_string()
+return filename
 
 Review comment:
   @mik-laj I've updated the docstring


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] haoliang7 commented on issue #5495: [AIRFLOW-4858] Deprecate "Historical convenience functions" in conf

2019-08-14 Thread GitBox
haoliang7 commented on issue #5495: [AIRFLOW-4858] Deprecate "Historical 
convenience functions" in conf
URL: https://github.com/apache/airflow/pull/5495#issuecomment-521128165
 
 
   @ashb The old config methods are removed. Please review my last commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5789: [AIRFLOW-4222] Add cli autocomplete for bash & zsh

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5789: [AIRFLOW-4222] Add cli 
autocomplete for bash & zsh
URL: https://github.com/apache/airflow/pull/5789#discussion_r313771624
 
 

 ##
 File path: docs/howto/cli-completion.rst
 ##
 @@ -0,0 +1,42 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Set Up Bash/Zsh Completion
+==
+
+When using bash (or ``zsh``) as your shell, ``airflow`` can use
+`argcomplete `_ for auto-completion.
+
+For global activation of all argcomplete enabled python applications run:
+
+.. code-block:: bash
+
+  sudo activate-global-python-argcomplete
+
+For permanent (but not global) airflow activation, use:
+
+.. code-block:: bash
+
+  register-python-argcomplete airflow >> ~/.bashrc
+
+For one-time activation of argcomplete for airflow only, use:
+
+.. code-block:: bash
+
+  eval "$(register-python-argcomplete airflow)"
 
 Review comment:
   Oh that damn logging whenever we `import airflow` :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj merged pull request #5546: [AIRFLOW-4908] BigQuery Hooks/Operators for update_dataset, patch_dataset, get_dataset

2019-08-14 Thread GitBox
mik-laj merged pull request #5546: [AIRFLOW-4908] BigQuery Hooks/Operators for 
update_dataset, patch_dataset, get_dataset
URL: https://github.com/apache/airflow/pull/5546
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-4908) Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and get_dataset

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907068#comment-16907068
 ] 

ASF subversion and git services commented on AIRFLOW-4908:
--

Commit 09b9610bee9f49270a9f3add6b45c1c6437c1914 in airflow's branch 
refs/heads/master from Ryan Yuan
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=09b9610 ]

[AIRFLOW-4908] Implement BigQuery Hooks/Operators for update_dataset, 
patch_dataset and get_dataset (#5546)

Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and 
get_dataset

> Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and 
> get_dataset
> 
>
> Key: AIRFLOW-4908
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4908
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Critical
>
> To create a BigQuery sink for GCP Stackdriver Logging, I have to assign 
> `WRITER` access to group `cloud-l...@google.com` to access BQ dataset. 
> However, current BigQueryHook doesn't support updating/patching dataset.
> Reference: 
> [https://googleapis.github.io/google-cloud-python/latest/logging/usage.html#export-to-bigquery]
> Implement GCP Stackdriver Logging: 
> https://issues.apache.org/jira/browse/AIRFLOW-4779
> While it is missing update_dataset and patch_dataset, BigQueryHook has 
> get_dataset but it doesn't have operator for it.
>  
> Features to be implemented:
> BigQueryBaseCursor.patch_dataset
> BigQueryBaseCursor.update_dataset
> BigQueryPatchDatasetOperator
> BigQueryUpdateDatasetOperator
> BigQueryGetDatasetOperator



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-4908) Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and get_dataset

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907067#comment-16907067
 ] 

ASF GitHub Bot commented on AIRFLOW-4908:
-

mik-laj commented on pull request #5546: [AIRFLOW-4908] BigQuery 
Hooks/Operators for update_dataset, patch_dataset, get_dataset
URL: https://github.com/apache/airflow/pull/5546
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and 
> get_dataset
> 
>
> Key: AIRFLOW-4908
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4908
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Critical
>
> To create a BigQuery sink for GCP Stackdriver Logging, I have to assign 
> `WRITER` access to group `cloud-l...@google.com` to access BQ dataset. 
> However, current BigQueryHook doesn't support updating/patching dataset.
> Reference: 
> [https://googleapis.github.io/google-cloud-python/latest/logging/usage.html#export-to-bigquery]
> Implement GCP Stackdriver Logging: 
> https://issues.apache.org/jira/browse/AIRFLOW-4779
> While it is missing update_dataset and patch_dataset, BigQueryHook has 
> get_dataset but it doesn't have operator for it.
>  
> Features to be implemented:
> BigQueryBaseCursor.patch_dataset
> BigQueryBaseCursor.update_dataset
> BigQueryPatchDatasetOperator
> BigQueryUpdateDatasetOperator
> BigQueryGetDatasetOperator



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5212) Getting the error "ERROR - Failed to bag_dag"

2019-08-14 Thread Theepan Subramani (JIRA)
Theepan Subramani created AIRFLOW-5212:
--

 Summary: Getting the error "ERROR - Failed to bag_dag"
 Key: AIRFLOW-5212
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5212
 Project: Apache Airflow
  Issue Type: Bug
  Components: DagRun
Affects Versions: 1.10.2
Reporter: Theepan Subramani
 Attachments: sand.py





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] BasPH commented on a change in pull request #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
BasPH commented on a change in pull request #5815: [AIRFLOW-5210] Make finding 
template files more efficient
URL: https://github.com/apache/airflow/pull/5815#discussion_r313817805
 
 

 ##
 File path: airflow/models/baseoperator.py
 ##
 @@ -717,26 +717,27 @@ def prepare_template(self):
 
 def resolve_template_files(self):
 # Getting the content of files for template_field / template_ext
-for attr in self.template_fields:
-content = getattr(self, attr, None)
-if content is None:
-continue
-elif isinstance(content, str) and \
-any([content.endswith(ext) for ext in self.template_ext]):
-env = self.get_template_env()
-try:
-setattr(self, attr, env.loader.get_source(env, content)[0])
-except Exception as e:
-self.log.exception(e)
-elif isinstance(content, list):
-env = self.dag.get_template_env()
-for i in range(len(content)):
-if isinstance(content[i], str) and \
-any([content[i].endswith(ext) for ext in 
self.template_ext]):
-try:
-content[i] = env.loader.get_source(env, 
content[i])[0]
-except Exception as e:
-self.log.exception(e)
+if self.template_ext:
 
 Review comment:
   The way I read it, this method iterates over all `template_fields` and if it 
finds a field value with an extension listed in `template_exts`, runs 
`env.loader.get_source(...)` and sets the result on the specific field.
   
   It indeed scans over all `template_fields` even if `template_exts` is empty, 
which doesn't make sense so I think this is a valid change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] acordiner commented on issue #5489: [AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator)

2019-08-14 Thread GitBox
acordiner commented on issue #5489: [AIRFLOW-4843] Allow orchestration via 
Docker Swarm (SwarmOperator)
URL: https://github.com/apache/airflow/pull/5489#issuecomment-521226654
 
 
   This is great! Any chance to add the ability to pass extra arguments to the 
TaskTemplate and ContainerSpec? For example, it would be handy to be able to 
specify labels and placement constraints.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dossett commented on issue #5419: [AIRFLOW-XXXX] Update pydoc of mlengine_operator

2019-08-14 Thread GitBox
dossett commented on issue #5419: [AIRFLOW-] Update pydoc of 
mlengine_operator
URL: https://github.com/apache/airflow/pull/5419#issuecomment-521474769
 
 
   Thanks @mik-laj, comment updated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r314102570
 
 

 ##
 File path: airflow/models/serialized_dag.py
 ##
 @@ -0,0 +1,155 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Serialzed DAG table in database."""
+
+import hashlib
+from typing import Any, Dict, List, Optional, TYPE_CHECKING
+from sqlalchemy import Column, Index, Integer, String, Text, and_
+from sqlalchemy.sql import exists
+
+from airflow.models.base import Base, ID_LEN
+from airflow.utils import db, timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+
+
+if TYPE_CHECKING:
+from airflow.dag.serialization.serialized_dag import SerializedDAG  # 
noqa: F401, E501; # pylint: disable=cyclic-import
+from airflow.models import DAG  # noqa: F401; # pylint: 
disable=cyclic-import
+
+
+class SerializedDagModel(Base):
+"""A table for serialized DAGs.
+
+serialized_dag table is a snapshot of DAG files synchronized by scheduler.
+This feature is controlled by:
+[core] dagcached = False: enable this feature
+[core] dagcached_min_update_interval = 30 (s):
+serialized DAGs are updated in DB when a file gets processed by 
scheduler,
+to reduce DB write rate, there is a minimal interval of updating 
serialized DAGs.
+[scheduler] dag_dir_list_interval = 300 (s):
+interval of deleting serialized DAGs in DB when the files are 
deleted, suggest
+to use a smaller interval such as 60
+
+It is used by webserver to load dagbags when dagcached=True. Because 
reading from
+database is lightweight compared to importing from files, it solves the 
webserver
+scalability issue.
+"""
+__tablename__ = 'serialized_dag'
+
+dag_id = Column(String(ID_LEN), primary_key=True)
+fileloc = Column(String(2000))
+# The max length of fileloc exceeds the limit of indexing.
+fileloc_hash = Column(Integer)
+data = Column(Text)
+last_updated = Column(UtcDateTime)
+
+__table_args__ = (
+Index('idx_fileloc_hash', fileloc_hash, unique=False),
+)
+
+def __init__(self, dag):
+from airflow.dag.serialization import Serialization
+
+self.dag_id = dag.dag_id
+self.fileloc = dag.full_filepath
+self.fileloc_hash = SerializedDagModel.dag_fileloc_hash(self.fileloc)
+self.data = Serialization.to_json(dag)
 
 Review comment:
   > Either here, or inside to_json we should ensure that the JSON blob is 
valid - I want to minimize the chance of writing "odd"/invalid data in to our 
DB.
   
   Done in 
https://github.com/apache/airflow/pull/5743/commits/977a2fe3fd244bc3f366a1228324f8b3c58f30ac
 . WDYT - is that OK?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] pgagnon opened a new pull request #5824: [AIRFLOW-5215] Add sidecar containers support to Pod class

2019-08-14 Thread GitBox
pgagnon opened a new pull request #5824: [AIRFLOW-5215] Add sidecar containers 
support to Pod class
URL: https://github.com/apache/airflow/pull/5824
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds a `sidecar_containers` argument to `Pod`, allowing users to pass a list 
of sidecar container definitions to add to the Pod. This is notably useful with 
the pod mutation hook.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   - `test_extract_sidecar_containers`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   `Pod`'s docstring is currently not up to date. Will address in a subsequent 
PR.
   
   ### Code Quality
   
   - [X] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] pgagnon commented on issue #5824: [AIRFLOW-5215] Add sidecar containers support to Pod class

2019-08-14 Thread GitBox
pgagnon commented on issue #5824: [AIRFLOW-5215] Add sidecar containers support 
to Pod class
URL: https://github.com/apache/airflow/pull/5824#issuecomment-521460427
 
 
   @ashb @dimberman 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5215) Add sidecar container support to Pod object

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907695#comment-16907695
 ] 

ASF GitHub Bot commented on AIRFLOW-5215:
-

pgagnon commented on pull request #5824: [AIRFLOW-5215] Add sidecar containers 
support to Pod class
URL: https://github.com/apache/airflow/pull/5824
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds a `sidecar_containers` argument to `Pod`, allowing users to pass a list 
of sidecar container definitions to add to the Pod. This is notably useful with 
the pod mutation hook.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   - `test_extract_sidecar_containers`.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   `Pod`'s docstring is currently not up to date. Will address in a subsequent 
PR.
   
   ### Code Quality
   
   - [X] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add sidecar container support to Pod object
> ---
>
> Key: AIRFLOW-5215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5215
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.0.0
>Reporter: Philippe Gagnon
>Assignee: Philippe Gagnon
>Priority: Major
>
> Add sidecar container support to Pod object.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] darrenleeweber opened a new pull request #5825: [AIRFLOW-5218] less polling for AWS Batch status

2019-08-14 Thread GitBox
darrenleeweber opened a new pull request #5825: [AIRFLOW-5218] less polling for 
AWS Batch status
URL: https://github.com/apache/airflow/pull/5825
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow Jira]
   - https://issues.apache.org/jira/browse/AIRFLOW-5218
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - a small increase in the backoff factor could avoid excessive polling
   - avoid the AWS API throttle limits for highly concurrent tasks
   
   ### Tests
   
   - [ ] My PR does not need testing for this extremely good reason:
   - it's the smallest possible change that might address the issue
   - the change does not impact any public API
   - if there are tests on the polling interval (or should be), LMK
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines
   - it's just one commit
   - the commit message is succinct, LMK if you want it amended
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - no changes required to documentation
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907774#comment-16907774
 ] 

Darren Weber commented on AIRFLOW-5218:
---

There is something weird in the polling logs.  The timestamps in the logs 
indicate that the retry polling interval is not what it says it will be, e.g. 
it reports the retry attempt count as the number of seconds (it's not).
{noformat}
[2019-08-15 02:33:57,163] {awsbatch_operator.py:103} INFO - AWS Batch Job 
started: ...
[2019-08-15 02:33:57,166] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 0 seconds
[2019-08-15 02:33:58,284] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 1 seconds
[2019-08-15 02:33:59,412] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 2 seconds 
[2019-08-15 02:34:00,568] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 3 seconds 
[2019-08-15 02:34:01,866] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 4 seconds 
[2019-08-15 02:34:03,140] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 5 seconds 
[2019-08-15 02:34:04,695] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 6 seconds 
[2019-08-15 02:34:06,165] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 7 seconds 
[2019-08-15 02:34:07,764] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 8 seconds 
[2019-08-15 02:34:09,514] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 9 seconds
[2019-08-15 02:34:11,440] {awsbatch_operator.py:137} INFO - AWS Batch retry in 
the next 10 seconds
{noformat}

> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Assignee: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also [https://github.com/broadinstitute/cromwell/issues/4303]
> This is a curious case of premature optimization. So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job. Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> {code:java}
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
>  Out[4]: 
>  [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]{code}
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up. 
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help. Since batch jobs tend 
> to be long-running jobs (rather than near-real time jobs), it might help to 
> issue less frequent polls when it's in the RUNNING state. Something on the 
> order of 10's seconds might be reasonable for batch jobs? Maybe the class 
> could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk merged pull request #5777: [AIRFLOW-5161] Static checks are run automatically in pre-commit hooks

2019-08-14 Thread GitBox
potiuk merged pull request #5777: [AIRFLOW-5161] Static checks are run 
automatically in pre-commit hooks
URL: https://github.com/apache/airflow/pull/5777
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5161) Add pre-commit hooks to run static checks for only changed files

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907710#comment-16907710
 ] 

ASF subversion and git services commented on AIRFLOW-5161:
--

Commit 70e937a8d8ff308a9fb9055ceb7ef2c034200b36 in airflow's branch 
refs/heads/master from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=70e937a ]

[AIRFLOW-5161] Static checks are run automatically in pre-commit hooks (#5777)



> Add pre-commit hooks to run static checks for only changed files
> 
>
> Key: AIRFLOW-5161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5161) Add pre-commit hooks to run static checks for only changed files

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907709#comment-16907709
 ] 

ASF GitHub Bot commented on AIRFLOW-5161:
-

potiuk commented on pull request #5777: [AIRFLOW-5161] Static checks are run 
automatically in pre-commit hooks
URL: https://github.com/apache/airflow/pull/5777
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add pre-commit hooks to run static checks for only changed files
> 
>
> Key: AIRFLOW-5161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907749#comment-16907749
 ] 

Darren Weber edited comment on AIRFLOW-5218 at 8/15/19 2:04 AM:


Even bumping the backoff factor from `0.1` to `0.3` might help, e.g.
{code:java}
from datetime import datetime
from time import sleep

for retries in range(10):
pause = 1 + pow(retries * 0.3, 2)
print(f"{datetime.now()}: retry ({retries:04d}) sleeping for {pause:6.2f} 
sec")
sleep(pause)

2019-08-14 19:02:58.745923: retry () sleeping for 1.00 sec
2019-08-14 19:02:59.747635: retry (0001) sleeping for 1.09 sec
2019-08-14 19:03:00.840129: retry (0002) sleeping for 1.36 sec
2019-08-14 19:03:02.202734: retry (0003) sleeping for 1.81 sec
2019-08-14 19:03:04.015686: retry (0004) sleeping for 2.44 sec
2019-08-14 19:03:06.458972: retry (0005) sleeping for 3.25 sec
2019-08-14 19:03:09.713452: retry (0006) sleeping for 4.24 sec
2019-08-14 19:03:13.954253: retry (0007) sleeping for 5.41 sec
2019-08-14 19:03:19.368445: retry (0008) sleeping for 6.76 sec
2019-08-14 19:03:26.135600: retry (0009) sleeping for 8.29 sec

{code}


was (Author: dazza):
Even bumping the backoff factor from `0.1` to `0.3` might help, e.g.
{code}
from datetime import datetime
from time import sleep

In [18]: for i in [1 + pow(retries * 0.3, 2) for retries in range(10)]: 
...: print(f"{datetime.now()}: sleeping for {i}") 
...: sleep(i) 
...:

  
2019-08-14 18:52:01.688705: sleeping for 1.0
2019-08-14 18:52:02.690385: sleeping for 1.09
2019-08-14 18:52:03.781384: sleeping for 1.3599
2019-08-14 18:52:05.144492: sleeping for 1.8098
2019-08-14 18:52:06.956547: sleeping for 2.44
2019-08-14 18:52:09.401454: sleeping for 3.25
2019-08-14 18:52:12.652212: sleeping for 4.239
2019-08-14 18:52:16.897060: sleeping for 5.41
2019-08-14 18:52:22.313692: sleeping for 6.76
2019-08-14 18:52:29.082087: sleeping for 8.29
{code}

> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also [https://github.com/broadinstitute/cromwell/issues/4303]
> This is a curious case of premature optimization. So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job. Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> {code:java}
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
>  Out[4]: 
>  [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]{code}
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up. 
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help. Since batch jobs tend 
> to be long-running jobs (rather than near-real time jobs), it might help to 
> issue less frequent polls when it's in the RUNNING state. Something on the 
> order of 10's seconds might be reasonable for batch jobs? Maybe the class 
> could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5219) Alarm if the task is not executed within the expected time range.

2019-08-14 Thread huangyan (JIRA)
huangyan created AIRFLOW-5219:
-

 Summary: Alarm if the task is not executed within the expected 
time range.
 Key: AIRFLOW-5219
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5219
 Project: Apache Airflow
  Issue Type: New Feature
  Components: DAG
Affects Versions: 1.10.4
Reporter: huangyan
Assignee: huangyan
 Fix For: 1.10.4


When using Airflow, user has an expected time range for the task. Beyond this 
range, the user expects to get an alert instead of performing the task 
directly. 

They may not want the task to be executed automatically, and then manually 
perform the task after analyzing the cause.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] pgagnon commented on issue #5824: [AIRFLOW-5215] Add sidecar containers support to Pod class

2019-08-14 Thread GitBox
pgagnon commented on issue #5824: [AIRFLOW-5215] Add sidecar containers support 
to Pod class
URL: https://github.com/apache/airflow/pull/5824#issuecomment-521466853
 
 
   Test failure seems unrelated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] derrick-mink-sp opened a new pull request #5826: Sailpoint internal/pod aliases

2019-08-14 Thread GitBox
derrick-mink-sp opened a new pull request #5826: Sailpoint internal/pod aliases
URL: https://github.com/apache/airflow/pull/5826
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow Jira]
 - https://issues.apache.org/jira/browse/AIRFLOW-5221
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
 - This PR will give users the ability to add DNS entries to their 
Kubernetes pods via hostAliases 
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
 tests/minikube/test_kubernetes_pod_operator.py
- test_host_aliases
   ### Commits
   
   - [] My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj merged pull request #5776: [AIRFLOW-XXX] Group references in one section

2019-08-14 Thread GitBox
mik-laj merged pull request #5776: [AIRFLOW-XXX] Group references in one section
URL: https://github.com/apache/airflow/pull/5776
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj opened a new pull request #5823: [AIRFLOW-XXX] Create "Using the CLI" page

2019-08-14 Thread GitBox
mik-laj opened a new pull request #5823: [AIRFLOW-XXX] Create "Using the CLI" 
page
URL: https://github.com/apache/airflow/pull/5823
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5217) Fix Pod docstring

2019-08-14 Thread Philippe Gagnon (JIRA)
Philippe Gagnon created AIRFLOW-5217:


 Summary: Fix Pod docstring
 Key: AIRFLOW-5217
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5217
 Project: Apache Airflow
  Issue Type: Improvement
  Components: executors
Affects Versions: 2.0.0
Reporter: Philippe Gagnon
Assignee: Philippe Gagnon


{{Pod}} class docstring is currently out of date with regards to its 
{{__init__}} method's docstring.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5217) Fix Pod docstring

2019-08-14 Thread Philippe Gagnon (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philippe Gagnon updated AIRFLOW-5217:
-
Description: {{Pod}} class docstring is currently out of date with regards 
to its {{__init__}} method's arguments.  (was: {{Pod}} class docstring is 
currently out of date with regards to its {{__init__}} method's docstring.)

> Fix Pod docstring
> -
>
> Key: AIRFLOW-5217
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5217
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executors
>Affects Versions: 2.0.0
>Reporter: Philippe Gagnon
>Assignee: Philippe Gagnon
>Priority: Minor
>
> {{Pod}} class docstring is currently out of date with regards to its 
> {{__init__}} method's arguments.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] marcusianlevine commented on issue #5811: [AIRFLOW-5207] Fix Mark Success and Failure views

2019-08-14 Thread GitBox
marcusianlevine commented on issue #5811: [AIRFLOW-5207] Fix Mark Success and 
Failure views
URL: https://github.com/apache/airflow/pull/5811#issuecomment-521479241
 
 
   Nevermind, this turned out to be an issue with one of our dynamic DAG plugins


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] marcusianlevine closed pull request #5811: [AIRFLOW-5207] Fix Mark Success and Failure views

2019-08-14 Thread GitBox
marcusianlevine closed pull request #5811: [AIRFLOW-5207] Fix Mark Success and 
Failure views
URL: https://github.com/apache/airflow/pull/5811
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5207) Mark Success and Mark Failed views error out due to DAG reassignment

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907716#comment-16907716
 ] 

ASF GitHub Bot commented on AIRFLOW-5207:
-

marcusianlevine commented on pull request #5811: [AIRFLOW-5207] Fix Mark 
Success and Failure views
URL: https://github.com/apache/airflow/pull/5811
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Mark Success and Mark Failed views error out due to DAG reassignment
> 
>
> Key: AIRFLOW-5207
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5207
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.4
>Reporter: Marcus Levine
>Assignee: Marcus Levine
>Priority: Major
> Fix For: 1.10.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When trying to clear a task after upgrading to 1.10.4, I get the following 
> traceback:
> {code:java}
> File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 
> 1451, in failed future, past, State.FAILED) File 
> "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 1396, in 
> _mark_task_instance_state task.dag = dag File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 
> 509, in dag "The DAG assigned to {} can not be changed.".format(self)) 
> airflow.exceptions.AirflowException: The DAG assigned to 
>  can not be changed.{code}
> This should be a simple fix by either dropping the offending line, or if it 
> is required to keep things working, just set the private attribute instead:
> {code:java}
> task._dag = dag
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (AIRFLOW-5207) Mark Success and Mark Failed views error out due to DAG reassignment

2019-08-14 Thread Marcus Levine (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Levine closed AIRFLOW-5207.
--
Resolution: Not A Problem

This turned out to be an issue with one of our plugins

> Mark Success and Mark Failed views error out due to DAG reassignment
> 
>
> Key: AIRFLOW-5207
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5207
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.4
>Reporter: Marcus Levine
>Assignee: Marcus Levine
>Priority: Major
> Fix For: 1.10.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When trying to clear a task after upgrading to 1.10.4, I get the following 
> traceback:
> {code:java}
> File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 
> 1451, in failed future, past, State.FAILED) File 
> "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 1396, in 
> _mark_task_instance_state task.dag = dag File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 
> 509, in dag "The DAG assigned to {} can not be changed.".format(self)) 
> airflow.exceptions.AirflowException: The DAG assigned to 
>  can not be changed.{code}
> This should be a simple fix by either dropping the offending line, or if it 
> is required to keep things working, just set the private attribute instead:
> {code:java}
> task._dag = dag
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk commented on issue #5786: [AIRFLOW-5170] Fix encoding pragmas, consistent licences for python files and related pylint fixes

2019-08-14 Thread GitBox
potiuk commented on issue #5786:  [AIRFLOW-5170] Fix encoding pragmas, 
consistent licences for python files and related pylint fixes
URL: https://github.com/apache/airflow/pull/5786#issuecomment-521509287
 
 
   @ashb @dimberman @Fokko  -> this is the first additional set of checks (for 
python files) added after merging the pylint/mypy/flake checks in pre-commit. 
It will make our python code much more consistent (and fixes/disables a lot of 
pylint errors). We also have a script that can refresh pylint_todo.txt


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5170) Add static checks for encoding pragma, consistent licences for python files and related pylint fixes

2019-08-14 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk updated AIRFLOW-5170:
--
Description: 
Automated check for encoding pragma, consisten licence files can be added for 
python files.

Since we have pylint checks in pre-commits added we should also make sure to 
fix all pylint related changes however for all the changed python files.

  was:Automated check for encoding pragma can be easily added. Since we have 
pylint checks in pre-commits added we should also make sure to fix all pylint 
related changes however.

Summary: Add static checks for encoding pragma, consistent licences for 
python files and related pylint fixes  (was: Add static checks for encoding 
pragma (and related pylint fixes))

> Add static checks for encoding pragma, consistent licences for python files 
> and related pylint fixes
> 
>
> Key: AIRFLOW-5170
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5170
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Assignee: Jarek Potiuk
>Priority: Major
>
> Automated check for encoding pragma, consisten licence files can be added for 
> python files.
> Since we have pylint checks in pre-commits added we should also make sure to 
> fix all pylint related changes however for all the changed python files.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-3333) New features enable transferring of files or data from GCS to a SFTP remote path and SFTP to GCS path.

2019-08-14 Thread Kamil Bregula (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907620#comment-16907620
 ] 

Kamil Bregula commented on AIRFLOW-:


[~pulinpathneja] Any progress? Maybe I can help in some way.

> New features enable transferring of files or data from GCS to a SFTP remote 
> path and SFTP to GCS path. 
> ---
>
> Key: AIRFLOW-
> URL: https://issues.apache.org/jira/browse/AIRFLOW-
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, gcp
>Reporter: Pulin Pathneja
>Assignee: Pulin Pathneja
>Priority: Major
>
> New features enable transferring of files or data from GCS(Google Cloud 
> Storage) to a SFTP remote path and SFTP to GCS(Google Cloud Storage) path. 
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)
Darren Weber created AIRFLOW-5218:
-

 Summary: AWS Batch Operator - status polling too often, esp. for 
high concurrency
 Key: AIRFLOW-5218
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
 Project: Apache Airflow
  Issue Type: Improvement
  Components: aws, contrib
Affects Versions: 1.10.4
Reporter: Darren Weber


The AWS Batch Operator attempts to use a boto3 feature that is not available 
and has not been merged in years, see

- https://github.com/boto/botocore/pull/1307
- see also https://github.com/broadinstitute/cromwell/issues/4303

This is a curious case of premature optimization.  So, in the meantime, this 
means that the fallback is the exponential backoff routine for the status 
checks on the batch job.  Unfortunately, when the concurrency of Airflow jobs 
is very high (100's of tasks), this fallback polling hits the AWS Batch API too 
hard and the AWS API throttle throws an error, which fails the Airflow task, 
simply because the status is polled too frequently.

Check the output from the retry algorithm, e.g. within the first 10 retries, 
the status of an AWS batch job is checked about 10 times at a rate that is 
approx 1 retry/sec.  When an Airflow instance is running 10's or 100's of 
concurrent batch jobs, this hits the API too frequently and crashes the Airflow 
task (plus it occupies a worker in too much busy work).

In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)]

  
Out[4]: 
[1.0,
 1.01,
 1.04,
 1.09,
 1.1601,
 1.25,
 1.36,
 1.4902,
 1.6401,
 1.81,
 2.0,
 2.21,
 2.4404,
 2.6904,
 2.9604,
 3.25,
 3.5605,
 3.8906,
 4.24,
 4.61]


Possible solutions are to introduce an initial sleep (say 60 sec?) right after 
issuing the request, so that the batch job has some time to spin up.  The job 
progresses through a through phases before it gets to RUNNING state and polling 
for each phase of that sequence might help.  Since batch jobs tend to be 
long-running jobs (rather than near-real time jobs), it might help to issue 
less frequent polls when it's in the RUNNING state.  Something on the order of 
10's seconds might be reasonable for batch jobs?  Maybe the class could expose 
a parameter for the rate of polling (or a callable)?




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907762#comment-16907762
 ] 

ASF GitHub Bot commented on AIRFLOW-5218:
-

darrenleeweber commented on pull request #5825: [AIRFLOW-5218] less polling for 
AWS Batch status
URL: https://github.com/apache/airflow/pull/5825
 
 
   ### Jira
   
   - [x] My PR addresses the following [Airflow Jira]
   - https://issues.apache.org/jira/browse/AIRFLOW-5218
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - a small increase in the backoff factor could avoid excessive polling
   - avoid the AWS API throttle limits for highly concurrent tasks
   
   ### Tests
   
   - [ ] My PR does not need testing for this extremely good reason:
   - it's the smallest possible change that might address the issue
   - the change does not impact any public API
   - if there are tests on the polling interval (or should be), LMK
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines
   - it's just one commit
   - the commit message is succinct, LMK if you want it amended
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - no changes required to documentation
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also [https://github.com/broadinstitute/cromwell/issues/4303]
> This is a curious case of premature optimization. So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job. Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> {code:java}
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
>  Out[4]: 
>  [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]{code}
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up. 
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help. Since batch jobs tend 
> to be long-running jobs (rather than near-real time jobs), it might help to 
> issue less frequent polls when it's in the RUNNING state. Something on the 
> order of 10's seconds might be reasonable for batch jobs? Maybe the class 
> could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907749#comment-16907749
 ] 

Darren Weber edited comment on AIRFLOW-5218 at 8/15/19 2:15 AM:


PR at [https://github.com/apache/airflow/pull/5825] applies the following 
suggestion.

Even bumping the backoff factor from `0.1` to `0.3` might help, e.g.
{code:java}
from datetime import datetime
from time import sleep

for retries in range(10):
pause = 1 + pow(retries * 0.3, 2)
print(f"{datetime.now()}: retry ({retries:04d}) sleeping for {pause:6.2f} 
sec")
sleep(pause)

2019-08-14 19:02:58.745923: retry () sleeping for 1.00 sec
2019-08-14 19:02:59.747635: retry (0001) sleeping for 1.09 sec
2019-08-14 19:03:00.840129: retry (0002) sleeping for 1.36 sec
2019-08-14 19:03:02.202734: retry (0003) sleeping for 1.81 sec
2019-08-14 19:03:04.015686: retry (0004) sleeping for 2.44 sec
2019-08-14 19:03:06.458972: retry (0005) sleeping for 3.25 sec
2019-08-14 19:03:09.713452: retry (0006) sleeping for 4.24 sec
2019-08-14 19:03:13.954253: retry (0007) sleeping for 5.41 sec
2019-08-14 19:03:19.368445: retry (0008) sleeping for 6.76 sec
2019-08-14 19:03:26.135600: retry (0009) sleeping for 8.29 sec

{code}


was (Author: dazza):
Even bumping the backoff factor from `0.1` to `0.3` might help, e.g.
{code:java}
from datetime import datetime
from time import sleep

for retries in range(10):
pause = 1 + pow(retries * 0.3, 2)
print(f"{datetime.now()}: retry ({retries:04d}) sleeping for {pause:6.2f} 
sec")
sleep(pause)

2019-08-14 19:02:58.745923: retry () sleeping for 1.00 sec
2019-08-14 19:02:59.747635: retry (0001) sleeping for 1.09 sec
2019-08-14 19:03:00.840129: retry (0002) sleeping for 1.36 sec
2019-08-14 19:03:02.202734: retry (0003) sleeping for 1.81 sec
2019-08-14 19:03:04.015686: retry (0004) sleeping for 2.44 sec
2019-08-14 19:03:06.458972: retry (0005) sleeping for 3.25 sec
2019-08-14 19:03:09.713452: retry (0006) sleeping for 4.24 sec
2019-08-14 19:03:13.954253: retry (0007) sleeping for 5.41 sec
2019-08-14 19:03:19.368445: retry (0008) sleeping for 6.76 sec
2019-08-14 19:03:26.135600: retry (0009) sleeping for 8.29 sec

{code}

> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also [https://github.com/broadinstitute/cromwell/issues/4303]
> This is a curious case of premature optimization. So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job. Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> {code:java}
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
>  Out[4]: 
>  [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]{code}
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up. 
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help. Since batch jobs tend 
> to be long-running jobs (rather than near-real time jobs), it might help to 
> issue less frequent polls when it's in the RUNNING state. Something on the 
> order of 10's seconds might be reasonable for batch jobs? Maybe the class 
> could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darren Weber reassigned AIRFLOW-5218:
-

Assignee: Darren Weber

> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Assignee: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also [https://github.com/broadinstitute/cromwell/issues/4303]
> This is a curious case of premature optimization. So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job. Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> {code:java}
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
>  Out[4]: 
>  [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]{code}
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up. 
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help. Since batch jobs tend 
> to be long-running jobs (rather than near-real time jobs), it might help to 
> issue less frequent polls when it's in the RUNNING state. Something on the 
> order of 10's seconds might be reasonable for batch jobs? Maybe the class 
> could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907749#comment-16907749
 ] 

Darren Weber commented on AIRFLOW-5218:
---

Even bumping the backoff factor from `0.1` to `0.3` might help, e.g.
{code}
from datetime import datetime
from time import sleep

In [18]: for i in [1 + pow(retries * 0.3, 2) for retries in range(10)]: 
...: print(f"{datetime.now()}: sleeping for {i}") 
...: sleep(i) 
...:

  
2019-08-14 18:52:01.688705: sleeping for 1.0
2019-08-14 18:52:02.690385: sleeping for 1.09
2019-08-14 18:52:03.781384: sleeping for 1.3599
2019-08-14 18:52:05.144492: sleeping for 1.8098
2019-08-14 18:52:06.956547: sleeping for 2.44
2019-08-14 18:52:09.401454: sleeping for 3.25
2019-08-14 18:52:12.652212: sleeping for 4.239
2019-08-14 18:52:16.897060: sleeping for 5.41
2019-08-14 18:52:22.313692: sleeping for 6.76
2019-08-14 18:52:29.082087: sleeping for 8.29
{code}

> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
> - https://github.com/boto/botocore/pull/1307
> - see also https://github.com/broadinstitute/cromwell/issues/4303
> This is a curious case of premature optimization.  So, in the meantime, this 
> means that the fallback is the exponential backoff routine for the status 
> checks on the batch job.  Unfortunately, when the concurrency of Airflow jobs 
> is very high (100's of tasks), this fallback polling hits the AWS Batch API 
> too hard and the AWS API throttle throws an error, which fails the Airflow 
> task, simply because the status is polled too frequently.
> Check the output from the retry algorithm, e.g. within the first 10 retries, 
> the status of an AWS batch job is checked about 10 times at a rate that is 
> approx 1 retry/sec.  When an Airflow instance is running 10's or 100's of 
> concurrent batch jobs, this hits the API too frequently and crashes the 
> Airflow task (plus it occupies a worker in too much busy work).
> In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)]  
>   
>   
> Out[4]: 
> [1.0,
>  1.01,
>  1.04,
>  1.09,
>  1.1601,
>  1.25,
>  1.36,
>  1.4902,
>  1.6401,
>  1.81,
>  2.0,
>  2.21,
>  2.4404,
>  2.6904,
>  2.9604,
>  3.25,
>  3.5605,
>  3.8906,
>  4.24,
>  4.61]
> Possible solutions are to introduce an initial sleep (say 60 sec?) right 
> after issuing the request, so that the batch job has some time to spin up.  
> The job progresses through a through phases before it gets to RUNNING state 
> and polling for each phase of that sequence might help.  Since batch jobs 
> tend to be long-running jobs (rather than near-real time jobs), it might help 
> to issue less frequent polls when it's in the RUNNING state.  Something on 
> the order of 10's seconds might be reasonable for batch jobs?  Maybe the 
> class could expose a parameter for the rate of polling (or a callable)?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darren Weber updated AIRFLOW-5218:
--
Description: 
The AWS Batch Operator attempts to use a boto3 feature that is not available 
and has not been merged in years, see
 - [https://github.com/boto/botocore/pull/1307]
 - see also [https://github.com/broadinstitute/cromwell/issues/4303]

This is a curious case of premature optimization. So, in the meantime, this 
means that the fallback is the exponential backoff routine for the status 
checks on the batch job. Unfortunately, when the concurrency of Airflow jobs is 
very high (100's of tasks), this fallback polling hits the AWS Batch API too 
hard and the AWS API throttle throws an error, which fails the Airflow task, 
simply because the status is polled too frequently.

Check the output from the retry algorithm, e.g. within the first 10 retries, 
the status of an AWS batch job is checked about 10 times at a rate that is 
approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
concurrent batch jobs, this hits the API too frequently and crashes the Airflow 
task (plus it occupies a worker in too much busy work).
{code:java}
In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
 Out[4]: 
 [1.0,
 1.01,
 1.04,
 1.09,
 1.1601,
 1.25,
 1.36,
 1.4902,
 1.6401,
 1.81,
 2.0,
 2.21,
 2.4404,
 2.6904,
 2.9604,
 3.25,
 3.5605,
 3.8906,
 4.24,
 4.61]{code}
Possible solutions are to introduce an initial sleep (say 60 sec?) right after 
issuing the request, so that the batch job has some time to spin up. The job 
progresses through a through phases before it gets to RUNNING state and polling 
for each phase of that sequence might help. Since batch jobs tend to be 
long-running jobs (rather than near-real time jobs), it might help to issue 
less frequent polls when it's in the RUNNING state. Something on the order of 
10's seconds might be reasonable for batch jobs? Maybe the class could expose a 
parameter for the rate of polling (or a callable)?

  was:
The AWS Batch Operator attempts to use a boto3 feature that is not available 
and has not been merged in years, see

- https://github.com/boto/botocore/pull/1307
- see also https://github.com/broadinstitute/cromwell/issues/4303

This is a curious case of premature optimization.  So, in the meantime, this 
means that the fallback is the exponential backoff routine for the status 
checks on the batch job.  Unfortunately, when the concurrency of Airflow jobs 
is very high (100's of tasks), this fallback polling hits the AWS Batch API too 
hard and the AWS API throttle throws an error, which fails the Airflow task, 
simply because the status is polled too frequently.

Check the output from the retry algorithm, e.g. within the first 10 retries, 
the status of an AWS batch job is checked about 10 times at a rate that is 
approx 1 retry/sec.  When an Airflow instance is running 10's or 100's of 
concurrent batch jobs, this hits the API too frequently and crashes the Airflow 
task (plus it occupies a worker in too much busy work).

In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)]

  
Out[4]: 
[1.0,
 1.01,
 1.04,
 1.09,
 1.1601,
 1.25,
 1.36,
 1.4902,
 1.6401,
 1.81,
 2.0,
 2.21,
 2.4404,
 2.6904,
 2.9604,
 3.25,
 3.5605,
 3.8906,
 4.24,
 4.61]


Possible solutions are to introduce an initial sleep (say 60 sec?) right after 
issuing the request, so that the batch job has some time to spin up.  The job 
progresses through a through phases before it gets to RUNNING state and polling 
for each phase of that sequence might help.  Since batch jobs tend to be 
long-running jobs (rather than near-real time jobs), it might help to issue 
less frequent polls when it's in the RUNNING state.  Something on the order of 
10's seconds might be reasonable for batch jobs?  Maybe the class could expose 
a parameter for the rate of polling (or a callable)?



> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged in years, see
>  - [https://github.com/boto/botocore/pull/1307]
>  - see also 

[jira] [Created] (AIRFLOW-5221) Add host alias support to the KubernetesPodOperator

2019-08-14 Thread Derrick Mink (JIRA)
Derrick Mink created AIRFLOW-5221:
-

 Summary: Add host alias support to the KubernetesPodOperator
 Key: AIRFLOW-5221
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5221
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators
Affects Versions: 1.10.4
Reporter: Derrick Mink
Assignee: Derrick Mink


[https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/]

The only wait to manage DNS entries for kubernetes pods is through hosts 
aliases 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5220) Easy form to create airflow dags

2019-08-14 Thread huangyan (JIRA)
huangyan created AIRFLOW-5220:
-

 Summary: Easy form to create airflow dags
 Key: AIRFLOW-5220
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5220
 Project: Apache Airflow
  Issue Type: New Feature
  Components: DAG, database
Affects Versions: 1.10.5
Reporter: huangyan
Assignee: huangyan


The airflow usage threshold is higher and the user must write a Python dag 
file. However, many users don't write Python, they want to create dags directly 
from forms.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk commented on a change in pull request #5807: [AIRFLOW-5204] Shellcheck + common licence in shell files

2019-08-14 Thread GitBox
potiuk commented on a change in pull request #5807:  [AIRFLOW-5204] Shellcheck 
+ common licence in shell files
URL: https://github.com/apache/airflow/pull/5807#discussion_r314178750
 
 

 ##
 File path: airflow/example_dags/entrypoint.sh
 ##
 @@ -1,20 +1,20 @@
-# -*- coding: utf-8 -*-
+#!/usr/bin/env bash
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
 #
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
+#http://www.apache.org/licenses/LICENSE-2.0
 #
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
 
-["/bin/bash", "-c", "/bin/sleep 30; /bin/mv {{params.source_location}}/{{ 
ti.xcom_pull('view_file') }} {{params.target_location}}; /bin/echo 
'{{params.target_location}}/{{ ti.xcom_pull('view_file') }}';"]
+# TODO: Uncomment this code when we start using it
+#[ "/bin/bash", "-c", "/bin/sleep 30; /bin/mv {{params.source_location}}/{{ 
ti.xcom_pull('view_file') }} {{params.target_location}}; /bin/echo 
'{{params.target_location}}/{{ ti.xcom_pull('view_file') }}';" ]  # shellcheck 
disable=SC1073,SC1072,SC1035
 
 Review comment:
   This is a problematic implementation of DockerOperator w/regards to command. 
The command can be either a string or array. It can be templated and it can 
also ba a file with .bash or .sh extension. In this case the python array was 
stored in a file with .sh extension - that was valid from the DockerOPerator 
point of view (see docker_copy_data.py) but it makes little sense to store an 
array in .sh file. Those tests in docker_copy_data.py were anyhow commented out 
with suggestion to uncomment if you want to run your own testing.
   
   Rather than commenting out I simply moved the array to docker_copy_data.py 
and removed the entrypoint.sh now


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on issue #5808: [AIRFLOW-5205] Check xml files with xmllint + Licenses

2019-08-14 Thread GitBox
potiuk commented on issue #5808:  [AIRFLOW-5205] Check xml files with xmllint + 
Licenses
URL: https://github.com/apache/airflow/pull/5808#issuecomment-521523483
 
 
   Made the PR standalone (not depending on series of PRs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on a change in pull request #5808: [AIRFLOW-5205] Check xml files with xmllint + Licenses

2019-08-14 Thread GitBox
potiuk commented on a change in pull request #5808:  [AIRFLOW-5205] Check xml 
files with xmllint + Licenses
URL: https://github.com/apache/airflow/pull/5808#discussion_r314181813
 
 

 ##
 File path: airflow/_vendor/slugify/slugify.py
 ##
 @@ -1,3 +1,6 @@
+# -*- coding: utf-8 -*-
+# pylint: skip-file
+"""Slugify !"""
 
 Review comment:
   Removed in the first commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5161) Add pre-commit hooks to run static checks for only changed files

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907824#comment-16907824
 ] 

ASF subversion and git services commented on AIRFLOW-5161:
--

Commit df4dc31ea109b4a6b832a9d6b3a4d54e1efd6e5a in airflow's branch 
refs/heads/v1-10-test from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=df4dc31 ]

[AIRFLOW-5161] Static checks are run automatically in pre-commit hooks (#5777)


> Add pre-commit hooks to run static checks for only changed files
> 
>
> Key: AIRFLOW-5161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (AIRFLOW-5161) Add pre-commit hooks to run static checks for only changed files

2019-08-14 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk resolved AIRFLOW-5161.
---
   Resolution: Fixed
Fix Version/s: 1.10.5

> Add pre-commit hooks to run static checks for only changed files
> 
>
> Key: AIRFLOW-5161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 1.10.5
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5218) AWS Batch Operator - status polling too often, esp. for high concurrency

2019-08-14 Thread Darren Weber (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darren Weber updated AIRFLOW-5218:
--
Description: 
The AWS Batch Operator attempts to use a boto3 feature that is not available 
and has not been merged in years, see
 - [https://github.com/boto/botocore/pull/1307]
 - see also [https://github.com/broadinstitute/cromwell/issues/4303]

This is a curious case of premature optimization. So, in the meantime, this 
means that the fallback is the exponential backoff routine for the status 
checks on the batch job. Unfortunately, when the concurrency of Airflow jobs is 
very high (100's of tasks), this fallback polling hits the AWS Batch API too 
hard and the AWS API throttle throws an error, which fails the Airflow task, 
simply because the status is polled too frequently.

Check the output from the retry algorithm, e.g. within the first 10 retries, 
the status of an AWS batch job is checked about 10 times at a rate that is 
approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
concurrent batch jobs, this hits the API too frequently and crashes the Airflow 
task (plus it occupies a worker in too much busy work).
{code:java}
In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
 Out[4]: 
 [1.0,
 1.01,
 1.04,
 1.09,
 1.1601,
 1.25,
 1.36,
 1.4902,
 1.6401,
 1.81,
 2.0,
 2.21,
 2.4404,
 2.6904,
 2.9604,
 3.25,
 3.5605,
 3.8906,
 4.24,
 4.61]{code}
Possible solutions are to introduce an initial sleep (say 60 sec?) right after 
issuing the request, so that the batch job has some time to spin up. The job 
progresses through a through phases before it gets to RUNNING state and polling 
for each phase of that sequence might help. Since batch jobs tend to be 
long-running jobs (rather than near-real time jobs), it might help to issue 
less frequent polls when it's in the RUNNING state. Something on the order of 
10's seconds might be reasonable for batch jobs? Maybe the class could expose a 
parameter for the rate of polling (or a callable)?

 

Another option is to use something like the sensor-poke approach, with 
rescheduling, e.g.

- 
[https://github.com/apache/airflow/blob/master/airflow/sensors/base_sensor_operator.py#L117]

 

  was:
The AWS Batch Operator attempts to use a boto3 feature that is not available 
and has not been merged in years, see
 - [https://github.com/boto/botocore/pull/1307]
 - see also [https://github.com/broadinstitute/cromwell/issues/4303]

This is a curious case of premature optimization. So, in the meantime, this 
means that the fallback is the exponential backoff routine for the status 
checks on the batch job. Unfortunately, when the concurrency of Airflow jobs is 
very high (100's of tasks), this fallback polling hits the AWS Batch API too 
hard and the AWS API throttle throws an error, which fails the Airflow task, 
simply because the status is polled too frequently.

Check the output from the retry algorithm, e.g. within the first 10 retries, 
the status of an AWS batch job is checked about 10 times at a rate that is 
approx 1 retry/sec. When an Airflow instance is running 10's or 100's of 
concurrent batch jobs, this hits the API too frequently and crashes the Airflow 
task (plus it occupies a worker in too much busy work).
{code:java}
In [4]: [1 + pow(retries * 0.1, 2) for retries in range(20)] 
 Out[4]: 
 [1.0,
 1.01,
 1.04,
 1.09,
 1.1601,
 1.25,
 1.36,
 1.4902,
 1.6401,
 1.81,
 2.0,
 2.21,
 2.4404,
 2.6904,
 2.9604,
 3.25,
 3.5605,
 3.8906,
 4.24,
 4.61]{code}
Possible solutions are to introduce an initial sleep (say 60 sec?) right after 
issuing the request, so that the batch job has some time to spin up. The job 
progresses through a through phases before it gets to RUNNING state and polling 
for each phase of that sequence might help. Since batch jobs tend to be 
long-running jobs (rather than near-real time jobs), it might help to issue 
less frequent polls when it's in the RUNNING state. Something on the order of 
10's seconds might be reasonable for batch jobs? Maybe the class could expose a 
parameter for the rate of polling (or a callable)?


> AWS Batch Operator - status polling too often, esp. for high concurrency
> 
>
> Key: AIRFLOW-5218
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5218
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Affects Versions: 1.10.4
>Reporter: Darren Weber
>Assignee: Darren Weber
>Priority: Major
>
> The AWS Batch Operator attempts to use a boto3 feature that is not available 
> and has not been merged 

[GitHub] [airflow] potiuk commented on issue #5807: [AIRFLOW-5204] Shellcheck + common licences in shell files

2019-08-14 Thread GitBox
potiuk commented on issue #5807:  [AIRFLOW-5204] Shellcheck + common licences 
in shell files
URL: https://github.com/apache/airflow/pull/5807#issuecomment-521519887
 
 
   Again - another set of checks. This time for shell files (shellcheck + 
shebangs/executable + licenses)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on issue #5790: [AIRFLOW-5180] Added static checks (yamllint) + auto-licences for yaml

2019-08-14 Thread GitBox
potiuk commented on issue #5790:  [AIRFLOW-5180] Added static checks (yamllint) 
+ auto-licences for yaml
URL: https://github.com/apache/airflow/pull/5790#issuecomment-521513998
 
 
   Part of static checks dealing with yaml (yamllint + consistent licenses). 
Removed the chaing of depnding commits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-5176) Add integration with Azure Data Explorer

2019-08-14 Thread Michael Spector (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Spector reassigned AIRFLOW-5176:


Assignee: (was: Michael Spector)

> Add integration with Azure Data Explorer
> 
>
> Key: AIRFLOW-5176
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5176
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Affects Versions: 1.10.4, 2.0.0
>Reporter: Michael Spector
>Priority: Major
>
> Add a hook and an operator that allow sending queries to Azure Data Explorer 
> (Kusto) cluster.
> ADX (Azure Data Explorer) is relatively new but very promising analytics data 
> store / data processing offering in Azure.
> PR: https://github.com/apache/airflow/pull/5785



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] mik-laj opened a new pull request #5822: [AIRFLOW-4758] Add GcsToGDriveOperator

2019-08-14 Thread GitBox
mik-laj opened a new pull request #5822: [AIRFLOW-4758] Add GcsToGDriveOperator
URL: https://github.com/apache/airflow/pull/5822
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4758
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] matwerber1 commented on issue #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow

2019-08-14 Thread GitBox
matwerber1 commented on issue #4068: [AIRFLOW-2310]: Add AWS Glue Job 
Compatibility to Airflow
URL: https://github.com/apache/airflow/pull/4068#issuecomment-521402915
 
 
   I see the merge failed from what is (hopefully?) a small conflict - can we 
get eyes on this? can I help? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-4758) Add GoogleCloudStorageToGoogleDrive Operator

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907597#comment-16907597
 ] 

ASF GitHub Bot commented on AIRFLOW-4758:
-

mik-laj commented on pull request #5822: [AIRFLOW-4758] Add GcsToGDriveOperator
URL: https://github.com/apache/airflow/pull/5822
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4758
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add GoogleCloudStorageToGoogleDrive Operator
> 
>
> Key: AIRFLOW-4758
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4758
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: gcp, operators
>Affects Versions: 1.10.3
>Reporter: jack
>Priority: Major
>
> Add Operators:
> GoogleCloudStorageToGoogleDrive
> GoogleDriveToGoogleCloudStorage
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-4758) Add GoogleCloudStorageToGoogleDrive Operator

2019-08-14 Thread Kamil Bregula (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907598#comment-16907598
 ] 

Kamil Bregula commented on AIRFLOW-4758:


I have created an operator that copies data from GCS to GDrive. Writing an 
operator that copies directories between GDrive and GCS will not be easy, 
because GDrive stores files in graphs. The directory structure may contain 
cycles. It is possible to write an operator that copies one file from GDrive, 
but its usability will be very limited. 

What do you think?

> Add GoogleCloudStorageToGoogleDrive Operator
> 
>
> Key: AIRFLOW-4758
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4758
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: gcp, operators
>Affects Versions: 1.10.3
>Reporter: jack
>Priority: Major
>
> Add Operators:
> GoogleCloudStorageToGoogleDrive
> GoogleDriveToGoogleCloudStorage
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] ashb commented on a change in pull request #5668: [AIRFLOW-4316] support setting kubernetes_environment_variables config section from env var

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5668: [AIRFLOW-4316] support 
setting kubernetes_environment_variables config section from env var
URL: https://github.com/apache/airflow/pull/5668#discussion_r313769536
 
 

 ##
 File path: tests/operators/test_email_operator.py
 ##
 @@ -57,6 +57,6 @@ def _run_as_operator(self, **kwargs):
 task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
 
 def test_execute(self):
-with conf_vars({('email', 'EMAIL_BACKEND'): 
'tests.operators.test_email_operator.send_email_test'}):
+with conf_vars({('email', 'email_backend'): 
'tests.operators.test_email_operator.send_email_test'}):
 
 Review comment:
   Just to confirm: we are setting these as lowercase because that is how it is 
defined in the config, but reading this as `conf.get('email', 'EMAIL_BACKEND')` 
still works, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5209) Fix Documentation build

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907091#comment-16907091
 ] 

ASF GitHub Bot commented on AIRFLOW-5209:
-

kaxil commented on pull request #5814: [AIRFLOW-5209] Bump Sphinx version to 
fix doc build
URL: https://github.com/apache/airflow/pull/5814
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Documentation build
> ---
>
> Key: AIRFLOW-5209
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5209
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.5
>
>
> Currently, if you try to build on master or 1.10.4 it fails with the 
> following error:
> {noformat}
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2753, in underline
> self.section(title, source, style, lineno - 1, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 327, in section
> self.new_subsection(title, lineno, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 395, in new_subsection
> node=section_node, match_titles=True)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 282, in nested_parse
> node=node, match_titles=match_titles)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 196, in run
> results = StateMachineWS.run(self, input_lines, input_offset)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 239, in run
> context, state, transitions)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2326, in explicit_markup
> nodelist, blank_finish = self.explicit_construct(match)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2338, in explicit_construct
> return method(self, expmatch)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2081, in directive
> directive_class, match, type_name, option_presets)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2130, in run_directive
> result = directive_instance.run()
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 121, in run
> documenter_options = process_documenter_options(doccls, self.config, 
> self.options)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 73, in process_documenter_options
> return Options(assemble_option_dict(options.items(), 
> documenter.option_spec))
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/utils/__init__.py",
>  line 328, in assemble_option_dict
> options[name] = convertor(value)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py",
>  line 82, in members_option
> return [x.strip() for x in arg.split(',')]
> AttributeError: 'bool' object has no attribute 'split'
> Exception occurred:
>   File 
> 

[jira] [Commented] (AIRFLOW-5209) Fix Documentation build

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907092#comment-16907092
 ] 

ASF subversion and git services commented on AIRFLOW-5209:
--

Commit 34fbd029f70c9f51de93ab3d3bfc6c72bb8d5bf3 in airflow's branch 
refs/heads/master from Kaxil Naik
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=34fbd02 ]

[AIRFLOW-5209] Bump Sphinx version to fix doc build (#5814)



> Fix Documentation build
> ---
>
> Key: AIRFLOW-5209
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5209
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.5
>
>
> Currently, if you try to build on master or 1.10.4 it fails with the 
> following error:
> {noformat}
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2753, in underline
> self.section(title, source, style, lineno - 1, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 327, in section
> self.new_subsection(title, lineno, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 395, in new_subsection
> node=section_node, match_titles=True)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 282, in nested_parse
> node=node, match_titles=match_titles)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 196, in run
> results = StateMachineWS.run(self, input_lines, input_offset)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 239, in run
> context, state, transitions)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2326, in explicit_markup
> nodelist, blank_finish = self.explicit_construct(match)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2338, in explicit_construct
> return method(self, expmatch)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2081, in directive
> directive_class, match, type_name, option_presets)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2130, in run_directive
> result = directive_instance.run()
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 121, in run
> documenter_options = process_documenter_options(doccls, self.config, 
> self.options)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 73, in process_documenter_options
> return Options(assemble_option_dict(options.items(), 
> documenter.option_spec))
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/utils/__init__.py",
>  line 328, in assemble_option_dict
> options[name] = convertor(value)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py",
>  line 82, in members_option
> return [x.strip() for x in arg.split(',')]
> AttributeError: 'bool' object has no attribute 'split'
> Exception occurred:
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py",
>  line 82, in members_option
> return [x.strip() for x in arg.split(',')]
> AttributeError: 'bool' object has no attribute 'split'
> 

[jira] [Resolved] (AIRFLOW-5209) Fix Documentation build

2019-08-14 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5209.
-
Resolution: Fixed

> Fix Documentation build
> ---
>
> Key: AIRFLOW-5209
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5209
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.5
>
>
> Currently, if you try to build on master or 1.10.4 it fails with the 
> following error:
> {noformat}
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2753, in underline
> self.section(title, source, style, lineno - 1, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 327, in section
> self.new_subsection(title, lineno, messages)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 395, in new_subsection
> node=section_node, match_titles=True)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 282, in nested_parse
> node=node, match_titles=match_titles)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 196, in run
> results = StateMachineWS.run(self, input_lines, input_offset)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 239, in run
> context, state, transitions)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/statemachine.py",
>  line 460, in check_line
> return method(match, context, next_state)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2326, in explicit_markup
> nodelist, blank_finish = self.explicit_construct(match)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2338, in explicit_construct
> return method(self, expmatch)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2081, in directive
> directive_class, match, type_name, option_presets)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/parsers/rst/states.py",
>  line 2130, in run_directive
> result = directive_instance.run()
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 121, in run
> documenter_options = process_documenter_options(doccls, self.config, 
> self.options)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/directive.py",
>  line 73, in process_documenter_options
> return Options(assemble_option_dict(options.items(), 
> documenter.option_spec))
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/docutils/utils/__init__.py",
>  line 328, in assemble_option_dict
> options[name] = convertor(value)
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py",
>  line 82, in members_option
> return [x.strip() for x in arg.split(',')]
> AttributeError: 'bool' object has no attribute 'split'
> Exception occurred:
>   File 
> "/home/docs/checkouts/readthedocs.org/user_builds/airflow/envs/latest/lib/python3.7/site-packages/sphinx/ext/autodoc/__init__.py",
>  line 82, in members_option
> return [x.strip() for x in arg.split(',')]
> AttributeError: 'bool' object has no attribute 'split'
> {noformat}
> Our doc build on RTD also fails with the same error: 
> https://readthedocs.org/projects/airflow/builds/9511663/
> This is caused where the version of Sphinx < 2
> Using the latest Sphinx version solves this for us.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] kaxil merged pull request #5814: [AIRFLOW-5209] Bump Sphinx version to fix doc build

2019-08-14 Thread GitBox
kaxil merged pull request #5814: [AIRFLOW-5209] Bump Sphinx version to fix doc 
build
URL: https://github.com/apache/airflow/pull/5814
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding 
template files more efficient
URL: https://github.com/apache/airflow/pull/5815#discussion_r313819287
 
 

 ##
 File path: airflow/models/baseoperator.py
 ##
 @@ -717,26 +717,27 @@ def prepare_template(self):
 
 def resolve_template_files(self):
 # Getting the content of files for template_field / template_ext
-for attr in self.template_fields:
-content = getattr(self, attr, None)
-if content is None:
-continue
-elif isinstance(content, str) and \
-any([content.endswith(ext) for ext in self.template_ext]):
-env = self.get_template_env()
-try:
-setattr(self, attr, env.loader.get_source(env, content)[0])
-except Exception as e:
-self.log.exception(e)
-elif isinstance(content, list):
-env = self.dag.get_template_env()
-for i in range(len(content)):
-if isinstance(content[i], str) and \
-any([content[i].endswith(ext) for ext in 
self.template_ext]):
-try:
-content[i] = env.loader.get_source(env, 
content[i])[0]
-except Exception as e:
-self.log.exception(e)
+if self.template_ext:
 
 Review comment:
   Cool, I get it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on issue #5565: [AIRFLOW-4899] Fix get_dataset_list from bigquery hook to return next…

2019-08-14 Thread GitBox
mik-laj commented on issue #5565: [AIRFLOW-4899] Fix get_dataset_list from 
bigquery hook to return next…
URL: https://github.com/apache/airflow/pull/5565#issuecomment-521171013
 
 
   @benjamingrenier flake8 is sad ;_; 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-4908) Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and get_dataset

2019-08-14 Thread Kamil Bregula (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Bregula resolved AIRFLOW-4908.

   Resolution: Fixed
Fix Version/s: 1.10.5

> Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and 
> get_dataset
> 
>
> Key: AIRFLOW-4908
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4908
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Critical
> Fix For: 1.10.5
>
>
> To create a BigQuery sink for GCP Stackdriver Logging, I have to assign 
> `WRITER` access to group `cloud-l...@google.com` to access BQ dataset. 
> However, current BigQueryHook doesn't support updating/patching dataset.
> Reference: 
> [https://googleapis.github.io/google-cloud-python/latest/logging/usage.html#export-to-bigquery]
> Implement GCP Stackdriver Logging: 
> https://issues.apache.org/jira/browse/AIRFLOW-4779
> While it is missing update_dataset and patch_dataset, BigQueryHook has 
> get_dataset but it doesn't have operator for it.
>  
> Features to be implemented:
> BigQueryBaseCursor.patch_dataset
> BigQueryBaseCursor.update_dataset
> BigQueryPatchDatasetOperator
> BigQueryUpdateDatasetOperator
> BigQueryGetDatasetOperator



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (AIRFLOW-5133) Keep original env state in provide_gcp_credential_file

2019-08-14 Thread Kamil Bregula (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Bregula resolved AIRFLOW-5133.

   Resolution: Fixed
Fix Version/s: 1.10.5

> Keep original env state in provide_gcp_credential_file
> --
>
> Key: AIRFLOW-5133
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5133
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 1.10.5
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5133) Keep original env state in provide_gcp_credential_file

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907077#comment-16907077
 ] 

ASF subversion and git services commented on AIRFLOW-5133:
--

Commit 877e42d8847c379a0817aa844704583f22e8be27 in airflow's branch 
refs/heads/master from Kamil Breguła
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=877e42d ]

[AIRFLOW-5133] Keep original env state in provide_gcp_credential_file (#5747)



> Keep original env state in provide_gcp_credential_file
> --
>
> Key: AIRFLOW-5133
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5133
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5133) Keep original env state in provide_gcp_credential_file

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907076#comment-16907076
 ] 

ASF GitHub Bot commented on AIRFLOW-5133:
-

mik-laj commented on pull request #5747: [AIRFLOW-5133] Keep original env state 
in provide_gcp_credential_file
URL: https://github.com/apache/airflow/pull/5747
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Keep original env state in provide_gcp_credential_file
> --
>
> Key: AIRFLOW-5133
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5133
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] mik-laj merged pull request #5747: [AIRFLOW-5133] Keep original env state in provide_gcp_credential_file

2019-08-14 Thread GitBox
mik-laj merged pull request #5747: [AIRFLOW-5133] Keep original env state in 
provide_gcp_credential_file
URL: https://github.com/apache/airflow/pull/5747
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5212) Getting the error "ERROR - Failed to bag_dag"

2019-08-14 Thread Theepan Subramani (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theepan Subramani updated AIRFLOW-5212:
---
Description: 
The error does not contain any details on why the dag is failing to run. Due to 
this error, the DAG does not start running. 

Without any hint, we are not sure on how to fix the python script used to 
create the DAG. Any help would be much appreciated

> Getting the error "ERROR - Failed to bag_dag"
> -
>
> Key: AIRFLOW-5212
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5212
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.10.2
>Reporter: Theepan Subramani
>Priority: Major
> Attachments: sand.py
>
>
> The error does not contain any details on why the dag is failing to run. Due 
> to this error, the DAG does not start running. 
> Without any hint, we are not sure on how to fix the python script used to 
> create the DAG. Any help would be much appreciated



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] BasPH commented on issue #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
BasPH commented on issue #5815: [AIRFLOW-5210] Make finding template files more 
efficient
URL: https://github.com/apache/airflow/pull/5815#issuecomment-521200790
 
 
   @danfrankj LGTM, the k8s CI step failed. I believe it's a bit buggy so 
restarted it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on issue #5566: [AIRFLOW-4935] Add method in the bigquery hook to list tables in a dataset

2019-08-14 Thread GitBox
mik-laj commented on issue #5566: [AIRFLOW-4935] Add method in the bigquery 
hook to list tables in a dataset
URL: https://github.com/apache/airflow/pull/5566#issuecomment-521171398
 
 
   Can you do rebase, please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj merged pull request #5816: [AIRFLOW-5211] Add pass_value to template_fields for BigQueryValueCheckOperator

2019-08-14 Thread GitBox
mik-laj merged pull request #5816: [AIRFLOW-5211] Add pass_value to 
template_fields for BigQueryValueCheckOperator
URL: https://github.com/apache/airflow/pull/5816
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5211) Add pass_value to template_fields -- BigQueryValueCheckOperator

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907081#comment-16907081
 ] 

ASF GitHub Bot commented on AIRFLOW-5211:
-

mik-laj commented on pull request #5816: [AIRFLOW-5211] Add pass_value to 
template_fields for BigQueryValueCheckOperator
URL: https://github.com/apache/airflow/pull/5816
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add pass_value to template_fields -- BigQueryValueCheckOperator
> ---
>
> Key: AIRFLOW-5211
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5211
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.4
>Reporter: Damon Liao
>Assignee: Damon Liao
>Priority: Minor
> Fix For: 1.10.4, 1.10.5
>
>
> There's use cases to fill *pass_value* from *XCom* when use 
> *BigQueryValueCheckOperator*, so add pass_value to template_fields.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5211) Add pass_value to template_fields -- BigQueryValueCheckOperator

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907082#comment-16907082
 ] 

ASF subversion and git services commented on AIRFLOW-5211:
--

Commit 7935e9378c555acb88b462a9459d87294d07c5e7 in airflow's branch 
refs/heads/master from damon09...@gmail.com
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=7935e93 ]

[AIRFLOW-5211] Add pass_value to template_fields for BigQueryValueCheckOperator 
(#5816)



> Add pass_value to template_fields -- BigQueryValueCheckOperator
> ---
>
> Key: AIRFLOW-5211
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5211
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.4
>Reporter: Damon Liao
>Assignee: Damon Liao
>Priority: Minor
> Fix For: 1.10.4, 1.10.5
>
>
> There's use cases to fill *pass_value* from *XCom* when use 
> *BigQueryValueCheckOperator*, so add pass_value to template_fields.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding 
template files more efficient
URL: https://github.com/apache/airflow/pull/5815#discussion_r313788497
 
 

 ##
 File path: airflow/models/baseoperator.py
 ##
 @@ -717,26 +717,27 @@ def prepare_template(self):
 
 def resolve_template_files(self):
 # Getting the content of files for template_field / template_ext
-for attr in self.template_fields:
-content = getattr(self, attr, None)
-if content is None:
-continue
-elif isinstance(content, str) and \
-any([content.endswith(ext) for ext in self.template_ext]):
-env = self.get_template_env()
-try:
-setattr(self, attr, env.loader.get_source(env, content)[0])
-except Exception as e:
-self.log.exception(e)
-elif isinstance(content, list):
-env = self.dag.get_template_env()
-for i in range(len(content)):
-if isinstance(content[i], str) and \
-any([content[i].endswith(ext) for ext in 
self.template_ext]):
-try:
-content[i] = env.loader.get_source(env, 
content[i])[0]
-except Exception as e:
-self.log.exception(e)
+if self.template_ext:
 
 Review comment:
   Wouldn't this stop rendering of template fields ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5815: [AIRFLOW-5210] Make finding 
template files more efficient
URL: https://github.com/apache/airflow/pull/5815#discussion_r313788497
 
 

 ##
 File path: airflow/models/baseoperator.py
 ##
 @@ -717,26 +717,27 @@ def prepare_template(self):
 
 def resolve_template_files(self):
 # Getting the content of files for template_field / template_ext
-for attr in self.template_fields:
-content = getattr(self, attr, None)
-if content is None:
-continue
-elif isinstance(content, str) and \
-any([content.endswith(ext) for ext in self.template_ext]):
-env = self.get_template_env()
-try:
-setattr(self, attr, env.loader.get_source(env, content)[0])
-except Exception as e:
-self.log.exception(e)
-elif isinstance(content, list):
-env = self.dag.get_template_env()
-for i in range(len(content)):
-if isinstance(content[i], str) and \
-any([content[i].endswith(ext) for ext in 
self.template_ext]):
-try:
-content[i] = env.loader.get_source(env, 
content[i])[0]
-except Exception as e:
-self.log.exception(e)
+if self.template_ext:
 
 Review comment:
   Wouldn't this stop rendering of template fields as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5672: [AIRFLOW-5056] Add argument to filter mails in ImapHook and related operators

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5672: [AIRFLOW-5056] Add argument 
to filter mails in ImapHook and related operators
URL: https://github.com/apache/airflow/pull/5672#discussion_r313820298
 
 

 ##
 File path: airflow/contrib/hooks/imap_hook.py
 ##
 @@ -30,72 +34,105 @@ class ImapHook(BaseHook):
 """
 This hook connects to a mail server by using the imap protocol.
 
+.. note:: Please call this Hook as context manager via `with`
+to automatically open and close the connection to the mail server.
+
 :param imap_conn_id: The connection id that contains the information used 
to authenticate the client.
 :type imap_conn_id: str
 """
 
 def __init__(self, imap_conn_id='imap_default'):
 super().__init__(imap_conn_id)
-self.conn = self.get_connection(imap_conn_id)
-self.mail_client = imaplib.IMAP4_SSL(self.conn.host)
+self.imap_conn_id = imap_conn_id
+self.mail_client = None
 
 def __enter__(self):
-self.mail_client.login(self.conn.login, self.conn.password)
-return self
+return self.get_conn()
 
 def __exit__(self, exc_type, exc_val, exc_tb):
 self.mail_client.logout()
 
-def has_mail_attachment(self, name, mail_folder='INBOX', 
check_regex=False):
+def get_conn(self):
+"""
+Login to the mail server.
+
+.. note:: Please call this Hook as context manager via `with`
+to automatically open and close the connection to the mail server.
+
+:return: an authorized ImapHook object.
+:rtype: ImapHook
+"""
+
+if not self.mail_client:
+conn = self.get_connection(self.imap_conn_id)
+self.mail_client = imaplib.IMAP4_SSL(conn.host)
+self.mail_client.login(conn.login, conn.password)
+
+return self
+
+def has_mail_attachment(self, name, check_regex=False, 
mail_folder='INBOX', mail_filter='All'):
 
 Review comment:
   You are likely the heaviest user of this, so we'll just merge this and make 
it 2.0 only


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on issue #5777: [AIRFLOW-5161] Static checks are run automatically in pre-commit hooks

2019-08-14 Thread GitBox
potiuk commented on issue #5777: [AIRFLOW-5161] Static checks are run 
automatically in pre-commit hooks
URL: https://github.com/apache/airflow/pull/5777#issuecomment-521238624
 
 
   Thanks @Fokko -> There are more checks to come, the whole idea is that we 
can add many more static checks (I have the follow-up PRs adding them) so it 
will be difficult to maintain such list. It's better to have one job covering 
all the checks (then the overhead for starting the job will be much smaller). 
We will run all the checks here except docs build (this is not really 
applicable for incremental pre-commit checks) and pylint (not applicable until 
we finish pylint introduction and get rid of pylint_todo.txt). So I prefer to 
leave that name.
   
   @dimberman  - are you happy with my fixes to docs ? I would love to merge 
that one so that I can better handle the follow-up changes with more static 
checks and get them added one-by-one 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-5210) Resolving Template Files for large DAGs hurts performance

2019-08-14 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-5210.

   Resolution: Fixed
Fix Version/s: 1.10.5

> Resolving Template Files for large DAGs hurts performance 
> --
>
> Key: AIRFLOW-5210
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5210
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.4
>Reporter: Daniel Frank
>Priority: Major
> Fix For: 1.10.5
>
>
> During task execution,  "resolve_template_files" runs for all tasks in a 
> given DAG. For large DAGs this takes a long time and is not necessary for 
> tasks that do not use the template_ext field 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] feluelle commented on a change in pull request #5672: [AIRFLOW-5056] Add argument to filter mails in ImapHook and related operators

2019-08-14 Thread GitBox
feluelle commented on a change in pull request #5672: [AIRFLOW-5056] Add 
argument to filter mails in ImapHook and related operators
URL: https://github.com/apache/airflow/pull/5672#discussion_r313879721
 
 

 ##
 File path: airflow/contrib/hooks/imap_hook.py
 ##
 @@ -30,72 +34,105 @@ class ImapHook(BaseHook):
 """
 This hook connects to a mail server by using the imap protocol.
 
+.. note:: Please call this Hook as context manager via `with`
+to automatically open and close the connection to the mail server.
+
 :param imap_conn_id: The connection id that contains the information used 
to authenticate the client.
 :type imap_conn_id: str
 """
 
 def __init__(self, imap_conn_id='imap_default'):
 super().__init__(imap_conn_id)
-self.conn = self.get_connection(imap_conn_id)
-self.mail_client = imaplib.IMAP4_SSL(self.conn.host)
+self.imap_conn_id = imap_conn_id
+self.mail_client = None
 
 def __enter__(self):
-self.mail_client.login(self.conn.login, self.conn.password)
-return self
+return self.get_conn()
 
 def __exit__(self, exc_type, exc_val, exc_tb):
 self.mail_client.logout()
 
-def has_mail_attachment(self, name, mail_folder='INBOX', 
check_regex=False):
+def get_conn(self):
+"""
+Login to the mail server.
+
+.. note:: Please call this Hook as context manager via `with`
+to automatically open and close the connection to the mail server.
+
+:return: an authorized ImapHook object.
+:rtype: ImapHook
+"""
+
+if not self.mail_client:
+conn = self.get_connection(self.imap_conn_id)
+self.mail_client = imaplib.IMAP4_SSL(conn.host)
+self.mail_client.login(conn.login, conn.password)
+
+return self
+
+def has_mail_attachment(self, name, check_regex=False, 
mail_folder='INBOX', mail_filter='All'):
 
 Review comment:
   Okay.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #5711: [AIRFLOW-4161] BigQuery to Mysql Operator

2019-08-14 Thread GitBox
ashb commented on issue #5711: [AIRFLOW-4161] BigQuery to Mysql Operator
URL: https://github.com/apache/airflow/pull/5711#issuecomment-521260415
 
 
   Could do that. It's probably an exception that won't ever byte us again 
anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5213) DockerOperator failing when the docker default logging drivers are other than 'journald','json-file'

2019-08-14 Thread venkata Bonu (JIRA)
venkata Bonu created AIRFLOW-5213:
-

 Summary: DockerOperator failing when the docker default logging 
drivers are other than 'journald','json-file'
 Key: AIRFLOW-5213
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5213
 Project: Apache Airflow
  Issue Type: Bug
  Components: DAG, operators
Affects Versions: 1.10.4
Reporter: venkata Bonu
Assignee: venkata Bonu
 Attachments: Screen Shot 2019-08-14 at 7.10.01 AM.png

Background:

Docker can be configured with multiple logging drivers.
 * syslog
 * local
 * json - file
 * journald
 * local
 * gelf
 * fluentd
 * awslogs
 * splunk
 * etwlogs
 * gcplogs
 * Logentries

But reading docker logs is supported only with drivers local , json-file , 
journald

Docker documentation: 
[https://docs.docker.com/config/containers/logging/configure/]

 

Description:

When a docker is configured with a logging driver other than local , json-file 
, jourmald , Airflow Tasks which are using DockerOperator are failing with an 
error

_docker.errors.APIError: 501 Server Error: Not Implemented ("configured logging 
driver does not support reading")_

Issue exists in the below lines of the code when the operator is trying to read 
the logs by attaching the container.

```
{code:python}
line = ''
for line in self.cli.attach(container=self.container['Id'], stdout=True, 
stderr=True, stream=True):  
    line = line.strip()
    if hasattr(line, 'decode'):
       line = line.decode('utf-8')
self.log.info(line)
{code}
```

 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] ashb commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r313867143
 
 

 ##
 File path: airflow/dag/serialization.py
 ##
 @@ -0,0 +1,250 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""DAG serialization with JSON."""
+
+import json
+import logging
+
+import datetime
+import dateutil.parser
+import pendulum
+
+from airflow import models
+from airflow.www.utils import get_python_source
+
+
+# JSON primitive types.
+_primitive_types = (int, bool, float, str)
+
+_datetime_types = (datetime.datetime, datetime.date, datetime.time)
+
+# Object types that are always excluded.
+# TODO(coufon): not needed if _dag_included_fields and _op_included_fields are 
customized.
+_excluded_types = (logging.Logger, models.connection.Connection, type)
+
+# Stringified DADs and operators contain exactly these fields.
+# TODO(coufon): to customize included fields and keep only necessary fields.
+_dag_included_fields = list(vars(models.DAG(dag_id='test')).keys())
+_op_included_fields = list(vars(models.BaseOperator(task_id='test')).keys()) + 
[
+'_dag', 'ui_color', 'ui_fgcolor', 'template_fields']
+
+# Encoding constants.
+TYPE = '__type'
+CLASS = '__class'
+VAR = '__var'
+
+# Supported types. primitives and list are not encoded.
+DAG = 'dag'
+OP = 'operator'
+DATETIME = 'datetime'
+TIMEDELTA = 'timedelta'
+TIMEZONE = 'timezone'
+DICT = 'dict'
+SET = 'set'
+TUPLE = 'tuple'
+
+# Constants.
+BASE_OPERATOR_CLASS = 'BaseOperator'
+# Serialization failure returns 'failed'.
+FAILED = 'failed'
+
+
+def _is_primitive(x):
+"""Primitive types."""
+return x is None or isinstance(x, _primitive_types)
+
+
+def _is_excluded(x):
+"""Types excluded from serialization.
+
+TODO(coufon): not needed if _dag_included_fields and _op_included_fields 
are customized.
+"""
+return x is None or isinstance(x, _excluded_types)
+
+
+def _serialize_object(x, visited_dags, included_fields):
+"""Helper fn to serialize an object as a JSON dict."""
+new_x = {}
+for k in included_fields:
+# None is ignored in serialized form and is added back in 
deserialization.
+v = getattr(x, k, None)
+if not _is_excluded(v):
+new_x[k] = _serialize(v, visited_dags)
+return new_x
+
+
+def _serialize_dag(x, visited_dags):
+"""Serialize a DAG."""
+if x.dag_id in visited_dags:
+return {TYPE: DAG, VAR: str(x.dag_id)}
+
+new_x = {TYPE: DAG}
+visited_dags[x.dag_id] = new_x
+new_x[VAR] = _serialize_object(
+x, visited_dags, included_fields=_dag_included_fields)
+return new_x
+
+
+def _serialize_operator(x, visited_dags):
+"""Serialize an operator."""
+return _encode(
+_serialize_object(
+x, visited_dags, included_fields=_op_included_fields),
+type_=OP,
+class_=x.__class__.__name__
+)
+
+
+def _encode(x, type_, class_=None):
+"""Encode data by a JSON dict."""
+return ({VAR: x, TYPE: type_} if class_ is None
+else {VAR: x, TYPE: type_, CLASS: class_})
+
+
+def _serialize(x, visited_dags):  # pylint: disable=too-many-return-statements
+"""Helper function of depth first search for serialization.
+
+visited_dags stores DAGs that are being stringifying for have been 
stringified,
+for:
+  (1) preventing deadlock loop caused by task.dag, task._dag, and 
dag.parent_dag;
+  (2) replacing the fields in (1) with serialized counterparts.
+
+The serialization protocol is:
+  (1) keeping JSON supported types: primitives, dict, list;
+  (2) encoding other types as {TYPE, 'foo', VAR, 'bar'}, the 
deserialization
+  step decode VAR according to TYPE;
+  (3) Operator has a special field CLASS to record the original class
+  name for displaying in UI.
+"""
+try:
+if _is_primitive(x):
+return x
+elif isinstance(x, dict):
+return _encode({k: _serialize(v, visited_dags) for k, v in 
x.items()}, type_=DICT)
+elif isinstance(x, list):
+return [_serialize(v, visited_dags) for v in x]
+elif 

[GitHub] [airflow] ashb commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r313866659
 
 

 ##
 File path: airflow/dag/serialization.py
 ##
 @@ -0,0 +1,250 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""DAG serialization with JSON."""
+
+import json
+import logging
+
+import datetime
+import dateutil.parser
+import pendulum
+
+from airflow import models
+from airflow.www.utils import get_python_source
+
+
+# JSON primitive types.
+_primitive_types = (int, bool, float, str)
 
 Review comment:
   Oh yes, good point. I was thinking in terms of "python primitives" :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
ashb commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r313870289
 
 

 ##
 File path: airflow/api/common/experimental/delete_dag.py
 ##
 @@ -45,6 +49,11 @@ def delete_dag(dag_id: str, keep_records_in_log: bool = 
True, session=None) -> i
 raise DagFileExists("Dag id {} is still in DagBag. "
 "Remove the DAG file first: {}".format(dag_id, 
dag.fileloc))
 
+# Scheduler removes DAGs without files from serialized_dag table every 
dag_dir_list_interval.
+# There may be a lag, so explicitly removes serialized DAG here.
+if DAGCACHED_ENABLED and SerializedDagModel.has_dag(dag_id):
+SerializedDagModel.remove_dag(dag_id)
 
 Review comment:
   ```suggestion
   SerializedDagModel.remove_dag(dag_id, session=session)
   ```
   
   May need to extend that method to take a session argument, but this way it 
all happens inside the same DB txn.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work started] (AIRFLOW-5179) Top level __init__.py breaks imports

2019-08-14 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-5179 started by Ash Berlin-Taylor.
--
> Top level __init__.py breaks imports
> 
>
> Key: AIRFLOW-5179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0
>Reporter: Cedrik Neumann
>Assignee: Ash Berlin-Taylor
>Priority: Blocker
>
> The recent commit 
> [3724c2aaf4cfee4a60f6c7231777bfb256090c7c|https://github.com/apache/airflow/commit/3724c2aaf4cfee4a60f6c7231777bfb256090c7c]
>  to master introduced a {{__init__.py}} file in the project root folder, 
> which basically breaks all imports in local development ({{pip install -e 
> .}}) as it turns the project root into a package.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r313873973
 
 

 ##
 File path: tests/models/test_serialized_dag.py
 ##
 @@ -0,0 +1,106 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for SerializedDagModel."""
+
+import unittest
+
+from airflow import example_dags as example_dags_module
+from airflow.dag.serialization.serialized_dag import SerializedDAG
+from airflow.models import DagBag
+from airflow.models import SerializedDagModel as SDM
+from airflow.utils import db
+
+
+# FIXME: it is defined in tests/dags/test_dag_serialization.py as well.
+# To move it to a shared module.
+def make_example_dags(module):
+"""Loads DAGs from a module for test."""
+dagbag = DagBag(module.__path__[0])
+return dagbag.dags
+
+
+# FIXME: move it to airflow/utils/db.py if needed.
+def clear_db_serialized_dags():
+with db.create_session() as session:
+session.query(SDM).delete()
+
+
+class SerializedDagModelTest(unittest.TestCase):
+"""Unit tests for SerializedDagModel."""
+
+def setUp(self):
+clear_db_serialized_dags()
+
+def tearDown(self):
+clear_db_serialized_dags()
+
+def test_dag_fileloc_hash(self):
+"""Verifies the correctness of hashing file path."""
+self.assertTrue(SDM.dag_fileloc_hash('/airflow/dags/test_dag.py') == 
60791)
+
+def _write_example_dags(self):
+example_dags = make_example_dags(example_dags_module)
+for dag in example_dags.values():
+SDM.write_dag(dag)
+return example_dags
+
+def test_write_dag(self):
+"""DAGs can be written into database."""
+example_dags = self._write_example_dags()
+
+with db.create_session() as session:
+for dag in example_dags.values():
+self.assertTrue(SDM.has_dag(dag.dag_id))
+result = session.query(
+SDM.fileloc, SDM.data).filter(SDM.dag_id == 
dag.dag_id).one()
+
+self.assertTrue(result.fileloc == dag.full_filepath)
+# Verifies JSON schema.
+SerializedDAG.validate_json(result.data)
+
+def test_read_dags(self):
+"""DAGs can be read from database."""
+example_dags = self._write_example_dags()
+serialized_dags = SDM.read_all_dags()
+self.assertTrue(len(example_dags) == len(serialized_dags))
+for dag_id, dag in example_dags.items():
+serialized_dag = serialized_dags[dag_id]
+
+self.assertTrue(serialized_dag.dag_id == dag.dag_id)
+self.assertTrue(set(serialized_dag.task_dict) == 
set(dag.task_dict))
+
+def test_remove_dags(self):
+"""DAGs can be removed from database."""
+example_dags_list = list(self._write_example_dags().values())
+print("1", example_dags_list)
+# Tests removing by dag_id.
+dag_removed_by_id = example_dags_list[0]
+SDM.remove_dag(dag_removed_by_id.dag_id)
+self.assertFalse(SDM.has_dag(dag_removed_by_id.dag_id))
+
+# Tests removing by file path.
+dag_removed_by_file = example_dags_list[1]
+example_dag_files = list([dag.full_filepath for dag in 
example_dags_list])
+print("2", dag_removed_by_file)
+print("3", example_dag_files)
+example_dag_files.remove(dag_removed_by_file.full_filepath)
+print("4", example_dag_files)
 
 Review comment:
   Temporary - to remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5179) Top level __init__.py breaks imports

2019-08-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907260#comment-16907260
 ] 

ASF GitHub Bot commented on AIRFLOW-5179:
-

ashb commented on pull request #5818: [AIRFLOW-5179] Remove top level 
__init__.py
URL: https://github.com/apache/airflow/pull/5818
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] The recent commit 3724c2aa (#5711) to master introduced a __init__.py 
file in
   the project root folder, which basically breaks all imports in local
   development (`pip install -e .`) as it turns the project root into a
   package.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Top level __init__.py breaks imports
> 
>
> Key: AIRFLOW-5179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0
>Reporter: Cedrik Neumann
>Priority: Blocker
>
> The recent commit 
> [3724c2aaf4cfee4a60f6c7231777bfb256090c7c|https://github.com/apache/airflow/commit/3724c2aaf4cfee4a60f6c7231777bfb256090c7c]
>  to master introduced a {{__init__.py}} file in the project root folder, 
> which basically breaks all imports in local development ({{pip install -e 
> .}}) as it turns the project root into a package.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r313873900
 
 

 ##
 File path: tests/models/test_serialized_dag.py
 ##
 @@ -0,0 +1,106 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for SerializedDagModel."""
+
+import unittest
+
+from airflow import example_dags as example_dags_module
+from airflow.dag.serialization.serialized_dag import SerializedDAG
+from airflow.models import DagBag
+from airflow.models import SerializedDagModel as SDM
+from airflow.utils import db
+
+
+# FIXME: it is defined in tests/dags/test_dag_serialization.py as well.
+# To move it to a shared module.
+def make_example_dags(module):
+"""Loads DAGs from a module for test."""
+dagbag = DagBag(module.__path__[0])
+return dagbag.dags
+
+
+# FIXME: move it to airflow/utils/db.py if needed.
+def clear_db_serialized_dags():
+with db.create_session() as session:
+session.query(SDM).delete()
+
+
+class SerializedDagModelTest(unittest.TestCase):
+"""Unit tests for SerializedDagModel."""
+
+def setUp(self):
+clear_db_serialized_dags()
+
+def tearDown(self):
+clear_db_serialized_dags()
+
+def test_dag_fileloc_hash(self):
+"""Verifies the correctness of hashing file path."""
+self.assertTrue(SDM.dag_fileloc_hash('/airflow/dags/test_dag.py') == 
60791)
+
+def _write_example_dags(self):
+example_dags = make_example_dags(example_dags_module)
+for dag in example_dags.values():
+SDM.write_dag(dag)
+return example_dags
+
+def test_write_dag(self):
+"""DAGs can be written into database."""
+example_dags = self._write_example_dags()
+
+with db.create_session() as session:
+for dag in example_dags.values():
+self.assertTrue(SDM.has_dag(dag.dag_id))
+result = session.query(
+SDM.fileloc, SDM.data).filter(SDM.dag_id == 
dag.dag_id).one()
+
+self.assertTrue(result.fileloc == dag.full_filepath)
+# Verifies JSON schema.
+SerializedDAG.validate_json(result.data)
+
+def test_read_dags(self):
+"""DAGs can be read from database."""
+example_dags = self._write_example_dags()
+serialized_dags = SDM.read_all_dags()
+self.assertTrue(len(example_dags) == len(serialized_dags))
+for dag_id, dag in example_dags.items():
+serialized_dag = serialized_dags[dag_id]
+
+self.assertTrue(serialized_dag.dag_id == dag.dag_id)
+self.assertTrue(set(serialized_dag.task_dict) == 
set(dag.task_dict))
+
+def test_remove_dags(self):
+"""DAGs can be removed from database."""
+example_dags_list = list(self._write_example_dags().values())
+print("1", example_dags_list)
 
 Review comment:
   Temporary - to remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r313873931
 
 

 ##
 File path: tests/models/test_serialized_dag.py
 ##
 @@ -0,0 +1,106 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for SerializedDagModel."""
+
+import unittest
+
+from airflow import example_dags as example_dags_module
+from airflow.dag.serialization.serialized_dag import SerializedDAG
+from airflow.models import DagBag
+from airflow.models import SerializedDagModel as SDM
+from airflow.utils import db
+
+
+# FIXME: it is defined in tests/dags/test_dag_serialization.py as well.
+# To move it to a shared module.
+def make_example_dags(module):
+"""Loads DAGs from a module for test."""
+dagbag = DagBag(module.__path__[0])
+return dagbag.dags
+
+
+# FIXME: move it to airflow/utils/db.py if needed.
+def clear_db_serialized_dags():
+with db.create_session() as session:
+session.query(SDM).delete()
+
+
+class SerializedDagModelTest(unittest.TestCase):
+"""Unit tests for SerializedDagModel."""
+
+def setUp(self):
+clear_db_serialized_dags()
+
+def tearDown(self):
+clear_db_serialized_dags()
+
+def test_dag_fileloc_hash(self):
+"""Verifies the correctness of hashing file path."""
+self.assertTrue(SDM.dag_fileloc_hash('/airflow/dags/test_dag.py') == 
60791)
+
+def _write_example_dags(self):
+example_dags = make_example_dags(example_dags_module)
+for dag in example_dags.values():
+SDM.write_dag(dag)
+return example_dags
+
+def test_write_dag(self):
+"""DAGs can be written into database."""
+example_dags = self._write_example_dags()
+
+with db.create_session() as session:
+for dag in example_dags.values():
+self.assertTrue(SDM.has_dag(dag.dag_id))
+result = session.query(
+SDM.fileloc, SDM.data).filter(SDM.dag_id == 
dag.dag_id).one()
+
+self.assertTrue(result.fileloc == dag.full_filepath)
+# Verifies JSON schema.
+SerializedDAG.validate_json(result.data)
+
+def test_read_dags(self):
+"""DAGs can be read from database."""
+example_dags = self._write_example_dags()
+serialized_dags = SDM.read_all_dags()
+self.assertTrue(len(example_dags) == len(serialized_dags))
+for dag_id, dag in example_dags.items():
+serialized_dag = serialized_dags[dag_id]
+
+self.assertTrue(serialized_dag.dag_id == dag.dag_id)
+self.assertTrue(set(serialized_dag.task_dict) == 
set(dag.task_dict))
+
+def test_remove_dags(self):
+"""DAGs can be removed from database."""
+example_dags_list = list(self._write_example_dags().values())
+print("1", example_dags_list)
+# Tests removing by dag_id.
+dag_removed_by_id = example_dags_list[0]
+SDM.remove_dag(dag_removed_by_id.dag_id)
+self.assertFalse(SDM.has_dag(dag_removed_by_id.dag_id))
+
+# Tests removing by file path.
+dag_removed_by_file = example_dags_list[1]
+example_dag_files = list([dag.full_filepath for dag in 
example_dags_list])
+print("2", dag_removed_by_file)
 
 Review comment:
   Temporary - to remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-08-14 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r313873949
 
 

 ##
 File path: tests/models/test_serialized_dag.py
 ##
 @@ -0,0 +1,106 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for SerializedDagModel."""
+
+import unittest
+
+from airflow import example_dags as example_dags_module
+from airflow.dag.serialization.serialized_dag import SerializedDAG
+from airflow.models import DagBag
+from airflow.models import SerializedDagModel as SDM
+from airflow.utils import db
+
+
+# FIXME: it is defined in tests/dags/test_dag_serialization.py as well.
+# To move it to a shared module.
+def make_example_dags(module):
+"""Loads DAGs from a module for test."""
+dagbag = DagBag(module.__path__[0])
+return dagbag.dags
+
+
+# FIXME: move it to airflow/utils/db.py if needed.
+def clear_db_serialized_dags():
+with db.create_session() as session:
+session.query(SDM).delete()
+
+
+class SerializedDagModelTest(unittest.TestCase):
+"""Unit tests for SerializedDagModel."""
+
+def setUp(self):
+clear_db_serialized_dags()
+
+def tearDown(self):
+clear_db_serialized_dags()
+
+def test_dag_fileloc_hash(self):
+"""Verifies the correctness of hashing file path."""
+self.assertTrue(SDM.dag_fileloc_hash('/airflow/dags/test_dag.py') == 
60791)
+
+def _write_example_dags(self):
+example_dags = make_example_dags(example_dags_module)
+for dag in example_dags.values():
+SDM.write_dag(dag)
+return example_dags
+
+def test_write_dag(self):
+"""DAGs can be written into database."""
+example_dags = self._write_example_dags()
+
+with db.create_session() as session:
+for dag in example_dags.values():
+self.assertTrue(SDM.has_dag(dag.dag_id))
+result = session.query(
+SDM.fileloc, SDM.data).filter(SDM.dag_id == 
dag.dag_id).one()
+
+self.assertTrue(result.fileloc == dag.full_filepath)
+# Verifies JSON schema.
+SerializedDAG.validate_json(result.data)
+
+def test_read_dags(self):
+"""DAGs can be read from database."""
+example_dags = self._write_example_dags()
+serialized_dags = SDM.read_all_dags()
+self.assertTrue(len(example_dags) == len(serialized_dags))
+for dag_id, dag in example_dags.items():
+serialized_dag = serialized_dags[dag_id]
+
+self.assertTrue(serialized_dag.dag_id == dag.dag_id)
+self.assertTrue(set(serialized_dag.task_dict) == 
set(dag.task_dict))
+
+def test_remove_dags(self):
+"""DAGs can be removed from database."""
+example_dags_list = list(self._write_example_dags().values())
+print("1", example_dags_list)
+# Tests removing by dag_id.
+dag_removed_by_id = example_dags_list[0]
+SDM.remove_dag(dag_removed_by_id.dag_id)
+self.assertFalse(SDM.has_dag(dag_removed_by_id.dag_id))
+
+# Tests removing by file path.
+dag_removed_by_file = example_dags_list[1]
+example_dag_files = list([dag.full_filepath for dag in 
example_dags_list])
+print("2", dag_removed_by_file)
+print("3", example_dag_files)
 
 Review comment:
   Temporary - to remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb opened a new pull request #5818: [AIRFLOW-5179] Remove top level __init__.py

2019-08-14 Thread GitBox
ashb opened a new pull request #5818: [AIRFLOW-5179] Remove top level 
__init__.py
URL: https://github.com/apache/airflow/pull/5818
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] The recent commit 3724c2aa (#5711) to master introduced a __init__.py 
file in
   the project root folder, which basically breaks all imports in local
   development (`pip install -e .`) as it turns the project root into a
   package.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] danfrankj commented on issue #5815: [AIRFLOW-5210] Make finding template files more efficient

2019-08-14 Thread GitBox
danfrankj commented on issue #5815: [AIRFLOW-5210] Make finding template files 
more efficient
URL: https://github.com/apache/airflow/pull/5815#issuecomment-521260026
 
 
   You're welcome! Thank you guys. Happy DAG'n


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5179) Top level __init__.py breaks imports

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907291#comment-16907291
 ] 

ASF subversion and git services commented on AIRFLOW-5179:
--

Commit 4e03d2390fc77e6a911fb97d8585fad482c589a6 in airflow's branch 
refs/heads/master from Ash Berlin-Taylor
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=4e03d23 ]

[AIRFLOW-5179] Remove top level __init__.py (#5818)

The recent commit 3724c2aa to master introduced a __init__.py file in
the project root folder, which basically breaks all imports in local
development (`pip install -e .`) as it turns the project root into a
package.

[ci skip]

> Top level __init__.py breaks imports
> 
>
> Key: AIRFLOW-5179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0
>Reporter: Cedrik Neumann
>Assignee: Ash Berlin-Taylor
>Priority: Blocker
>
> The recent commit 
> [3724c2aaf4cfee4a60f6c7231777bfb256090c7c|https://github.com/apache/airflow/commit/3724c2aaf4cfee4a60f6c7231777bfb256090c7c]
>  to master introduced a {{__init__.py}} file in the project root folder, 
> which basically breaks all imports in local development ({{pip install -e 
> .}}) as it turns the project root into a package.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (AIRFLOW-5179) Top level __init__.py breaks imports

2019-08-14 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-5179.

Resolution: Fixed

> Top level __init__.py breaks imports
> 
>
> Key: AIRFLOW-5179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0
>Reporter: Cedrik Neumann
>Assignee: Ash Berlin-Taylor
>Priority: Blocker
>
> The recent commit 
> [3724c2aaf4cfee4a60f6c7231777bfb256090c7c|https://github.com/apache/airflow/commit/3724c2aaf4cfee4a60f6c7231777bfb256090c7c]
>  to master introduced a {{__init__.py}} file in the project root folder, 
> which basically breaks all imports in local development ({{pip install -e 
> .}}) as it turns the project root into a package.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk commented on issue #5808: [AIRFLOW-5205] Check xml files depends on AIRFLOW-5161, AIRFLOW-5170, AIRFLOW-5180, AIRFLOW-5204,

2019-08-14 Thread GitBox
potiuk commented on issue #5808:  [AIRFLOW-5205] Check xml files depends on  
AIRFLOW-5161,  AIRFLOW-5170,  AIRFLOW-5180,  AIRFLOW-5204, 
URL: https://github.com/apache/airflow/pull/5808#issuecomment-521240322
 
 
   @fokko @feluelle : apologies for that. i split those changes into smaller, 
fairly independent commits (hence the "depends on" in the PR title). And it was 
much easier to do the "series" of commits. They all depend on the #5777 and 
once I merge that one (Thanks @fokko for the approaval), I will actually change 
those commits/PR and make them totally separate (and not depending on each 
other).  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5210) Resolving Template Files for large DAGs hurts performance

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907254#comment-16907254
 ] 

ASF subversion and git services commented on AIRFLOW-5210:
--

Commit eeac82318a6440b2d65f9a35b3437b91813945f4 in airflow's branch 
refs/heads/master from Daniel Frank
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=eeac823 ]

[AIRFLOW-5210] Make finding template files more efficient (#5815)

For large DAGs, iterating over template fields to find template files can be 
time intensive.
Save this time for tasks that do not specify a template file extension.

> Resolving Template Files for large DAGs hurts performance 
> --
>
> Key: AIRFLOW-5210
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5210
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.4
>Reporter: Daniel Frank
>Priority: Major
>
> During task execution,  "resolve_template_files" runs for all tasks in a 
> given DAG. For large DAGs this takes a long time and is not necessary for 
> tasks that do not use the template_ext field 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >