[jira] [Created] (AIRFLOW-5871) Stopping/Clearing a running airflow instance doesnt't terminate the job actually.

2019-11-07 Thread Vasudha Putta (Jira)
Vasudha Putta created AIRFLOW-5871:
--

 Summary: Stopping/Clearing a running airflow instance doesnt't 
terminate the job actually.
 Key: AIRFLOW-5871
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5871
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: 1.10.1
Reporter: Vasudha Putta


Hi Team,

When I change the state of a running job instance to cleared/failed, it doesn't 
completely terminate the existing job. I tried using pythonOperator, 
bashOperator. The job connects to oracle and executes a package. Even 
terminating/killing airflow job process  won't terminate the oracle sessions. 
This is an issue as whenever we would need to compile the package we would have 
to stop the dags, marks existing dag runs to clear state and then kill the 
oracle sessions. Is there a way to clean stop dag runs in airflow.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-5860) Add the field `dagrun_id` to the response of experimental API on trigger.

2019-11-07 Thread Anton Kumpan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969784#comment-16969784
 ] 

Anton Kumpan edited comment on AIRFLOW-5860 at 11/8/19 3:17 AM:


I believe this is duplicate of 

https://issues.apache.org/jira/browse/AIRFLOW-5590

 

Also, even after adding dag_id to the response - it will not be useful because 
currently date is used as identifier. It is not really comfortable - so I have 
https://issues.apache.org/jira/browse/AIRFLOW-5593 as suggestion. You can work 
on it if you want


was (Author: akumpan):
I believe this is duplicate of 

https://issues.apache.org/jira/browse/AIRFLOW-5590

 

Also, even after adding dag_id to the response - it is not useful - because 
currently date is used as identifier. It is not really comfortable - so I have 
https://issues.apache.org/jira/browse/AIRFLOW-5593 as suggestion. You can work 
on it if you want

> Add the field `dagrun_id` to the response of experimental API on trigger.
> -
>
> Key: AIRFLOW-5860
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5860
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, cli
>Affects Versions: 1.10.6
>Reporter: Douglas Mendez
>Priority: Major
>
> We are using experimental REST API for automating Airflow Dag, triggering 
> Dags from one of our microservices. It would be great for us to have within 
> the API response the dagrun_id to keep track of it with ease.
>  
>  
> *Response example*
>  
> {code:java}
> // code placeholder
> {
>'execution_date': execution_date, 
>'message': message, 
>'dagrun_id': 'dagrun_id
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-5860) Add the field `dagrun_id` to the response of experimental API on trigger.

2019-11-07 Thread Anton Kumpan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969784#comment-16969784
 ] 

Anton Kumpan edited comment on AIRFLOW-5860 at 11/8/19 3:16 AM:


I believe this is duplicate of 

https://issues.apache.org/jira/browse/AIRFLOW-5590

 

Also, even after adding dag_id to the response - it is not useful - because 
currently date is used as identifier. It is not really comfortable - so I have 
https://issues.apache.org/jira/browse/AIRFLOW-5593 as suggestion. You can work 
on it if you want


was (Author: akumpan):
I believe this is duplicate of 

https://issues.apache.org/jira/browse/AIRFLOW-5590

> Add the field `dagrun_id` to the response of experimental API on trigger.
> -
>
> Key: AIRFLOW-5860
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5860
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, cli
>Affects Versions: 1.10.6
>Reporter: Douglas Mendez
>Priority: Major
>
> We are using experimental REST API for automating Airflow Dag, triggering 
> Dags from one of our microservices. It would be great for us to have within 
> the API response the dagrun_id to keep track of it with ease.
>  
>  
> *Response example*
>  
> {code:java}
> // code placeholder
> {
>'execution_date': execution_date, 
>'message': message, 
>'dagrun_id': 'dagrun_id
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow-site] mik-laj opened a new pull request #143: Add version selector

2019-11-07 Thread GitBox
mik-laj opened a new pull request #143: Add version selector
URL: https://github.com/apache/airflow-site/pull/143
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] aKumpan commented on issue #6256: [AIRFLOW-5590] Add run_id to trigger DAG run API response

2019-11-07 Thread GitBox
aKumpan commented on issue #6256: [AIRFLOW-5590] Add run_id to trigger DAG run 
API response
URL: https://github.com/apache/airflow/pull/6256#issuecomment-551365641
 
 
   @XD-DENG hi, could you please merge this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5590) Add 'run_id' to trigger DAG run API response as a field

2019-11-07 Thread Anton Kumpan (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kumpan updated AIRFLOW-5590:
--
External issue URL: https://github.com/apache/airflow/pull/6256

> Add 'run_id' to trigger DAG run API response as a field
> ---
>
> Key: AIRFLOW-5590
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5590
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.5
>Reporter: Anton Kumpan
>Assignee: Anton Kumpan
>Priority: Major
>
> Idea is the same as in AIRFLOW-4482
> Currently API response looks like this:
> {code:json}
> {
> 'execution_date': '2019-05-08T07:03:09+00:00', 
> 'message': 'Created  manual__2019-05-08T07:03:09+00:00, externally triggered: True>'
> }
> {code}
>  
> It would be nice to add run_id as a separate field, so that response will 
> look like:
> {code:json}
> {
> 'run_id': 'manual__2019-05-08T07:03:09+00:00',
> 'execution_date': '2019-05-08T07:03:09+00:00', 
> 'message': 'Created  triggered: True>'
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5860) Add the field `dagrun_id` to the response of experimental API on trigger.

2019-11-07 Thread Anton Kumpan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969784#comment-16969784
 ] 

Anton Kumpan commented on AIRFLOW-5860:
---

I believe this is duplicate of 

https://issues.apache.org/jira/browse/AIRFLOW-5590

> Add the field `dagrun_id` to the response of experimental API on trigger.
> -
>
> Key: AIRFLOW-5860
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5860
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, cli
>Affects Versions: 1.10.6
>Reporter: Douglas Mendez
>Priority: Major
>
> We are using experimental REST API for automating Airflow Dag, triggering 
> Dags from one of our microservices. It would be great for us to have within 
> the API response the dagrun_id to keep track of it with ease.
>  
>  
> *Response example*
>  
> {code:java}
> // code placeholder
> {
>'execution_date': execution_date, 
>'message': message, 
>'dagrun_id': 'dagrun_id
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] aKumpan commented on issue #6514: AIRFLOW-5590: Add dagrun_id to the response of experimental API on trigger.

2019-11-07 Thread GitBox
aKumpan commented on issue #6514: AIRFLOW-5590: Add dagrun_id to the response 
of experimental API on trigger.
URL: https://github.com/apache/airflow/pull/6514#issuecomment-551365190
 
 
   I believe you are duplicating:
   https://github.com/apache/airflow/pull/6256


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #6519: [AIRFLOW-5869] BugFix: Creating DagRuns fails for Deserialized tasks …

2019-11-07 Thread GitBox
codecov-io commented on issue #6519: [AIRFLOW-5869] BugFix: Creating DagRuns 
fails for Deserialized tasks …
URL: https://github.com/apache/airflow/pull/6519#issuecomment-551353595
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=h1) 
Report
   > Merging 
[#6519](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/49f8be798a113af4ae26cad4ac2df1113d923539?src=pr=desc)
 will **decrease** coverage by `0.43%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6519/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6519  +/-   ##
   ==
   - Coverage   83.98%   83.54%   -0.44% 
   ==
 Files 635  635  
 Lines   3672236725   +3 
   ==
   - Hits3084030683 -157 
   - Misses   5882 6042 +160
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/serialization/serialized\_dag.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9zZXJpYWxpemF0aW9uL3NlcmlhbGl6ZWRfZGFnLnB5)
 | `96% <100%> (+0.16%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/executors/sequential\_executor.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvc2VxdWVudGlhbF9leGVjdXRvci5weQ==)
 | `47.61% <0%> (-52.39%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `70.14% <0%> (-28.36%)` | :arrow_down: |
   | 
[airflow/utils/log/colored\_log.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvY29sb3JlZF9sb2cucHk=)
 | `81.81% <0%> (-11.37%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `86.44% <0%> (-6.78%)` | :arrow_down: |
   | 
[airflow/executors/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvX19pbml0X18ucHk=)
 | `63.26% <0%> (-4.09%)` | :arrow_down: |
   | ... and [4 
more](https://codecov.io/gh/apache/airflow/pull/6519/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=footer). 
Last update 
[49f8be7...8c5e408](https://codecov.io/gh/apache/airflow/pull/6519?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5870) Allow infinite pool for tasks

2019-11-07 Thread Alex Guziel (Jira)
Alex Guziel created AIRFLOW-5870:


 Summary: Allow infinite pool for tasks
 Key: AIRFLOW-5870
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5870
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 1.10.6
Reporter: Alex Guziel
Assignee: Alex Guziel


Pools do not allow infinite sized pools. Infinite sized pools can make queries 
much cheaper 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] saguziel opened a new pull request #6520: [AIRFLOW-5870] Allow -1 for infinite pool size

2019-11-07 Thread GitBox
saguziel opened a new pull request #6520: [AIRFLOW-5870] Allow -1 for infinite 
pool size
URL: https://github.com/apache/airflow/pull/6520
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5870
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   Adds the ability to create pools with size=-1, allowing infinite task usage, 
and allowing the used_slots to return without a db query.
   
   To contextualize this change, we saw spiky DB queries since this has to 
query all RUNNING task instances, and each task instance that starts running 
needs to run this query, leading to an n^2 problem. 
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5870) Allow infinite pool for tasks

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969728#comment-16969728
 ] 

ASF GitHub Bot commented on AIRFLOW-5870:
-

saguziel commented on pull request #6520: [AIRFLOW-5870] Allow -1 for infinite 
pool size
URL: https://github.com/apache/airflow/pull/6520
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5870
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   Adds the ability to create pools with size=-1, allowing infinite task usage, 
and allowing the used_slots to return without a db query.
   
   To contextualize this change, we saw spiky DB queries since this has to 
query all RUNNING task instances, and each task instance that starts running 
needs to run this query, leading to an n^2 problem. 
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow infinite pool for tasks
> -
>
> Key: AIRFLOW-5870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5870
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.6
>Reporter: Alex Guziel
>Assignee: Alex Guziel
>Priority: Major
>
> Pools do not allow infinite sized pools. Infinite sized pools can make 
> queries much cheaper 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow-site] mik-laj opened a new pull request #142: Create indexes in an automagical way

2019-11-07 Thread GitBox
mik-laj opened a new pull request #142: Create indexes in an automagical way
URL: https://github.com/apache/airflow-site/pull/142
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj opened a new pull request #141: Add cleanup command

2019-11-07 Thread GitBox
mik-laj opened a new pull request #141: Add cleanup command
URL: https://github.com/apache/airflow-site/pull/141
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj opened a new pull request #140: Fix process killing by CTRL + C

2019-11-07 Thread GitBox
mik-laj opened a new pull request #140: Fix process killing by CTRL + C
URL: https://github.com/apache/airflow-site/pull/140
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5869) Creating DagRuns fails for Deserialized tasks with no start_date

2019-11-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5869:

Fix Version/s: 1.10.7

> Creating DagRuns fails for Deserialized tasks with no start_date
> 
>
> Key: AIRFLOW-5869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5869
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Priority: Major
> Fix For: 1.10.7
>
>
> Deserialized operators do not always have start_date set. 
> That, for instance, breaks triggering dags.
> See the code from DAG.create_dagrun():
> {code:python}
> run = DagRun(...)
> session.add(run)
> session.commit()
> run.dag = self
> run.verify_integrity(session=session) # this validation fails because 
> run assumes that all operators have start_date set
> run.refresh_from_db()
> {code}
> One of the optimisation 
> (https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
>  we did as part of DAG Serialization was to not store dates in tasks if they 
> have a matching date (start_date or end_date) in DAG. Unfortunately, when 
> triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.
> The fix is to add the start_date when deserializing the operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] codecov-io commented on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread GitBox
codecov-io commented on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda to 
/providers/aws [AIP-21]
URL: https://github.com/apache/airflow/pull/6518#issuecomment-551332733
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=h1) 
Report
   > Merging 
[#6518](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/49f8be798a113af4ae26cad4ac2df1113d923539?src=pr=desc)
 will **decrease** coverage by `0.86%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6518/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6518  +/-   ##
   ==
   - Coverage   83.98%   83.11%   -0.87% 
   ==
 Files 635  636   +1 
 Lines   3672236726   +4 
   ==
   - Hits3084030525 -315 
   - Misses   5882 6201 +319
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/hooks/aws\_lambda\_hook.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F3c19sYW1iZGFfaG9vay5weQ==)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[airflow/providers/aws/hooks/lambda\_function.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXdzL2hvb2tzL2xhbWJkYV9mdW5jdGlvbi5weQ==)
 | `100% <100%> (ø)` | |
   | 
[airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/generic\_transfer.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZ2VuZXJpY190cmFuc2Zlci5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | ... and [18 
more](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=footer). 
Last update 
[49f8be7...f60107d](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda 
to /providers/aws [AIP-21]
URL: https://github.com/apache/airflow/pull/6518#issuecomment-551332733
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=h1) 
Report
   > Merging 
[#6518](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/49f8be798a113af4ae26cad4ac2df1113d923539?src=pr=desc)
 will **decrease** coverage by `0.28%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6518/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6518  +/-   ##
   ==
   - Coverage   83.98%   83.69%   -0.29% 
   ==
 Files 635  636   +1 
 Lines   3672236726   +4 
   ==
   - Hits3084030737 -103 
   - Misses   5882 5989 +107
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/hooks/aws\_lambda\_hook.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F3c19sYW1iZGFfaG9vay5weQ==)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[airflow/providers/aws/hooks/lambda\_function.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXdzL2hvb2tzL2xhbWJkYV9mdW5jdGlvbi5weQ==)
 | `100% <100%> (ø)` | |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `70.14% <0%> (-28.36%)` | :arrow_down: |
   | 
[airflow/jobs/local\_task\_job.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2xvY2FsX3Rhc2tfam9iLnB5)
 | `90% <0%> (+5%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=footer). 
Last update 
[49f8be7...f60107d](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6518: [AIRFLOW-5782] Migrate AWS Lambda 
to /providers/aws [AIP-21]
URL: https://github.com/apache/airflow/pull/6518#issuecomment-551332733
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=h1) 
Report
   > Merging 
[#6518](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/49f8be798a113af4ae26cad4ac2df1113d923539?src=pr=desc)
 will **decrease** coverage by `0.37%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6518/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #6518  +/-   ##
   =
   - Coverage   83.98%   83.6%   -0.38% 
   =
 Files 635 636   +1 
 Lines   36722   36726   +4 
   =
   - Hits30840   30706 -134 
   - Misses   58826020 +138
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/hooks/aws\_lambda\_hook.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F3c19sYW1iZGFfaG9vay5weQ==)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[airflow/providers/aws/hooks/lambda\_function.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXdzL2hvb2tzL2xhbWJkYV9mdW5jdGlvbi5weQ==)
 | `100% <100%> (ø)` | |
   | 
[airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `70.14% <0%> (-28.36%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `91.52% <0%> (-1.7%)` | :arrow_down: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `89.83% <0%> (-1.7%)` | :arrow_down: |
   | ... and [4 
more](https://codecov.io/gh/apache/airflow/pull/6518/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=footer). 
Last update 
[49f8be7...f60107d](https://codecov.io/gh/apache/airflow/pull/6518?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5869) Creating DagRuns fails for Deserialized tasks with no start_date

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969678#comment-16969678
 ] 

ASF GitHub Bot commented on AIRFLOW-5869:
-

kaxil commented on pull request #6519: [AIRFLOW-5869] BugFix: Creating DagRuns 
fails for Deserialized tasks …
URL: https://github.com/apache/airflow/pull/6519
 
 
   …with no start_date
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
 - https://issues.apache.org/jira/browse/AIRFLOW-5869
   
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   Deserialized operators do not always have start_date set.
   
   That, for instance, breaks triggering dags.
   
   See the code from DAG.create_dagrun():
   ```
   run = DagRun(...)
   session.add(run)
   session.commit()
   
   run.dag = self
   run.verify_integrity(session=session) # this validation fails 
because run assumes that all operators have start_date set
   run.refresh_from_db()
   ```
   One of the optimisation 
(https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
 we did as part of DAG Serialization was to not store dates in tasks if they 
have a matching date (start_date or end_date) in DAG. Unfortunately, when 
triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.
   
   The fix is to add the start_date when deserializing the operator.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `test_deserialization_start_date`
   * `test_deserialization_end_date `
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Creating DagRuns fails for Deserialized tasks with no start_date
> 
>
> Key: AIRFLOW-5869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5869
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Priority: Major
>
> Deserialized operators do not always have start_date set. 
> That, for instance, breaks triggering dags.
> See the code from DAG.create_dagrun():
> {code:python}
> run = DagRun(...)
> session.add(run)
> session.commit()
> run.dag = self
> run.verify_integrity(session=session) # this validation fails because 
> run assumes that all operators have start_date set
> run.refresh_from_db()
> {code}
> One of the optimisation 
> (https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
>  we did as part of DAG Serialization was to not store dates in tasks if they 
> have a matching date (start_date or end_date) in DAG. Unfortunately, when 
> triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.
> The fix is to add the start_date when deserializing the operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil opened a new pull request #6519: [AIRFLOW-5869] BugFix: Creating DagRuns fails for Deserialized tasks …

2019-11-07 Thread GitBox
kaxil opened a new pull request #6519: [AIRFLOW-5869] BugFix: Creating DagRuns 
fails for Deserialized tasks …
URL: https://github.com/apache/airflow/pull/6519
 
 
   …with no start_date
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
 - https://issues.apache.org/jira/browse/AIRFLOW-5869
   
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   Deserialized operators do not always have start_date set.
   
   That, for instance, breaks triggering dags.
   
   See the code from DAG.create_dagrun():
   ```
   run = DagRun(...)
   session.add(run)
   session.commit()
   
   run.dag = self
   run.verify_integrity(session=session) # this validation fails 
because run assumes that all operators have start_date set
   run.refresh_from_db()
   ```
   One of the optimisation 
(https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
 we did as part of DAG Serialization was to not store dates in tasks if they 
have a matching date (start_date or end_date) in DAG. Unfortunately, when 
triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.
   
   The fix is to add the start_date when deserializing the operator.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `test_deserialization_start_date`
   * `test_deserialization_end_date `
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5869) Creating DagRuns fails for Deserialized tasks with no start_date

2019-11-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5869:

Summary: Creating DagRuns fails for Deserialized tasks with no start_date  
(was: Deserialized Operators have no start_date)

> Creating DagRuns fails for Deserialized tasks with no start_date
> 
>
> Key: AIRFLOW-5869
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5869
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Priority: Major
>
> Deserialized operators do not always have start_date set. 
> That, for instance, breaks triggering dags.
> See the code from DAG.create_dagrun():
> {code:python}
> run = DagRun(...)
> session.add(run)
> session.commit()
> run.dag = self
> run.verify_integrity(session=session) # this validation fails because 
> run assumes that all operators have start_date set
> run.refresh_from_db()
> {code}
> One of the optimisation 
> (https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
>  we did as part of DAG Serialization was to not store dates in tasks if they 
> have a matching date (start_date or end_date) in DAG. Unfortunately, when 
> triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.
> The fix is to add the start_date when deserializing the operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-5869) Deserialized Operators have no start_date

2019-11-07 Thread Kaxil Naik (Jira)
Kaxil Naik created AIRFLOW-5869:
---

 Summary: Deserialized Operators have no start_date
 Key: AIRFLOW-5869
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5869
 Project: Apache Airflow
  Issue Type: Bug
  Components: core
Affects Versions: 2.0.0
Reporter: Kaxil Naik


Deserialized operators do not always have start_date set. 

That, for instance, breaks triggering dags.

See the code from DAG.create_dagrun():

{code:python}
run = DagRun(...)
session.add(run)
session.commit()

run.dag = self
run.verify_integrity(session=session) # this validation fails because 
run assumes that all operators have start_date set
run.refresh_from_db()
{code}

One of the optimisation 
(https://github.com/coufon/airflow/commit/b5ee858f44f55818c589cf2c8bf3866fa5d50e30)
 we did as part of DAG Serialization was to not store dates in tasks if they 
have a matching date (start_date or end_date) in DAG. Unfortunately, when 
triggering DAG containing such tasks, it fails on DagRun.run.verify_integrity.

The fix is to add the start_date when deserializing the operator.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] ratb3rt commented on issue #6502: [AIRFLOW-5786] Migrate AWS SNS to /providers/aws

2019-11-07 Thread GitBox
ratb3rt commented on issue #6502: [AIRFLOW-5786] Migrate AWS SNS to 
/providers/aws
URL: https://github.com/apache/airflow/pull/6502#issuecomment-551319400
 
 
   Migrated to new paths


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5867) [--debug] unit_test_mode configuration interpreted as str instead of bool

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969636#comment-16969636
 ] 

ASF GitHub Bot commented on AIRFLOW-5867:
-

rvanasa commented on pull request #6517: [AIRFLOW-5867] Fix reloading when 
using webserver --debug command
URL: https://github.com/apache/airflow/pull/6517
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [--debug] unit_test_mode configuration interpreted as str instead of bool
> -
>
> Key: AIRFLOW-5867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.10.6
>Reporter: Ryan Vandersmith
>Assignee: Ryan Vandersmith
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 1.10.7
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Relevant StackOverflow question:
> [https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web]
>  
> The `--debug` CLI argument appears to have unintended functionality:
> [https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
> {code:python}
> if args.debug:
> print(
> "Starting the web server on port {0} and host {1}.".format(
> args.port, args.hostname))
> app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
> app.run(debug=True, use_reloader=not app.config['TESTING'],
> port=args.port, host=args.hostname,
> ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
> None)
> {code}
> Because `testing` and consequently `app.config['TESTING']` are provided as a 
> `str` object, the reloader is only enabled when the `unit_test_mode` property 
> is an empty string. 
> A very clean fix exists (line 979):
> {code:java}
> app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
> {code}
> I will submit a pull request with the above change immediately after opening 
> this issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5867) [--debug] unit_test_mode configuration interpreted as str instead of bool

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969637#comment-16969637
 ] 

ASF GitHub Bot commented on AIRFLOW-5867:
-

rvanasa commented on pull request #6517: [AIRFLOW-5867] Fix reloading when 
using webserver --debug command
URL: https://github.com/apache/airflow/pull/6517
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5867
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The `--debug` CLI argument appears to have unintended functionality:
   
   https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980
   
   Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 
   
   Relevant StackOverflow question:
   
https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   This bugfix restores intended functionality. 
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [--debug] unit_test_mode configuration interpreted as str instead of bool
> -
>
> Key: AIRFLOW-5867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.10.6
>Reporter: Ryan Vandersmith
>Assignee: Ryan Vandersmith
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 1.10.7
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Relevant StackOverflow question:
> [https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web]
>  
> The `--debug` CLI argument appears to have unintended functionality:
> [https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
> {code:python}
> if args.debug:
> print(
> "Starting the web server on port {0} and host {1}.".format(
> args.port, args.hostname))
> app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
> app.run(debug=True, use_reloader=not app.config['TESTING'],
> port=args.port, host=args.hostname,
> ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
> None)
> {code}
> Because `testing` and consequently `app.config['TESTING']` are provided as a 
> `str` object, the reloader is only enabled when the 

[GitHub] [airflow] rvanasa opened a new pull request #6517: [AIRFLOW-5867] Fix reloading when using webserver --debug command

2019-11-07 Thread GitBox
rvanasa opened a new pull request #6517: [AIRFLOW-5867] Fix reloading when 
using webserver --debug command
URL: https://github.com/apache/airflow/pull/6517
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5867
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The `--debug` CLI argument appears to have unintended functionality:
   
   https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980
   
   Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 
   
   Relevant StackOverflow question:
   
https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   This bugfix restores intended functionality. 
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] rvanasa closed pull request #6517: [AIRFLOW-5867] Fix reloading when using webserver --debug command

2019-11-07 Thread GitBox
rvanasa closed pull request #6517: [AIRFLOW-5867] Fix reloading when using 
webserver --debug command
URL: https://github.com/apache/airflow/pull/6517
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5782) Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969627#comment-16969627
 ] 

ASF GitHub Bot commented on AIRFLOW-5782:
-

shcherbin commented on pull request #6518: [AIRFLOW-5782] Migrate AWS Lambda to 
/providers/aws [AIP-21]
URL: https://github.com/apache/airflow/pull/6518
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-5782) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5782
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR migrates AWS lambda hook to /providers/aws/hooks
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Migrate AWS Lambda to /providers/aws [AIP-21]
> -
>
> Key: AIRFLOW-5782
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5782
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 2.0.0
>Reporter: Bas Harenslak
>Assignee: Cyril Shcherbin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] shcherbin opened a new pull request #6518: [AIRFLOW-5782] Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread GitBox
shcherbin opened a new pull request #6518: [AIRFLOW-5782] Migrate AWS Lambda to 
/providers/aws [AIP-21]
URL: https://github.com/apache/airflow/pull/6518
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-5782) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5782
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR migrates AWS lambda hook to /providers/aws/hooks
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5868) Should not deepcopy shallow_copy attributes

2019-11-07 Thread Ping Zhang (Jira)
Ping Zhang created AIRFLOW-5868:
---

 Summary: Should not deepcopy shallow_copy attributes
 Key: AIRFLOW-5868
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5868
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: 1.10.4
Reporter: Ping Zhang
Assignee: Ping Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-5782) Migrate AWS Lambda to /providers/aws [AIP-21]

2019-11-07 Thread Cyril Shcherbin (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyril Shcherbin reassigned AIRFLOW-5782:


Assignee: Cyril Shcherbin

> Migrate AWS Lambda to /providers/aws [AIP-21]
> -
>
> Key: AIRFLOW-5782
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5782
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 2.0.0
>Reporter: Bas Harenslak
>Assignee: Cyril Shcherbin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4547) Negative priority_weight should not be permitted

2019-11-07 Thread Alex Abraham (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969606#comment-16969606
 ] 

Alex Abraham commented on AIRFLOW-4547:
---

I actually have a use case where this is the desired behavior:

I have 50 tasks which send commands to a spark server to run a job on some 
data. Before I send the command to the server though, I want to make sure that 
the data exists. Therefore I created tasks before each spark task to check if 
the files exist and skip the spark task if they don't. The problem I was 
running into is that airflow made sure every single exists operator finished 
before submitting the first spark job. If you set the exists operator to -1 
priority_weight, the spark jobs will execute since it will be a higher priority 
:)

> Negative priority_weight should not be permitted
> 
>
> Key: AIRFLOW-4547
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4547
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.3
>Reporter: Teresa Martyny
>Priority: Major
> Fix For: 2.0.0
>
>
> Airflow allows a dev to assign a negative priority_weight to a task. However, 
> the Airflow code does math to determine the priority_weight on its own in 
> models.py#priority_weight_total on line 2796
> This makes the final priority_weight wrong in the end. Airflow should raise 
> an error if an operator has priority_weight assigned to a negative number at 
> any point. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5817) Improve BigQuery operators idempotency

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969523#comment-16969523
 ] 

ASF GitHub Bot commented on AIRFLOW-5817:
-

kaxil commented on pull request #6470: [AIRFLOW-5817] Improve BigQuery 
operators idempotency
URL: https://github.com/apache/airflow/pull/6470
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve BigQuery operators idempotency
> --
>
> Key: AIRFLOW-5817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5817
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil merged pull request #6470: [AIRFLOW-5817] Improve BigQuery operators idempotency

2019-11-07 Thread GitBox
kaxil merged pull request #6470: [AIRFLOW-5817] Improve BigQuery operators 
idempotency
URL: https://github.com/apache/airflow/pull/6470
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5817) Improve BigQuery operators idempotency

2019-11-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969524#comment-16969524
 ] 

ASF subversion and git services commented on AIRFLOW-5817:
--

Commit 49f8be798a113af4ae26cad4ac2df1113d923539 in airflow's branch 
refs/heads/master from Tomek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=49f8be7 ]

[AIRFLOW-5817] Improve BigQuery operators idempotency (#6470)



> Improve BigQuery operators idempotency
> --
>
> Key: AIRFLOW-5817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5817
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-5817) Improve BigQuery operators idempotency

2019-11-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5817.
-
Fix Version/s: 2.0.0
   Resolution: Fixed

> Improve BigQuery operators idempotency
> --
>
> Key: AIRFLOW-5817
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5817
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5867) [--debug] unit_test_mode configuration interpreted as str instead of bool

2019-11-07 Thread Ryan Vandersmith (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Vandersmith updated AIRFLOW-5867:
--
Description: 
Relevant StackOverflow question:

[https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web]

 

The `--debug` CLI argument appears to have unintended functionality:

[https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
{code:python}
if args.debug:
print(
"Starting the web server on port {0} and host {1}.".format(
args.port, args.hostname))
app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
app.run(debug=True, use_reloader=not app.config['TESTING'],
port=args.port, host=args.hostname,
ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
None)
{code}
Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 

A very clean fix exists (line 979):
{code:java}
app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
{code}
I will submit a pull request with the above change immediately after opening 
this issue. 

  was:
The `--debug` CLI argument appears to have unintended functionality:

[https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
{code:python}
if args.debug:
print(
"Starting the web server on port {0} and host {1}.".format(
args.port, args.hostname))
app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
app.run(debug=True, use_reloader=not app.config['TESTING'],
port=args.port, host=args.hostname,
ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
None)
{code}
Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 

A very clean fix exists (line 979):
{code:java}
app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
{code}
I will submit a pull request with the above change immediately after opening 
this issue. 


> [--debug] unit_test_mode configuration interpreted as str instead of bool
> -
>
> Key: AIRFLOW-5867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.10.6
>Reporter: Ryan Vandersmith
>Assignee: Ryan Vandersmith
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 1.10.7
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> Relevant StackOverflow question:
> [https://stackoverflow.com/questions/58366469/what-is-an-efficient-way-to-develop-airflow-plugins-without-restarting-the-web]
>  
> The `--debug` CLI argument appears to have unintended functionality:
> [https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
> {code:python}
> if args.debug:
> print(
> "Starting the web server on port {0} and host {1}.".format(
> args.port, args.hostname))
> app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
> app.run(debug=True, use_reloader=not app.config['TESTING'],
> port=args.port, host=args.hostname,
> ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
> None)
> {code}
> Because `testing` and consequently `app.config['TESTING']` are provided as a 
> `str` object, the reloader is only enabled when the `unit_test_mode` property 
> is an empty string. 
> A very clean fix exists (line 979):
> {code:java}
> app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
> {code}
> I will submit a pull request with the above change immediately after opening 
> this issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343833274
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+---
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how to 

[jira] [Commented] (AIRFLOW-5867) [--debug] unit_test_mode configuration interpreted as str instead of bool

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969521#comment-16969521
 ] 

ASF GitHub Bot commented on AIRFLOW-5867:
-

rvanasa commented on pull request #6517: [AIRFLOW-5867] Fix webserver 
unit_test_mode data type
URL: https://github.com/apache/airflow/pull/6517
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5867
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The `--debug` CLI argument appears to have unintended functionality:
   
   https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980
   
   Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   This bugfix restores intended functionality. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [--debug] unit_test_mode configuration interpreted as str instead of bool
> -
>
> Key: AIRFLOW-5867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5867
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.10.6
>Reporter: Ryan Vandersmith
>Assignee: Ryan Vandersmith
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 1.10.7
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The `--debug` CLI argument appears to have unintended functionality:
> [https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
> {code:python}
> if args.debug:
> print(
> "Starting the web server on port {0} and host {1}.".format(
> args.port, args.hostname))
> app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
> app.run(debug=True, use_reloader=not app.config['TESTING'],
> port=args.port, host=args.hostname,
> ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
> None)
> {code}
> Because `testing` and consequently `app.config['TESTING']` are provided as a 
> `str` object, the reloader is only enabled when the `unit_test_mode` property 
> is an empty string. 
> A very clean fix exists (line 979):
> {code:java}
> app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
> {code}
> I will submit a pull request with the above change immediately after opening 
> this issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343831995
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+---
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
 
 Review comment:
   ```suggestion
   This test should ensure that your DAG does not contain a piece of code that 
raises error while loading.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, 

[GitHub] [airflow] rvanasa opened a new pull request #6517: [AIRFLOW-5867] Fix webserver unit_test_mode data type

2019-11-07 Thread GitBox
rvanasa opened a new pull request #6517: [AIRFLOW-5867] Fix webserver 
unit_test_mode data type
URL: https://github.com/apache/airflow/pull/6517
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5867
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The `--debug` CLI argument appears to have unintended functionality:
   
   https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980
   
   Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   This bugfix restores intended functionality. 
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343830368
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
 
 Review comment:
   Something missing ??
   
   ```suggestion
   Let's look at the steps you need to follow to avoid these pitfalls.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5867) [--debug] unit_test_mode configuration interpreted as str instead of bool

2019-11-07 Thread Ryan Vandersmith (Jira)
Ryan Vandersmith created AIRFLOW-5867:
-

 Summary: [--debug] unit_test_mode configuration interpreted as str 
instead of bool
 Key: AIRFLOW-5867
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5867
 Project: Apache Airflow
  Issue Type: Bug
  Components: cli
Affects Versions: 1.10.6
Reporter: Ryan Vandersmith
Assignee: Ryan Vandersmith
 Fix For: 1.10.7


The `--debug` CLI argument appears to have unintended functionality:

[https://github.com/apache/airflow/blob/master/airflow/bin/cli.py#L980]
{code:python}
if args.debug:
print(
"Starting the web server on port {0} and host {1}.".format(
args.port, args.hostname))
app, _ = create_app(None, testing=conf.get('core', 'unit_test_mode'))
app.run(debug=True, use_reloader=not app.config['TESTING'],
port=args.port, host=args.hostname,
ssl_context=(ssl_cert, ssl_key) if ssl_cert and ssl_key else 
None)
{code}
Because `testing` and consequently `app.config['TESTING']` are provided as a 
`str` object, the reloader is only enabled when the `unit_test_mode` property 
is an empty string. 

A very clean fix exists (line 979):
{code:java}
app, _ = create_app(None, testing=conf.getboolean('core', 'unit_test_mode'))
{code}
I will submit a pull request with the above change immediately after opening 
this issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod 
to be configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-537654940
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=h1) 
Report
   > Merging 
[#6230](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/07e80e5cf9e0291a0684530bf897ea6235f4f17f?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `92.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6230/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6230  +/-   ##
   ==
   - Coverage   83.93%   83.92%   -0.01% 
   ==
 Files 635  635  
 Lines   3671636786  +70 
   ==
   + Hits3081630872  +56 
   - Misses   5900 5914  +14
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/kubernetes/secret.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3NlY3JldC5weQ==)
 | `93.61% <100%> (ø)` | :arrow_up: |
   | 
[airflow/kubernetes/worker\_configuration.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3dvcmtlcl9jb25maWd1cmF0aW9uLnB5)
 | `96.42% <100%> (+0.02%)` | :arrow_up: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `98.52% <100%> (+0.02%)` | :arrow_up: |
   | 
[airflow/kubernetes/pod\_generator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9nZW5lcmF0b3IucHk=)
 | `92.96% <91.39%> (-1.74%)` | :arrow_down: |
   | 
[airflow/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMva3ViZXJuZXRlc19leGVjdXRvci5weQ==)
 | `58.37% <91.66%> (-0.62%)` | :arrow_down: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `58.15% <0%> (+0.16%)` | :arrow_up: |
   | 
[airflow/hooks/postgres\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wb3N0Z3Jlc19ob29rLnB5)
 | `92.85% <0%> (ø)` | :arrow_up: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `89.83% <0%> (ø)` | :arrow_up: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `91.52% <0%> (ø)` | :arrow_up: |
   | ... and [1 
more](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=footer). 
Last update 
[07e80e5...68cd4dd](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod 
to be configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-537654940
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=h1) 
Report
   > Merging 
[#6230](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/07e80e5cf9e0291a0684530bf897ea6235f4f17f?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `92.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6230/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6230  +/-   ##
   ==
   - Coverage   83.93%   83.92%   -0.01% 
   ==
 Files 635  635  
 Lines   3671636786  +70 
   ==
   + Hits3081630872  +56 
   - Misses   5900 5914  +14
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/kubernetes/secret.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3NlY3JldC5weQ==)
 | `93.61% <100%> (ø)` | :arrow_up: |
   | 
[airflow/kubernetes/worker\_configuration.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3dvcmtlcl9jb25maWd1cmF0aW9uLnB5)
 | `96.42% <100%> (+0.02%)` | :arrow_up: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `98.52% <100%> (+0.02%)` | :arrow_up: |
   | 
[airflow/kubernetes/pod\_generator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9nZW5lcmF0b3IucHk=)
 | `92.96% <91.39%> (-1.74%)` | :arrow_down: |
   | 
[airflow/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMva3ViZXJuZXRlc19leGVjdXRvci5weQ==)
 | `58.37% <91.66%> (-0.62%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `89.9% <0%> (-1.53%)` | :arrow_down: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `58.15% <0%> (+0.16%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=footer). 
Last update 
[07e80e5...68cd4dd](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod 
to be configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-537654940
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=h1) 
Report
   > Merging 
[#6230](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/07e80e5cf9e0291a0684530bf897ea6235f4f17f?src=pr=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `92.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6230/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #6230  +/-   ##
   =
   + Coverage   83.93% 84%   +0.07% 
   =
 Files 635 635  
 Lines   36716   36786  +70 
   =
   + Hits30816   30903  +87 
   + Misses   59005883  -17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/kubernetes/secret.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3NlY3JldC5weQ==)
 | `93.61% <100%> (ø)` | :arrow_up: |
   | 
[airflow/kubernetes/worker\_configuration.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3dvcmtlcl9jb25maWd1cmF0aW9uLnB5)
 | `96.42% <100%> (+0.02%)` | :arrow_up: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `98.52% <100%> (+0.02%)` | :arrow_up: |
   | 
[airflow/kubernetes/pod\_generator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9nZW5lcmF0b3IucHk=)
 | `92.96% <91.39%> (-1.74%)` | :arrow_down: |
   | 
[airflow/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMva3ViZXJuZXRlc19leGVjdXRvci5weQ==)
 | `58.37% <91.66%> (-0.62%)` | :arrow_down: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `58.15% <0%> (+0.16%)` | :arrow_up: |
   | 
[airflow/hooks/postgres\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wb3N0Z3Jlc19ob29rLnB5)
 | `94.28% <0%> (+1.42%)` | :arrow_up: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `91.52% <0%> (+1.69%)` | :arrow_up: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `93.22% <0%> (+1.69%)` | :arrow_up: |
   | ... and [1 
more](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=footer). 
Last update 
[07e80e5...68cd4dd](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod to be configured from JSON/YAML file

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6230: [AIRFLOW-5413] Allow K8S worker pod 
to be configured from JSON/YAML file
URL: https://github.com/apache/airflow/pull/6230#issuecomment-537654940
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=h1) 
Report
   > Merging 
[#6230](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/07e80e5cf9e0291a0684530bf897ea6235f4f17f?src=pr=desc)
 will **increase** coverage by `0.07%`.
   > The diff coverage is `92.17%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6230/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ##   master   #6230  +/-   ##
   =
   + Coverage   83.93% 84%   +0.07% 
   =
 Files 635 635  
 Lines   36716   36786  +70 
   =
   + Hits30816   30903  +87 
   + Misses   59005883  -17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/kubernetes/secret.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3NlY3JldC5weQ==)
 | `93.61% <100%> (ø)` | :arrow_up: |
   | 
[airflow/kubernetes/worker\_configuration.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3dvcmtlcl9jb25maWd1cmF0aW9uLnB5)
 | `96.42% <100%> (+0.02%)` | :arrow_up: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `98.52% <100%> (+0.02%)` | :arrow_up: |
   | 
[airflow/kubernetes/pod\_generator.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9nZW5lcmF0b3IucHk=)
 | `92.96% <91.39%> (-1.74%)` | :arrow_down: |
   | 
[airflow/executors/kubernetes\_executor.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMva3ViZXJuZXRlc19leGVjdXRvci5weQ==)
 | `58.37% <91.66%> (-0.62%)` | :arrow_down: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `58.15% <0%> (+0.16%)` | :arrow_up: |
   | 
[airflow/hooks/postgres\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wb3N0Z3Jlc19ob29rLnB5)
 | `94.28% <0%> (+1.42%)` | :arrow_up: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `91.52% <0%> (+1.69%)` | :arrow_up: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `93.22% <0%> (+1.69%)` | :arrow_up: |
   | ... and [1 
more](https://codecov.io/gh/apache/airflow/pull/6230/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=footer). 
Last update 
[07e80e5...68cd4dd](https://codecov.io/gh/apache/airflow/pull/6230?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5819) AWSBatchOperator has invalid default value for array_properties

2019-11-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969513#comment-16969513
 ] 

ASF subversion and git services commented on AIRFLOW-5819:
--

Commit 63ba1c58423e0f060f1d253fa4ea1504760b5120 in airflow's branch 
refs/heads/master from Domantas Jurkus
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=63ba1c5 ]

[AIRFLOW-5819] Update AWSBatchOperator default value (#6473)



> AWSBatchOperator has invalid default value for array_properties
> ---
>
> Key: AIRFLOW-5819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5819
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.6
>Reporter: Dom
>Priority: Major
> Fix For: 1.10.7
>
>
> We upgraded our Airflow version from 1.10.3 to 1.10.6 and saw our 
> AWSBatchOperator throwing the following error (easily fixed by changing the 
> default param from None to {}):
> {code:java}
>  [2019-10-30 13:10:15.392] INFO:airflow.task.operators.execute:93 Running AWS 
> Batch Job - Job definition: batch-jobdef-1 - on queue AwsBatch-batch-queue
> [2019-10-30 13:10:15.393] INFO:airflow.task.operators.execute:95 
> AWSBatchOperator overrides: {'command': ['--start_datetime', '2019-10-29']}
> [2019-10-30 13:10:15.433] INFO:airflow.task.operators.execute:121 AWS Batch 
> Job has failed executed
> [2019-10-30 13:10:15.445] ERROR:airflow.task.handle_failure:1058 Parameter 
> validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 108, in execute
> containerOverrides=self.overrides)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, 
> in _api_call
> return self._make_api_call(operation_name, kwargs)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 634, 
> in _make_api_call
> api_params, operation_model, context=request_context)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 682, 
> in _convert_to_request_dict
> api_params, operation_model)
>   File "/usr/local/lib/python3.7/site-packages/botocore/validate.py", line 
> 297, in serialize_to_request
> raise ParamValidationError(report=report.generate_report())
> botocore.exceptions.ParamValidationError: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: During handling of the above 
> exception, another exception occurred:Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
> 930, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 122, in execute
> raise AirflowException(e)
> airflow.exceptions.AirflowException: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> [2019-10-30 13:10:15.447] INFO:airflow.task.handle_failure:1087 All retries 
> failed; marking task as FAILED{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil merged pull request #6473: [AIRFLOW-5819] Update AWSBatchOperator default value

2019-11-07 Thread GitBox
kaxil merged pull request #6473: [AIRFLOW-5819] Update AWSBatchOperator default 
value
URL: https://github.com/apache/airflow/pull/6473
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-5819) AWSBatchOperator has invalid default value for array_properties

2019-11-07 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-5819.
-
Fix Version/s: 1.10.7
   Resolution: Fixed

> AWSBatchOperator has invalid default value for array_properties
> ---
>
> Key: AIRFLOW-5819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5819
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.6
>Reporter: Dom
>Priority: Major
> Fix For: 1.10.7
>
>
> We upgraded our Airflow version from 1.10.3 to 1.10.6 and saw our 
> AWSBatchOperator throwing the following error (easily fixed by changing the 
> default param from None to {}):
> {code:java}
>  [2019-10-30 13:10:15.392] INFO:airflow.task.operators.execute:93 Running AWS 
> Batch Job - Job definition: batch-jobdef-1 - on queue AwsBatch-batch-queue
> [2019-10-30 13:10:15.393] INFO:airflow.task.operators.execute:95 
> AWSBatchOperator overrides: {'command': ['--start_datetime', '2019-10-29']}
> [2019-10-30 13:10:15.433] INFO:airflow.task.operators.execute:121 AWS Batch 
> Job has failed executed
> [2019-10-30 13:10:15.445] ERROR:airflow.task.handle_failure:1058 Parameter 
> validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 108, in execute
> containerOverrides=self.overrides)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, 
> in _api_call
> return self._make_api_call(operation_name, kwargs)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 634, 
> in _make_api_call
> api_params, operation_model, context=request_context)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 682, 
> in _convert_to_request_dict
> api_params, operation_model)
>   File "/usr/local/lib/python3.7/site-packages/botocore/validate.py", line 
> 297, in serialize_to_request
> raise ParamValidationError(report=report.generate_report())
> botocore.exceptions.ParamValidationError: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: During handling of the above 
> exception, another exception occurred:Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
> 930, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 122, in execute
> raise AirflowException(e)
> airflow.exceptions.AirflowException: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> [2019-10-30 13:10:15.447] INFO:airflow.task.handle_failure:1087 All retries 
> failed; marking task as FAILED{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5819) AWSBatchOperator has invalid default value for array_properties

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969512#comment-16969512
 ] 

ASF GitHub Bot commented on AIRFLOW-5819:
-

kaxil commented on pull request #6473: [AIRFLOW-5819] Update AWSBatchOperator 
default value
URL: https://github.com/apache/airflow/pull/6473
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AWSBatchOperator has invalid default value for array_properties
> ---
>
> Key: AIRFLOW-5819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5819
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.6
>Reporter: Dom
>Priority: Major
>
> We upgraded our Airflow version from 1.10.3 to 1.10.6 and saw our 
> AWSBatchOperator throwing the following error (easily fixed by changing the 
> default param from None to {}):
> {code:java}
>  [2019-10-30 13:10:15.392] INFO:airflow.task.operators.execute:93 Running AWS 
> Batch Job - Job definition: batch-jobdef-1 - on queue AwsBatch-batch-queue
> [2019-10-30 13:10:15.393] INFO:airflow.task.operators.execute:95 
> AWSBatchOperator overrides: {'command': ['--start_datetime', '2019-10-29']}
> [2019-10-30 13:10:15.433] INFO:airflow.task.operators.execute:121 AWS Batch 
> Job has failed executed
> [2019-10-30 13:10:15.445] ERROR:airflow.task.handle_failure:1058 Parameter 
> validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 108, in execute
> containerOverrides=self.overrides)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, 
> in _api_call
> return self._make_api_call(operation_name, kwargs)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 634, 
> in _make_api_call
> api_params, operation_model, context=request_context)
>   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 682, 
> in _convert_to_request_dict
> api_params, operation_model)
>   File "/usr/local/lib/python3.7/site-packages/botocore/validate.py", line 
> 297, in serialize_to_request
> raise ParamValidationError(report=report.generate_report())
> botocore.exceptions.ParamValidationError: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: During handling of the above 
> exception, another exception occurred:Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
> 930, in _run_raw_task
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/awsbatch_operator.py",
>  line 122, in execute
> raise AirflowException(e)
> airflow.exceptions.AirflowException: Parameter validation failed:
> Invalid type for parameter arrayProperties, value: None, type:  'NoneType'>, valid types: 
> [2019-10-30 13:10:15.447] INFO:airflow.task.handle_failure:1087 All retries 
> failed; marking task as FAILED{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] TobKed commented on issue #6511: [AIRFLOW-XXX] Improve the PubSub documentation

2019-11-07 Thread GitBox
TobKed commented on issue #6511: [AIRFLOW-XXX] Improve the PubSub documentation
URL: https://github.com/apache/airflow/pull/6511#issuecomment-551215099
 
 
   > In case you fix a typo in the documentation you can prepend your commit 
with [AIRFLOW-XXX], code changes always need a Jira issue. || What did you mean 
by prepend was it supposed to be pretend?
   
   It means that commit should have `[AIRFLOW-XXX]`  at the beginning of 
message. I think word `prepend` is correct here. Am I right @mschickensoup ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Description: 
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

Scheduler/webserver pids have remained up the entire time, no ec2 autoheals 
happened

 

 

scheduler log shows:

[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 29/45 running and 
queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 30/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 31/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 32/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 33/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 34/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 35/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 37/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 38/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 39/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 40/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 41/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 42/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 43/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 44/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,437] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:42,193] 

[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Description: 
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

Scheduler/webserver pids have remained up the entire time, no ec2 autoheals 
happened

 

 

scheduler log shows:

[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 29/45 running and 
queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 30/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 31/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 32/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 33/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 34/45 running 
and queued tasks
 [2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 35/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 37/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 38/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 39/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 40/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 41/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 42/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 43/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 44/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running 
and queued tasks
 [2019-11-07 12:25:21,437] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
 [2019-11-07 12:25:42,193] 

[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Description: 
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

Scheduler/webserver pids have remained up the entire time, no ec2 autoheals 
happened

 

 

scheduler log shows:

[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 29/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 30/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 31/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 32/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 33/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 34/45 running and 
queued tasks
[2019-11-07 12:25:18,287] \{jobs.py:1185} INFO - DAG daga has 35/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,435] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 36/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 37/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 38/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 39/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 40/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 41/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 42/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 43/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 44/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running and 
queued tasks
[2019-11-07 12:25:21,436] \{jobs.py:1185} INFO - DAG daga has 45/45 running and 
queued tasks
[2019-11-07 12:25:21,437] \{jobs.py:1185} INFO - DAG dage has 0/45 running and 
queued tasks
[2019-11-07 12:25:42,193] \{jobs.py:1185} INFO - DAG dagb has 0/45 

[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Description: 
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

Scheduler/webserver pids have remained up the entire time, no ec2 autoheals 
happened

 

  was:
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

 


> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: EC2's `ps -ef pipe wc -l` count.png, Mysql 
> queuedepth.png, mysql cpu_util.png, mysql dbconnections.png, mysql read 
> latency.png, mysql write IOPS.png, mysql write latency.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried 

[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: EC2's `ps -ef pipe wc -l` count.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: EC2's `ps -ef pipe wc -l` count.png, Mysql 
> queuedepth.png, mysql cpu_util.png, mysql dbconnections.png, mysql read 
> latency.png, mysql write IOPS.png, mysql write latency.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  
> You can see notable change in the graphs just after 4th november midnight 
> (that is when the issue started). Around 30 dagruns (yes there are diff 
> execution_dates running for same dagid at same time) start around 11pm each 
> night.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: mysql cpu_util.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql cpu_util.png, mysql 
> dbconnections.png, mysql read latency.png, mysql write IOPS.png, mysql write 
> latency.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  
> You can see notable change in the graphs just after 4th november midnight 
> (that is when the issue started). Around 30 dagruns (yes there are diff 
> execution_dates running for same dagid at same time) start around 11pm each 
> night.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: mysql write latency.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql dbconnections.png, mysql 
> read latency.png, mysql write IOPS.png, mysql write latency.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  
> You can see notable change in the graphs just after 4th november midnight 
> (that is when the issue started). Around 30 dagruns (yes there are diff 
> execution_dates running for same dagid at same time) start around 11pm each 
> night.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: mysql read latency.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql dbconnections.png, mysql 
> read latency.png, mysql write IOPS.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  
> You can see notable change in the graphs just after 4th november midnight 
> (that is when the issue started). Around 30 dagruns (yes there are diff 
> execution_dates running for same dagid at same time) start around 11pm each 
> night.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Description: 
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 

You can see notable change in the graphs just after 4th november midnight (that 
is when the issue started). Around 30 dagruns (yes there are diff 
execution_dates running for same dagid at same time) start around 11pm each 
night.

 

 

  was:
mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 


> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql dbconnections.png, mysql 
> write IOPS.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  
> You can see notable change in the graphs just after 4th november midnight 
> (that is when the issue started). Around 30 dagruns (yes there are diff 
> execution_dates running for same 

[GitHub] [airflow] TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add SFTPToGoogleCloudStorageOperator

2019-11-07 Thread GitBox
TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add 
SFTPToGoogleCloudStorageOperator
URL: https://github.com/apache/airflow/pull/6393#discussion_r343803655
 
 

 ##
 File path: airflow/operators/sftp_to_gcs.py
 ##
 @@ -0,0 +1,178 @@
+#
 
 Review comment:
   I've moved it to `providers/google/cloud`. All related files as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: mysql dbconnections.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql dbconnections.png, mysql 
> write IOPS.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: mysql write IOPS.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png, mysql write IOPS.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] codecov-io edited a comment on issue #6473: [AIRFLOW-5819] Update AWSBatchOperator default value

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6473: [AIRFLOW-5819] Update 
AWSBatchOperator default value
URL: https://github.com/apache/airflow/pull/6473#issuecomment-548057865
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=h1) 
Report
   > Merging 
[#6473](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/58060d3d64a47cd348560a0fb7f821fde2eb08c1?src=pr=desc)
 will **decrease** coverage by `0.2%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6473/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6473  +/-   ##
   ==
   - Coverage   83.82%   83.62%   -0.21% 
   ==
 Files 635  635  
 Lines   3665736716  +59 
   ==
   - Hits3072730702  -25 
   - Misses   5930 6014  +84
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/operators/awsbatch\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9hd3NiYXRjaF9vcGVyYXRvci5weQ==)
 | `78.4% <100%> (ø)` | :arrow_up: |
   | 
[airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `70.14% <0%> (-28.36%)` | :arrow_down: |
   | 
[airflow/hooks/postgres\_hook.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wb3N0Z3Jlc19ob29rLnB5)
 | `92.85% <0%> (-3.64%)` | :arrow_down: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `94.89% <0%> (-1.88%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `89.9% <0%> (-1.53%)` | :arrow_down: |
   | ... and [34 
more](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=footer). 
Last update 
[58060d3...3eb4cb3](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-5866:
--
Attachment: Mysql queuedepth.png

> Task_instance table too large causing issues?
> -
>
> Key: AIRFLOW-5866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
> Attachments: Mysql queuedepth.png
>
>
> mysql rds metastore - db.m5.large instance, 5.7.26 version
>  
> task_instance table has 2,848,160 rows
> dag_run table has 22768 rows
> dag table has 23 rows
> log table has 17,916,891 rows
>  
> airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
> parallelism set to 45 (ie max 45 tasks at once). Just using externally 
> triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
> dynamic dags
>  
> Everything was fine until yesterday, around 300 dag runs every day. Now today 
> these issues appear all of a sudden (no code change, environment change.etc). 
> I suspect the task_instance table has gotten too big and causing scheduler 
> and mysql issues.
>  
> 1.
> 'Recent tasks' are showing blank on the web ui home page. 
> admin/airflow/task_stats fails to display with 504 error after few mins but 
> dag_stats endpoint shows dags are in running state
>  
> 2.
> dag_runs are stuck in running state > 20 hrs, seems no new tasks are being 
> run (they are stuck in scheduled/queued state)
>  
> I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
> would then start finishing but then after few hours got into same situation 
> as points 1/2 above. I believe certain dag ids (with many tasks) are hitting 
> the issue, will know m
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] codecov-io edited a comment on issue #6473: [AIRFLOW-5819] Update AWSBatchOperator default value

2019-11-07 Thread GitBox
codecov-io edited a comment on issue #6473: [AIRFLOW-5819] Update 
AWSBatchOperator default value
URL: https://github.com/apache/airflow/pull/6473#issuecomment-548057865
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=h1) 
Report
   > Merging 
[#6473](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/58060d3d64a47cd348560a0fb7f821fde2eb08c1?src=pr=desc)
 will **decrease** coverage by `0.2%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6473/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6473  +/-   ##
   ==
   - Coverage   83.82%   83.62%   -0.21% 
   ==
 Files 635  635  
 Lines   3665736716  +59 
   ==
   - Hits3072730702  -25 
   - Misses   5930 6014  +84
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/operators/awsbatch\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9hd3NiYXRjaF9vcGVyYXRvci5weQ==)
 | `78.4% <100%> (ø)` | :arrow_up: |
   | 
[airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | 
[airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5)
 | `33.33% <0%> (-41.67%)` | :arrow_down: |
   | 
[...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==)
 | `70.14% <0%> (-28.36%)` | :arrow_down: |
   | 
[airflow/hooks/postgres\_hook.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9wb3N0Z3Jlc19ob29rLnB5)
 | `92.85% <0%> (-3.64%)` | :arrow_down: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `94.89% <0%> (-1.88%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `89.9% <0%> (-1.53%)` | :arrow_down: |
   | ... and [34 
more](https://codecov.io/gh/apache/airflow/pull/6473/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=footer). 
Last update 
[58060d3...3eb4cb3](https://codecov.io/gh/apache/airflow/pull/6473?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5866) Task_instance table too large causing issues?

2019-11-07 Thread t oo (Jira)
t oo created AIRFLOW-5866:
-

 Summary: Task_instance table too large causing issues?
 Key: AIRFLOW-5866
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5866
 Project: Apache Airflow
  Issue Type: Bug
  Components: database, scheduler
Affects Versions: 1.10.3
Reporter: t oo


mysql rds metastore - db.m5.large instance, 5.7.26 version

 

task_instance table has 2,848,160 rows

dag_run table has 22768 rows

dag table has 23 rows

log table has 17,916,891 rows

 

airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, 
parallelism set to 45 (ie max 45 tasks at once). Just using externally 
triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using 
dynamic dags

 

Everything was fine until yesterday, around 300 dag runs every day. Now today 
these issues appear all of a sudden (no code change, environment change.etc). I 
suspect the task_instance table has gotten too big and causing scheduler and 
mysql issues.

 

1.

'Recent tasks' are showing blank on the web ui home page. 
admin/airflow/task_stats fails to display with 504 error after few mins but 
dag_stats endpoint shows dags are in running state

 

2.

dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run 
(they are stuck in scheduled/queued state)

 

I then tried terminating the EC2 and getting a new one, the dagruns and tasks 
would then start finishing but then after few hours got into same situation as 
points 1/2 above. I believe certain dag ids (with many tasks) are hitting the 
issue, will know m

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add SFTPToGoogleCloudStorageOperator

2019-11-07 Thread GitBox
TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add 
SFTPToGoogleCloudStorageOperator
URL: https://github.com/apache/airflow/pull/6393#discussion_r343784244
 
 

 ##
 File path: docs/howto/operator/gcp/sftp_to_gcs.rst
 ##
 @@ -0,0 +1,105 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+SFTP to Google Cloud Storage Transfer Operator
+==
+
+Google has a service `Google Cloud Storage 
`__. This service is
+used to store large data from various applications.
+SFTP (SSH File Transfer Protocol) is a secure file transfer protocol.
+It runs over the SSH protocol. It supports the full security and 
authentication functionality of the SSH.
+
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite Tasks
+^^
+
+.. include:: _partials/prerequisite_tasks.rst
+
+.. _howto/operator:SFTPToGoogleCloudStorageOperator:
+
+Operator
+
+
+Transfer files between SFTP and Google Storage is performed with the
+:class:`~airflow.operators.sftp_to_gcs.SFTPToGoogleCloudStorageOperator` 
operator.
+
+Use :ref:`Jinja templating ` with
+:template-fields:`airflow.operators.sftp_to_gcs.SFTPToGoogleCloudStorageOperator`
+to define values dynamically.
+
+Copying single files
+
+
+The following Operator copies a single file.
+
+.. exampleinclude:: ../../../../airflow/example_dags/example_sftp_to_gcs.py
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_sftp_to_gcs_copy_single_file]
+:end-before: [END howto_operator_sftp_to_gcs_copy_single_file]
+
+Moving a single file
+
+
+To move the file use the ``move_object`` parameter. Once the file is copied to 
Google Storage,
+the original file from the SFTP is deleted.
+The ``destination_path`` parameter defines the full path of the file in the 
bucket.
+
+.. exampleinclude:: ../../../../airflow/example_dags/example_sftp_to_gcs.py
+:language: python
+:dedent: 4
+:start-after: [START 
howto_operator_sftp_to_gcs_move_single_file_destination]
+:end-before: [END howto_operator_sftp_to_gcs_move_single_file_destination]
+
+
+Copying directory
+-
+
+Use the ``wildcard`` in ``source_path`` parameter to copy the directory.
 
 Review comment:
   Sure. Information will be added. It is important to mention it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add SFTPToGoogleCloudStorageOperator

2019-11-07 Thread GitBox
TobKed commented on a change in pull request #6393: [AIRFLOW-5718] Add 
SFTPToGoogleCloudStorageOperator
URL: https://github.com/apache/airflow/pull/6393#discussion_r343782483
 
 

 ##
 File path: tests/operators/test_sftp_to_gcs.py
 ##
 @@ -0,0 +1,213 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import unittest
+
+from airflow.exceptions import AirflowException
+from airflow.operators.sftp_to_gcs import SFTPToGoogleCloudStorageOperator
+from tests.compat import mock
+
+TASK_ID = "test-gcs-to-sftp-operator"
+GCP_CONN_ID = "GCP_CONN_ID"
+SFTP_CONN_ID = "SFTP_CONN_ID"
+DELEGATE_TO = "DELEGATE_TO"
+
+DEFAULT_MIME_TYPE = "application/octet-stream"
+
+TEST_BUCKET = "test-bucket"
+SOURCE_OBJECT_WILDCARD_FILENAME = "main_dir/test_object*.json"
+SOURCE_OBJECT_NO_WILDCARD = "main_dir/test_object3.json"
+SOURCE_OBJECT_MULTIPLE_WILDCARDS = "main_dir/csv/*/test_*.csv"
+
+SOURCE_FILES_LIST = [
+"main_dir/test_object1.txt",
+"main_dir/test_object2.txt",
+"main_dir/test_object3.json",
+"main_dir/sub_dir/test_object1.txt",
+"main_dir/sub_dir/test_object2.txt",
+"main_dir/sub_dir/test_object3.json",
+]
+
+DESTINATION_PATH_DIR = "destination_dir"
+DESTINATION_PATH_FILE = "destination_dir/copy.txt"
+
+
+# pylint: disable=unused-argument
+class TestSFTPToGoogleCloudStorageOperator(unittest.TestCase):
+@mock.patch("airflow.operators.sftp_to_gcs.GoogleCloudStorageHook")
+@mock.patch("airflow.operators.sftp_to_gcs.SFTPHook")
+def test_execute_copy_single_file(self, sftp_hook, gcs_hook):
+task = SFTPToGoogleCloudStorageOperator(
+task_id=TASK_ID,
+source_path=SOURCE_OBJECT_NO_WILDCARD,
+destination_bucket=TEST_BUCKET,
+destination_path=DESTINATION_PATH_FILE,
+move_object=False,
+gcp_conn_id=GCP_CONN_ID,
+sftp_conn_id=SFTP_CONN_ID,
+delegate_to=DELEGATE_TO,
+)
+task.execute(None)
+gcs_hook.assert_called_once_with(
+gcp_conn_id=GCP_CONN_ID, delegate_to=DELEGATE_TO
+)
+sftp_hook.assert_called_once_with(SFTP_CONN_ID)
+
+sftp_hook.return_value.retrieve_file.assert_called_once_with(
+os.path.join(SOURCE_OBJECT_NO_WILDCARD), mock.ANY
+)
+
+gcs_hook.return_value.upload.assert_called_once_with(
+bucket_name=TEST_BUCKET,
+object_name=DESTINATION_PATH_FILE,
+filename=mock.ANY,
+mime_type=DEFAULT_MIME_TYPE,
+)
+
+sftp_hook.return_value.delete_file.assert_not_called()
+
+@mock.patch("airflow.operators.sftp_to_gcs.GoogleCloudStorageHook")
+@mock.patch("airflow.operators.sftp_to_gcs.SFTPHook")
+def test_execute_move_single_file(self, sftp_hook, gcs_hook):
+task = SFTPToGoogleCloudStorageOperator(
+task_id=TASK_ID,
+source_path=SOURCE_OBJECT_NO_WILDCARD,
+destination_bucket=TEST_BUCKET,
+destination_path=DESTINATION_PATH_FILE,
+move_object=True,
+gcp_conn_id=GCP_CONN_ID,
+sftp_conn_id=SFTP_CONN_ID,
+delegate_to=DELEGATE_TO,
+)
+task.execute(None)
+gcs_hook.assert_called_once_with(
+gcp_conn_id=GCP_CONN_ID, delegate_to=DELEGATE_TO
+)
+sftp_hook.assert_called_once_with(SFTP_CONN_ID)
+
+sftp_hook.return_value.retrieve_file.assert_called_once_with(
+os.path.join(SOURCE_OBJECT_NO_WILDCARD), mock.ANY
+)
+
+gcs_hook.return_value.upload.assert_called_once_with(
+bucket_name=TEST_BUCKET,
+object_name=DESTINATION_PATH_FILE,
+filename=mock.ANY,
+mime_type=DEFAULT_MIME_TYPE,
+)
+
+sftp_hook.return_value.delete_file.assert_called_once_with(
+SOURCE_OBJECT_NO_WILDCARD
+)
+
+@mock.patch("airflow.operators.sftp_to_gcs.GoogleCloudStorageHook")
+@mock.patch("airflow.operators.sftp_to_gcs.SFTPHook")
+def test_execute_copy_with_wildcard(self, sftp_hook, gcs_hook):
+sftp_hook.return_value.get_tree_map.return_value = [
+

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343755387
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343755456
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343754810
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343753269
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
 
 Review comment:
   ```diff
   - Creating a task
   - --
   + Creating a task
   + ---
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343755288
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343753683
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343734138
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[GitHub] [airflow] kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
kaxil commented on a change in pull request #6515: [AIRFLOW-XXX] GSoD: How to 
make DAGs production ready
URL: https://github.com/apache/airflow/pull/6515#discussion_r343733936
 
 

 ##
 File path: docs/howto/dags-in-production.rst
 ##
 @@ -0,0 +1,247 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+Getting a DAG ready for production
+==
+
+
+Running Airflow in production is seamless. It comes bundled with all the 
plugins and configs
+necessary to run most of the DAGs. However, you can come across certain 
pitfalls, which can cause occasional errors.
+Let's the steps you need to follow to avoid these pitfalls.
+
+Writing a DAG
+^^
+Creating a new DAG in Airflow is quite simple. However, there are many things 
that you need to take care of
+to ensure the DAG run or failure does not produce unexpected results.
+
+Creating a task
+--
+
+You should treat tasks in Airflow equivalent to transactions in a database. It 
implies that you should never produce
+incomplete results from your tasks. An example is not to produce incomplete 
data in ``HDFS`` or ``S3`` at the end of a task.
+
+Airflow retries a task if it fails. Thus, the tasks should produce the same 
outcome on every re-run.
+Some of the ways you can avoid producing a different result -
+
+* Don't use INSERT during a task re-run, an INSERT statement might lead to 
duplicate rows in your database.
+  Replace it with UPSERT.
+* Read and write in a specific partition. Never read the latest available data 
in a task. 
+  Someone may update the input data between re-runs, which results in 
different outputs. 
+  A better way is to read the input data from a specific partition. You can 
use ``execution_date`` as a partition. 
+  You should follow this partitioning method while writing data in S3/HDFS, as 
well.
+* The python datetime ``now()`` function gives the current datetime object. 
+  This function should never be used inside a task, especially to do the 
critical computation, as it leads to different outcomes on each run. 
+  It's fine to use it, for example, to generate a temporary log.
+
+
+Deleting a task
+
+
+Never delete a task from a DAG. In case of deletion, the historical 
information of the task disappears from the Airflow UI. 
+It is advised to create a new DAG in case the tasks need to be deleted.
+
+
+Communication
+--
+
+Airflow executes tasks of a DAG in different directories, which can even be 
present 
+on different servers in case you are using :doc:`Kubernetes executor 
<../executor/kubernetes>` or :doc:`Celery executor <../executor/celery>`. 
+Therefore, you should not store any file or config in the local filesystem — 
for example, a task that downloads the JAR file that the next task executes.
+
+Always use XCom to communicate small messages between tasks or S3/HDFS to 
communicate large messages/files.
+
+The tasks should also not store any authentication parameters such as 
passwords or token inside them. 
+Always use :ref:`Connections ` to store data securely in 
Airflow backend and retrieve them using a unique connection id.
+
+
+.. note::
+
+Don't write any critical code outside the tasks. The code outside the 
tasks runs every time airflow parses the DAG, which happens every second by 
default.
+
+You should also avoid repeating arguments such as connection_id or S3 
paths using default_args. It helps you to avoid mistakes while passing 
arguments.
+
+
+
+Testing a DAG
+^
+
+Airflow users should treat DAGs as production level code. The DAGs should have 
various tests to ensure that it produces expected results.
+You can write a wide variety of tests for a DAG. Let's take a look at some of 
them.
+
+DAG Loader Test
+---
+
+This test should ensure that your DAG doesn't contain a piece of code that 
raises error while loading.
+No additional code needs to be written by the user to run this test.
+
+.. code::
+
+ python your-dag-file.py
+
+Running the above command without any error ensures your DAG doesn't contain 
any uninstalled dependency, syntax errors, etc. 
+
+You can look into :ref:`Testing a DAG ` for details on how 

[jira] [Commented] (AIRFLOW-5704) Docker scripts for kind kubernetes tests can be improved

2019-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969396#comment-16969396
 ] 

ASF GitHub Bot commented on AIRFLOW-5704:
-

potiuk commented on pull request #6516: [AIRFLOW-5704] Improve Kind Kubernetes 
scripts for local testing
URL: https://github.com/apache/airflow/pull/6516
 
 
   * Fixed problem that Kubernetes tests were testing latest master
 rather than what came from the local sources.
   * Moved Kubernetes scripts to 'in_container' dir where they belong now
   * Kubernetes tests are now better suited for running locally
   * Kubernetes cluster is not deleted until environment is stopped
   * Kubernetes image is built outside of the container and passed as .tar
   * Kubectl version name is corrected in the Dockerfile
   * Kubernetes Version can be used to select Kubernetes versio
   * Running kubernetes scripts is now easy in Breeze
   * Instructions on how to run Kubernetes tests are updated
   * Better flags in Breeze are used to run Kubernetes environment/tests
   * The old "bare" environment is replaced by --no-deps switch
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Docker scripts for kind kubernetes tests can be improved
> 
>
> Key: AIRFLOW-5704
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5704
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Assignee: Jarek Potiuk
>Priority: Major
>
> The docker CI image for kind tests can be improved
>  
>  * Kubernetes Version and all the installation of docker + kubectl + kind can 
> be added back
>  * Running kubernetes scripts should be possible from within breeze without 
> special "kubernetes" environment
>  * --env breeze switch should be removed
>  * "bare" environment should be replaced by --no-deps switch
>  * ENV variable should disappear
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] potiuk opened a new pull request #6516: [AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing

2019-11-07 Thread GitBox
potiuk opened a new pull request #6516: [AIRFLOW-5704] Improve Kind Kubernetes 
scripts for local testing
URL: https://github.com/apache/airflow/pull/6516
 
 
   * Fixed problem that Kubernetes tests were testing latest master
 rather than what came from the local sources.
   * Moved Kubernetes scripts to 'in_container' dir where they belong now
   * Kubernetes tests are now better suited for running locally
   * Kubernetes cluster is not deleted until environment is stopped
   * Kubernetes image is built outside of the container and passed as .tar
   * Kubectl version name is corrected in the Dockerfile
   * Kubernetes Version can be used to select Kubernetes versio
   * Running kubernetes scripts is now easy in Breeze
   * Instructions on how to run Kubernetes tests are updated
   * Better flags in Breeze are used to run Kubernetes environment/tests
   * The old "bare" environment is replaced by --no-deps switch
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969391#comment-16969391
 ] 

Jon Hanson edited comment on AIRFLOW-1753 at 11/7/19 4:28 PM:
--

Even if you get past that, the same script (daemon.py) fails on {{import 
resource}}. The entire daemon.py script looks very unix-specific, so I think it 
would need to be rewritten wholesale for Windows.


was (Author: jonhanson):
Even if you get past that, the same script (daemon.py) fails on {{import 
resource}}. The entire daemon.py script looks very unix-specific, so I think 
the it would need to be rewritten wholesale for Windows.

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969391#comment-16969391
 ] 

Jon Hanson commented on AIRFLOW-1753:
-

Even if you get past that, the same script (daemon.py) fails on {{import 
resource}}. The entire daemon.py script looks very unix-specific, so I think 
the it would need to be rewritten wholesale for Windows.

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread jack (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969386#comment-16969386
 ] 

jack commented on AIRFLOW-1753:
---

pwd is a built-in module(come with python installation) for unix like only os.
For windows maybe winpwd can work

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969376#comment-16969376
 ] 

Jon Hanson edited comment on AIRFLOW-1753 at 11/7/19 4:12 PM:
--

[~ash]

If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
 {{import pwd}}
 {{ModuleNotFoundError: No module named 'pwd'}}


was (Author: jonhanson):
@[~ash]

If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
 {{import pwd}}
 {{ModuleNotFoundError: No module named 'pwd'}}

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969376#comment-16969376
 ] 

Jon Hanson edited comment on AIRFLOW-1753 at 11/7/19 4:12 PM:
--

@[~ash]

If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
 {{import pwd}}
 {{ModuleNotFoundError: No module named 'pwd'}}


was (Author: jonhanson):
If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
{{import pwd}}
{{ModuleNotFoundError: No module named 'pwd'}}

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969376#comment-16969376
 ] 

Jon Hanson edited comment on AIRFLOW-1753 at 11/7/19 4:11 PM:
--

If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
{{import pwd}}
{{ModuleNotFoundError: No module named 'pwd'}}


was (Author: jonhanson):
If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{ File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
{{ import pwd}}
{{ModuleNotFoundError: No module named 'pwd'}}

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10

2019-11-07 Thread Jon Hanson (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969376#comment-16969376
 ] 

Jon Hanson commented on AIRFLOW-1753:
-

If you get it to build, and then run {{airflow initdb}}, it will fail as it's 
using daemon.py, which is importing two Unix-only modules (pwd & resource):

 

{{ File "J:\python\python38\lib\site-packages\daemon\daemon.py", line 25, in 
}}
{{ import pwd}}
{{ModuleNotFoundError: No module named 'pwd'}}

> Can't install on windows 10
> ---
>
> Key: AIRFLOW-1753
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1753
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Lakshman Udayakantha
>Priority: Major
>
> When I installed airflow using "pip install airflow command" two errors pop 
> up.
> 1.  link.exe failed with exit status 1158
> 2.\x86_amd64\\cl.exe' failed with exit status 2
> first issue can be solved by reffering 
> https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421.
> But second issue is still there. there was no any solution by googling also. 
> how to prevent that issue and install airflow on windows 10 X64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] KKcorps commented on issue #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
KKcorps commented on issue #6515: [AIRFLOW-XXX] GSoD: How to make DAGs 
production ready
URL: https://github.com/apache/airflow/pull/6515#issuecomment-551142274
 
 
   Surge UI - 
http://airflow.kharekartik-prod-dag.surge.sh/howto/dags-in-production.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] KKcorps opened a new pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs production ready

2019-11-07 Thread GitBox
KKcorps opened a new pull request #6515: [AIRFLOW-XXX] GSoD: How to make DAGs 
production ready
URL: https://github.com/apache/airflow/pull/6515
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] baolsen commented on issue #6512: [AIRFLOW-5824] Added AWS DataSync Hook and Operator

2019-11-07 Thread GitBox
baolsen commented on issue #6512: [AIRFLOW-5824] Added AWS DataSync Hook and 
Operator
URL: https://github.com/apache/airflow/pull/6512#issuecomment-551119608
 
 
   I split up the monolithic DataSync operator into different operators for 
Create, Get, Update, Delete and execute. This provides more fine-grained 
control for dealing with AWS DataSync and will work better for users in 
read-only scenarios or scenarios where new Tasks _must_ be created every time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj closed pull request #124: Suggest a change button

2019-11-07 Thread GitBox
mik-laj closed pull request #124: Suggest a change button
URL: https://github.com/apache/airflow-site/pull/124
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj commented on issue #106: [depends on #124] Roadmap

2019-11-07 Thread GitBox
mik-laj commented on issue #106: [depends on #124] Roadmap
URL: https://github.com/apache/airflow-site/pull/106#issuecomment-551118740
 
 
   Superseded by: https://github.com/apache/airflow-site/pull/138


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj closed pull request #100: [depends on #91] Docs

2019-11-07 Thread GitBox
mik-laj closed pull request #100: [depends on #91] Docs
URL: https://github.com/apache/airflow-site/pull/100
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj commented on issue #125: [depends on #106] Feature/tracking progress

2019-11-07 Thread GitBox
mik-laj commented on issue #125: [depends on #106] Feature/tracking progress
URL: https://github.com/apache/airflow-site/pull/125#issuecomment-551118782
 
 
   Superseded by: https://github.com/apache/airflow-site/pull/138


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj edited a comment on issue #100: [depends on #91] Docs

2019-11-07 Thread GitBox
mik-laj edited a comment on issue #100: [depends on #91] Docs
URL: https://github.com/apache/airflow-site/pull/100#issuecomment-551118349
 
 
   Superseded by: https://github.com/apache/airflow-site/pull/138


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj closed pull request #106: [depends on #124] Roadmap

2019-11-07 Thread GitBox
mik-laj closed pull request #106: [depends on #124] Roadmap
URL: https://github.com/apache/airflow-site/pull/106
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj commented on issue #124: Suggest a change button

2019-11-07 Thread GitBox
mik-laj commented on issue #124: Suggest a change button
URL: https://github.com/apache/airflow-site/pull/124#issuecomment-551119041
 
 
   Superseded by: https://github.com/apache/airflow-site/pull/138


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj closed pull request #125: [depends on #106] Feature/tracking progress

2019-11-07 Thread GitBox
mik-laj closed pull request #125: [depends on #106] Feature/tracking progress
URL: https://github.com/apache/airflow-site/pull/125
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-site] mik-laj commented on issue #100: [depends on #91] Docs

2019-11-07 Thread GitBox
mik-laj commented on issue #100: [depends on #91] Docs
URL: https://github.com/apache/airflow-site/pull/100#issuecomment-551118349
 
 
   Suspensed by: https://github.com/apache/airflow-site/pull/138


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >