[jira] [Commented] (AIRFLOW-3561) Improve some views

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728013#comment-16728013
 ] 

ASF GitHub Bot commented on AIRFLOW-3561:
-

ffinfo commented on pull request #4368: AIRFLOW-3561 - improve queries
URL: https://github.com/apache/incubator-airflow/pull/4368
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3561\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3561
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve some views
> --
>
>     Key: AIRFLOW-3561
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3561
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Peter van 't Hof
>Assignee: Peter van 't Hof
>Priority: Minor
>
> Some views does interaction with the dag bag while is not needed for the 
> query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3561) Improve some views

2018-12-23 Thread Peter van 't Hof (JIRA)
Peter van 't Hof created AIRFLOW-3561:
-

 Summary: Improve some views
 Key: AIRFLOW-3561
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3561
 Project: Apache Airflow
  Issue Type: Improvement
  Components: webserver
Reporter: Peter van 't Hof
Assignee: Peter van 't Hof


Some views does interaction with the dag bag while is not needed for the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727999#comment-16727999
 ] 

ASF GitHub Bot commented on AIRFLOW-3551:
-

feluelle commented on pull request #4367: [AIRFLOW-3551] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4367
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3551
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - adds test case for xcom_push=True
   - refactoring
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve BashOperator Test Coverage
> --
>
>     Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727998#comment-16727998
 ] 

ASF GitHub Bot commented on AIRFLOW-3551:
-

feluelle commented on pull request #4366: [AIRFLOW-3551] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4366
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3552) Add ImapToS3TransferOperator

2018-12-23 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727995#comment-16727995
 ] 

Felix Uellendall commented on AIRFLOW-3552:
---

Sorry. I referenced the wrong id by mistake.

> Add ImapToS3TransferOperator
> 
>
> Key: AIRFLOW-3552
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3552
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
>
> This operator transfers mail attachments from a mail server to an amazon s3 
> bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3552) Add ImapToS3TransferOperator

2018-12-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727993#comment-16727993
 ] 

ASF GitHub Bot commented on AIRFLOW-3552:
-

feluelle commented on pull request #4366: [AIRFLOW-3552] Improve BashOperator 
Test Coverage
URL: https://github.com/apache/incubator-airflow/pull/4366
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3551
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   - adds test case for xcom_push=True
   - refactoring
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ImapToS3TransferOperator
> 
>
>     Key: AIRFLOW-3552
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3552
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
>
> This operator transfers mail attachments from a mail server to an amazon s3 
> bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727991#comment-16727991
 ] 

Felix Uellendall commented on AIRFLOW-3551:
---

You propably mean the `pre_execute` function from the BaseOperator. I mean the 
`pre_exec` function (see above).

> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727987#comment-16727987
 ] 

Felix Uellendall edited comment on AIRFLOW-3551 at 12/23/18 3:52 PM:
-

I think it is not really testable because it is a inner function and I can not 
directly access it.
{code:python}
def pre_exec():
# Restore default signal disposition and invoke setsid
for sig in ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ'):
if hasattr(signal, sig):
signal.signal(getattr(signal, sig), signal.SIG_DFL)
os.setsid()

self.log.info('Running command: %s', self.bash_command)
sub_process = Popen(
['bash', tmp_file.name],
stdout=PIPE,
stderr=STDOUT,
cwd=tmp_dir,
env=self.env,
preexec_fn=pre_exec)
{code}


was (Author: feluelle):
I think it is not really testable because it is a inner function and I can not 
directly access it.
{code:python}
def pre_exec():
# Restore default signal disposition and invoke setsid
for sig in ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ'):
if hasattr(signal, sig):
signal.signal(getattr(signal, sig), signal.SIG_DFL)
os.setsid()

self.log.info('Running command: %s', self.bash_command)
sub_process = Popen(
['bash', tmp_file.name],
stdout=PIPE,
stderr=STDOUT,
cwd=tmp_dir,
env=self.env,
preexec_fn=pre_exec)
{code}

If we move it outside we can test it. You think the `pre_exec` function should 
be a public function in the `BaseOperator` ?


> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-23 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727987#comment-16727987
 ] 

Felix Uellendall commented on AIRFLOW-3551:
---

I think it is not really testable because it is a inner function and I can not 
directly access it.
{code:python}
def pre_exec():
# Restore default signal disposition and invoke setsid
for sig in ('SIGPIPE', 'SIGXFZ', 'SIGXFSZ'):
if hasattr(signal, sig):
signal.signal(getattr(signal, sig), signal.SIG_DFL)
os.setsid()

self.log.info('Running command: %s', self.bash_command)
sub_process = Popen(
['bash', tmp_file.name],
stdout=PIPE,
stderr=STDOUT,
cwd=tmp_dir,
env=self.env,
preexec_fn=pre_exec)
{code}

If we move it outside we can test it. You think the `pre_exec` function should 
be a public function in the `BaseOperator` ?


> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727965#comment-16727965
 ] 

jack commented on AIRFLOW-32:
-

All tasks in this ticket have been merged.

Can be closed?

> Remove deprecated features prior to releasing Airflow 2.0
> -
>
> Key: AIRFLOW-32
> URL: https://issues.apache.org/jira/browse/AIRFLOW-32
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Priority: Major
>  Labels: deprecated
> Fix For: 2.0.0
>
>
> A number of features have been marked for deprecation in Airflow 2.0. They 
> need to be deleted prior to release. 
> Usually the error message or comments will mention Airflow 2.0 with either a 
> #TODO or #FIXME.
> Tracking list (not necessarily complete!):
> JIRA:
> AIRFLOW-31
> AIRFLOW-200
> GitHub:
> https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
> https://github.com/airbnb/airflow/pull/1219
> https://github.com/airbnb/airflow/pull/1285



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727962#comment-16727962
 ] 

jack commented on AIRFLOW-2319:
---

[~akoeltringer] I think officialy there are only 3: SQlite, PostgreSQL and 
MySQL. As these are the only DBs being tested with travis.

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3045) Duplicate entry error with MySQL when update task_instances

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727960#comment-16727960
 ] 

jack commented on AIRFLOW-3045:
---

Shouldn't have this break any MySQL back-end installed since 1.10.0 ?

> Duplicate entry error with MySQL when update task_instances
> ---
>
> Key: AIRFLOW-3045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3045
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.0
>Reporter: Haotian Wu
>Assignee: Haotian Wu
>Priority: Major
>
> h3. How to reproduce
> # Setup apach-airflow==1.10.0 with MySQL, bring up both webserver and 
> scheduler.
> # Add a DAG and it becomes running but none of the task will be actually 
> executed.
> # Manually trigger another run for the same dag, airflow scheduler will crash 
> with error {{sqlalchemy.exc.IntegrityError: 
> (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 
> 'xxx-yy--MM-DD ...' for key 'PRIMARY'")}}
> h3. The Reason
> In Airflow-1.10.0, execution_date field of task_instance is changed from 
> DateTime to Timestamp. However, in MySQL first Timestamp column in a table is 
> declared with {{ON UPDATE CURRENT_TIMESTAMP}} clause. Database in MySQL will 
> look like below after {{airflow initdb}}.
> | Field   | Type  | Null | Key | Default  | Extra 
> |
> | task_id | varchar(250)  | NO   | PRI | NULL |   
> |
> | dag_id  | varchar(250)  | NO   | PRI | NULL |   
> |
> | execution_date  | timestamp(6)  | NO   | PRI | CURRENT_TIMESTAMP(6) |  on 
> update CURRENT_TIMESTAMP(6)  |
> # When a task_instance is updated from state NULL to state "scheduled", its 
> execution_date is also reset to current timestamp automatically. 
> # task_instance is linked to a given dag_run by same execution_date, so 
> changed execution_date means task_instance is no longer linked to any known 
> dag_run.
> # If there are more than one dag_run for the same dag, multiple task_instance 
> with same  will be "unlinked" to their dag_run. Airflow 
> scheduler will try to update them to state NULL and thus try to update them 
> to the same  primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2609) Fix small issue with the BranchPythonOperator. It currently is skipping tasks it should not.

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727957#comment-16727957
 ] 

jack commented on AIRFLOW-2609:
---

This ticket needs to be open. The PR wasn't merged yet:

https://github.com/apache/incubator-airflow/pull/3530

> Fix small issue with the BranchPythonOperator. It currently is skipping tasks 
> it should not.
> 
>
> Key: AIRFLOW-2609
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2609
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Sandro Luck
>Assignee: Sandro Luck
>Priority: Minor
>
> Current behavior: When you Branch from A e.g. the BranchPythonOperator '->' 
> (B or C,), and you make some B '->' C as well. The current behavior is that C 
> will be skipped even though it's a downstream task of B. Wishes behavior only 
> skip downstream tasks which are not in the list of the branch_taken 
> downstream tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3047) HiveCliHook does not work properly with Beeline

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727950#comment-16727950
 ] 

jack commented on AIRFLOW-3047:
---

[~vladglinskiy] can you submit PR for this?

> HiveCliHook does not work properly with Beeline
> ---
>
> Key: AIRFLOW-3047
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3047
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, hooks
>Affects Versions: 1.10.0
>Reporter: Vladislav Glinskiy
>Priority: Major
>
> Simple _HiveOperator_ does not work properly in the case when 
> _hive_cli_default_ connection configured to use _Beeline_.
>  
> *Steps to reproduce:* 
> 1. Setup Hive/HiveServer2 and Airflow environment with _beeline_ in _PATH_
> 2. Create test _datetimes_ table
> As example:
> {code:java}
> CREATE EXTERNAL TABLE datetimes (
> datetimes STRING)
> STORED AS PARQUET
> LOCATION '/opt/apps/datetimes';{code}
>  
> 3. Edit _hive_cli_default_ connection:
> {code:java}
> airflow connections --delete --conn_id hive_cli_default
> airflow connections --add --conn_id hive_cli_default --conn_type hive_cli 
> --conn_host $HOST --conn_port 1 --conn_schema default --conn_login 
> $CONN_LOGIN --conn_password $CONN_PASSWORD --conn_extra "{\"use_beeline\": 
> true, \"auth\": \"null;user=$HS_USER;password=$HS_PASSWORD\"}"
> {code}
> Set variables according to your environment.
>  
> 4. Create simple DAG:
> {code:java}
> """
> ###
> Sample DAG, which declares single Hive task.
> """
> import datetime
> import airflow
> from airflow import DAG
> from airflow.operators.hive_operator import HiveOperator
> from datetime import timedelta
> default_args = {
>   'owner': 'airflow',
>   'depends_on_past': False,
>   'start_date': airflow.utils.dates.days_ago(0, hour=0, minute=0, second=1),
>   'email': ['airf...@example.com'],
>   'email_on_failure': False,
>   'email_on_retry': False,
>   'retries': 1,
>   'retry_delay': timedelta(minutes=5),
>   'provide_context': True
> }
> dag = DAG(
> 'hive_task_dag',
> default_args=default_args,
> description='Single task DAG',
> schedule_interval=timedelta(minutes=15))
> insert_current_datetime = HiveOperator(
> task_id='insert_current_datetime_task',
> hql="insert into table datetimes values ('" + 
> datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y") + "');",
> dag=dag)
> dag.doc_md = __doc__
> {code}
>  
> 5. Trigger DAG execution. Ensure that DAG completes successfully.
> 6. Check _datetimes_ table. It will be empty.
>  
> As it turned out the issue is caused by an invalid temporary script file. The 
> problem will be fixed if we add new-line character at the end of the script.
> So, a possible fix is to change:
> *hive_hooks.py:182*
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}".format(**locals())
> {code}
> to
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}\n".format(**locals())
> {code}
> Don't know how it can affect _hive shell_ queries since it is tested only 
> against _beeline_.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3243) UI task and dag clear feature cannot pick up dag parameters

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727944#comment-16727944
 ] 

jack commented on AIRFLOW-3243:
---

I don't think the max_active_runs is enforced when clearing tasks.

> UI task and dag clear feature cannot pick up dag parameters
> ---
>
> Key: AIRFLOW-3243
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3243
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: chengningzhang
>Priority: Major
>
> Hi, 
>     I meet an issue with airflow UI dags and tasks "clear" feature. When I 
> clear the tasks from the UI, the dag parameters will not be picked up by the 
> the cleared tasks.
>     For example, I have "max_active_runs=1" in my dag parameter, but when I 
> manually clear the tasks, this parameter will not be picked up. The same 
> cleared tasks with different schedule time will run in parallel. 
>    Is there way we can improve this, as we may want to backfill some data and 
> just clear the past tasks from airflow UI. 
>  
> Thanks,
> Chengning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3017) 404 error when opening log in the Web UI

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727939#comment-16727939
 ] 

jack commented on AIRFLOW-3017:
---

This seems like a local issue on the old UI which is deprecated for 2.0.0

> 404 error when opening log in the Web UI
> 
>
> Key: AIRFLOW-3017
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3017
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Victor
>Priority: Major
>
> I opened the logs of one of the task of a DAG and saw the following error in 
> the console of my browser:
> GET https://AIRFLOW:8080/admin/admin/admin/js/form-1.0.0.js net::ERR_ABORTED 
> 404
> I suppose there is a typo somewhere in the code…



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2842) GCS rsync operator

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727842#comment-16727842
 ] 

jack commented on AIRFLOW-2842:
---

[~dlamblin] This can be achieved with BashOperator but you can say this on 
everything.

In any case having operator for this can make life easier (you don't need to 
manage separated connection files with credentials etc.. ).

> GCS rsync operator
> --
>
> Key: AIRFLOW-2842
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2842
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Vikram Oberoi
>Priority: Major
>
> The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects 
> from one bucket to another using a wildcard.
> As long you don't delete anything in the source bucket, the destination 
> bucket will end up synchronized on every run.
> However, each object gets copied over even if it exists at the destination, 
> which makes this operation inefficient, time-consuming, and potentially 
> costly.
> I'd love an operator that behaves like `gsutil rsync` for when I need to 
> synchronize two buckets, supporting `gsutil rsync -d` behavior as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3288) Add SNS integration

2018-12-23 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727827#comment-16727827
 ] 

jack commented on AIRFLOW-3288:
---

[~ashb] This was merged. The ticket can be closed.

> Add SNS integration
> ---
>
> Key: AIRFLOW-3288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3288
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Szymon Bilinski
>Assignee: Szymon Bilinski
>Priority: Major
>
> I'd like to propose a new {{contrib}} hook and a basic operator for 
> publishing *Amazon SNS* notifications.
> Motivation: 
> - Useful for integrating various Amazon services and pretty general in 
> nature: 
> -- AWS SQS
> -- AWS Lambda
> -- E-mail
> -- ... 
> - A similar functionality already 
> [exists|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/example_dags/example_pubsub_flow.py]
>  for GCP (i.e. Pub/Sub integration).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3558) Have tox flake8 skip ignored and hidden directories

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727826#comment-16727826
 ] 

ASF GitHub Bot commented on AIRFLOW-3558:
-

bolkedebruin commented on pull request #4361: [AIRFLOW-3558] Improve default 
tox flake8 excludes
URL: https://github.com/apache/incubator-airflow/pull/4361
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Have tox flake8 skip ignored and hidden directories
> ---
>
> Key: AIRFLOW-3558
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3558
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> By default if you run tox with the flake8 target in Airflow it checks all of 
> the directories, this includes .eggs, env, etc. all of which are ignored by 
> our gitignore but cat by flake8 and gives a bunch of errors for non-Airflow 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3356) Scheduler gets stuck for certain DAGs

2018-12-22 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727825#comment-16727825
 ] 

jack commented on AIRFLOW-3356:
---

I'm having the same issue also on 1.9

Some tasks are stuck on running even though they are not (tasks that takes 1-4 
minutes to execute get stuck for hours)... Only clearing them solve this and 
then they are re-scheduled.

> Scheduler gets stuck for certain DAGs
> -
>
> Key: AIRFLOW-3356
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3356
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: John Smodic
>Priority: Critical
>
> I observe the scheduler getting stuck for certain DAGs:
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: File Path PID Runtime Last 
> Runtime Last Run
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: 
> /home/ubuntu/airflow/dags/stuck_dag.py 14241 *19977.55s* 1.05s 
> 2018-11-15T13:38:47
> Nov 15 19:11:48 ip-172-16-13-120 python3.6[1319]: 
> /home/ubuntu/airflow/dags/not_stuck_dag.py 19906 0.05s 1.05s 
> 2018-11-15T19:11:44
>  
> The "Runtime" of the stuck DAG's scheduling process is huge and I can't tell 
> what it's doing. There's no mention of that DAG in the scheduler logs 
> otherwise.
>  
> The mapped process looks like this:
> ubuntu 14241 0.0 0.3 371132 63232 ? S 13:38 0:00 /usr/bin/python3.6 
> /usr/local/bin/airflow scheduler
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1684) Branching based on XCOM variable

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727813#comment-16727813
 ] 

ASF GitHub Bot commented on AIRFLOW-1684:
-

eladkal commented on pull request #4365: [AIRFLOW-1684] - Branching based on 
XCom variable (Docs)
URL: https://github.com/apache/incubator-airflow/pull/4365
 
 
   Elaborate how to use branching with XComs
   
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-1684) issues and references 
them in the PR title.
   
   ### Description
   
   - Elaborate how to use branching with XComs
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Branching based on XCOM variable
> 
>
> Key: AIRFLOW-1684
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1684
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: xcom
>Affects Versions: 1.7.0
> Environment: Centos 7, Airflow1.7
>Reporter: Virendhar Sivaraman
>Assignee: Elad
>Priority: Major
>
> I would like to branch my dag based on a XCOM variable.
> Steps:
> 1. Populate XCOM in bash
> 2. pull the XCOM variable in a BranchPythonOperator and branch it out based 
> on the XCOM variable
> I've tried the documentation and researched on the internet - haven't been 
> successful.
> This feature will be helpful if its not available yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1684) Branching based on XCOM variable

2018-12-22 Thread Elad (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elad reassigned AIRFLOW-1684:
-

Assignee: Elad

> Branching based on XCOM variable
> 
>
> Key: AIRFLOW-1684
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1684
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: xcom
>Affects Versions: 1.7.0
> Environment: Centos 7, Airflow1.7
>Reporter: Virendhar Sivaraman
>Assignee: Elad
>Priority: Major
>
> I would like to branch my dag based on a XCOM variable.
> Steps:
> 1. Populate XCOM in bash
> 2. pull the XCOM variable in a BranchPythonOperator and branch it out based 
> on the XCOM variable
> I've tried the documentation and researched on the internet - haven't been 
> successful.
> This feature will be helpful if its not available yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727807#comment-16727807
 ] 

ASF GitHub Bot commented on AIRFLOW-3550:
-

jmcarp commented on pull request #4364: [AIRFLOW-3550] Standardize GKE hook.
URL: https://github.com/apache/incubator-airflow/pull/4364
 
 
   Refactor `GKEClusterHook` to subclass `GoogleCloudBaseHook` and
   authenticate with connection credentials.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3550
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Refactor `GKEClusterHook` to subclass `GoogleCloudBaseHook` and authenticate 
with connection credentials.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GKEClusterHook doesn't use gcp_conn_id
> --
>
>     Key: AIRFLOW-3550
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3550
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Wilson Lian
>Priority: Major
>
> The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using 
> the default service account (if present).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3560) Add Sensor that polls until a day of the week

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727764#comment-16727764
 ] 

ASF GitHub Bot commented on AIRFLOW-3560:
-

kaxil commented on pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek 
Sensors
URL: https://github.com/apache/incubator-airflow/pull/4363
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3560
   
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   One of the use-cases we had is we wanted to run certain tasks only on 
Weekends or certain days of the weeks. Along the way, I have seen more people 
requiring the same.
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `DayOfWeekSensorTests`
   * `WeekEndSensorTests`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sensor that polls until a day of the week
> -
>
>     Key: AIRFLOW-3560
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3560
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2
>
>
> One of the use-case we have is we want to run certain tasks only on Weekends



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3560) Add Sensor that polls until a day of the week

2018-12-22 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-3560:
---

 Summary: Add Sensor that polls until a day of the week
 Key: AIRFLOW-3560
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3560
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 1.10.2


One of the use-case we have is we want to run certain tasks only on Weekends



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1077) Subdags can deadlock

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727537#comment-16727537
 ] 

ASF GitHub Bot commented on AIRFLOW-1077:
-

stale[bot] closed pull request #2367: [AIRFLOW-1077] Warn about subdag deadlock 
case
URL: https://github.com/apache/incubator-airflow/pull/2367
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/concepts.rst b/docs/concepts.rst
index 33a6ea44c7..56fbd2a531 100644
--- a/docs/concepts.rst
+++ b/docs/concepts.rst
@@ -457,10 +457,10 @@ Not like this, where the join task is skipped
 
 .. image:: img/branch_bad.png
 
-SubDAGs
+SubDags
 ===
 
-SubDAGs are perfect for repeating patterns. Defining a function that returns a
+SubDags are perfect for repeating patterns. Defining a function that returns a
 DAG object is a nice design pattern when using Airflow.
 
 Airbnb uses the *stage-check-exchange* pattern when loading data. Data is 
staged
@@ -472,13 +472,13 @@ As another example, consider the following DAG:
 
 .. image:: img/subdag_before.png
 
-We can combine all of the parallel ``task-*`` operators into a single SubDAG,
+We can combine all of the parallel ``task-*`` operators into a single SubDag,
 so that the resulting DAG resembles the following:
 
 .. image:: img/subdag_after.png
 
-Note that SubDAG operators should contain a factory method that returns a DAG
-object. This will prevent the SubDAG from being treated like a separate DAG in
+Note that SubDag operators should contain a factory method that returns a DAG
+object. This will prevent the SubDag from being treated like a separate DAG in
 the main UI. For example:
 
 .. code:: python
@@ -503,7 +503,7 @@ the main UI. For example:
 
 return dag
 
-This SubDAG can then be referenced in your main DAG file:
+This SubDag can then be referenced in your main DAG file:
 
 .. code:: python
 
@@ -531,29 +531,36 @@ This SubDAG can then be referenced in your main DAG file:
   )
 
 You can zoom into a SubDagOperator from the graph view of the main DAG to show
-the tasks contained within the SubDAG:
+the tasks contained within the SubDag:
 
 .. image:: img/subdag_zoom.png
 
-Some other tips when using SubDAGs:
+Some other tips when using SubDags:
 
--  by convention, a SubDAG's ``dag_id`` should be prefixed by its parent and
+-  by convention, a SubDag's ``dag_id`` should be prefixed by its parent and
a dot. As in ``parent.child``
--  share arguments between the main DAG and the SubDAG by passing arguments to
-   the SubDAG operator (as demonstrated above)
--  SubDAGs must have a schedule and be enabled. If the SubDAG's schedule is
-   set to ``None`` or ``@once``, the SubDAG will succeed without having done
+-  share arguments between the main DAG and the SubDag by passing arguments to
+   the SubDag operator (as demonstrated above)
+-  SubDags must have a schedule and be enabled. If the SubDag's schedule is
+   set to ``None`` or ``@once``, the SubDag will succeed without having done
anything
 -  clearing a SubDagOperator also clears the state of the tasks within
 -  marking success on a SubDagOperator does not affect the state of the tasks
within
--  refrain from using ``depends_on_past=True`` in tasks within the SubDAG as
+-  refrain from using ``depends_on_past=True`` in tasks within the SubDag as
this can be confusing
--  it is possible to specify an executor for the SubDAG. It is common to use
-   the SequentialExecutor if you want to run the SubDAG in-process and
+-  it is possible to specify an executor for the SubDag. It is common to use
+   the SequentialExecutor if you want to run the SubDag in-process and
effectively limit its parallelism to one. Using LocalExecutor can be
problematic as it may over-subscribe your worker, running multiple tasks in
a single slot
+-  do not create more SubDags then your concurrency limit or the scheduler
+   will deadlock. Each SubDags counts towards your concurrency limit. For
+   example, if you have a concurrency limit of 16 and you have 25 SubDags,
+   the 16 SubDags will be scheduled, effectively blocking any of the tasks
+   within the given SubDags. You can work around this by setting the SubDag's
+   executor to SequentialExecutor. This allows multiple SubDag to run
+   concurrently without locking the tasks within the SubDag
 
 See ``airflow/example_dags`` for a demonstration.
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (AIRFLOW-3150) Make execution_date a template field in TriggerDagRunOperator

2018-12-22 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3150:

Fix Version/s: 1.10.2

> Make execution_date a template field in TriggerDagRunOperator
> -
>
> Key: AIRFLOW-3150
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3150
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Kyle Hamlin
>Assignee: Kaxil Naik
>Priority: Minor
>  Labels: easy-fix
> Fix For: 1.10.2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3559) Add missing options to DatadogHook

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727533#comment-16727533
 ] 

ASF GitHub Bot commented on AIRFLOW-3559:
-

jmcarp opened a new pull request #4362: [AIRFLOW-3559] Add missing options to 
DatadogHook.
URL: https://github.com/apache/incubator-airflow/pull/4362
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3559
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds missing arguments to `DatadogHook`.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Backfills missing tests for `DatadogHook`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add missing options to DatadogHook
> --
>
>     Key: AIRFLOW-3559
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3559
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> The DataDog hook is missing a few options for creating events and metrics. 
> I'll add those options and backfill unit tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3559) Add missing options to DatadogHook

2018-12-22 Thread Josh Carp (JIRA)
Josh Carp created AIRFLOW-3559:
--

 Summary: Add missing options to DatadogHook
 Key: AIRFLOW-3559
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3559
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Josh Carp
Assignee: Josh Carp


The DataDog hook is missing a few options for creating events and metrics. I'll 
add those options and backfill unit tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3558) Have tox flake8 skip ignored and hidden directories

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727526#comment-16727526
 ] 

ASF GitHub Bot commented on AIRFLOW-3558:
-

holdenk opened a new pull request #4361: [AIRFLOW-3558] Improve default tox 
flake8 excludes
URL: https://github.com/apache/incubator-airflow/pull/4361
 
 
   Right now our gitignore skips a bunch of temporary Python directories
   but our flake8 config will still test against them, leading to
   unnecessary error messages. This changes the excludes
   to skip the common directories that can cause false flake8 failures.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ X ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Right now our gitignore skips a bunch of temporary Python directories
   but our flake8 config will still test against them, leading to
   unnecessary error messages. This changes the excludes
   to skip the common directories that can cause false flake8 failures.
   
   This should not impact end users.
   
   ### Tests
   
   - [ X ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   The existing flake8 env still runs from tox
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ X ] In case of new functionality, my PR adds documentation that 
describes how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   No new functionality.
   
   ### Code Quality
   
   - [ X ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Have tox flake8 skip ignored and hidden directories
> ---
>
>     Key: AIRFLOW-3558
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3558
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> By default if you run tox with the flake8 target in Airflow it checks all of 
> the directories, this includes .eggs, env, etc. all of which are ignored by 
> our gitignore but cat by flake8 and gives a bunch of errors for non-Airflow 
> code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3558) Have tox flake8 skip ignored and hidden directories

2018-12-22 Thread holdenk (JIRA)
holdenk created AIRFLOW-3558:


 Summary: Have tox flake8 skip ignored and hidden directories
 Key: AIRFLOW-3558
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3558
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: holdenk
Assignee: holdenk


By default if you run tox with the flake8 target in Airflow it checks all of 
the directories, this includes .eggs, env, etc. all of which are ignored by our 
gitignore but cat by flake8 and gives a bunch of errors for non-Airflow code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1191) Contrib Spark Submit hook should permit override of spark-submit cmd

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727522#comment-16727522
 ] 

ASF GitHub Bot commented on AIRFLOW-1191:
-

holdenk opened a new pull request #4360: [AIRFLOW-1191] Simplify override of 
spark submit command
URL: https://github.com/apache/incubator-airflow/pull/4360
 
 
   This will better support distros which ship spark 1 & 2 ( and eventually 3)
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ X ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Adds a spark_binary param to the spark submit operator to allow folks to 
more easily configure the operator to use a different binary, as is needed for 
some distros of the Hadoop ecosystem which ship multiple version of Spark.
   
   ### Tests
   
   - [ X ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Updates the existing test_spark_submit_operator to check for override
   
   ### Commits
   
   - [ X ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ X ] In case of new functionality, my PR adds documentation that 
describes how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   docstring update
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Contrib Spark Submit hook should permit override of spark-submit cmd
> 
>
>     Key: AIRFLOW-1191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1191
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, hooks
>Affects Versions: 1.8.1
> Environment: Cloudera based Spark parcel
>Reporter: Vianney FOUCAULT
>Assignee: Vianney FOUCAULT
>Priority: Major
> Fix For: 1.10.0
>
>
> Using Cloudera based cluster with spark 2 parcel that rename spark-submit to 
> spark2-submit
> It should be possible to change the spark submit cmd without specifying a env 
> var



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-22 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-850.

   Resolution: Fixed
Fix Version/s: 1.10.2

Resolved by https://github.com/apache/incubator-airflow/pull/4349

> Airflow should support a general purpose PythonSensor
> -
>
> Key: AIRFLOW-850
> URL: https://issues.apache.org/jira/browse/AIRFLOW-850
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Daniel Gies
>Assignee: Daniel Gies
>Priority: Major
> Fix For: 1.10.2
>
>
> Today I found myself trying to use a sensor to postpone execution until data 
> for the current execution date appeared in a file.  It occurred to me that 
> having a general purpose PythonSensor would allow developers to use the 
> sensor paradigm with arbitrary code.
> We should add a PythonSensor to the core sensors module which takes a 
> python_callable and optional args like the PythonOperator does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727516#comment-16727516
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


kaxil closed pull request #4349: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/4349
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/sensors/python_sensor.py 
b/airflow/contrib/sensors/python_sensor.py
new file mode 100644
index 00..68bc7497ea
--- /dev/null
+++ b/airflow/contrib/sensors/python_sensor.py
@@ -0,0 +1,81 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.sensors.base_sensor_operator import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class PythonSensor(BaseSensorOperator):
+"""
+Waits for a Python callable to return True.
+
+User could put input argument in templates_dict
+e.g templates_dict = {'start_ds': 1970}
+and access the argument by calling `kwargs['templates_dict']['start_ds']`
+in the the callable
+
+:param python_callable: A reference to an object that is callable
+:type python_callable: python callable
+:param op_kwargs: a dictionary of keyword arguments that will get unpacked
+in your function
+:type op_kwargs: dict
+:param op_args: a list of positional arguments that will get unpacked when
+calling your callable
+:type op_args: list
+:param provide_context: if set to true, Airflow will pass a set of
+keyword arguments that can be used in your function. This set of
+kwargs correspond exactly to what you can use in your jinja
+templates. For this to work, you need to define `**kwargs` in your
+function header.
+:type provide_context: bool
+:param templates_dict: a dictionary where the values are templates that
+will get templated by the Airflow engine sometime between
+``__init__`` and ``execute`` takes place and are made available
+in your callable's context after the template has been applied.
+:type templates_dict: dict of str
+"""
+
+template_fields = ('templates_dict',)
+template_ext = tuple()
+
+@apply_defaults
+def __init__(
+self,
+python_callable,
+op_args=None,
+op_kwargs=None,
+provide_context=False,
+templates_dict=None,
+*args, **kwargs):
+super(PythonSensor, self).__init__(*args, **kwargs)
+self.python_callable = python_callable
+self.op_args = op_args or []
+self.op_kwargs = op_kwargs or {}
+self.provide_context = provide_context
+self.templates_dict = templates_dict
+
+def poke(self, context):
+if self.provide_context:
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+self.op_kwargs = context
+
+self.log.info("Poking callable: " + str(self.python_callable))
+return_value = self.python_callable(*self.op_args, **self.op_kwargs)
+return bool(return_value)
diff --git a/docs/code.rst b/docs/code.rst
index 61414ecbd6..e890adffec 100644
--- a/docs/code.rst
+++ b/docs/code.rst
@@ -256,6 +256,7 @@ Sensors
 .. autoclass:: 
airflow.contrib.sensors.imap_attachment_sensor.ImapAttachmentSensor
 .. autoclass:: airflow.contrib.sensors.jira_sensor.JiraSensor
 .. autoclass:: airflow.contrib.sensors.pubsub_sensor.PubSubPullSensor
+.. autoclass:: airflow.contrib.sensors.python_sensor.PythonSensor
 .. autoclass:: airflow.contrib.sensors.qubole_sensor.QuboleSensor
 .. autoclass:: airflow.contrib.sensors.redis_key_sensor.RedisKeySensor
 .. autoclass:: 
airflow.contrib.sensors.sagemaker_base_sensor.SageMakerBaseSensor
diff --git a/tests/contrib/sensors/test_python_sensor.py 
b/tests/cont

[jira] [Commented] (AIRFLOW-3557) Various typos

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727513#comment-16727513
 ] 

ASF GitHub Bot commented on AIRFLOW-3557:
-

kaxil closed pull request #4357: [AIRFLOW-3557] Fix various typos
URL: https://github.com/apache/incubator-airflow/pull/4357
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CHANGELOG.txt b/CHANGELOG.txt
index abb0563d71..98a1103792 100644
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@@ -24,7 +24,7 @@ Improvements:
 [AIRFLOW-2622] Add "confirm=False" option to SFTPOperator
 [AIRFLOW-2662] support affinity & nodeSelector policies for kubernetes 
executor/operator
 [AIRFLOW-2709] Improve error handling in Databricks hook
-[AIRFLOW-2723] Update lxml dependancy to >= 4.0.
+[AIRFLOW-2723] Update lxml dependency to >= 4.0.
 [AIRFLOW-2763] No precheck mechanism in place during worker initialisation for 
the connection to metadata database
 [AIRFLOW-2789] Add ability to create single node cluster to 
DataprocClusterCreateOperator
 [AIRFLOW-2797] Add ability to create Google Dataproc cluster with custom image
@@ -269,7 +269,7 @@ AIRFLOW 1.10.0, 2018-08-03
 [AIRFLOW-2429] Make Airflow flake8 compliant
 [AIRFLOW-2491] Resolve flask version conflict
 [AIRFLOW-2484] Remove duplicate key in MySQL to GCS Op
-[ARIFLOW-2458] Add cassandra-to-gcs operator
+[AIRFLOW-2458] Add cassandra-to-gcs operator
 [AIRFLOW-2477] Improve time units for task duration and landing times charts 
for RBAC UI
 [AIRFLOW-2474] Only import snakebite if using py2
 [AIRFLOW-48] Parse connection uri querystring
@@ -1504,7 +1504,7 @@ AIRFLOW 1.8.0, 2017-03-12
 [AIRFLOW-784] Pin funcsigs to 1.0.0
 [AIRFLOW-624] Fix setup.py to not import airflow.version as version
 [AIRFLOW-779] Task should fail with specific message when deleted
-[AIRFLOW-778] Fix completey broken MetastorePartitionSensor
+[AIRFLOW-778] Fix completely broken MetastorePartitionSensor
 [AIRFLOW-739] Set pickle_info log to debug
 [AIRFLOW-771] Make S3 logs append instead of clobber
 [AIRFLOW-773] Fix flaky datetime addition in api test
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 556a5d847b..2a60f1dc3c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -166,10 +166,10 @@ There are three ways to setup an Apache Airflow 
development environment.
   tox -e py35-backend_mysql
   ```
 
-  If you wish to run individual tests inside of docker enviroment you can do 
as follows:
+  If you wish to run individual tests inside of Docker environment you can do 
as follows:
 
   ```bash
-# From the container (with your desired enviroment) with druid hook
+# From the container (with your desired environment) with druid hook
 tox -e py35-backend_mysql -- tests/hooks/test_druid_hook.py
  ```
 
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index 5cab013b28..30a16305db 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -1594,7 +1594,7 @@ def insert_all(self, project_id, dataset_id, table_id,
 self.log.info('All row(s) inserted successfully: 
{}:{}.{}'.format(
 dataset_project_id, dataset_id, table_id))
 else:
-error_msg = '{} insert error(s) occured: {}:{}.{}. Details: 
{}'.format(
+error_msg = '{} insert error(s) occurred: {}:{}.{}. Details: 
{}'.format(
 len(resp['insertErrors']),
 dataset_project_id, dataset_id, table_id, 
resp['insertErrors'])
 if fail_on_error:
diff --git a/airflow/contrib/hooks/emr_hook.py 
b/airflow/contrib/hooks/emr_hook.py
index f9fd3f04de..fcdf4ac848 100644
--- a/airflow/contrib/hooks/emr_hook.py
+++ b/airflow/contrib/hooks/emr_hook.py
@@ -23,7 +23,7 @@
 
 class EmrHook(AwsHook):
 """
-Interact with AWS EMR. emr_conn_id is only neccessary for using the
+Interact with AWS EMR. emr_conn_id is only necessary for using the
 create_job_flow method.
 """
 
diff --git a/airflow/executors/celery_executor.py 
b/airflow/executors/celery_executor.py
index 98ce6efba7..10694ea4b7 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -74,7 +74,7 @@ def execute_command(command_to_exec):
 
 class ExceptionWithTraceback(object):
 """
-Wrapper class used to propogate exceptions to parent processes from 
subprocesses.
+Wrapper class used to propagate exceptions to parent processes from 
subprocesses.
 :param exception: The exception to wrap
 :type exception: Exception
 :param traceback: The stacktrace to wrap
diff -

[jira] [Resolved] (AIRFLOW-3557) Various typos

2018-12-22 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3557.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Resolved by https://github.com/apache/incubator-airflow/pull/4357

> Various typos
> -
>
> Key: AIRFLOW-3557
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3557
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bas Harenslak
>Priority: Major
> Fix For: 2.0.0
>
>
> Fix various typos, checked with 
> [misspell|https://github.com/client9/misspell].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3150) Make execution_date a template field in TriggerDagRunOperator

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727510#comment-16727510
 ] 

ASF GitHub Bot commented on AIRFLOW-3150:
-

kaxil opened a new pull request #4359: [AIRFLOW-3150] Make execution_date 
templated in TriggerDagRunOperator
URL: https://github.com/apache/incubator-airflow/pull/4359
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3150
   
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   * `test_trigger_dagrun_with_str_execution_date`
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make execution_date a template field in TriggerDagRunOperator
> -
>
>     Key: AIRFLOW-3150
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3150
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Kyle Hamlin
>Assignee: Kaxil Naik
>    Priority: Minor
>  Labels: easy-fix
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727313#comment-16727313
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH opened a new pull request #4356: [AIRFLOW-3556] Add cross join set 
downstream function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3556
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Add function to set "cross join style" downstream dependencies between two 
list of tasks. For example:
   
   ```
   cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
   
   Sets dependencies:
   t1 --> t4
  \ /
   t2 -X> t5
  / \
   t3 --> t6
   
   Equivalent to:
   t1.set_downstream(t4)
   t1.set_downstream(t5)
   t1.set_downstream(t6)
   t2.set_downstream(t4)
   t2.set_downstream(t5)
   t2.set_downstream(t6)
   t3.set_downstream(t4)
   t3.set_downstream(t5)
   t3.set_downstream(t6)
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   HelpersTest.test_cross_downstream()
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3557) Various typos

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727311#comment-16727311
 ] 

ASF GitHub Bot commented on AIRFLOW-3557:
-

BasPH opened a new pull request #4357: [AIRFLOW-3557] Fix various typos
URL: https://github.com/apache/incubator-airflow/pull/4357
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3557
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   
   Check source code with [misspell](https://github.com/client9/misspell) and 
fixed various typos.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   No changes to code.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   No new functionality.
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Various typos
> -
>
>     Key: AIRFLOW-3557
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3557
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bas Harenslak
>Priority: Major
>
> Fix various typos, checked with 
> [misspell|https://github.com/client9/misspell].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727310#comment-16727310
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH closed pull request #4356: [AIRFLOW-3556] Add cross join set downstream 
function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/utils/helpers.py b/airflow/utils/helpers.py
index 328147c1cf..5f8c88879c 100644
--- a/airflow/utils/helpers.py
+++ b/airflow/utils/helpers.py
@@ -169,6 +169,37 @@ def chain(*tasks):
 up_task.set_downstream(down_task)
 
 
+def cross_downstream(from_tasks, to_tasks):
+"""
+Set downstream dependencies for all tasks in from_tasks to all tasks in 
to_tasks.
+E.g.: cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
+Is equivalent to:
+
+t1 --> t4
+   \ /
+t2 -X> t5
+   / \
+t3 --> t6
+
+t1.set_downstream(t4)
+t1.set_downstream(t5)
+t1.set_downstream(t6)
+t2.set_downstream(t4)
+t2.set_downstream(t5)
+t2.set_downstream(t6)
+t3.set_downstream(t4)
+t3.set_downstream(t5)
+t3.set_downstream(t6)
+
+:param from_tasks: List of tasks to start from.
+:type from_tasks: List[airflow.models.BaseOperator]
+:param to_tasks: List of tasks to set as downstream dependencies.
+:type to_tasks: List[airflow.models.BaseOperator]
+"""
+for task in from_tasks:
+task.set_downstream(to_tasks)
+
+
 def pprinttable(rows):
 """Returns a pretty ascii table from tuples
 
diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index 4cb3e1a1fc..837a79acba 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -20,11 +20,16 @@
 import logging
 import multiprocessing
 import os
-import psutil
 import signal
 import time
 import unittest
+from datetime import datetime
+
+import psutil
+import six
 
+from airflow import DAG
+from airflow.operators.dummy_operator import DummyOperator
 from airflow.utils import helpers
 
 
@@ -210,6 +215,16 @@ def test_is_container(self):
 # Pass an object that is not iter nor a string.
 self.assertFalse(helpers.is_container(10))
 
+def test_cross_downstream(self):
+"""Test if all dependencies between tasks are all set correctly."""
+dag = DAG(dag_id="test_dag", start_date=datetime.now())
+start_tasks = [DummyOperator(task_id="t{i}".format(i=i), dag=dag) for 
i in range(1, 4)]
+end_tasks = [DummyOperator(task_id="t{i}".format(i=i), dag=dag) for i 
in range(4, 7)]
+helpers.cross_downstream(from_tasks=start_tasks, to_tasks=end_tasks)
+
+for start_task in start_tasks:
+six.assertCountEqual(self, 
start_task.get_direct_relatives(upstream=False), end_tasks)
+
 
 if __name__ == '__main__':
 unittest.main()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3557) Various typos

2018-12-22 Thread Bas Harenslak (JIRA)
Bas Harenslak created AIRFLOW-3557:
--

 Summary: Various typos
 Key: AIRFLOW-3557
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3557
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Bas Harenslak


Fix various typos, checked with [misspell|https://github.com/client9/misspell].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3153) send dag last_run to statsd

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727178#comment-16727178
 ] 

ASF GitHub Bot commented on AIRFLOW-3153:
-

stale[bot] closed pull request #3997: [AIRFLOW-3153] send dag last_run to statsd
URL: https://github.com/apache/incubator-airflow/pull/3997
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index da1089d690..94ec4458d8 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -580,7 +580,8 @@ def __init__(
 self.using_sqlite = False
 if 'sqlite' in conf.get('core', 'sql_alchemy_conn'):
 if self.max_threads > 1:
-self.log.error("Cannot use more than 1 thread when using 
sqlite. Setting max_threads to 1")
+self.log.error("Cannot use more than 1 thread when using 
sqlite. "
+   "Setting max_threads to 1")
 self.max_threads = 1
 self.using_sqlite = True
 
@@ -1026,7 +1027,8 @@ def _change_state_for_tis_without_dagrun(self,
 
 if tis_changed > 0:
 self.log.warning(
-"Set %s task instances to state=%s as their associated DagRun 
was not in RUNNING state",
+"Set %s task instances to state=%s "
+"as their associated DagRun was not in RUNNING state",
 tis_changed, new_state
 )
 
@@ -1201,7 +1203,8 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
   " this task has been reached.", 
task_instance)
 continue
 else:
-task_concurrency_map[(task_instance.dag_id, 
task_instance.task_id)] += 1
+task_concurrency_map[(task_instance.dag_id,
+  task_instance.task_id)] += 1
 
 if self.executor.has_task(task_instance):
 self.log.debug(
@@ -1505,6 +1508,8 @@ def _log_file_processing_stats(self,
"Last Run"]
 
 rows = []
+dags_folder = conf.get('core', 'dags_folder').rstrip(os.sep)
+
 for file_path in known_file_paths:
 last_runtime = processor_manager.get_last_runtime(file_path)
 processor_pid = processor_manager.get_pid(file_path)
@@ -1513,6 +1518,16 @@ def _log_file_processing_stats(self,
if processor_start_time else None)
 last_run = processor_manager.get_last_finish_time(file_path)
 
+file_name = file_path[len(dags_folder) + 1:]
+dag_name = os.path.splitext(file_name)[0].replace(os.sep, '.')
+if last_runtime is not None:
+Stats.gauge('last_runtime.{}'.format(dag_name), last_runtime)
+if last_run is not None:
+unixtime = last_run.strftime("%s")
+seconds_ago = (timezone.utcnow() - last_run).total_seconds()
+Stats.gauge('last_run.unixtime.{}'.format(dag_name), unixtime)
+Stats.gauge('last_run.seconds_ago.{}'.format(dag_name), 
seconds_ago)
+
 rows.append((file_path,
  processor_pid,
  runtime,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> send dag last_run to statsd
> ---
>
> Key: AIRFLOW-3153
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3153
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727090#comment-16727090
 ] 

ASF GitHub Bot commented on AIRFLOW-3556:
-

BasPH opened a new pull request #4356: [AIRFLOW-3556] Add cross join set 
dependency function
URL: https://github.com/apache/incubator-airflow/pull/4356
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3556
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Add function to set "cross join style" downstream dependencies between two 
list of tasks. For example:
   
   ```
   cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
   
   Sets dependencies:
   t1 --> t4
  \ /
   t2 -X> t5
  / \
   t3 --> t6
   
   Equivalent to:
   t1.set_downstream(t4)
   t1.set_downstream(t5)
   t1.set_downstream(t6)
   t2.set_downstream(t4)
   t2.set_downstream(t5)
   t2.set_downstream(t6)
   t3.set_downstream(t4)
   t3.set_downstream(t5)
   t3.set_downstream(t6)
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   HelpersTest.test_cross_downstream()
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a "cross join" function for setting dependencies between two lists of 
> tasks
> ---
>
> Key: AIRFLOW-3556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Bas Harenslak
>Priority: Major
>
> Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
> function that sets downstream dependencies in a cross join fashion between 
> two lists of tasks.
> For example:
> {code}
> cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
> Sets dependencies:
> t1 --> t4
>\ /
> t2 -X> t5
>/ \
> t3 --> t6
> Equivalent to:
> t1.set_downstream(t4)
> t1.set_downstream(t5)
> t1.set_downstream(t6)
> t2.set_downstream(t4)
> t2.set_downstream(t5)
> t2.set_downstream(t6)
> t3.set_downstream(t4)
> t3.set_downstream(t5)
> t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3556) Add a "cross join" function for setting dependencies between two lists of tasks

2018-12-21 Thread Bas Harenslak (JIRA)
Bas Harenslak created AIRFLOW-3556:
--

 Summary: Add a "cross join" function for setting dependencies 
between two lists of tasks
 Key: AIRFLOW-3556
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3556
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Bas Harenslak


Similar to airflow.utils.helpers.chain(), it would be useful to have a helper 
function that sets downstream dependencies in a cross join fashion between two 
lists of tasks.

For example:
{code}
cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])

Sets dependencies:
t1 --> t4
   \ /
t2 -X> t5
   / \
t3 --> t6

Equivalent to:
t1.set_downstream(t4)
t1.set_downstream(t5)
t1.set_downstream(t6)
t2.set_downstream(t4)
t2.set_downstream(t5)
t2.set_downstream(t6)
t3.set_downstream(t4)
t3.set_downstream(t5)
t3.set_downstream(t6){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3526) Error is thrown while converting Lables to Key in K8 executor

2018-12-21 Thread Chengzhi Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727018#comment-16727018
 ] 

Chengzhi Zhao commented on AIRFLOW-3526:


We did upgrade to 1.10.1, we are also using k8s executor had a similar issue. 
We had to rollback to 1.10.0. I was wondering if it is related to 
[https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/kubernetes/worker_configuration.py#L219-L224]
 which missing `try_number`

> Error is thrown while converting Lables to Key in K8 executor
> -
>
> Key: AIRFLOW-3526
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3526
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor, kubernetes, scheduler
>Affects Versions: 1.10.1
>Reporter: raman
>Priority: Major
>
> Following error is thrown continuously  in 
> _labels_to_key(self, labels) functon in kubernetes_executor
> {kubernetes_executor.py:455} WARNING - Error while converting labels to key; 
> labels: {u'execution_date': '2018-12-11T00_00_00_plus_00_00', 
> u'airflow-worker': 'ba2dfe4e-d503-490f-ad49-60d7f48efeb6', u'task_id': 
> 'test_task_1', u'dag_id': 'perf_test_500_10_92'}; exception: 'try_number'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3446) Add operators for Google Cloud BigTable

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726969#comment-16726969
 ] 

ASF GitHub Bot commented on AIRFLOW-3446:
-

DariuszAniszewski opened a new pull request #4354: [AIRFLOW-3446] Add Google 
Cloud BigTable operators
URL: https://github.com/apache/incubator-airflow/pull/4354
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following 
[AIRFLOW-3446](https://issues.apache.org/jira/browse/AIRFLOW-3446/) issues and 
references them in the PR title. 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   New operators allows:
   * creating and deleting instance
   * creating and deleting table
   * updating cluster
   * waiting for table replication (sensor)
   
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   * tests/contrib/operators/test_gcp_bigtable_operator.py
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add operators for Google Cloud BigTable
> ---
>
> Key: AIRFLOW-3446
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3446
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dariusz Aniszewski
>Assignee: Dariusz Aniszewski
>Priority: Major
>
> Proposed operators:
>  * BigTableInstanceCreateOperator
>  * BigTableInstanceDeleteOperator
>  * BigTableTableCreateOperator
>  * BigTableTableDeleteOperator
>  * BigTableClusterUpdateOperator
>  * BigTableTableWaitForReplicationSensor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3480) Google Cloud Spanner Instance Database Deploy/Update/Delete

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726905#comment-16726905
 ] 

ASF GitHub Bot commented on AIRFLOW-3480:
-

potiuk opened a new pull request #4353: [AIRFLOW-3480] Added Database 
Deploy/Update/Delete operators
URL: https://github.com/apache/incubator-airflow/pull/4353
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/AIRFLOW-3480) issue and 
references them in the PR title. 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Added Database Deploy/Update/Delete operators for Google Cloud Spanner
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   
   - test_database_create
   - test_database_create_with_pre_existing_db
   - test_database_create_ex_if_param_missing(parameterised)
   - test_database_update
   - test_database_update_ex_if_param_missing(parameterised)
   - test_database_update_ex_if_database_not_exist
   - test_database_delete
   - test_database_delete_exits_and_succeeds_if_database_does_not_exist
   - test_database_delete_ex_if_param_missing (parameterised)

   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Google Cloud Spanner Instance Database Deploy/Update/Delete
> ---
>
> Key: AIRFLOW-3480
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3480
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Reporter: Jarek Potiuk
>Assignee: Jarek Potiuk
>Priority: Minor
>
> We need to have operators to implement Instance management operations:
>  * InstanceDeploy (create database if it does not exist, succeed if already 
> created(
>  * Update (run update_ddl method changing database structure)
>  * Delete (delete the database)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-21 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726900#comment-16726900
 ] 

Felix Uellendall commented on AIRFLOW-3551:
---

I can't find it in the BaseOperator's implementation. So at the moment it is 
BashOperator specific.

Do you know how I can access the return value of the execute function when the 
execute function is not explicity getting called like in the test case?

> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3555) Remove lxml dependency

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726898#comment-16726898
 ] 

ASF GitHub Bot commented on AIRFLOW-3555:
-

jcao219 opened a new pull request #4352: [AIRFLOW-3555] Remove lxml dependency
URL: https://github.com/apache/incubator-airflow/pull/4352
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3555
   
   ### Description
   
   - [x] The lxml dependency is no longer needed except for when running tests.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason: Dependency clean up
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove lxml dependency
> --
>
>     Key: AIRFLOW-3555
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3555
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
>Reporter: Jimmy Cao
>Assignee: Jimmy Cao
>Priority: Major
>
> In this PR: 
> [https://github.com/apache/incubator-airflow/pull/1712/files#diff-948e87b4f8f644b3ad8c7950958df033]
>  lxml was added to airflow/www/views.py, and then in this following PR: 
> [https://github.com/apache/incubator-airflow/pull/1722]  the lxml package was 
> added to the list of core dependencies.
> However, months later in this commit: 
> [https://github.com/apache/incubator-airflow/commit/1accb54ff561b8d745277308447dd6f9d3e9f8d5#diff-948e87b4f8f644b3ad8c7950958df033]
>  the lxml import was removed from airflow/www/views.py so it is no longer 
> needed except in the devel extras because it's still used in tests.
> It should be removed from the install_requires list.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2716) Replace new Python 3.7 keywords

2018-12-21 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726882#comment-16726882
 ] 

Edward Capriolo commented on AIRFLOW-2716:
--

 I have 1.10.1 and run into this with python 3.7

  File 
"/Users/edwardcapriolo/Documents/airflow/venv/lib/python3.7/site-packages/tenacity/__init__.py",
 line 352

    from tenacity.async import AsyncRetrying

                      ^

SyntaxError: invalid syntax

> Replace new Python 3.7 keywords
> ---
>
> Key: AIRFLOW-2716
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2716
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Jacob Hayes
>Assignee: Jacob Hayes
>Priority: Major
> Fix For: 1.10.1, 2.0.0
>
>
> Python 3.7 added `async` and `await` as reserved keywords, so they need to be 
> replaced with alternative names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3554) Remove contrib folder from being omitted by code cov

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726772#comment-16726772
 ] 

ASF GitHub Bot commented on AIRFLOW-3554:
-

feluelle opened a new pull request #4351: [AIRFLOW-3554] Remove contrib folder 
from code cov omit list
URL: https://github.com/apache/incubator-airflow/pull/4351
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3554
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Currently the `contrib` folder is not being processed by codecov.
   That means that contributors won't see a code coverage of their implemented 
code in this folder.
   To generally improve code/test coverage for this project I would recommend 
to enable coverage for the `contrib` folder, too.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove contrib folder from being omitted by code cov
> 
>
>     Key: AIRFLOW-3554
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3554
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726770#comment-16726770
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3551:


pre_exec is not specific to the BashOperator, but part of the BaseOperator. 
Testing it would be good though :)

> Improve BashOperator Test Coverage
> --
>
> Key: AIRFLOW-3551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
> Project: Apache Airflow
>  Issue Type: Test
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Minor
>
> The current tests for the `BashOperator` are not covering
> * pre_exec
> * xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-549) Scheduler child logs are created out of normal location

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-549.
---
Resolution: Fixed

The logging config was massively reworked around 1.9.0 so I'm saying this will 
not be an issue anymore.

> Scheduler child logs are created out of normal location
> ---
>
> Key: AIRFLOW-549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-549
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Paul Yang
>Priority: Major
>
> The new scheduler has childs logging in their own log file. The location of 
> the log files are set outside of the cli configurable locations making it 
> inconsistent with other log configurations in airflow. In addition the log 
> files are by default created in /tmp which is a non standard location for log 
> files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3545) Can't use Prometheus or other pull based instrumentation systems to monitor Tasks launched on Kubernetes

2018-12-21 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726763#comment-16726763
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3545:


Right now this is not possible, and due to the heavy use of (sub)processes by 
Airflow this isn't a trivial change.

I'd suggest taking a look at https://github.com/prometheus/statsd_exporter

> Can't use Prometheus or other pull based instrumentation systems to monitor 
> Tasks launched on Kubernetes
> 
>
> Key: AIRFLOW-3545
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3545
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Aditya Vishwakarma
>Priority: Major
>
> Prometheus, which is a common way to instrument services on Kubernetes, uses 
> a pull based mechanism to fetch metrics. This  involves a service exposing a 
> `/metrics` endpoint. This endpoint is scraped every 30 secs by prometheus to 
> collect metrics.
> This requires a port to be specified in the generated Pod config. Something 
> like below.
> {code:java}
> // Sample Pod Spec
> apiVersion: v1
> kind: Job
> metadata:
>   name: batch-job
> spec:
>   ports:
>   - name: metrics
> port: 9091 # port to fetch metrics from
> protocol: TCP
> targetPort: 9091
> {code}
> Currently KubernetesPodOperator doesn't have any options to open ports like 
> this.
> Is it possible to have an option to do this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3527) Cloud SQL proxy with UNIX sockets might lead to too long socket path

2018-12-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726752#comment-16726752
 ] 

ASF GitHub Bot commented on AIRFLOW-3527:
-

potiuk opened a new pull request #4350: [AIRFLOW-3527] Cloud SQL Proxy has 
shorter path for UNIX socket
URL: https://github.com/apache/incubator-airflow/pull/4350
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3527)
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   There is a limitation of UNIX socket path length as described in
   
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04
   
   Cloud SQL Proxy uses generated path and it can get longer than the limit
   in case of POSTGRES connections especially (POSTGRES adds few characters on
   its own). The error returned by sqlproxy in this case is pretty vague
   (invalid path) - it makes it difficult to understand the problem by
   the user.
   
   This commit fixes it in two ways:
   * makes it less likely that the path length will be exceeded
   by shorter random string generated for the socket directory.
   * raises an Error in case of calculated path is too long
   ### Tests
   
   - [x] My PR adds the following unit tests:
   CloudSqlQueryValidationTest:
   * test_create_operator_with_too_long_unix_socket_path
   * test_create_operator_with_not_too_long_unix_socket_path
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] No documentation update is needed.
   
   ### Code Quality
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cloud SQL proxy with UNIX sockets might lead to too long socket path
> 
>
> Key: AIRFLOW-3527
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3527
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jarek Potiuk
>Priority: Major
>
> Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
> /tmp/\{UDID1}/folder - which in case of postgres and long instance names 
> might lead to too long name of UNIX socket (the path length for socket is 
> limited to 108 characters in Linux). 
> [http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04]
> However in case instance name is long enough that leads to too long path 
> (which turns to be fairly short - instance names can often exceed 20-30 
> characters)  and a cryptic "invalid path name" error. Therefor we need to 
> 1) generate the path with shorter random number prefix. 8 characters should 
> be random enough + we can check if the generated path did not exist already 
> and generate another one if that's the case.
> 2) fail validation in case the generated path is too long and propose a 
> solution (shorter names or switching to TCP).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3554) Remove contrib folder from being omitted by code cov

2018-12-21 Thread Felix Uellendall (JIRA)
Felix Uellendall created AIRFLOW-3554:
-

 Summary: Remove contrib folder from being omitted by code cov
 Key: AIRFLOW-3554
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3554
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Felix Uellendall
Assignee: Felix Uellendall






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3553) Microseconds in manually triggered tasks break "mark as success"

2018-12-21 Thread Jozef Fekiac (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jozef Fekiac updated AIRFLOW-3553:
--
Description: 
 

If a user wants to mark success on a dagRun with microseconds (I.e. manually 
triggered from GUI), user can't mark tasks as success. 

 

in 1.9, replace microseconds is a default behaviour, as per code, but 

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/www/views.py#L915]

doesn't handle the microseconds, setting run_id

propagating it to

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/api/common/experimental/trigger_dag.py#L37]

resulting in no run_id alteration, => it's with microseconds, disabling the 
"mark as success function" from GUI

  was:
 

If a user wants to mark success on a dagRun with microseconds (I.e. manually 
triggered from GUI), user can't mark tasks as success. 

 

in 1.9, replace microseconds is a default behaviour, as per code, but 

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/www/views.py#L915]

doesn't handle the microseconds, setting run_id

resulting in **

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/api/common/experimental/trigger_dag.py#L37]

that has run_id with microseconds, disabling the "mark as success function".


> Microseconds in manually triggered tasks break "mark as success" 
> -
>
> Key: AIRFLOW-3553
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3553
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.1, 2.0.0
>Reporter: Jozef Fekiac
>Priority: Major
>
>  
> If a user wants to mark success on a dagRun with microseconds (I.e. manually 
> triggered from GUI), user can't mark tasks as success. 
>  
> in 1.9, replace microseconds is a default behaviour, as per code, but 
> [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/www/views.py#L915]
> doesn't handle the microseconds, setting run_id
> propagating it to
> [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/api/common/experimental/trigger_dag.py#L37]
> resulting in no run_id alteration, => it's with microseconds, disabling the 
> "mark as success function" from GUI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3553) Microseconds in manually triggered tasks break "mark as success"

2018-12-21 Thread Jozef Fekiac (JIRA)
Jozef Fekiac created AIRFLOW-3553:
-

 Summary: Microseconds in manually triggered tasks break "mark as 
success" 
 Key: AIRFLOW-3553
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3553
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10.1, 1.9.0, 2.0.0
Reporter: Jozef Fekiac


 

If a user wants to mark success on a dagRun with microseconds (I.e. manually 
triggered from GUI), user can't mark tasks as success. 

 

in 1.9, replace microseconds is a default behaviour, as per code, but 

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/www/views.py#L915]

doesn't handle the microseconds, setting run_id

resulting in **

[https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/api/common/experimental/trigger_dag.py#L37]

that has run_id with microseconds, disabling the "mark as success function".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3552) Add ImapToS3TransferOperator

2018-12-21 Thread Felix Uellendall (JIRA)
Felix Uellendall created AIRFLOW-3552:
-

 Summary: Add ImapToS3TransferOperator
 Key: AIRFLOW-3552
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3552
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Felix Uellendall
Assignee: Felix Uellendall


This operator transfers mail attachments from a mail server to an amazon s3 
bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3551) Improve BashOperator Test Coverage

2018-12-21 Thread Felix Uellendall (JIRA)
Felix Uellendall created AIRFLOW-3551:
-

 Summary: Improve BashOperator Test Coverage
 Key: AIRFLOW-3551
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3551
 Project: Apache Airflow
  Issue Type: Test
Reporter: Felix Uellendall
Assignee: Felix Uellendall


The current tests for the `BashOperator` are not covering
* pre_exec
* xcom_push_flag



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3527) Cloud SQL proxy with UNIX sockets might lead to too long socket path

2018-12-21 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk updated AIRFLOW-3527:
--
Description: 
Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
/tmp/\{UDID1}/folder - which in case of postgres and long instance names might 
lead to too long name of UNIX socket (the path length for socket is limited to 
around 108 characters in Linux). 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04

However in case instance name is long enough that leads to too long path (which 
turns to be fairly short - instance names can often exceed 20-30 characters)  
and a cryptic "invalid path name" error. Therefor we need to 

1) generate the path with shorter random number prefix. 8 characters should be 
random enough + we can check if the generated path did not exist already and 
generate another one if that's the case.

2) fail validation in case the generated path is too long and propose a 
solution (shorter names or switching to TCP).

  was:
Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
/tmp/\{UDID1}/folder - which in case of postgres and long instance names might 
lead to too long name of UNIX socket (the path length for socket is limited to 
around 108 characters in Linux).

However in case instance name is long enough that leads to too long path (which 
turns to be fairly short - instance names can often exceed 20-30 characters)  
and a cryptic "invalid path name" error. Therefor we need to 

1) generate the path with shorter random number prefix. 8 characters should be 
random enough + we can check if the generated path did not exist already and 
generate another one if that's the case.

2) fail validation in case the generated path is too long and propose a 
solution (shorter names or switching to TCP).


> Cloud SQL proxy with UNIX sockets might lead to too long socket path
> 
>
> Key: AIRFLOW-3527
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3527
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jarek Potiuk
>Priority: Major
>
> Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
> /tmp/\{UDID1}/folder - which in case of postgres and long instance names 
> might lead to too long name of UNIX socket (the path length for socket is 
> limited to around 108 characters in Linux). 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04
> However in case instance name is long enough that leads to too long path 
> (which turns to be fairly short - instance names can often exceed 20-30 
> characters)  and a cryptic "invalid path name" error. Therefor we need to 
> 1) generate the path with shorter random number prefix. 8 characters should 
> be random enough + we can check if the generated path did not exist already 
> and generate another one if that's the case.
> 2) fail validation in case the generated path is too long and propose a 
> solution (shorter names or switching to TCP).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3527) Cloud SQL proxy with UNIX sockets might lead to too long socket path

2018-12-21 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk updated AIRFLOW-3527:
--
Description: 
Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
/tmp/\{UDID1}/folder - which in case of postgres and long instance names might 
lead to too long name of UNIX socket (the path length for socket is limited to 
108 characters in Linux). 
[http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04]

However in case instance name is long enough that leads to too long path (which 
turns to be fairly short - instance names can often exceed 20-30 characters)  
and a cryptic "invalid path name" error. Therefor we need to 

1) generate the path with shorter random number prefix. 8 characters should be 
random enough + we can check if the generated path did not exist already and 
generate another one if that's the case.

2) fail validation in case the generated path is too long and propose a 
solution (shorter names or switching to TCP).

  was:
Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
/tmp/\{UDID1}/folder - which in case of postgres and long instance names might 
lead to too long name of UNIX socket (the path length for socket is limited to 
around 108 characters in Linux). 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04

However in case instance name is long enough that leads to too long path (which 
turns to be fairly short - instance names can often exceed 20-30 characters)  
and a cryptic "invalid path name" error. Therefor we need to 

1) generate the path with shorter random number prefix. 8 characters should be 
random enough + we can check if the generated path did not exist already and 
generate another one if that's the case.

2) fail validation in case the generated path is too long and propose a 
solution (shorter names or switching to TCP).


> Cloud SQL proxy with UNIX sockets might lead to too long socket path
> 
>
> Key: AIRFLOW-3527
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3527
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jarek Potiuk
>Priority: Major
>
> Currently Cloud SQL Proxy with UNIX sockets creates the proxy dir in 
> /tmp/\{UDID1}/folder - which in case of postgres and long instance names 
> might lead to too long name of UNIX socket (the path length for socket is 
> limited to 108 characters in Linux). 
> [http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html#tag_13_67_04]
> However in case instance name is long enough that leads to too long path 
> (which turns to be fairly short - instance names can often exceed 20-30 
> characters)  and a cryptic "invalid path name" error. Therefor we need to 
> 1) generate the path with shorter random number prefix. 8 characters should 
> be random enough + we can check if the generated path did not exist already 
> and generate another one if that's the case.
> 2) fail validation in case the generated path is too long and propose a 
> solution (shorter names or switching to TCP).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726474#comment-16726474
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


feng-tao closed pull request #2058: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/2058
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/operators/sensors.py b/airflow/operators/sensors.py
index 44a97e00c1..bf02335a95 100644
--- a/airflow/operators/sensors.py
+++ b/airflow/operators/sensors.py
@@ -679,3 +679,57 @@ def poke(self, context):
 raise ae
 
 return True
+
+class PythonSensor(BaseSensorOperator):
+"""
+Waits for a Python callable to return True
+
+:param python_callable: A reference to an object that is callable
+:type python_callable: python callable
+:param op_kwargs: a dictionary of keyword arguments that will get unpacked
+in your function
+:type op_kwargs: dict
+:param op_args: a list of positional arguments that will get unpacked when
+calling your callable
+:type op_args: list
+:param provide_context: if set to true, Airflow will pass a set of
+keyword arguments that can be used in your function. This set of
+kwargs correspond exactly to what you can use in your jinja
+templates. For this to work, you need to define `**kwargs` in your
+function header.
+:type provide_context: bool
+:param templates_dict: a dictionary where the values are templates that
+will get templated by the Airflow engine sometime between
+``__init__`` and ``execute`` takes place and are made available
+in your callable's context after the template has been applied
+:type templates_dict: dict of str
+"""
+
+template_fields = ('templates_dict',)
+template_ext = tuple()
+
+def __init__(
+self,
+python_callable,
+op_args=None,
+op_kwargs=None,
+provide_context=False,
+templates_dict=None,
+*args, **kwargs):
+super(PythonSensor, self).__init__(*args, **kwargs)
+self.python_callable = python_callable
+self.op_args = op_args or []
+self.op_kwargs = op_kwargs or {}
+self.provide_context = provide_context
+self.templates_dict = templates_dict
+
+
+def poke(self, context):
+if self.provide_context:
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+self.op_kwargs = context
+
+logging.info("Poking callable: " + str(self.python_callable))
+return_value = self.python_callable(*self.op_args, **self.op_kwargs)
+return bool(return_value)
diff --git a/tests/operators/sensors.py b/tests/operators/sensors.py
index e77216b580..2633e4c41b 100644
--- a/tests/operators/sensors.py
+++ b/tests/operators/sensors.py
@@ -22,7 +22,7 @@
 from datetime import datetime, timedelta
 
 from airflow import DAG, configuration
-from airflow.operators.sensors import HttpSensor, BaseSensorOperator, 
HdfsSensor
+from airflow.operators.sensors import HttpSensor, BaseSensorOperator, 
HdfsSensor, PythonSensor
 from airflow.utils.decorators import apply_defaults
 from airflow.exceptions import (AirflowException,
 AirflowSensorTimeout,
@@ -181,3 +181,38 @@ def test_legacy_file_does_not_exists(self):
 # Then
 with self.assertRaises(AirflowSensorTimeout):
 task.execute(None)
+
+class PythonSensorTests(unittest.TestCase):
+
+def setUp(self):
+configuration.load_test_config()
+args = {
+'owner': 'airflow',
+'start_date': DEFAULT_DATE
+}
+dag = DAG(TEST_DAG_ID, default_args=args)
+self.dag = dag
+
+def test_python_sensor_true(self):
+t = PythonSensor(
+task_id='python_sensor_check_true',
+python_callable=lambda: True,
+dag=self.dag)
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+
+def test_python_sensor_false(self):
+t = PythonSensor(
+task_id='python_sensor_check_false',
+timeout=1,
+python_callable=lambda: False,
+dag=self.dag)
+with self.assertRaises(AirflowSensorTimeout):
+t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, 
ignore_ti_state=True)
+
+def test_python_sensor_raise(self):
+t = PythonSensor(
+task_id='python_sensor_check_raise',
+python_callable=lambda: 1/0,
+dag=s

[jira] [Commented] (AIRFLOW-850) Airflow should support a general purpose PythonSensor

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726472#comment-16726472
 ] 

ASF GitHub Bot commented on AIRFLOW-850:


feng-tao opened a new pull request #4349: [AIRFLOW-850] Add a PythonSensor
URL: https://github.com/apache/incubator-airflow/pull/4349
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-850
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   A general purpose PythonSensor which allows an arbitrary Python callable to 
delay Task execution until the callable returns True. This is based on a stale 
pr(https://github.com/apache/incubator-airflow/pull/2058)
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   yes
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Airflow should support a general purpose PythonSensor
> -
>
> Key: AIRFLOW-850
> URL: https://issues.apache.org/jira/browse/AIRFLOW-850
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Daniel Gies
>Assignee: Daniel Gies
>Priority: Major
>
> Today I found myself trying to use a sensor to postpone execution until data 
> for the current execution date appeared in a file.  It occurred to me that 
> having a general purpose PythonSensor would allow developers to use the 
> sensor paradigm with arbitrary code.
> We should add a PythonSensor to the core sensors module which takes a 
> python_callable and optional args like the PythonOperator does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id

2018-12-20 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3550:


 Summary: GKEClusterHook doesn't use gcp_conn_id
 Key: AIRFLOW-3550
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3550
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Affects Versions: 1.10.1, 1.10.0
Reporter: Wilson Lian


The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using the 
default service account (if present).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3549) Please don't log the content of the downloaded gcs file

2018-12-20 Thread joyce chan (JIRA)
joyce chan created AIRFLOW-3549:
---

 Summary: Please don't log the content of the downloaded gcs file
 Key: AIRFLOW-3549
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3549
 Project: Apache Airflow
  Issue Type: Bug
  Components: gcp
Affects Versions: 1.10.1
Reporter: joyce chan


This line here prints to the log the whole content of the file that was 
downloaded from GS, but sometimes, our files are huge, then everything will get 
saved to the log unnecessarily.

https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_download_operator.py#L91



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3549) Please don't log the content of the downloaded Google Storage file

2018-12-20 Thread joyce chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

joyce chan closed AIRFLOW-3549.
---
Resolution: Fixed

> Please don't log the content of the downloaded Google Storage file
> --
>
> Key: AIRFLOW-3549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3549
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.1
>Reporter: joyce chan
>Priority: Major
>
> This line here prints to the log the whole content of the file that was 
> downloaded from GS, but sometimes, our files are huge, then everything will 
> get saved to the log unnecessarily.
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_download_operator.py#L91



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3549) Please don't log the content of the downloaded Google Storage file

2018-12-20 Thread joyce chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

joyce chan updated AIRFLOW-3549:

Summary: Please don't log the content of the downloaded Google Storage file 
 (was: Please don't log the content of the downloaded gcs file)

> Please don't log the content of the downloaded Google Storage file
> --
>
> Key: AIRFLOW-3549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3549
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.1
>Reporter: joyce chan
>Priority: Major
>
> This line here prints to the log the whole content of the file that was 
> downloaded from GS, but sometimes, our files are huge, then everything will 
> get saved to the log unnecessarily.
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/gcs_download_operator.py#L91



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3445) MariaDB explicit_defaults_for_timestamp = 1 Does not work.

2018-12-20 Thread Gerardo (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726178#comment-16726178
 ] 

Gerardo commented on AIRFLOW-3445:
--

This has become a major problem for our team. We followed the patch suggested 
above; however, we've observed some strange behavior when triggering a DagRun 
from the interface:

Logs:

WARNING - Set 1 task instances to state=None as their associated Dag Run is not 
in running state
(https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1023)

It will repeat this warning ad infinitum until the dag is canceled. Looking at 
the task instances table in our MariaDB reveals that new task instances are 
filling up the table, each with utc now() for the time.

According to the above file 
(https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1023),
 if a task instance cannot successfully be joined onto a dag run with the same 
ID and execute_date, it will set its status to None, thereby canceling our 
DagRun.

It's unclear to me why the first TaskInstance would have a different 
execute_date than our dag run, but clearly it must. I have a strong suspicion 
that the Task Instance is being created in the database with a NULL 
execute_date, which MariaDB will fill with the current timestamp. See: 
execution_date=sa.text('CURRENT_TIMESTAMP(6)') in Roger/Feng Lu's patch.

What else could be going wrong here? 

> MariaDB explicit_defaults_for_timestamp = 1 Does not work.
> --
>
> Key: AIRFLOW-3445
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3445
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.1
> Environment: Hosted VM on the Google Cloud Platform, Compute Engine:
> Machine type: n1-standard-2 (2 vCPUs, 7.5 GB memory)
> Operating System  CentOS
>Reporter: Conor Molloy
>Priority: Blocker
> Fix For: 1.10.2
>
>
> {{Running into an issue when running }}
> {{`airflow upgradedb`}}
> {{ going from `1.9` -> `1.10.1`}}
> {{}}
> {code:java}
> `sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1193, 
> "Unknown system variable 'explicit_defaults_for_timestamp'") [SQL: 'SELECT 
> @@explicit_defaults_for_timestamp']`{code}
> {{I saw this link on the airflow website.}}
> {{[https://airflow.readthedocs.io/en/stable/faq.html#how-to-fix-exception-global-variable-explicit-defaults-for-timestamp-needs-to-be-on-1|http://example.com]}}
> {{Here it says you can set}}
> {code:java}
> `explicit_defaults_for_timestamp = 1`{code}
> {{in the _my.cnf_ file. However I am using Mariadb and when I add this to the 
> _my.cnf_ file the}}
> {noformat}
> mariadb.service{noformat}
> {{fails to start up. Has anyone else come across this issue?}}
>  
> The output from
> {code:java}
> `SHOW VARIABLES like '%version%'`{code}
> was
> {code:java}
> `+-+--+`
> `| Variable_name | Value |`
> `+-+--+`
> `| innodb_version | 5.5.59-MariaDB-38.11 |`
> `| protocol_version | 10 |`
> `| slave_type_conversions | |`
> `| version | 5.5.60-MariaDB |`
> `| version_comment | MariaDB Server |`
> `| version_compile_machine | x86_64 |`
> `| version_compile_os | Linux |`
> `+-+--+`{code}
> The MariaDB does not have the argument as it is a MySQL only feature.
> [https://mariadb.com/kb/en/library/system-variable-differences-between-mariadb-100-and-mysql-56/|http://example.com]
> There may need to be a check for MariaDB before upgrading, as mentioned by 
> Ash in this Slack thread. 
> [https://apache-airflow.slack.com/archives/CCQB40SQJ/p1543918149008100|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3548) Tasks instances doesn't reuse mysql connection

2018-12-20 Thread Rami Darwish (JIRA)
Rami Darwish created AIRFLOW-3548:
-

 Summary: Tasks instances doesn't reuse mysql connection
 Key: AIRFLOW-3548
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3548
 Project: Apache Airflow
  Issue Type: Improvement
  Components: worker
Affects Versions: 1.10.1
Reporter: Rami Darwish


Task instances keep opening a new connection to mysql every "job_heartbeat_sec" 
interval. Ideally, it should open 1 connection for the life span of the task 
until it finished executing. It seems to ignore sql_alchemy_pool_enabled = True

We're using airflow 1.10.1, mysql 5.7, SQLAlchemy 1.1.18, Python 2.7.12



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3163) Add set table description operator to BigQuery operators

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725931#comment-16725931
 ] 

ASF GitHub Bot commented on AIRFLOW-3163:
-

stale[bot] closed pull request #4003: [AIRFLOW-3163] add operator to enable 
setting table description in BigQuery table
URL: https://github.com/apache/incubator-airflow/pull/4003
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index dd77df1283..ccbb36dbd4 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -135,6 +135,34 @@ def table_exists(self, project_id, dataset_id, table_id):
 return False
 raise
 
+def set_table_description(self, dataset_id, table_id, description, 
project_id=None):
+"""
+Sets the description for the given table
+
+:param project_id: The Google cloud project in which to look for the
+table. The connection supplied to the hook must provide access to
+the specified project.
+:type project_id: string
+:param dataset_id: The name of the dataset in which to look for the
+table.
+:type dataset_id: string
+:param table_id: The name of the table to set the description for.
+:type table_id: string
+:param description: The description to set
+:type description: string
+"""
+service = self.get_service()
+project_id = project_id if project_id is not None else 
self._get_field('project')
+table = service.tables().get(
+projectId=project_id, datasetId=dataset_id,
+tableId=table_id).execute()
+table['description'] = description
+service.tables().patch(
+projectId=project_id,
+datasetId=dataset_id,
+tableId=table_id,
+body=table).execute()
+
 
 class BigQueryPandasConnector(GbqConnector):
 """
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index 9386e57c07..1ad19a7aa0 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -629,3 +629,57 @@ def execute(self, context):
 project_id=self.project_id,
 dataset_id=self.dataset_id,
 dataset_reference=self.dataset_reference)
+
+
+class BigQuerySetTableDescriptionOperator(BaseOperator):
+"""
+This operator is called to set the desription on a table
+
+:param project_id: The Google cloud project in which to look for the
+table. The connection supplied must provide access to
+the specified project.
+:type project_id: string
+:param dataset_id: The name of the dataset in which to look for the
+table.
+:type dataset_id: string
+:param table_id: The name of the table to set the description for.
+:type table_id: string
+:param description: The description to set
+:type description: string
+:param bigquery_conn_id: The connection ID to use when
+connecting to BigQuery.
+:type google_cloud_storage_conn_id: string
+:param delegate_to: The account to impersonate, if any. For this to
+work, the service account making the request must have domain-wide
+delegation enabled.
+:type delegate_to: string
+"""
+template_fields = ('project_id', 'dataset_id', 'table_id', 'description')
+ui_color = '#f0eee4'
+
+@apply_defaults
+def __init__(self,
+ project_id=None,
+ dataset_id=None,
+ table_id=None,
+ description=None,
+ bigquery_conn_id='bigquery_default',
+ delegate_to=None,
+ *args,
+ **kwargs):
+super(BigQuerySetTableDescriptionOperator, self).__init__(*args, 
**kwargs)
+self.project_id = project_id
+self.dataset_id = dataset_id
+self.table_id = table_id
+self.description = description
+self.bigquery_conn_id = bigquery_conn_id
+self.delegate_to = delegate_to
+
+def execute(self, context):
+bq_hook = BigQueryHook(
+bigquery_conn_id=self.bigquery_conn_id,
+delegate_to=self.delegate_to)
+bq_hook.set_table_description(project_id=self.project_id,
+  dataset_id=self.dataset_id,
+  table_id=self.ta

[jira] [Resolved] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-12-20 Thread Bryant Biggs (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryant Biggs resolved AIRFLOW-3426.
---
Resolution: Fixed

Merged as complete

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-12-20 Thread Bryant Biggs (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryant Biggs closed AIRFLOW-3426.
-

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3426) Correct references to Python version tested (3.4 -> 3.5)

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725867#comment-16725867
 ] 

jack commented on AIRFLOW-3426:
---

This was merged. Ticket can be closed.

> Correct references to Python version tested (3.4 -> 3.5)
> 
>
> Key: AIRFLOW-3426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3426
> Project: Apache Airflow
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.9.0, 1.10.0, 1.10.1
> Environment: All
>Reporter: Bryant Biggs
>Assignee: Bryant Biggs
>Priority: Trivial
>  Labels: documentation, easyfix, newbie
> Fix For: 1.10.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The current CI tests on Travis use Python version 2.7 and 3.5, however 
> throughout the documentation there are still references to using/supporting 
> 3.4. To better match what is actually supported, the 3.4 references should be 
> replaced with what is actually being tested, 3.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2937) HttpHook doesn't respect the URI scheme when the connection is defined via Environment Variable

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725810#comment-16725810
 ] 

ASF GitHub Bot commented on AIRFLOW-2937:
-

stale[bot] closed pull request #3783: [AIRFLOW-2937] Support HTTPS in Http 
connection form environment variables
URL: https://github.com/apache/incubator-airflow/pull/3783
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/hooks/http_hook.py b/airflow/hooks/http_hook.py
index c449fe0c15..359e588a96 100644
--- a/airflow/hooks/http_hook.py
+++ b/airflow/hooks/http_hook.py
@@ -62,7 +62,7 @@ def get_conn(self, headers=None):
 self.base_url = conn.host
 else:
 # schema defaults to HTTP
-schema = conn.schema if conn.schema else "http"
+schema = conn.conn_type if conn.conn_type else conn.schema if 
conn.schema else "http"
 self.base_url = schema + "://" + conn.host
 
 if conn.port:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> HttpHook doesn't respect the URI scheme when the connection is defined via 
> Environment Variable
> ---
>
> Key: AIRFLOW-2937
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2937
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Matt Chapman
>Assignee: Matt Chapman
>Priority: Major
>
> AIRFLOW-645 almost solved this, but not quite.
> I believe AIRFLOW-2841 is another misguided attempt at solving this problem, 
> and shows that this is an issue for other users.
> The core issue is that the HttpHook confusingly mixes up the ideas of 'URI 
> scheme' and 'Database schema.' 
> I'm submitting a patch that fixes the issue while maintaining backward 
> compatibility, but does not solve the core confusion, which I suggest should 
> be addressed in the next major release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725802#comment-16725802
 ] 

ASF GitHub Bot commented on AIRFLOW-3458:
-

Fokko closed pull request #4335: [AIRFLOW-3458] Move models.Connection into 
separate file
URL: https://github.com/apache/incubator-airflow/pull/4335
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index cd414d2821..143e2b34aa 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -33,6 +33,8 @@
 import argparse
 from builtins import input
 from collections import namedtuple
+
+from airflow.models.connection import Connection
 from airflow.utils.timezone import parse as parsedate
 import json
 from tabulate import tabulate
@@ -55,8 +57,7 @@
 from airflow.exceptions import AirflowException, AirflowWebServerTimeout
 from airflow.executors import GetDefaultExecutor
 from airflow.models import (DagModel, DagBag, TaskInstance,
-DagPickle, DagRun, Variable, DagStat,
-Connection, DAG)
+DagPickle, DagRun, Variable, DagStat, DAG)
 
 from airflow.ti_deps.dep_context import (DepContext, SCHEDULER_DEPS)
 from airflow.utils import cli as cli_utils
diff --git a/airflow/contrib/executors/mesos_executor.py 
b/airflow/contrib/executors/mesos_executor.py
index 0609d71cf2..7aae91e6d4 100644
--- a/airflow/contrib/executors/mesos_executor.py
+++ b/airflow/contrib/executors/mesos_executor.py
@@ -80,7 +80,7 @@ def registered(self, driver, frameworkId, masterInfo):
 if configuration.conf.getboolean('mesos', 'CHECKPOINT') and \
 configuration.conf.get('mesos', 'FAILOVER_TIMEOUT'):
 # Import here to work around a circular import error
-from airflow.models import Connection
+from airflow.models.connection import Connection
 
 # Update the Framework ID in the database.
 session = Session()
@@ -253,7 +253,7 @@ def start(self):
 
 if configuration.conf.get('mesos', 'FAILOVER_TIMEOUT'):
 # Import here to work around a circular import error
-from airflow.models import Connection
+from airflow.models.connection import Connection
 
 # Query the database to get the ID of the Mesos Framework, if 
available.
 conn_id = FRAMEWORK_CONNID_PREFIX + framework.name
diff --git a/airflow/contrib/hooks/gcp_sql_hook.py 
b/airflow/contrib/hooks/gcp_sql_hook.py
index 1581637e0d..9872746b7b 100644
--- a/airflow/contrib/hooks/gcp_sql_hook.py
+++ b/airflow/contrib/hooks/gcp_sql_hook.py
@@ -34,7 +34,7 @@
 import requests
 from googleapiclient.discovery import build
 
-from airflow import AirflowException, LoggingMixin, models
+from airflow import AirflowException, LoggingMixin
 from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
 
 # Number of retries - used by googleapiclient method calls to perform retries
@@ -42,7 +42,7 @@
 from airflow.hooks.base_hook import BaseHook
 from airflow.hooks.mysql_hook import MySqlHook
 from airflow.hooks.postgres_hook import PostgresHook
-from airflow.models import Connection
+from airflow.models.connection import Connection
 from airflow.utils.db import provide_session
 
 NUM_RETRIES = 5
@@ -457,8 +457,8 @@ def _download_sql_proxy_if_needed(self):
 
 @provide_session
 def _get_credential_parameters(self, session):
-connection = session.query(models.Connection). \
-filter(models.Connection.conn_id == self.gcp_conn_id).first()
+connection = session.query(Connection). \
+filter(Connection.conn_id == self.gcp_conn_id).first()
 session.expunge_all()
 if GCP_CREDENTIALS_KEY_PATH in connection.extra_dejson:
 credential_params = [
@@ -851,8 +851,8 @@ def delete_connection(self, session=None):
 decorator).
 """
 self.log.info("Deleting connection {}".format(self.db_conn_id))
-connection = session.query(models.Connection).filter(
-models.Connection.conn_id == self.db_conn_id)[0]
+connection = session.query(Connection).filter(
+Connection.conn_id == self.db_conn_id)[0]
 session.delete(connection)
 session.commit()
 
diff --git a/airflow/hooks/base_hook.py b/airflow/hooks/base_hook.py
index ef44f6469d..c1283e3fb4 100644
--- a/airflow/hooks/base_hook.py
+++ b/airflow/hooks/base_hook.py
@@ -25,7 +25,7 @@
 import os
 import random
 
-from airflow.models import Connection
+from airflow.models.connection import Connection
 from airflow.exception

[jira] [Closed] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-20 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong closed AIRFLOW-3458.
-
Resolution: Fixed

> Refactor: Move Connection out of models.py
> --
>
> Key: AIRFLOW-3458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3458
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Assignee: Bas Harenslak
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-20 Thread Fokko Driesprong (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725804#comment-16725804
 ] 

Fokko Driesprong commented on AIRFLOW-3458:
---

[~ashb] The models.py is huge, and needs to be split into the models package.

> Refactor: Move Connection out of models.py
> --
>
> Key: AIRFLOW-3458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3458
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Assignee: Bas Harenslak
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3458) Refactor: Move Connection out of models.py

2018-12-20 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong reassigned AIRFLOW-3458:
-

Assignee: Bas Harenslak

> Refactor: Move Connection out of models.py
> --
>
> Key: AIRFLOW-3458
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3458
> Project: Apache Airflow
>  Issue Type: Task
>  Components: models
>Affects Versions: 1.10.1
>Reporter: Fokko Driesprong
>Assignee: Bas Harenslak
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3254) BigQueryGetDataOperator to support reading query from SQL file

2018-12-20 Thread jack (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jack updated AIRFLOW-3254:
--
Affects Version/s: (was: 1.10.0)
   1.10.1

> BigQueryGetDataOperator to support reading query from SQL file
> --
>
> Key: AIRFLOW-3254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3254
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.10.1
>Reporter: jack
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.2
>
>
> As discussed with [~Fokko] on Slack:
> Currently the BigQueryGetDataOperator supports only reading query provided 
> directly as:
>  
> {code:java}
> sql = 'SELECT ID FROM TABLE'
> {code}
>  
> it does not support reading the query from a SQL file which can be annoying 
> as sometimes queries are quite large.
> This behavior is supported by other operators like 
> MySqlToGoogleCloudStorageOperator:
> dag = DAG(
>     dag_id='Import',
>     default_args=args,
>     schedule_interval='*/5 * * * *',
>     max_active_runs=1,
>     catchup=False,
>     template_searchpath = ['/home/.../airflow/…/sql/Import']
> )
>  
> importop = MySqlToGoogleCloudStorageOperator(
>     task_id='import',
>     mysql_conn_id='MySQL_con',
>     google_cloud_storage_conn_id='gcp_con',
>     provide_context=True,
>     sql = 'importop.sql',
>     params=\{'table_name' : TABLE_NAME},
>     bucket=GCS_BUCKET_ID,
>     filename=file_name_orders,
>     dag=dag)
>  
> If anyone can pick it up it would be great :)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2143) Try number displays incorrect values in the web UI

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725686#comment-16725686
 ] 

jack commented on AIRFLOW-2143:
---

I see this also

> Try number displays incorrect values in the web UI
> --
>
> Key: AIRFLOW-2143
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2143
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: James Davidheiser
>Priority: Minor
> Attachments: adhoc_query.png, task_instance_page.png
>
>
> This was confusing us a lot in our task runs - in the database, a task that 
> ran is marked as 1 try.  However, when we view it in the UI, it shows at 2 
> tries in several places.  These include:
>  * Task Instance Details (ie 
> [https://airflow/task?execution_date=xxx_id=xxx_id=xxx 
> )|https://airflow/task?execution_date=xxx_id=xxx_id=xxx]
>  * Task instance browser (/admin/taskinstance/)
>  * Task Tries graph (/admin/airflow/tries)
> Notably, is is correctly shown as 1 try in the log filenames, on the log 
> viewer page (admin/airflow/log?execution_date=), and some other places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-644) Issue with past runs when using starttime as datetime.now()

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725680#comment-16725680
 ] 

jack commented on AIRFLOW-644:
--

It's a bad practice to do 'start_date': datetime.now(),```

> Issue with past runs when using starttime as datetime.now()
> ---
>
> Key: AIRFLOW-644
> URL: https://issues.apache.org/jira/browse/AIRFLOW-644
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Reporter: Puneeth Potu
>Priority: Major
>
> Hi, we used the following snippet in the dag parameters
> ```default_args = {
> 'owner': 'dwh',
> 'depends_on_past': True,
> 'wait_for_downstream': True,
> 'start_date': datetime.now(),```
> When used datetime.now() along with frequency as @daily I see the last 5 runs 
> in my graph view, and the dag status of  all the previous runs is "FAILED"
> When used datetime.now() along with frequency as @monthly I see the last 14 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> When used datetime.now() along with frequency as @weekly I see the last 53 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> For monthly and weekly it is not showing either the current week or month. I 
> activated my Dags today (11/22/2016).
> I see weekly runs populated from (2015-11-15 to 2016-11-13), and I don't see 
> 2016-11-20 which is the latest.
> I see Monthly runs populated from (2015-09-01 to 2016-10-01) and I don't see 
> 2016-11-01 which is the latest.
> Please, advise if this is the expected behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-549) Scheduler child logs are created out of normal location

2018-12-20 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725682#comment-16725682
 ] 

jack commented on AIRFLOW-549:
--

[~bolke] is this still an issue?

> Scheduler child logs are created out of normal location
> ---
>
> Key: AIRFLOW-549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-549
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Paul Yang
>Priority: Major
>
> The new scheduler has childs logging in their own log file. The location of 
> the log files are set outside of the cli configurable locations making it 
> inconsistent with other log configurations in airflow. In addition the log 
> files are by default created in /tmp which is a non standard location for log 
> files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1288) Bad owners field in DAGs breaks Airflow front page

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725610#comment-16725610
 ] 

jack commented on AIRFLOW-1288:
---

Did you set your owner to be a list instead of string?

> Bad owners field in DAGs breaks Airflow front page
> --
>
> Key: AIRFLOW-1288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1288
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Priority: Major
>
> DAGs that have owners set to a bad value break the front page of the 
> webserver with an error like below. Instead these should just cause import 
> errors for the specific dags in question.
> {code}
> Ooops.
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: i-0dbbddfb63fb2cfbc.inst.aws.airbnb.com
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/hooks/framework_flask.py", 
> line 103, in _nr_wrapper_Flask_handle_exception_
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/hooks/framework_flask.py", 
> line 40, in _nr_wrapper_handler_
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 758, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 1909, in index
> all_dag_ids=all_dag_ids)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 307, in render
> return render_template(template, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/newrelic/api/function_trace.py", line 
> 110, in literal_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask/templating.py", line 
> 128, in render_template
> context, ctx.app)
>   File "/usr/local/lib/python2.7/dist-packages/flask/templating.py", line 
> 110, in _render
> rv = template.render(context

[jira] [Commented] (AIRFLOW-1322) Cannot mark task as success

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725607#comment-16725607
 ] 

jack commented on AIRFLOW-1322:
---

[~Fokko] is this still an issue?

> Cannot mark task as success
> ---
>
> Key: AIRFLOW-1322
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1322
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> Hi guys,
> I've noticed when I trigger a new job using the UI, I'm not able to `Mark 
> Successful`. When I trigger a job using the cli, this option does appear. I 
> have the feeling that the jobs are not properly created when a job is 
> triggered using the UI.
> I want to look into it when I have more time.
> Cheers, Fokko



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1362) Paused dag restarted on upgrading airflow from 1.8.0 to 1.8.1

2018-12-19 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725609#comment-16725609
 ] 

jack commented on AIRFLOW-1362:
---

Did it happen when you perform upgrade in your test environment?

> Paused dag restarted on upgrading airflow from 1.8.0 to 1.8.1
> -
>
> Key: AIRFLOW-1362
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1362
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: user_airflow
>Priority: Major
>
> Recently we upgraded airflow from 1.8.0 to 1.8.1. The upgrade went fine but 
> once i restarted the web server and scheduler, all paused dags restarted 
> automatically and started running multiple runs from the date they have been 
> stopped. It messed up most of user data and we need to clean up data 
> manually. How can we prevent this happening in future upgrades?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3547) Jinja templating is not enabled for some SparkSubmitOperator parameters.

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725598#comment-16725598
 ] 

ASF GitHub Bot commented on AIRFLOW-3547:
-

thesuperzapper opened a new pull request #4347: [AIRFLOW-3547] Fixed Jinja 
templating in SparkSubmitOperator
URL: https://github.com/apache/incubator-airflow/pull/4347
 
 
   This is a minor change to allow Jinja templating in parameters where it 
makes sense for SparkSubmitOperator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jinja templating is not enabled for some SparkSubmitOperator parameters.
> 
>
> Key: AIRFLOW-3547
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3547
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
>Reporter: Mathew
>Assignee: Mathew
>Priority: Minor
>
> SparkSubmitOperator currently only supports Jinja templating in its 'name', 
> 'application_args' and 'packages' parameters, this is problematic as a user 
> might want to do something like:
> {code:python}
> application="{{ dag.folder }}/spark_code.py"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3547) Jinja templating is not enabled for some SparkSubmitOperator parameters.

2018-12-19 Thread Mathew (JIRA)
Mathew created AIRFLOW-3547:
---

 Summary: Jinja templating is not enabled for some 
SparkSubmitOperator parameters.
 Key: AIRFLOW-3547
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3547
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Affects Versions: 1.10.1
Reporter: Mathew
Assignee: Mathew


SparkSubmitOperator currently only supports Jinja templating in its 'name', 
'application_args' and 'packages' parameters, this is problematic as a user 
might want to do something like:


{code:python}
application="{{ dag.folder }}/spark_code.py"
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Tao Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng closed AIRFLOW-3546.
-
Resolution: Fixed

> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725383#comment-16725383
 ] 

ASF GitHub Bot commented on AIRFLOW-3546:
-

feng-tao closed pull request #4346: [AIRFLOW-3546] Fix typos in jobs.py logs
URL: https://github.com/apache/incubator-airflow/pull/4346
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/jobs.py b/airflow/jobs.py
index e60f135972..8472ecd383 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -1213,7 +1213,7 @@ def _find_executable_task_instances(self, simple_dag_bag, 
states, session=None):
 task_instance_str = "\n\t".join(
 ["{}".format(x) for x in executable_tis])
 self.log.info(
-"Setting the follow tasks to queued state:\n\t%s", 
task_instance_str)
+"Setting the following tasks to queued state:\n\t%s", 
task_instance_str)
 # so these dont expire on commit
 for ti in executable_tis:
 copy_dag_id = ti.dag_id
@@ -1408,7 +1408,7 @@ def _change_state_for_tasks_failed_to_execute(self, 
session):
 ["{}".format(x) for x in tis_to_set_to_scheduled])
 
 session.commit()
-self.log.info("Set the follow tasks to scheduled state:\n\t{}"
+self.log.info("Set the following tasks to scheduled state:\n\t{}"
   .format(task_instance_str))
 
 def _process_dags(self, dagbag, dags, tis_out):


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Typo in jobs.py logs
> 
>
>     Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Stan Kudrow (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stan Kudrow updated AIRFLOW-3546:
-
Description: PR: https://github.com/apache/incubator-airflow/pull/4346

> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Stan Kudrow (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3546 stopped by Stan Kudrow.

> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Stan Kudrow (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3546 started by Stan Kudrow.

> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Stan Kudrow (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3546 started by Stan Kudrow.

> Typo in jobs.py logs
> 
>
> Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>Assignee: Stan Kudrow
>Priority: Trivial
>
> PR: https://github.com/apache/incubator-airflow/pull/4346



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3398) Google Cloud Spanner instance database query operator

2018-12-19 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3398.
-
   Resolution: Fixed
 Assignee: (was: Szymon Przedwojski)
Fix Version/s: 1.10.2

Resolved by https://github.com/apache/incubator-airflow/pull/4314

> Google Cloud Spanner instance database query operator
> -
>
> Key: AIRFLOW-3398
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3398
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: gcp
>Reporter: Szymon Przedwojski
>Priority: Minor
> Fix For: 1.10.2
>
>
> Creating an operator to enable executing arbitrary SQL in a Transaction in 
> Cloud Spanner.
> https://googleapis.github.io/google-cloud-python/latest/spanner/index.html#executing-arbitrary-sql-in-a-transaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3398) Google Cloud Spanner instance database query operator

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725360#comment-16725360
 ] 

ASF GitHub Bot commented on AIRFLOW-3398:
-

kaxil closed pull request #4314: [AIRFLOW-3398] Google Cloud Spanner instance 
database query operator
URL: https://github.com/apache/incubator-airflow/pull/4314
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/example_dags/example_gcp_spanner.py 
b/airflow/contrib/example_dags/example_gcp_spanner.py
index dd8b8c52b9..cec3dcb855 100644
--- a/airflow/contrib/example_dags/example_gcp_spanner.py
+++ b/airflow/contrib/example_dags/example_gcp_spanner.py
@@ -18,18 +18,18 @@
 # under the License.
 
 """
-Example Airflow DAG that creates, updates and deletes a Cloud Spanner instance.
+Example Airflow DAG that creates, updates, queries and deletes a Cloud Spanner 
instance.
 
 This DAG relies on the following environment variables
-* PROJECT_ID - Google Cloud Platform project for the Cloud Spanner instance.
-* INSTANCE_ID - Cloud Spanner instance ID.
-* CONFIG_NAME - The name of the instance's configuration. Values are of the 
form
+* SPANNER_PROJECT_ID - Google Cloud Platform project for the Cloud Spanner 
instance.
+* SPANNER_INSTANCE_ID - Cloud Spanner instance ID.
+* SPANNER_CONFIG_NAME - The name of the instance's configuration. Values are 
of the form
 projects//instanceConfigs/.
 See also:
 
https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instanceConfigs#InstanceConfig
 
https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instanceConfigs/list#google.spanner.admin.instance.v1.InstanceAdmin.ListInstanceConfigs
-* NODE_COUNT - Number of nodes allocated to the instance.
-* DISPLAY_NAME - The descriptive name for this instance as it appears in UIs.
+* SPANNER_NODE_COUNT - Number of nodes allocated to the instance.
+* SPANNER_DISPLAY_NAME - The descriptive name for this instance as it appears 
in UIs.
 Must be unique per project and between 4 and 30 characters in length.
 """
 
@@ -38,15 +38,17 @@
 import airflow
 from airflow import models
 from airflow.contrib.operators.gcp_spanner_operator import \
-CloudSpannerInstanceDeployOperator, CloudSpannerInstanceDeleteOperator
+CloudSpannerInstanceDeployOperator, 
CloudSpannerInstanceDatabaseQueryOperator, \
+CloudSpannerInstanceDeleteOperator
 
 # [START howto_operator_spanner_arguments]
-PROJECT_ID = os.environ.get('PROJECT_ID', 'example-project')
-INSTANCE_ID = os.environ.get('INSTANCE_ID', 'testinstance')
-CONFIG_NAME = os.environ.get('CONFIG_NAME',
+PROJECT_ID = os.environ.get('SPANNER_PROJECT_ID', 'example-project')
+INSTANCE_ID = os.environ.get('SPANNER_INSTANCE_ID', 'testinstance')
+DB_ID = os.environ.get('SPANNER_DB_ID', 'db1')
+CONFIG_NAME = os.environ.get('SPANNER_CONFIG_NAME',
  'projects/example-project/instanceConfigs/eur3')
-NODE_COUNT = os.environ.get('NODE_COUNT', '1')
-DISPLAY_NAME = os.environ.get('DISPLAY_NAME', 'Test Instance')
+NODE_COUNT = os.environ.get('SPANNER_NODE_COUNT', '1')
+DISPLAY_NAME = os.environ.get('SPANNER_DISPLAY_NAME', 'Test Instance')
 # [END howto_operator_spanner_arguments]
 
 default_args = {
@@ -80,6 +82,24 @@
 task_id='spanner_instance_update_task'
 )
 
+# [START howto_operator_spanner_query]
+spanner_instance_query = CloudSpannerInstanceDatabaseQueryOperator(
+project_id=PROJECT_ID,
+instance_id=INSTANCE_ID,
+database_id='db1',
+query="DELETE FROM my_table2 WHERE true",
+task_id='spanner_instance_query'
+)
+# [END howto_operator_spanner_query]
+
+spanner_instance_query2 = CloudSpannerInstanceDatabaseQueryOperator(
+project_id=PROJECT_ID,
+instance_id=INSTANCE_ID,
+database_id='db1',
+query="example_gcp_spanner.sql",
+task_id='spanner_instance_query2'
+)
+
 # [START howto_operator_spanner_delete]
 spanner_instance_delete_task = CloudSpannerInstanceDeleteOperator(
 project_id=PROJECT_ID,
@@ -89,4 +109,5 @@
 # [END howto_operator_spanner_delete]
 
 spanner_instance_create_task >> spanner_instance_update_task \
+>> spanner_instance_query >> spanner_instance_query2 \
 >> spanner_instance_delete_task
diff --git a/airflow/contrib/example_dags/example_gcp_spanner.sql 
b/airflow/contrib/example_dags/example_gcp_spanner.sql
new file mode 100644
index 00..5d5f238022
--- /dev/null
+++ b/airflow/contrib/example_dags/example_gcp_spanner.sql
@@ -0,0 +1,3 @@
+INSERT my_table2 (id, name) VALUES (7, 'Seven');
+

[jira] [Commented] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725352#comment-16725352
 ] 

ASF GitHub Bot commented on AIRFLOW-3546:
-

stankud opened a new pull request #4346: [AIRFLOW-3546] Fix typos in jobs.py 
logs
URL: https://github.com/apache/incubator-airflow/pull/4346
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3546) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3546
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x ] Here are some details about my PR, including screenshots of any UI 
changes:
   Scheduler logs the following:
   ```
   ...INFO - Setting the follow tasks to queued state:
   ```
   this PR changes `follow` to `following`
   
   ### Tests
   
   - [ x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   This is a trivial change which doesn't need tests
   
   ### Commits
   
   - [x ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Typo in jobs.py logs
> 
>
>     Key: AIRFLOW-3546
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Stan Kudrow
>    Assignee: Stan Kudrow
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3546) Typo in jobs.py logs

2018-12-19 Thread Stan Kudrow (JIRA)
Stan Kudrow created AIRFLOW-3546:


 Summary: Typo in jobs.py logs
 Key: AIRFLOW-3546
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3546
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Reporter: Stan Kudrow
Assignee: Stan Kudrow






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3545) Can't use Prometheus or other pull based instrumentation systems to monitor Tasks launched on Kubernetes

2018-12-19 Thread Aditya Vishwakarma (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Vishwakarma updated AIRFLOW-3545:

Description: 
Prometheus, which is a common way to instrument services on Kubernetes, uses a 
pull based mechanism to fetch metrics. This  involves a service exposing a 
`/metrics` endpoint. This endpoint is scraped every 30 secs by prometheus to 
collect metrics.

This requires a port to be specified in the generated Pod config. Something 
like below.
{code:java}
// Sample Pod Spec
apiVersion: v1
kind: Job
metadata:
  name: batch-job
spec:
  ports:
  - name: metrics
port: 9091 # port to fetch metrics from
protocol: TCP
targetPort: 9091
{code}
Currently KubernetesPodOperator doesn't have any options to open ports like 
this.

Is it possible to have an option to do this?

  was:
Prometheus, which is a common way to instrument services on Kubernetes, uses a 
pull based mechanism to fetch metrics. Which generally involves a service 
exposing a `/metrics` endpoints. This endpoint is scraped every 30 secs by 
prometheus to collect metrics. 

This requires a port to be specified in the generated Pod config. Something 
like below.
{code:java}
// Sample Pod Spec
apiVersion: v1
kind: Job
metadata:
  name: batch-job
spec:
  ports:
  - name: metrics
port: 9091 # port to fetch metrics from
protocol: TCP
targetPort: 9091
{code}
Currently KubernetesPodOperator doesn't have any options to open ports like 
this.

Is it possible to have an option to do this?


> Can't use Prometheus or other pull based instrumentation systems to monitor 
> Tasks launched on Kubernetes
> 
>
> Key: AIRFLOW-3545
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3545
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Aditya Vishwakarma
>Priority: Major
>
> Prometheus, which is a common way to instrument services on Kubernetes, uses 
> a pull based mechanism to fetch metrics. This  involves a service exposing a 
> `/metrics` endpoint. This endpoint is scraped every 30 secs by prometheus to 
> collect metrics.
> This requires a port to be specified in the generated Pod config. Something 
> like below.
> {code:java}
> // Sample Pod Spec
> apiVersion: v1
> kind: Job
> metadata:
>   name: batch-job
> spec:
>   ports:
>   - name: metrics
> port: 9091 # port to fetch metrics from
> protocol: TCP
> targetPort: 9091
> {code}
> Currently KubernetesPodOperator doesn't have any options to open ports like 
> this.
> Is it possible to have an option to do this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3545) Can't use Prometheus or other pull based instrumentation systems to monitor Tasks launched on Kubernetes

2018-12-19 Thread Aditya Vishwakarma (JIRA)
Aditya Vishwakarma created AIRFLOW-3545:
---

 Summary: Can't use Prometheus or other pull based instrumentation 
systems to monitor Tasks launched on Kubernetes
 Key: AIRFLOW-3545
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3545
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Aditya Vishwakarma


Prometheus, which is a common way to instrument services on Kubernetes, uses a 
pull based mechanism to fetch metrics. Which generally involves a service 
exposing a `/metrics` endpoints. This endpoint is scraped every 30 secs by 
prometheus to collect metrics. 

This requires a port to be specified in the generated Pod config. Something 
like below.
{code:java}
// Sample Pod Spec
apiVersion: v1
kind: Job
metadata:
  name: batch-job
spec:
  ports:
  - name: metrics
port: 9091 # port to fetch metrics from
protocol: TCP
targetPort: 9091
{code}
Currently KubernetesPodOperator doesn't have any options to open ports like 
this.

Is it possible to have an option to do this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >