[jira] [Commented] (AIRFLOW-5138) Spurious DeprecationWarnings issued for configuration

2019-08-07 Thread Jonathan Lange (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902697#comment-16902697
 ] 

Jonathan Lange commented on AIRFLOW-5138:
-

The warnings are likely due to 
https://github.com/apache/airflow/blob/0be39219cd058ba7d50cdf34b2cc46513f4f5ab3/airflow/configuration.py#L39-L43

It's very strange to have a library override user-provided warning settings.

> Spurious DeprecationWarnings issued for configuration
> -
>
> Key: AIRFLOW-5138
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5138
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.3, 1.10.4
>Reporter: Jonathan Lange
>Priority: Minor
>
> If you have tests that import airflow, and run them with pytest, you'll see 
> errors like this:
> {code}
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/configuration.py:599
>   
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/configuration.py:599:
>  DeprecationWarning: Specifying airflow_home in the config file is 
> deprecated. As you have left it at the default value you should remove the 
> setting from your airflow.cfg and suffer no change in behaviour.
> category=DeprecationWarning,
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:65
>   
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:65:
>  DeprecationWarning: The elasticsearch_host option in [elasticsearch] has 
> been renamed to host - the old setting has been used, but please update your 
> config.
> ELASTICSEARCH_HOST = conf.get('elasticsearch', 'HOST')
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:67
>   
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:67:
>  DeprecationWarning: The elasticsearch_log_id_template option in 
> [elasticsearch] has been renamed to log_id_template - the old setting has 
> been used, but please update your config.
> ELASTICSEARCH_LOG_ID_TEMPLATE = conf.get('elasticsearch', 
> 'LOG_ID_TEMPLATE')
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:69
>   
> /Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:69:
>  DeprecationWarning: The elasticsearch_end_of_log_mark option in 
> [elasticsearch] has been renamed to end_of_log_mark - the old setting has 
> been used, but please update your config.
> ELASTICSEARCH_END_OF_LOG_MARK = conf.get('elasticsearch', 
> 'END_OF_LOG_MARK')
> {code}
> These errors occur even if you don't have a config file present. They are 
> very distracting, and I can't figure out what action I should take to get rid 
> of them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] houqp commented on issue #5731: [AIRFLOW-5117] support refreshing EKS api tokens

2019-08-07 Thread GitBox
houqp commented on issue #5731: [AIRFLOW-5117] support refreshing EKS api tokens
URL: https://github.com/apache/airflow/pull/5731#issuecomment-519360218
 
 
   @ashb tests added, ready for review :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-4568) The ExternalTaskSensor should be configurable to raise an Airflow Exception in case the poked external task reaches a disallowed state, such as f.i. failed

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902664#comment-16902664
 ] 

ASF GitHub Bot commented on AIRFLOW-4568:
-

michaelmdeng commented on pull request #5755: [AIRFLOW-4568] Add 
unallowed_states to ExternalTaskSensor
URL: https://github.com/apache/airflow/pull/5755
 
 
   
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4568
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   Also possibly dupe of [AIRFLOW-104]??
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Often times we want an ExternalTaskSensor to not behave as a sensor (poke 
until a condition is satisfied), and instead as a mirror of a task in another 
dag (succeed if the external succeeds, fail if the external fails).
   
   This change adds the `unallowed_states` parameter to the ExternalTaskSensor. 
If external task/dag is found to be in `unallowed_states`, the sensor will 
immediately fail instead of previous behavior of returning false and continuing 
to poke until timeout.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Additional tests in tests/sensors/test_external_task_sensor.py
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The ExternalTaskSensor should be configurable to raise an Airflow Exception 
> in case the poked external task reaches a disallowed state, such as f.i. 
> failed
> ---
>
> Key: AIRFLOW-4568
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4568
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.10.3
>Reporter: ddluke
>Priority: Minor
>
> _As an engineer, I would like to have the behavior of the ExternalTaskSensor 
> changed_
> _So that it fails in case the poked external_task_id fails_
> *Therefore*
>  * I suggest extending the behavior of the sensor to optionally also query 
> the TaskInstance for disallowed states and raise an AirflowException if 
> found. Currently, if the poked external task reaches a failed state, the 
> sensor continues to poke and does not terminate
> *Acceptance Criteria (from my pov)*
>  * The class interface for ExternalTaskSensor is extended with an additional 
> parameter, disallowed_states, which is an Optional List of 
> airflow.utils.state.State
>  * The poke method is expanded to count the number of rows from TaskInstance 
> which met the filter criteria dag_id, task_id, disallowed_states and 
> 

[GitHub] [airflow] michaelmdeng opened a new pull request #5755: [AIRFLOW-4568] Add unallowed_states to ExternalTaskSensor

2019-08-07 Thread GitBox
michaelmdeng opened a new pull request #5755: [AIRFLOW-4568] Add 
unallowed_states to ExternalTaskSensor
URL: https://github.com/apache/airflow/pull/5755
 
 
   
   
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4568
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   Also possibly dupe of [AIRFLOW-104]??
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Often times we want an ExternalTaskSensor to not behave as a sensor (poke 
until a condition is satisfied), and instead as a mirror of a task in another 
dag (succeed if the external succeeds, fail if the external fails).
   
   This change adds the `unallowed_states` parameter to the ExternalTaskSensor. 
If external task/dag is found to be in `unallowed_states`, the sensor will 
immediately fail instead of previous behavior of returning false and continuing 
to poke until timeout.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   Additional tests in tests/sensors/test_external_task_sensor.py
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] Persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311841043
 
 

 ##
 File path: airflow/models/serialized_dag.py
 ##
 @@ -0,0 +1,143 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Serialzed DAG table in database."""
+
+import hashlib
+from typing import Any, Dict, List, Optional, TYPE_CHECKING
+from sqlalchemy import Column, Index, Integer, String, Text, and_
+from sqlalchemy.sql import exists
+
+from airflow.models.base import Base, ID_LEN
+from airflow.utils import timezone
+from airflow.utils.db import provide_session
+from airflow.utils.sqlalchemy import UtcDateTime
+
+
+if TYPE_CHECKING:
+from airflow.dag.serialization.serialized_dag import SerializedDAG  # 
noqa: F401, E501; # pylint: disable=cyclic-import
+from airflow.models import DAG  # noqa: F401; # pylint: 
disable=cyclic-import
+
+
+class SerializedDagModel(Base):
+"""A database table for serialized DAGs."""
+
+__tablename__ = 'serialized_dag'
+
+dag_id = Column(String(ID_LEN), primary_key=True)
+fileloc = Column(String(2000))
+# The max length of fileloc exceeds the limit of indexing.
+fileloc_hash = Column(Integer)
+data = Column(Text)
+last_updated = Column(UtcDateTime)
+
+__table_args__ = (
+Index('idx_fileloc_hash', fileloc_hash, unique=False),
+)
+
+def __init__(self, dag):
+from airflow.dag.serialization import Serialization
+
+self.dag_id = dag.dag_id
+self.fileloc = dag.full_filepath
+self.fileloc_hash = SerializedDagModel.dag_fileloc_hash(self.fileloc)
+self.data = Serialization.to_json(dag)
+self.last_updated = timezone.utcnow()
+
+@staticmethod
+def dag_fileloc_hash(full_filepath: str) -> int:
+Hashing file location for indexing.
+
+:param full_filepath: full filepath of DAG file
+:return: hashed full_filepath
+"""
+# Truncates hash to 4 bytes.
+# TODO(coufon): hashing is needed because the length of fileloc is 
2000 as
+# an Airflow convention, which is over the limit of indexing. If we can
+return int(0x & int(
+hashlib.sha1(full_filepath.encode('utf-8')).hexdigest(), 16))
+
+@classmethod
+@provide_session
+def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, 
session=None):
+"""Serializes a DAG and writes it into database.
+
+:param dag: a DAG to be written into database
+:param min_update_interval: minimal interval in seconds to update 
serialized DAG
+"""
+if min_update_interval is not None:
+result = session.query(cls.last_updated).filter(
+cls.dag_id == dag.dag_id).first()
+if result is not None and (
+timezone.utcnow() - result.last_updated).total_seconds() < 
min_update_interval:
+return
+session.merge(cls(dag))
+session.commit()
 
 Review comment:
   A handy func you can use is:
   
https://github.com/apache/airflow/blob/d5ad0761fd0b33cb89258ff6924c608c3e086680/airflow/utils/db.py#L33-L45


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] Persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311840968
 
 

 ##
 File path: airflow/models/serialized_dag.py
 ##
 @@ -0,0 +1,143 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Serialzed DAG table in database."""
+
+import hashlib
+from typing import Any, Dict, List, Optional, TYPE_CHECKING
+from sqlalchemy import Column, Index, Integer, String, Text, and_
+from sqlalchemy.sql import exists
+
+from airflow.models.base import Base, ID_LEN
+from airflow.utils import timezone
+from airflow.utils.db import provide_session
+from airflow.utils.sqlalchemy import UtcDateTime
+
+
+if TYPE_CHECKING:
+from airflow.dag.serialization.serialized_dag import SerializedDAG  # 
noqa: F401, E501; # pylint: disable=cyclic-import
+from airflow.models import DAG  # noqa: F401; # pylint: 
disable=cyclic-import
+
+
+class SerializedDagModel(Base):
+"""A database table for serialized DAGs."""
+
+__tablename__ = 'serialized_dag'
+
+dag_id = Column(String(ID_LEN), primary_key=True)
+fileloc = Column(String(2000))
+# The max length of fileloc exceeds the limit of indexing.
+fileloc_hash = Column(Integer)
+data = Column(Text)
+last_updated = Column(UtcDateTime)
+
+__table_args__ = (
+Index('idx_fileloc_hash', fileloc_hash, unique=False),
+)
+
+def __init__(self, dag):
+from airflow.dag.serialization import Serialization
+
+self.dag_id = dag.dag_id
+self.fileloc = dag.full_filepath
+self.fileloc_hash = SerializedDagModel.dag_fileloc_hash(self.fileloc)
+self.data = Serialization.to_json(dag)
+self.last_updated = timezone.utcnow()
+
+@staticmethod
+def dag_fileloc_hash(full_filepath: str) -> int:
+Hashing file location for indexing.
+
+:param full_filepath: full filepath of DAG file
+:return: hashed full_filepath
+"""
+# Truncates hash to 4 bytes.
+# TODO(coufon): hashing is needed because the length of fileloc is 
2000 as
+# an Airflow convention, which is over the limit of indexing. If we can
+return int(0x & int(
+hashlib.sha1(full_filepath.encode('utf-8')).hexdigest(), 16))
+
+@classmethod
+@provide_session
+def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, 
session=None):
+"""Serializes a DAG and writes it into database.
+
+:param dag: a DAG to be written into database
+:param min_update_interval: minimal interval in seconds to update 
serialized DAG
+"""
+if min_update_interval is not None:
+result = session.query(cls.last_updated).filter(
+cls.dag_id == dag.dag_id).first()
+if result is not None and (
+timezone.utcnow() - result.last_updated).total_seconds() < 
min_update_interval:
+return
+session.merge(cls(dag))
+session.commit()
 
 Review comment:
   Should we do a `session.rollback()` in cased of encountering an exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] Persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311839965
 
 

 ##
 File path: airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py
 ##
 @@ -0,0 +1,50 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+# 
+#   http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""add serialized_dag table
+
+Revision ID: d38e04c12aa2
+Revises: 6e96a59344a4
+Create Date: 2019-08-01 14:39:35.616417
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = 'd38e04c12aa2'
+down_revision = '6e96a59344a4'
+branch_labels = None
+depends_on = None
+
+from alembic import op
+import sqlalchemy as sa
+
+
+def upgrade():
+op.create_table('serialized_dag',
 
 Review comment:
   Any reason why we need a new table instead of extending the existing `dag` 
table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] Persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
milton0825 commented on a change in pull request #5743: [AIRFLOW-5088] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311838201
 
 

 ##
 File path: airflow/models/dagbag.py
 ##
 @@ -416,6 +427,19 @@ def collect_dags(
  format(dag_names),
  file_stat.duration)
 
+def collect_dags_from_db(self):
+"""Collects DAGs from database."""
+start_dttm = timezone.utcnow()
+# DAG post-pcocessing steps such as self.bag_dag and croniter are not 
needed as
+# they are done by scheduler before serialization.
+# The dagbag contains all rows in serialized_dag table. Deleted DAGs 
are deleted
+# from the table by the scheduler job.
+self.log.info("Filling up the DagBag from database")
+self.dags = SerializedDagModel.read_all_dags()
+Stats.gauge(
 
 Review comment:
   Can we use `Stats.timing` here? And you can just pass `timedelta` in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 commented on issue #5663: [AIRFLOW-4940] Add DynamoDB to S3 operator

2019-08-07 Thread GitBox
milton0825 commented on issue #5663: [AIRFLOW-4940] Add DynamoDB to S3 operator
URL: https://github.com/apache/airflow/pull/5663#issuecomment-519338511
 
 
   PTAL @kaxil @potiuk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] milton0825 removed a comment on issue #5663: [AIRFLOW-4940] Add DynamoDB to S3 operator

2019-08-07 Thread GitBox
milton0825 removed a comment on issue #5663: [AIRFLOW-4940] Add DynamoDB to S3 
operator
URL: https://github.com/apache/airflow/pull/5663#issuecomment-516189476
 
 
   PTAL @ashb @feng-tao @potiuk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] coufon commented on a change in pull request #5743: [AIRFLOW-5088] Persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5743: [AIRFLOW-5088] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311824524
 
 

 ##
 File path: airflow/models/dag.py
 ##
 @@ -52,6 +53,8 @@
 from airflow.utils.sqlalchemy import UtcDateTime, Interval
 from airflow.utils.state import State
 
+DAGCACHED_ENABLED = configuration.getboolean('core', 'dagcached', 
fallback=True)
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] lsowen commented on issue #5754: handle elasticsearch `json_format` as a boolean

2019-08-07 Thread GitBox
lsowen commented on issue #5754: handle elasticsearch `json_format` as a boolean
URL: https://github.com/apache/airflow/pull/5754#issuecomment-519325234
 
 
   Looks like a spurious test failure: `Error: Invalid or corrupt jarfile 
/tmp/apache-rat-0.12.jar`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] lsowen opened a new pull request #5754: handle elasticsearch `json_format` as a boolean

2019-08-07 Thread GitBox
lsowen opened a new pull request #5754: handle elasticsearch `json_format` as a 
boolean
URL: https://github.com/apache/airflow/pull/5754
 
 
   Is a boolean, but is read as a string, which means it is _always_ considered 
True, even though the default config value is `False`.
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5141) KubernetesPodOperator to print resource utilization at end of job

2019-08-07 Thread Chris McLennon (JIRA)
Chris McLennon created AIRFLOW-5141:
---

 Summary: KubernetesPodOperator to print resource utilization at 
end of job
 Key: AIRFLOW-5141
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5141
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators
Affects Versions: 1.10.4
Reporter: Chris McLennon


It would be fantastic if the KubernetesPodOperator printed out a pod's max 
CPU/mem after it completes. This would give users a better idea of how many 
resources they should be requesting for their jobs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] ryanyuan commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
ryanyuan commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751#issuecomment-519307273
 
 
   @kaxil @potiuk @ashb Any ideas on why the changes in dataproc_operator.py in 
my GCP DLP commit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] KevinYang21 commented on issue #5589: [AIRFLOW-4956] Fix LocalTaskJob heartbeat log spamming

2019-08-07 Thread GitBox
KevinYang21 commented on issue #5589: [AIRFLOW-4956] Fix LocalTaskJob heartbeat 
log spamming
URL: https://github.com/apache/airflow/pull/5589#issuecomment-519284250
 
 
   @ashb seems like recent two commits are passing master CI. Is master CI 
fixed already?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5140) missing type annotation errors reported by dmypy

2019-08-07 Thread QP Hou (JIRA)
QP Hou created AIRFLOW-5140:
---

 Summary: missing type annotation errors reported by dmypy
 Key: AIRFLOW-5140
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5140
 Project: Apache Airflow
  Issue Type: Bug
  Components: ci
Affects Versions: 2.0.0
Reporter: QP Hou
Assignee: QP Hou


dmypy is reporting the following error:

 
{code:bash}
$ dmypy run -- --follow-imports=error airflow tests
tests/core.py:2188: error: Need type annotation for 'HDFSHook'
tests/core.py:2189: error: Need type annotation for 'snakebite'
airflow/__init__.py:42: error: Need type annotation for 'login'
airflow/api/auth/backend/default.py:22: error: Need type annotation for 
'CLIENT_AUTH'
airflow/api/auth/backend/deny_all.py:23: error: Need type annotation for 
'CLIENT_AUTH'
airflow/contrib/example_dags/example_gcp_speech.py:60: error: Need type 
annotation for 'SOURCE_LANGUAGE'
airflow/contrib/hooks/gcp_cloud_build_hook.py:50: error: Need type annotation 
for '_conn'
airflow/contrib/hooks/gcp_compute_hook.py:50: error: Need type annotation for 
'_conn'
airflow/contrib/hooks/gcp_function_hook.py:41: error: Need type annotation for 
'_conn'
airflow/contrib/hooks/gcp_sql_hook.py:722: error: Need type annotation for 
'_conn'
airflow/contrib/hooks/gcp_translate_hook.py:35: error: Need type annotation for 
'_client'
airflow/contrib/hooks/gcs_hook.py:41: error: Need type annotation for '_conn'
airflow/contrib/operators/awsbatch_operator.py:62: error: Need type annotation 
for 'client'
airflow/contrib/operators/awsbatch_operator.py:63: error: Need type annotation 
for 'arn'
airflow/contrib/operators/ecs_operator.py:61: error: Need type annotation for 
'client'
airflow/contrib/operators/ecs_operator.py:62: error: Need type annotation for 
'arn'
airflow/contrib/operators/gcp_function_operator.py:229: error: Need type 
annotation for 'upload_function'
airflow/executors/__init__.py:28: error: Need type annotation for 
'DEFAULT_EXECUTOR'
airflow/hooks/dbapi_hook.py:41: error: Need type annotation for 'connector'
airflow/models/crypto.py:49: error: Need type annotation for '_fernet'
airflow/security/kerberos.py:25: error: Need type annotation for 
'NEED_KRB181_WORKAROUND'
airflow/settings.py:66: error: Need type annotation for 'SQL_ALCHEMY_CONN'
airflow/settings.py:67: error: Need type annotation for 'DAGS_FOLDER'
airflow/settings.py:68: error: Need type annotation for 'PLUGINS_FOLDER'
airflow/settings.py:69: error: Need type annotation for 'LOGGING_CLASS_PATH'
airflow/settings.py:71: error: Need type annotation for 'engine'
airflow/settings.py:72: error: Need type annotation for 'Session'
airflow/settings.py:312: error: Need type annotation for 'CONTEXT_MANAGER_DAG'
airflow/utils/state.py:28: error: Need type annotation for 'NONE'
airflow/www/app.py:39: error: Need type annotation for 'appbuilder'
tests/contrib/hooks/test_databricks_hook.py:72: error: Need type annotation for 
'RESULT_STATE'
tests/contrib/hooks/test_gcp_cloud_build_hook.py:49: error: Need type 
annotation for 'hook'
tests/contrib/hooks/test_gcp_cloud_build_hook.py:122: error: Need type 
annotation for 'hook'
tests/contrib/hooks/test_gcp_cloud_build_hook.py:191: error: Need type 
annotation for 'hook'
tests/contrib/utils/gcp_authenticator.py:57: error: Need type annotation for 
'original_account'
tests/test_utils/reset_warning_registry.py:45: error: Need type annotation for 
'_pattern'
tests/test_utils/reset_warning_registry.py:48: error: Need type annotation for 
'_backup'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] codecov-io edited a comment on issue #5413: [AIRFLOW-4690] Make tests/api Pylint compatible

2019-08-07 Thread GitBox
codecov-io edited a comment on issue #5413: [AIRFLOW-4690] Make tests/api 
Pylint compatible
URL: https://github.com/apache/airflow/pull/5413#issuecomment-501480502
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=h1) 
Report
   > Merging 
[#5413](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0d1da8c31743701b4ea8b209bb594e14c3d62983?src=pr=desc)
 will **increase** coverage by `0.09%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5413/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5413  +/-   ##
   ==
   + Coverage   79.93%   80.02%   +0.09% 
   ==
 Files 498  498  
 Lines   3218832244  +56 
   ==
   + Hits2572825802  +74 
   + Misses   6460 6442  -18
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/models/dagrun.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncnVuLnB5)
 | `96.61% <100%> (+0.15%)` | :arrow_up: |
   | 
[airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5)
 | `91.64% <0%> (-0.2%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.46% <0%> (+0.05%)` | :arrow_up: |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `92.55% <0%> (+0.46%)` | :arrow_up: |
   | 
[airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5)
 | `93.35% <0%> (+0.49%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `74.42% <0%> (+2.49%)` | :arrow_up: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/5413/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `96.77% <0%> (+6.45%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=footer). 
Last update 
[0d1da8c...cdab7d5](https://codecov.io/gh/apache/airflow/pull/5413?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH commented on issue #5413: [AIRFLOW-4690] Make tests/api Pylint compatible

2019-08-07 Thread GitBox
BasPH commented on issue #5413: [AIRFLOW-4690] Make tests/api Pylint compatible
URL: https://github.com/apache/airflow/pull/5413#issuecomment-519271539
 
 
   @potiuk done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5139) ElasticSearchTaskRunner does not allow for custom ES configurations

2019-08-07 Thread Daniel Imberman (JIRA)
Daniel Imberman created AIRFLOW-5139:


 Summary: ElasticSearchTaskRunner does not allow for custom ES 
configurations
 Key: AIRFLOW-5139
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5139
 Project: Apache Airflow
  Issue Type: Task
  Components: logging
Affects Versions: 1.10.4
Reporter: Daniel Imberman
Assignee: Daniel Imberman


In [the ES 
task-handler|[https://github.com/apache/airflow/blob/master/airflow/utils/log/es_task_handler.py#L70]],
 the "ElasticSearch" object only takes in "host" as an argument. This means 
that users do not have the option to turn on ssl, assign certs, or perform any 
other of the necessary actions to match their ES environment.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] codecov-io edited a comment on issue #5740: [AIRFLOW-5087] Display task/dag run stats on UI so that users can debug more easily

2019-08-07 Thread GitBox
codecov-io edited a comment on issue #5740: [AIRFLOW-5087] Display task/dag run 
stats on UI so that users can debug more easily
URL: https://github.com/apache/airflow/pull/5740#issuecomment-519257117
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=h1) 
Report
   > Merging 
[#5740](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0d1da8c31743701b4ea8b209bb594e14c3d62983?src=pr=desc)
 will **increase** coverage by `0.09%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5740/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5740  +/-   ##
   ==
   + Coverage   79.93%   80.02%   +0.09% 
   ==
 Files 498  498  
 Lines   3218832259  +71 
   ==
   + Hits2572825816  +88 
   + Misses   6460 6443  -17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `75.44% <100%> (+0.28%)` | :arrow_up: |
   | 
[airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5)
 | `91.72% <100%> (-0.11%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.15% <0%> (-0.26%)` | :arrow_down: |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `92.55% <0%> (+0.46%)` | :arrow_up: |
   | 
[airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5)
 | `93.35% <0%> (+0.49%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `74.42% <0%> (+2.49%)` | :arrow_up: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `96.77% <0%> (+6.45%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=footer). 
Last update 
[0d1da8c...f2ec462](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #5740: [AIRFLOW-5087] Display task/dag run stats on UI so that users can debug more easily

2019-08-07 Thread GitBox
codecov-io commented on issue #5740: [AIRFLOW-5087] Display task/dag run stats 
on UI so that users can debug more easily
URL: https://github.com/apache/airflow/pull/5740#issuecomment-519257113
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=h1) 
Report
   > Merging 
[#5740](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0d1da8c31743701b4ea8b209bb594e14c3d62983?src=pr=desc)
 will **increase** coverage by `0.09%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5740/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5740  +/-   ##
   ==
   + Coverage   79.93%   80.02%   +0.09% 
   ==
 Files 498  498  
 Lines   3218832259  +71 
   ==
   + Hits2572825816  +88 
   + Misses   6460 6443  -17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `75.44% <100%> (+0.28%)` | :arrow_up: |
   | 
[airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5)
 | `91.72% <100%> (-0.11%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.15% <0%> (-0.26%)` | :arrow_down: |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `92.55% <0%> (+0.46%)` | :arrow_up: |
   | 
[airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5)
 | `93.35% <0%> (+0.49%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `74.42% <0%> (+2.49%)` | :arrow_up: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `96.77% <0%> (+6.45%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=footer). 
Last update 
[0d1da8c...f2ec462](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #5740: [AIRFLOW-5087] Display task/dag run stats on UI so that users can debug more easily

2019-08-07 Thread GitBox
codecov-io commented on issue #5740: [AIRFLOW-5087] Display task/dag run stats 
on UI so that users can debug more easily
URL: https://github.com/apache/airflow/pull/5740#issuecomment-519257117
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=h1) 
Report
   > Merging 
[#5740](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0d1da8c31743701b4ea8b209bb594e14c3d62983?src=pr=desc)
 will **increase** coverage by `0.09%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5740/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5740  +/-   ##
   ==
   + Coverage   79.93%   80.02%   +0.09% 
   ==
 Files 498  498  
 Lines   3218832259  +71 
   ==
   + Hits2572825816  +88 
   + Misses   6460 6443  -17
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `75.44% <100%> (+0.28%)` | :arrow_up: |
   | 
[airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5)
 | `91.72% <100%> (-0.11%)` | :arrow_down: |
   | 
[airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==)
 | `91.15% <0%> (-0.26%)` | :arrow_down: |
   | 
[airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5)
 | `92.55% <0%> (+0.46%)` | :arrow_up: |
   | 
[airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5)
 | `93.35% <0%> (+0.49%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `74.42% <0%> (+2.49%)` | :arrow_up: |
   | 
[airflow/operators/subdag\_operator.py](https://codecov.io/gh/apache/airflow/pull/5740/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvc3ViZGFnX29wZXJhdG9yLnB5)
 | `96.77% <0%> (+6.45%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=footer). 
Last update 
[0d1da8c...f2ec462](https://codecov.io/gh/apache/airflow/pull/5740?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #5752: [AIRFLOW-5045] Add ability to create Google Dataproc cluster with cus…

2019-08-07 Thread GitBox
codecov-io commented on issue #5752: [AIRFLOW-5045] Add ability to create 
Google Dataproc cluster with cus…
URL: https://github.com/apache/airflow/pull/5752#issuecomment-519249045
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=h1) 
Report
   > Merging 
[#5752](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0d1da8c31743701b4ea8b209bb594e14c3d62983?src=pr=desc)
 will **decrease** coverage by `0.64%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5752/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5752  +/-   ##
   ==
   - Coverage   79.93%   79.28%   -0.65% 
   ==
 Files 498  498  
 Lines   3218832190   +2 
   ==
   - Hits2572825523 -205 
   - Misses   6460 6667 +207
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/contrib/operators/dataproc\_operator.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9kYXRhcHJvY19vcGVyYXRvci5weQ==)
 | `86.26% <100%> (+0.07%)` | :arrow_up: |
   | 
[airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/generic\_transfer.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZ2VuZXJpY190cmFuc2Zlci5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/executors/celery\_executor.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvY2VsZXJ5X2V4ZWN1dG9yLnB5)
 | `40.74% <0%> (-35.56%)` | :arrow_down: |
   | 
[airflow/utils/log/wasb\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvd2FzYl90YXNrX2hhbmRsZXIucHk=)
 | `32.87% <0%> (-9.59%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `73.25% <0%> (-5.82%)` | :arrow_down: |
   | 
[airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5)
 | `87.15% <0%> (-4.59%)` | :arrow_down: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `85.08% <0%> (-3.51%)` | :arrow_down: |
   | ... and [8 
more](https://codecov.io/gh/apache/airflow/pull/5752/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=footer). 
Last update 
[0d1da8c...b925684](https://codecov.io/gh/apache/airflow/pull/5752?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH commented on issue #5112: [AIRFLOW-4209][AIP3-STEP13] Replace imp by importlib

2019-08-07 Thread GitBox
BasPH commented on issue #5112: [AIRFLOW-4209][AIP3-STEP13] Replace imp by 
importlib
URL: https://github.com/apache/airflow/pull/5112#issuecomment-519244452
 
 
   @zhongjiajie are you still working on this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5088) To implement DAG JSON serialization and DB persistence for webserver scalability improvement

2019-08-07 Thread Zhou Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou Fang updated AIRFLOW-5088:
---
Summary: To implement DAG JSON serialization and DB persistence for 
webserver scalability improvement  (was: To implement DAG serialization using 
JSON for DB persistence)

> To implement DAG JSON serialization and DB persistence for webserver 
> scalability improvement
> 
>
> Key: AIRFLOW-5088
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5088
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG, webserver
>Affects Versions: 1.10.5
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>
> Created this issue for starting to implement DAG serialization using JSON and 
> persistence in DB. Serialized DAG will be used in webserver for solving the 
> webserver scalability issue.
>  
> The implementation is based on AIP-24: 
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5138) Spurious DeprecationWarnings issued for configuration

2019-08-07 Thread Jonathan Lange (JIRA)
Jonathan Lange created AIRFLOW-5138:
---

 Summary: Spurious DeprecationWarnings issued for configuration
 Key: AIRFLOW-5138
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5138
 Project: Apache Airflow
  Issue Type: Bug
  Components: configuration
Affects Versions: 1.10.4, 1.10.3
Reporter: Jonathan Lange


If you have tests that import airflow, and run them with pytest, you'll see 
errors like this:

{code}
/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/configuration.py:599
  
/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/configuration.py:599:
 DeprecationWarning: Specifying airflow_home in the config file is deprecated. 
As you have left it at the default value you should remove the setting from 
your airflow.cfg and suffer no change in behaviour.
category=DeprecationWarning,

/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:65
  
/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:65:
 DeprecationWarning: The elasticsearch_host option in [elasticsearch] has been 
renamed to host - the old setting has been used, but please update your config.
ELASTICSEARCH_HOST = conf.get('elasticsearch', 'HOST')

/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:67
  
/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:67:
 DeprecationWarning: The elasticsearch_log_id_template option in 
[elasticsearch] has been renamed to log_id_template - the old setting has been 
used, but please update your config.
ELASTICSEARCH_LOG_ID_TEMPLATE = conf.get('elasticsearch', 'LOG_ID_TEMPLATE')

/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:69
  
/Users/jml/.pyenv/versions/3.7.4/envs/memflow/lib/python3.7/site-packages/airflow/config_templates/airflow_local_settings.py:69:
 DeprecationWarning: The elasticsearch_end_of_log_mark option in 
[elasticsearch] has been renamed to end_of_log_mark - the old setting has been 
used, but please update your config.
ELASTICSEARCH_END_OF_LOG_MARK = conf.get('elasticsearch', 'END_OF_LOG_MARK')

{code}

These errors occur even if you don't have a config file present. They are very 
distracting, and I can't figure out what action I should take to get rid of 
them.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-4509) SubDagOperator using scheduler instead of backfill

2019-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902410#comment-16902410
 ] 

ASF subversion and git services commented on AIRFLOW-4509:
--

Commit 0be39219cd058ba7d50cdf34b2cc46513f4f5ab3 in airflow's branch 
refs/heads/master from Chao-Han Tsai
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=0be3921 ]

[AIRFLOW-4509] SubDagOperator using scheduler instead of backfill (#5498)

Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.

In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.

This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.

> SubDagOperator using scheduler instead of backfill
> --
>
> Key: AIRFLOW-4509
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4509
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 1.10.5
>
>
> Make SubDagOperator use Airflow scheduler instead of backfill to schedule 
> tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (AIRFLOW-4509) SubDagOperator using scheduler instead of backfill

2019-08-07 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk resolved AIRFLOW-4509.
---
   Resolution: Fixed
Fix Version/s: (was: 1.10.5)
   2.0.0

> SubDagOperator using scheduler instead of backfill
> --
>
> Key: AIRFLOW-4509
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4509
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 2.0.0
>
>
> Make SubDagOperator use Airflow scheduler instead of backfill to schedule 
> tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-4509) SubDagOperator using scheduler instead of backfill

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902409#comment-16902409
 ] 

ASF GitHub Bot commented on AIRFLOW-4509:
-

potiuk commented on pull request #5498: [AIRFLOW-4509] SubDagOperator using 
scheduler instead of backfill
URL: https://github.com/apache/airflow/pull/5498
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SubDagOperator using scheduler instead of backfill
> --
>
> Key: AIRFLOW-4509
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4509
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
> Fix For: 1.10.5
>
>
> Make SubDagOperator use Airflow scheduler instead of backfill to schedule 
> tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk merged pull request #5498: [AIRFLOW-4509] SubDagOperator using scheduler instead of backfill

2019-08-07 Thread GitBox
potiuk merged pull request #5498: [AIRFLOW-4509] SubDagOperator using scheduler 
instead of backfill
URL: https://github.com/apache/airflow/pull/5498
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r311701670
 
 

 ##
 File path: airflow/www/utils.py
 ##
 @@ -374,6 +376,30 @@ def get_chart_height(dag):
 return 600 + len(dag.tasks) * 10
 
 
+def get_python_source(x: Union[Callable, str]) -> str:
+"""
+Helper function to get Python source (or not), preventing exceptions
+"""
+if isinstance(x, str):
+return x
+source_code = None
+if isinstance(x, functools.partial):
+source_code = inspect.getsource(x.func)
+if source_code is None:
+try:
+source_code = inspect.getsource(x)
+except TypeError:
+pass
+if source_code is None:
+try:
+source_code = inspect.getsource(x.__call__)
 
 Review comment:
   This fn was copied from Airflow 1.10.4 (it was removed from master)
   https://github.com/apache/airflow/blob/1.10.4/airflow/www/utils.py#L407
   
   I did a test to confirm it is for a class that implements __call__:
   import inspect
   
   class A:
 def `__call__`(self, x):
   return x+1
   
   a = A()
   print(inspect.getsource(a.__call__)) # it works
   print(inspect.getsource(a)) # it does not


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-4686) Make dags Pylint compatible

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902393#comment-16902393
 ] 

ASF GitHub Bot commented on AIRFLOW-4686:
-

feluelle commented on pull request #5753: [AIRFLOW-4686] Make dags Pylint 
compatible
URL: https://github.com/apache/airflow/pull/5753
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4686
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make dags Pylint compatible
> ---
>
> Key: AIRFLOW-4686
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4686
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Bas Harenslak
>Assignee: Felix Uellendall
>Priority: Major
>
> Fix all Pylint messages in dags. To start; running scripts/ci/ci_pylint.sh on 
> master should produce no messages. (1) Remove the files mentioned in your 
> issue from the blacklist. (2) Run scripts/ci/ci_pylint.sh to see all messages 
> on the no longer blacklisted files. (3) Fix all messages and create PR.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r311701670
 
 

 ##
 File path: airflow/www/utils.py
 ##
 @@ -374,6 +376,30 @@ def get_chart_height(dag):
 return 600 + len(dag.tasks) * 10
 
 
+def get_python_source(x: Union[Callable, str]) -> str:
+"""
+Helper function to get Python source (or not), preventing exceptions
+"""
+if isinstance(x, str):
+return x
+source_code = None
+if isinstance(x, functools.partial):
+source_code = inspect.getsource(x.func)
+if source_code is None:
+try:
+source_code = inspect.getsource(x)
+except TypeError:
+pass
+if source_code is None:
+try:
+source_code = inspect.getsource(x.__call__)
 
 Review comment:
   This fn was copied from Airflow 1.10.4 (it was removed from master)
   https://github.com/apache/airflow/blob/1.10.4/airflow/www/utils.py#L407
   
   I did a test to confirm it is for a class that implements `__call__`:
   import inspect
   
   class A:
 def `__call__`(self, x):
   return x+1
   
   a = A()
   print(inspect.getsource(a.`__call__`)) # it works
   print(inspect.getsource(a)) # it does not


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] feluelle opened a new pull request #5753: [AIRFLOW-4686] Make dags Pylint compatible

2019-08-07 Thread GitBox
feluelle opened a new pull request #5753: [AIRFLOW-4686] Make dags Pylint 
compatible
URL: https://github.com/apache/airflow/pull/5753
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-4686
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r311701670
 
 

 ##
 File path: airflow/www/utils.py
 ##
 @@ -374,6 +376,30 @@ def get_chart_height(dag):
 return 600 + len(dag.tasks) * 10
 
 
+def get_python_source(x: Union[Callable, str]) -> str:
+"""
+Helper function to get Python source (or not), preventing exceptions
+"""
+if isinstance(x, str):
+return x
+source_code = None
+if isinstance(x, functools.partial):
+source_code = inspect.getsource(x.func)
+if source_code is None:
+try:
+source_code = inspect.getsource(x)
+except TypeError:
+pass
+if source_code is None:
+try:
+source_code = inspect.getsource(x.__call__)
 
 Review comment:
   This fn was copied from Airflow 1.10.4 (it was removed from master)
   https://github.com/apache/airflow/blob/1.10.4/airflow/www/utils.py#L407
   
   I did a test to confirm it is for a class that implements __call__:
   `import inspect
   
   class A:
 def __call__(self, x):
   return x+1
   
   a = A()
   print(inspect.getsource(a.__call__)) # it works
   print(inspect.getsource(a)) # it does not`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r311701670
 
 

 ##
 File path: airflow/www/utils.py
 ##
 @@ -374,6 +376,30 @@ def get_chart_height(dag):
 return 600 + len(dag.tasks) * 10
 
 
+def get_python_source(x: Union[Callable, str]) -> str:
+"""
+Helper function to get Python source (or not), preventing exceptions
+"""
+if isinstance(x, str):
+return x
+source_code = None
+if isinstance(x, functools.partial):
+source_code = inspect.getsource(x.func)
+if source_code is None:
+try:
+source_code = inspect.getsource(x)
+except TypeError:
+pass
+if source_code is None:
+try:
+source_code = inspect.getsource(x.__call__)
 
 Review comment:
   This fn was copied from Airflow 1.10.4 (it was removed from master)
   https://github.com/apache/airflow/blob/1.10.4/airflow/www/utils.py#L407
   
   I did a test to confirm it is for a class that implements __call__:
   import inspect
   
   class A:
 def __call__(self, x):
   return x+1
   
   a = A()
   print(inspect.getsource(a.__call__)) # it works
   print(inspect.getsource(a)) # it does not


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5701: [AIRFLOW-5088][AIP-24] Add 
DAG serialization using JSON
URL: https://github.com/apache/airflow/pull/5701#discussion_r311701670
 
 

 ##
 File path: airflow/www/utils.py
 ##
 @@ -374,6 +376,30 @@ def get_chart_height(dag):
 return 600 + len(dag.tasks) * 10
 
 
+def get_python_source(x: Union[Callable, str]) -> str:
+"""
+Helper function to get Python source (or not), preventing exceptions
+"""
+if isinstance(x, str):
+return x
+source_code = None
+if isinstance(x, functools.partial):
+source_code = inspect.getsource(x.func)
+if source_code is None:
+try:
+source_code = inspect.getsource(x)
+except TypeError:
+pass
+if source_code is None:
+try:
+source_code = inspect.getsource(x.__call__)
 
 Review comment:
   This fn was copied from Airflow 1.10.4. It was removed from master.
   https://github.com/apache/airflow/blob/1.10.4/airflow/www/utils.py#L407
   
   I did a test to confirm it is for a class that implements __call__:
   import inspect
   
   class A:
 def __call__(self, x):
   return x+1
   
   a = A()
   print(inspect.getsource(a.__call__)) # it works
   print(inspect.getsource(a)) # it does not


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] potiuk commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
potiuk commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751#issuecomment-519211989
 
 
   Yeah. @OmerJog is right. We will add tests for that for all GCP operators. 
We are in the process of unifying all GCP operators so I added 
https://issues.apache.org/jira/browse/AIRFLOW-5137 to cover that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5137) Add templated fields tests for all GCP operators

2019-08-07 Thread Jarek Potiuk (JIRA)
Jarek Potiuk created AIRFLOW-5137:
-

 Summary: Add templated fields tests for all GCP operators
 Key: AIRFLOW-5137
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5137
 Project: Apache Airflow
  Issue Type: Improvement
  Components: gcp
Affects Versions: 1.10.4, 2.0.0
Reporter: Jarek Potiuk
Assignee: Jarek Potiuk


We should make sure that all gcp operators have tests for templated fields.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] coufon commented on a change in pull request #5743: [AIRFLOW-5088] persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
coufon commented on a change in pull request #5743: [AIRFLOW-5088] persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311694023
 
 

 ##
 File path: airflow/models/dag.py
 ##
 @@ -52,6 +53,8 @@
 from airflow.utils.sqlalchemy import UtcDateTime, Interval
 from airflow.utils.state import State
 
+DAGCACHED_ENABLED = configuration.getboolean('core', 'dagcached', 
fallback=True)
 
 Review comment:
   Good point. I will change this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5045) Add ability to create Google Dataproc cluster with custom image from a different project

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902371#comment-16902371
 ] 

ASF GitHub Bot commented on AIRFLOW-5045:
-

idralyuk commented on pull request #5752: [AIRFLOW-5045] Add ability to create 
Google Dataproc cluster with cus…
URL: https://github.com/apache/airflow/pull/5752
 
 
   …tom image from a different project
   
   ### Jira
   
   https://issues.apache.org/jira/browse/AIRFLOW-5045
   
   ### Description
   
   [AIRFLOW-2797](https://github.com/apache/airflow/pull/3871) added 
functionality to create Dataproc clusters with custom images. Unfortunately the 
images can only be defined in the current project. This PR adds ability to 
define a custom image project.
   
   For more info see: 
https://cloud.google.com/dataproc/docs/guides/dataproc-images
   
   ### Tests
   
   One test added:
   
   it checks whether the custom image project id was set correctly
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [X] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ability to create Google Dataproc cluster with custom image from a 
> different project
> 
>
> Key: AIRFLOW-5045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, operators
>Affects Versions: 1.10.3
>Reporter: Igor
>Assignee: Igor
>Priority: Minor
>
> Custom image support has been added for Dataproc by 
> https://issues.apache.org/jira/browse/AIRFLOW-2797.
> Unfortunately the code assumes that the images come from the same project.
> It would be useful to add a new parameter 'custom_image_project_id' to 
> specify the source project of the image.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] idralyuk opened a new pull request #5752: [AIRFLOW-5045] Add ability to create Google Dataproc cluster with cus…

2019-08-07 Thread GitBox
idralyuk opened a new pull request #5752: [AIRFLOW-5045] Add ability to create 
Google Dataproc cluster with cus…
URL: https://github.com/apache/airflow/pull/5752
 
 
   …tom image from a different project
   
   ### Jira
   
   https://issues.apache.org/jira/browse/AIRFLOW-5045
   
   ### Description
   
   [AIRFLOW-2797](https://github.com/apache/airflow/pull/3871) added 
functionality to create Dataproc clusters with custom images. Unfortunately the 
images can only be defined in the current project. This PR adds ability to 
define a custom image project.
   
   For more info see: 
https://cloud.google.com/dataproc/docs/guides/dataproc-images
   
   ### Tests
   
   One test added:
   
   it checks whether the custom image project id was set correctly
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [X] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] OmerJog commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
OmerJog commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751#issuecomment-519204660
 
 
   @kaxil I think we are missing a test like this:
   
https://github.com/apache/airflow/blob/master/tests/contrib/operators/test_gcp_compute_operator.py#L71-L93
   to confirm that templated fields are working
   Such test would have prevented the bad cherry picked doesn't it?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] OmerJog edited a comment on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
OmerJog edited a comment on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751#issuecomment-519204660
 
 
   @kaxil I think we are missing a test like this:
   
https://github.com/apache/airflow/blob/master/tests/contrib/operators/test_gcp_compute_operator.py#L71-L93
   to confirm that templated fields are working and actual templating is being 
done
   Such test would have prevented the bad cherry picked doesn't it?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-4686) Make dags Pylint compatible

2019-08-07 Thread Felix Uellendall (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Uellendall reassigned AIRFLOW-4686:
-

Assignee: Felix Uellendall

> Make dags Pylint compatible
> ---
>
> Key: AIRFLOW-4686
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4686
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Bas Harenslak
>Assignee: Felix Uellendall
>Priority: Major
>
> Fix all Pylint messages in dags. To start; running scripts/ci/ci_pylint.sh on 
> master should produce no messages. (1) Remove the files mentioned in your 
> issue from the blacklist. (2) Run scripts/ci/ci_pylint.sh to see all messages 
> on the no longer blacklisted files. (3) Fix all messages and create PR.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5045) Add ability to create Google Dataproc cluster with custom image from a different project

2019-08-07 Thread Igor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated AIRFLOW-5045:
--
Description: 
Custom image support has been added for Dataproc by 
https://issues.apache.org/jira/browse/AIRFLOW-2797.

Unfortunately the code assumes that the images come from the same project.

It would be useful to add a new parameter 'custom_image_project_id' to specify 
the source project of the image.

  was:
Thank you for adding support for custom images in Dataproc 
(https://issues.apache.org/jira/browse/AIRFLOW-2797).

Unfortunately the code assumes that the images come from the same project.

Would it be possible to add a new parameter 'image_project' to specify the 
source project of the image?


> Add ability to create Google Dataproc cluster with custom image from a 
> different project
> 
>
> Key: AIRFLOW-5045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, operators
>Affects Versions: 1.10.3
>Reporter: Igor
>Assignee: Igor
>Priority: Minor
>
> Custom image support has been added for Dataproc by 
> https://issues.apache.org/jira/browse/AIRFLOW-2797.
> Unfortunately the code assumes that the images come from the same project.
> It would be useful to add a new parameter 'custom_image_project_id' to 
> specify the source project of the image.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (AIRFLOW-5045) Add ability to create Google Dataproc cluster with custom image from a different project

2019-08-07 Thread Igor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-5045 started by Igor.
-
> Add ability to create Google Dataproc cluster with custom image from a 
> different project
> 
>
> Key: AIRFLOW-5045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, operators
>Affects Versions: 1.10.3
>Reporter: Igor
>Assignee: Igor
>Priority: Minor
>
> Thank you for adding support for custom images in Dataproc 
> (https://issues.apache.org/jira/browse/AIRFLOW-2797).
> Unfortunately the code assumes that the images come from the same project.
> Would it be possible to add a new parameter 'image_project' to specify the 
> source project of the image?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (AIRFLOW-5045) Add ability to create Google Dataproc cluster with custom image from a different project

2019-08-07 Thread Igor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor reassigned AIRFLOW-5045:
-

Assignee: Igor  (was: Jarosław Śmietanka)

> Add ability to create Google Dataproc cluster with custom image from a 
> different project
> 
>
> Key: AIRFLOW-5045
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5045
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, operators
>Affects Versions: 1.10.3
>Reporter: Igor
>Assignee: Igor
>Priority: Minor
>
> Thank you for adding support for custom images in Dataproc 
> (https://issues.apache.org/jira/browse/AIRFLOW-2797).
> Unfortunately the code assumes that the images come from the same project.
> Would it be possible to add a new parameter 'image_project' to specify the 
> source project of the image?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] OmerJog commented on a change in pull request #5647: [AIRFLOW-3628] Add smtp_mime_from

2019-08-07 Thread GitBox
OmerJog commented on a change in pull request #5647: [AIRFLOW-3628] Add 
smtp_mime_from
URL: https://github.com/apache/airflow/pull/5647#discussion_r311681398
 
 

 ##
 File path: airflow/config_templates/default_airflow.cfg
 ##
 @@ -349,6 +349,7 @@ smtp_ssl = False
 smtp_port = 25
 smtp_mail_from = airf...@example.com
 
+smtp_mime_from = airflow
 
 Review comment:
   @mik-laj  i agree. Applied a fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088] persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088] persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311660129
 
 

 ##
 File path: airflow/models/dag.py
 ##
 @@ -52,6 +53,8 @@
 from airflow.utils.sqlalchemy import UtcDateTime, Interval
 from airflow.utils.state import State
 
+DAGCACHED_ENABLED = configuration.getboolean('core', 'dagcached', 
fallback=True)
 
 Review comment:
   WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on issue #5743: [AIRFLOW-5088] persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
kaxil commented on issue #5743: [AIRFLOW-5088] persisting serialized DAG in DB 
for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-519182354
 
 
   This PR is a follow up of https://github.com/apache/airflow/pull/5701


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] blcksrx commented on issue #5724: [AIRFLOW-5113] Support icon url in slack web hook

2019-08-07 Thread GitBox
blcksrx commented on issue #5724: [AIRFLOW-5113] Support icon url in slack web 
hook
URL: https://github.com/apache/airflow/pull/5724#issuecomment-519178761
 
 
   @potiuk  Hi! According to this PR, can you add the @alopeyk comapny on the 
list of comapnies that uses airflow?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088] persisting serialized DAG in DB for webserver scalability

2019-08-07 Thread GitBox
kaxil commented on a change in pull request #5743: [AIRFLOW-5088] persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r311652074
 
 

 ##
 File path: airflow/models/dag.py
 ##
 @@ -52,6 +53,8 @@
 from airflow.utils.sqlalchemy import UtcDateTime, Interval
 from airflow.utils.state import State
 
+DAGCACHED_ENABLED = configuration.getboolean('core', 'dagcached', 
fallback=True)
 
 Review comment:
   We should probably just have this setting at one place (May be in 
`settings.py`) and import it from there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] feluelle commented on issue #5335: [AIRFLOW-4588] Add GoogleDiscoveryApiHook and GoogleApiToS3Transfer

2019-08-07 Thread GitBox
feluelle commented on issue #5335: [AIRFLOW-4588] Add GoogleDiscoveryApiHook 
and GoogleApiToS3Transfer
URL: https://github.com/apache/airflow/pull/5335#issuecomment-519173006
 
 
   @potiuk Agree note about it in the `integration.rst` would be nice.
   ```
   When accessing Google's Services you can choose out of two integrated 
options for doing operations.
   
   The first and recommended option is to use the dedicated Hooks and Operators 
that are only integrating one specific service - but do that very stable.
   You can identify them on the **gcp** prefix of the file name.
   
   The second option to use Google Services is to use the 
**GoogleDiscoveryApiHook** to request data from any Google Service registered 
in the [Google API Explorer](https://developers.google.com/apis-explorer/#p/) - 
deprecated way but more generic to not having to write custom service logic.
   
   For more Information which libraries these two options use see below:
   ```
   plus "side note" (what @mik-laj just wrote)
   > Google provides two types of API libraries:
   > 
   > * The Google APIs Python Client is a client library that are compatibilble 
with **all Google APIs**.
   > 
   > * The google-cloud-* libraries are built specifically for **the Google 
Cloud Platform** and provide the recommended way to integrate Google Cloud 
APIs. This provides much better experience for programmers.
   > 
   > 
   > It is worth noting that the Discovery library supports all Google services 
also those less popular among our clients. The google-cloud-* libraries support 
**only** Google Cloud Platform services. I do not know other libraries that 
integrate with Google API. Both libraries have identical authorization 
mechanisms, but they differ in implementation e.g. The Google APIs Python 
Client use HTTPS only. In most cases the google-cloud-* libraries use protobuf. 
In rare cases, they also use HTTPs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-4931) Add KMS Encryption Configuration to BigQuery Hook and Operators

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4931:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Add KMS Encryption Configuration to BigQuery Hook and Operators
> ---
>
> Key: AIRFLOW-4931
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4931
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Critical
> Fix For: 2.0.0
>
>
> One of the clients requires adding KMS encryption on BigQuery tables. 
> Reference:
> [https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#ExternalDataConfiguration]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4547) Negative priority_weight should not be permitted

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4547:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Negative priority_weight should not be permitted
> 
>
> Key: AIRFLOW-4547
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4547
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.3
>Reporter: Teresa Martyny
>Priority: Major
> Fix For: 2.0.0
>
>
> Airflow allows a dev to assign a negative priority_weight to a task. However, 
> the Airflow code does math to determine the priority_weight on its own in 
> models.py#priority_weight_total on line 2796
> This makes the final priority_weight wrong in the end. Airflow should raise 
> an error if an operator has priority_weight assigned to a negative number at 
> any point. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4309) Delete dag doesn't delete everything

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4309:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Delete dag doesn't delete everything
> 
>
> Key: AIRFLOW-4309
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4309
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.3
>Reporter: lovk korm
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: file1.PNG
>
>
> I uploaded a broken DAG. A message regarding it appeared in the UI as it 
> should.
> Then I deleted a DAG using the UI button.
> The DAG is gone from the UI but a warring about broken DAG keeps appearing.
> It seems like the delete doesn't clean up everything. I can't get rid of this 
> message.
>  
> edit:
> Also notice bug in counting (marked in yellow in the image attached)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4484) CeleryExecutor#sync timesout when fetching celery task state

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4484:

Fix Version/s: (was: 1.10.4)
   2.0.0

> CeleryExecutor#sync timesout when fetching celery task state
> 
>
> Key: AIRFLOW-4484
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4484
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.3
>Reporter: Teresa Martyny
>Priority: Major
> Fix For: 2.0.0
>
>
> We're seeing a quite a few process time outs raising during our core pipeline 
> run. We traced it back to CeleryExecutor#sync calling 
> #fetch_celery_task_state and timing out here:
>  
> airflow/utils/timeout.py:
> {code:python}
> def handle_timeout(self, signum, frame):
>  self.log.error("Process timed out, PID: %s", str(os.getpid()))
>  raise AirflowTaskTimeout(self.error_message)
> {code}
> called from here: airflow/executors/celery_executor.py#fetch_celery_task_state
> {code:python}
> with timeout(seconds=2):
>  # Accessing state property of celery task will make actual network request
>  # to get the current state of the task.
>  res = (celery_task[0], celery_task[1].state) 
> {code}
> along with an AirflowTaskTimeout raising here: 
> airflow/executors/celery_executor.py#heartbeat
> {code:python}
> if isinstance(result, ExceptionWithTraceback): 
> self.log.error(
> CELERY_SEND_ERR_MSG_HEADER + ":{}\n{}\n".format(
> result.exception, result.traceback))
> {code}
>  
> This code was introduced in this new feature:
> https://issues.apache.org/jira/browse/AIRFLOW-2761
> Are there configuration settings that we need to set as a result of this new 
> code to avoid these timeouts?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4315) Improve Airflow's Experimantal API for all kind of monitoring requirements

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4315:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Improve Airflow's Experimantal API for all kind of monitoring requirements
> --
>
> Key: AIRFLOW-4315
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4315
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.3
>Reporter: Ravi Agarwal
>Assignee: Ravi Agarwal
>Priority: Major
> Fix For: 2.0.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> *Goal*
>  
> We want to contribute to Airflow’s experimental APIs by extending the set of 
> endpoints. to enable monitoring for the DAGs. In order to achieve this, we 
> would like airflow to have the following capabilities
> A) Get the list of all DAGs in airflow
> B) Get the details for a particular DAG identified by a dag_id
> C) Get list of tasks for a dag with their sequence details for a particular 
> DAG identified by a dag_id
>  D) Get details for a task identified by dag_id and task_id.
> Historic Results for DAG Run:
>  E) Get details of all dag_runs for a particular DAG identified by a dag_id
> F) Get details of a specific dag_run for a particular DAG identified by 
> dag_id and execution_date
> G) Get details of all dag_runs filtered by various parameters like state, 
> execution time, execution interval etc.
> Historic Results for Task Instance:
>  -H) Get list of task_instances for a dag_run of a particular DAG identified 
> by a dag_id and execution_date-
>  I) Get details of a task_instance for a dag_run of a DAG identified by 
> dag_id, execution_date and task_id
> J) Get details of all task_instances filtered by various parameters like 
> state, execution interval etc.
> Logs Monitoring:
>  K) Get logs pertaining to a particular task_instance identified by dag_id, 
> execution_date and task_id
> h1. What already exists in Airflow's experimental API and changes which would 
> improve the usability:
>  
> E1.
>     GET /api/experimental/dags//dag_runs
>     GET /api/experimental/dags//dag_runs?state=
>        Returns a list of Dag Runs for a specific DAG ID.
>  
> This satisfies the requirement ‘D’, but it will be good to add more filters 
> to this endpoint, like being able to filter dag_runs that were run within a 
> given time interval, or before given time or after a given time, or with 
> states NOT equal to the given state.
>  
>     Proposal -
>    GET /api/experimental/dags//dag_runs?state_not_equal=    
>    GET /api/experimental/dags//dag_runs?execution_before=   
>    GET /api/experimental/dags//dag_runs?execution_after=   
> E2.
>     GET /api/experimental/dags//dag_runs/
> Returns a JSON with a dag_run’s public instance variables. The format for the 
>  is expected to be “-mm-DDTHH:MM:SS”, for example: 
> “2016-11-16T11:34:15”.
>  
> This endpoint is a good candidate to satisfy the requirement ‘E’, but it 
> returns nothing but the state of the identified dag_run. This should have a 
> lot more details about the dag_run than just state.
>  
>     Proposal :
>  
> Modify the response object of this endpoint to return same details as 
> [/dags//dag_runs] returns for each object in it's list
>  
> E3.
>     GET /api/experimental/dags//tasks/   
> Returns info for a task.
>  
> This endpoint satisfies the requirement ‘K’, and can be improved by adding 
> details as to which tasks are upstream and downstream to the identified one 
> and also the information regarding the operator type would be useful in the 
> response.
>  
>     Proposal :
>  
>    Add operator type, list of upstream tasks & list of downstream tasks 
> to the response object to increase usability.
>  
> E4.
>     GET 
> /api/experimental/dags//dag_runs//tasks/ 
>  
> Returns a JSON with a task instance’s public instance variables. The format 
> for the  is expected to be “-mm-DDTHH:MM:SS”, for 
> example: “2016-11-16T11:34:15”.
>  
> This endpoint satisfies the requirement ‘G’, and can be improved by adding 
> details as to which tasks were executed upstream and downstream. Also, it can 
> be useful from a monitoring perspective, to also return the number of 
> attempts already made, if running, or number of attempts made in total if 
> failed/successful for a given task.
>  
>     Proposal :
>  
> Add list of upstream task_instances, list of downstream task_instances, and 
> number of attempts to the response object to increase usability.
>  
> E5.
>     GET /api/experimental/latest_runs
> Returns the latest DagRun for each DAG formatted for the UI.
>  
>     This endpoint satisfies the requirement ‘F’ partially.
> h1. Proposed New Endpoints 

[jira] [Updated] (AIRFLOW-4424) Scheduler does not terminate after num_runs when executor is KubernetesExecutor

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4424:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Scheduler does not terminate after num_runs when executor is 
> KubernetesExecutor
> ---
>
> Key: AIRFLOW-4424
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4424
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.10.3
> Environment: EKS, deployed with stable airflow helm chart
>Reporter: Brian Nutt
>Priority: Blocker
>  Labels: kubernetes
> Fix For: 2.0.0
>
>
> When using the executor like the CeleryExecutor and num_runs is set on the 
> scheduler, the scheduler pod restarts after num runs have completed. After 
> switching to KubernetesExecutor, the scheduler logs:
> [2019-04-26 19:20:43,562] \{{kubernetes_executor.py:770}} INFO - Shutting 
> down Kubernetes executor
> However, the scheduler process does not complete. This leads to the scheduler 
> pod never restarting and running num_runs again. Resulted in having to roll 
> back to CeleryExecutor because if num_runs is -1, the scheduler builds up 
> tons of defunct processes, which is eventually making tasks not able to be 
> scheduled as the underlying nodes have run out of file descriptors.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3954) Ensure airflow.api.auth.backend.default works with connexion

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3954:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Ensure airflow.api.auth.backend.default works with connexion
> 
>
> Key: AIRFLOW-3954
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3954
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: api, authentication
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Priority: Major
>  Labels: aip-13, openapi
> Fix For: 2.0.0
>
>
> Make sure all the existing test cases for the default auth method pass, and 
> perform some smoke tests to ensure the authentication works in practice.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3953) Refactor API route handlers to use Connexion

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3953:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Refactor API route handlers to use Connexion
> 
>
> Key: AIRFLOW-3953
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3953
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Priority: Major
>  Labels: aip-13, openapi
> Fix For: 2.0.0
>
>
> All existing routes under {{/api}} (eg, in 
> {{airflow/www/api/experimental/endpoints}} should be refactored to be handler 
> by the [connexion|https://connexion.readthedocs.io/en/latest/] library.
> Where possible this should be done by affecting the existing codebase as 
> little as possible. There is some bridging code already written to handle 
> this. 
> https://github.com/apache/airflow/pull/4640/files#diff-859f51803a6d3ee3746285e5abe016bbR90



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4724) Make params dict to be templated for operators

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4724:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Make params dict to be templated for operators
> --
>
> Key: AIRFLOW-4724
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4724
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, operators
>Affects Versions: 1.10.3
>Reporter: jack
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently using params dict as:
> {code:java}
> EXEC_TIMESEPOCH =  "{{ execution_date.strftime('%s') }}"
> gcs_export_uri_template_filename = 'product_dwh-' + EXEC_TIMESEPOCH + '.csv'
> upload_file_ftp_op = BashOperator(
>    task_id='upload_file_ftp_task',
>    params={'filename':gcs_export_uri_template_filename},
>    bash_command="python3.6 /home/ubuntu/airflow/scripts/ranker.py  '{{ 
> params.filename }}'  " ,
>   dag=dag)
> {code}
> Gives:
> {code:java}
> python3.6 /home/ubuntu/airflow/scripts/ranker.py  'product_dwh-{{ 
> execution_date.strftime('%s') }}.csv'{code}
>  
> The BaseOperator says:
> {code:java}
> self.params = params or {} # Available in templates!{code}
> [https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L343]
> But as you can see above the code wasn't templated as expected.
>  
> I worked-around this by not using params dict as:
> {code:java}
> cmd = """python3.6 /home/ubuntu/airflow/scripts/ranker.py 'product_dwh-{{ 
> execution_date.strftime('%s') }}.csv' """
> upload_file_ftp_op = BashOperator(
>     task_id='upload_file_ftp_task',
>     bash_command = cmd,
>     dag=dag){code}
> This code works perfectly and while it's simpler and better the first code 
> should have still work.
>  
> A discussion about this has been on slack:
> [https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1559134166151100]
>  
> Since Slack doesn't save history forever [~feluelle] , [~dlamblin], if you 
> have something to comment please post it here so there will be a reference to 
> whomever pick this one up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4734) Upsert functionality for PostgresHook.insert_rows()

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4734:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Upsert functionality for PostgresHook.insert_rows()
> ---
>
> Key: AIRFLOW-4734
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4734
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 1.10.3
>Reporter: William Tran
>Assignee: William Tran
>Priority: Minor
>  Labels: features
> Fix For: 2.0.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> PostgresHook's parent class, DbApiHook, implements upsert in its 
> insert_rows() method with the replace=True flag. However, the underlying 
> generated SQL is specific to MySQL's "REPLACE INTO" syntax and is not 
> applicable to Postgres.
> I'd like to override this method in PostgresHook to implement the "INSERT ... 
> ON CONFLICT DO UPDATE" syntax (new since Postgres 9.5)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4462) MSSQL backend broken

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4462:

Fix Version/s: (was: 1.10.4)
   2.0.0

> MSSQL backend broken
> 
>
> Key: AIRFLOW-4462
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4462
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database
>Affects Versions: 1.10.1, 1.10.3
>Reporter: Bijay Deo
>Assignee: Bijay Deo
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 2.0.0
>
>
> Airflow dag trigger doesn't work mssql azure, mssql 2017 (tested), Other 
> version of mssql must have the same issue. Basically, airflow can't be used 
> on mssql without fixing this issue. Just click a number of manual triggers 
> and it will fail.
> Dag trigger woks only when the execution_date is missing the milliseconds 
> part.  when the millisecond part is non-zero upto the last digit, it fails to 
> trigger. 
> The problem is execution_date input having microsecond from pyodbc where it 
> compares equality on task_instance table. Since execution_date is up to 
> millisecond precision  in db(sql server also rounds this value), the equality 
> fails on certain values, e.g, 2019-04-28 00:34:26.517, 2019-04-20T18:51:35.033
> Stack trace below:
> Connected to pydev debugger (build 191.6605.12)
> [2019-04-23 21:49:13,361] \{settings.py:182} INFO - settings.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800, pid=78750
> [2019-04-23 21:49:13,523] \{__init__.py:51} INFO - Using executor 
> SequentialExecutor
> 2019-04-23 21:49:17,318 INFO sqlalchemy.engine.base.Engine SELECT 
> CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
> [2019-04-23 21:49:17,318] \{log.py:110} INFO - SELECT 
> CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
> 2019-04-23 21:49:17,319 INFO sqlalchemy.engine.base.Engine ()
> [2019-04-23 21:49:17,319] \{log.py:110} INFO - ()
> 2019-04-23 21:49:17,324 INFO sqlalchemy.engine.base.Engine SELECT 
> schema_name()
> [2019-04-23 21:49:17,324] \{log.py:110} INFO - SELECT schema_name()
> 2019-04-23 21:49:17,324 INFO sqlalchemy.engine.base.Engine ()
> [2019-04-23 21:49:17,324] \{log.py:110} INFO - ()
> 2019-04-23 21:49:17,329 INFO sqlalchemy.engine.base.Engine SELECT CAST('test 
> plain returns' AS VARCHAR(60)) AS anon_1
> [2019-04-23 21:49:17,329] \{log.py:110} INFO - SELECT CAST('test plain 
> returns' AS VARCHAR(60)) AS anon_1
> 2019-04-23 21:49:17,329 INFO sqlalchemy.engine.base.Engine ()
> [2019-04-23 21:49:17,329] \{log.py:110} INFO - ()
> 2019-04-23 21:49:17,332 INFO sqlalchemy.engine.base.Engine SELECT CAST('test 
> unicode returns' AS NVARCHAR(60)) AS anon_1
> [2019-04-23 21:49:17,332] \{log.py:110} INFO - SELECT CAST('test unicode 
> returns' AS NVARCHAR(60)) AS anon_1
> 2019-04-23 21:49:17,333 INFO sqlalchemy.engine.base.Engine ()
> [2019-04-23 21:49:17,333] \{log.py:110} INFO - ()
> 2019-04-23 21:49:17,337 INFO sqlalchemy.engine.base.Engine SELECT 1
> [2019-04-23 21:49:17,337] \{log.py:110} INFO - SELECT 1
> 2019-04-23 21:49:17,337 INFO sqlalchemy.engine.base.Engine ()
> [2019-04-23 21:49:17,337] \{log.py:110} INFO - ()
> 2019-04-23 21:49:17,340 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
> [2019-04-23 21:49:17,340] \{log.py:110} INFO - BEGIN (implicit)
> 2019-04-23 21:49:17,344 INFO sqlalchemy.engine.base.Engine INSERT INTO log 
> (dttm, dag_id, task_id, event, execution_date, owner, extra) OUTPUT 
> inserted.id VALUES (?, ?, ?, ?, ?, ?, ?)
> [2019-04-23 21:49:17,344] \{log.py:110} INFO - INSERT INTO log (dttm, dag_id, 
> task_id, event, execution_date, owner, extra) OUTPUT inserted.id VALUES (?, 
> ?, ?, ?, ?, ?, ?)
> 2019-04-23 21:49:17,344 INFO sqlalchemy.engine.base.Engine 
> (datetime.datetime(2019, 4, 24, 4, 49, 17, 42277, tzinfo=), 
> 'tutorial', 'print_date', 'cli_run',  [2019-04-20T18:51:35.033000+00:00]>, 'admin', '\{"host_name": 
> "Bijays-MBP.hsd1.ca.comcast.net", "full_command": 
> "[\'/Users/admin/Documents/Development/Java/airflow/airflow/bin/airflow\', 
> \'run\', \ ... (18 characters truncated) ... 
> dmin/Documents/Development/Java/airflow/airflow/example_dags/tutorial.py\', 
> \'--local\', \'tutorial\', \'print_date\', \'2019-04-20 18:51:35.033\']"}')
> [2019-04-23 21:49:17,344] \{log.py:110} INFO - (datetime.datetime(2019, 4, 
> 24, 4, 49, 17, 42277, tzinfo=), 'tutorial', 'print_date', 
> 'cli_run', , 'admin', 
> '\{"host_name": "Bijays-MBP.hsd1.ca.comcast.net", "full_command": 
> "[\'/Users/admin/Documents/Development/Java/airflow/airflow/bin/airflow\', 
> \'run\', \ ... (18 characters truncated) ... 
> dmin/Documents/Development/Java/airflow/airflow/example_dags/tutorial.py\', 
> \'--local\', \'tutorial\', \'print_date\', \'2019-04-20 18:51:35.033\']"}')
> 2019-04-23 21:49:17,351 INFO 

[jira] [Updated] (AIRFLOW-4064) Edit DAG in FAB UI

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4064:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Edit DAG in FAB UI
> --
>
> Key: AIRFLOW-4064
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4064
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api, DAG, ui
>Affects Versions: 1.10.2
>Reporter: Ruslan Fialkovsky
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 2.png
>
>
> The FAB UI haven't "edit" button.
> I have role "Admin" with all permission.
> !2.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4113) Unpin boto library

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4113:

Fix Version/s: (was: 1.10.4)

> Unpin boto library
> --
>
> Key: AIRFLOW-4113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4113
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 1.10.2
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now we're pinning boto on <1.8 because of an underlying issue in moto. 
> This issue has been solved, and therefore it would be nice to update again to 
> a more recent version of boto.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3942) Allow customisable FAB views and menulinks

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3942:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Allow customisable FAB views and menulinks
> --
>
> Key: AIRFLOW-3942
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3942
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Assignee: Drew Sonne
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently, only a subset of options for flask_appbuilder_views and 
> flask_appbuilder_menu_links are passed down to FAB. This change exposes all 
> the arguments from {{appbuilder.add_link}} and {{appbuilder.add_view}} to the 
> plugin author.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4355) Externally triggered DAG is marked as 'success' even if a task has been 'removed'!

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4355:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Externally triggered DAG is marked as 'success' even if a task has been 
> 'removed'!
> --
>
> Key: AIRFLOW-4355
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4355
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Blocker
>  Labels: dynamic
> Fix For: 2.0.0
>
> Attachments: dag_success_even_if_task_removed.png, treeview.png
>
>
> note: all my dags are purely externally triggered
> *Issue:* Dag has 5 parallel tasks that ran successfully and 1 final task that 
> somehow got 'removed' state (prior dag runs had 'failed' state) and never ran 
> successfully but still the DAG is showing success!
>  
> *Command ran* (note that previous commands like airflow trigger_dag -e 
> 20190412 qsr_coremytbl were run before and failed for valid reason (ie python 
> task failing) ):
> airflow trigger_dag -e 20190412T08:00 qsr_coremytbl --conf '\{"hourstr":"08"}'
>  
> *some logs on prior instance of airflow (ec2 was autohealed):*
> [2019-04-18 08:29:40,678] \{logging_mixin.py:95} INFO - [2019-04-18 
> 08:29:40,678] {__init__.py:4897} WARNING - Failed to get task ' qsr_coremytbl.REPAIR_HIVE_schemeh.mytbl 2019-04-12 08:00:00+00:00 [None]>' 
> for dag ''. Marking it as removed.
>  [2019-04-18 08:29:43,582] \{logging_mixin.py:95} INFO - [2019-04-18 
> 08:29:43,582] {__init__.py:4906} INFO - Restoring task ' qsr_coremytbl.REPAIR_HIVE_schemeh.mytbl 2019-04-12 08:00:00+00:00 [removed]>' 
> which was previously removed from DAG ''
>  [2019-04-18 08:29:43,618] \{jobs.py:1787} INFO - Creating / updating 
>  08:00:00+00:00 [scheduled]> in ORM
>  [2019-04-18 08:29:43,676] \{logging_mixin.py:95} INFO - [2019-04-18 
> 08:29:43,676] {__init__.py:4897} WARNING - Failed to get task ' qsr_coremytbl.REPAIR_HIVE_schemeh.mytbl 2019-04-12 08:00:00+00:00 
> [scheduled]>' for dag ''. Marking it as removed.
>  
> *some logs on newer ec2:*
> [myuser@host logs]$ grep -i hive -R * | sed 's#[0-9]#x#g' | sort | uniq -c | 
> grep -v 'airflow-webserver-access.log'
>  2 audit/airflow-audit.log:-xx-xx xx:xx:xx.xx  qsr_coremytbl 
> REPAIR_HIVE_schemeh.mytbl log -xx-xx xx:xx:xx.xx rsawyerx 
> [('execution_date', u'-xx-xxTxx:xx:xx+xx:xx'), ('task_id', 
> u'REPAIR_HIVE_schemeh.mytbl'), ('dag_id', u'qsr_coremytbl')]
>  1 audit/airflow-audit.log:-xx-xx xx:xx:xx.xx  qsr_coremytbl 
> REPAIR_HIVE_schemeh.mytbl log -xx-xx xx:xx:xx.xx rsawyerx 
> [('execution_date', u'-xx-xxTxx:xx:xx+xx:xx'), ('task_id', 
> u'REPAIR_HIVE_schemeh.mytbl'), ('dag_id', u'qsr_coremytbl'), ('format', 
> u'json')]
>  1 audit/airflow-audit.log:-xx-xx xx:xx:xx.xx  qsr_coremytbl 
> REPAIR_HIVE_schemeh.mytbl rendered -xx-xx xx:xx:xx.xx rsawyerx 
> [('execution_date', u'-xx-xxTxx:xx:xx+xx:xx'), ('task_id', 
> u'REPAIR_HIVE_schemeh.mytbl'), ('dag_id', u'qsr_coremytbl')]
>  1 audit/airflow-audit.log:-xx-xx xx:xx:xx.xx  qsr_coremytbl 
> REPAIR_HIVE_schemeh.mytbl task -xx-xx xx:xx:xx.xx rsawyerx 
> [('execution_date', u'-xx-xxTxx:xx:xx+xx:xx'), ('task_id', 
> u'REPAIR_HIVE_schemeh.mytbl'), ('dag_id', u'qsr_coremytbl')]
>  1 scheduler/latest/qsr_dag_generation.py.log:[-xx-xx xx:xx:xx,xxx] 
> \{jobs.py:} INFO - Creating / updating  qsr_coremytbl.REPAIR_HIVE_schemeh.mytbl -xx-xx xx:xx:xx+xx:xx 
> [scheduled]> in ORM
>  71 scheduler/latest/qsr_dag_generation.py.log:[-xx-xx xx:xx:xx,xxx] 
> \{logging_mixin.py:xx} INFO - [-xx-xx xx:xx:xx,xxx] {__init__.py:} 
> INFO - Restoring task ' -xx-xx xx:xx:xx+xx:xx [removed]>' which was previously removed from DAG 
> ''
>  1 scheduler/-xx-xx/qsr_dag_generation.py.log:[-xx-xx xx:xx:xx,xxx] 
> \{jobs.py:} INFO - Creating / updating  qsr_coremytbl.REPAIR_HIVE_schemeh.mytbl -xx-xx xx:xx:xx+xx:xx 
> [scheduled]> in ORM
>  71 scheduler/-xx-xx/qsr_dag_generation.py.log:[-xx-xx xx:xx:xx,xxx] 
> \{logging_mixin.py:xx} INFO - [-xx-xx xx:xx:xx,xxx] {__init__.py:} 
> INFO - Restoring task ' -xx-xx xx:xx:xx+xx:xx [removed]>' which was previously removed from DAG 
> ''
>  
> mysql> *select * from task_instance where task_id like '%REP%';#*
>  
> 

[jira] [Updated] (AIRFLOW-4316) Keys/Variable names in the 'kubernetes_environment_variables' section are transformed to lower case

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4316:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Keys/Variable names in the 'kubernetes_environment_variables' section are 
> transformed to lower case
> ---
>
> Key: AIRFLOW-4316
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4316
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration, contrib
>Affects Versions: 1.10.3
>Reporter: Vlad Precup
>Assignee: QP Hou
>Priority: Major
>  Labels: configuration, kubernetes
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> All the environment variable names declared under the 
> kubernetes_environment_variables section are converted to lowercase in the 
> newly generated pods when running Airflow with the KubernetesExecutor in 
> version 1.10.3.
> The behavior is not aligned with the comments in the documentation / code 
> comments in the default config file, airflow.cfg.
> Investigation helper:
> Since we are treating all the sections uniformly in the configuration 
> component, all the keys are lowered (i.e. `configuration.py`, line 342 ->`key 
> = env_var.replace(section_prefix, '').lower()`)
> Please also write (a) unit test(s) for this special case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4176) [security] webui shows password - admin/log/?flt1_extra_contains=conn_password

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4176:

Fix Version/s: (was: 1.10.4)
   2.0.0

> [security] webui shows password - admin/log/?flt1_extra_contains=conn_password
> --
>
> Key: AIRFLOW-4176
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4176
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Affects Versions: 1.10.2
>Reporter: t oo
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: airf.png
>
>
> First setup hivecli connection:
> source /home/ec2-user/venv/bin/activate; airflow connections -a --conn_id 
> query_hive --conn_type hive_cli --conn_host domainhere --conn_port 1 
> --conn_schema default --conn_extra "\{\"use_beeline\":\"true\", 
> \"ssl-options\":\"ssl=true;sslTrustStore=path-${RUNTIME_ENV}.jks;trustStorePassword=${QUERY_JKS_PASW}\"}"
>  --conn_login ${QUERY_HIVE_USER} --conn_password ${QUERY_HIVE_PASW}
>  
> On the webui navigate to domain/admin/log/?flt1_extra_contains=conn_password
> and you will be able to see cleartext user and password!
> see attachment



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4415) skip status stops propagation randomly.

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4415:

Fix Version/s: (was: 1.10.4)
   2.0.0

> skip status stops propagation randomly.
> ---
>
> Key: AIRFLOW-4415
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4415
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.8.1
>Reporter: Feng Mao
>Priority: Major
> Fix For: 2.0.0
>
>
> Issue: skip status stop propogation to down streams and get randomly stopped 
> with the dag status marked as failed.
> The issue is located in the version 1.8.1.
> In version 1.8.0 there is a temp fix but removed after this version.
> https://github.com/apache/airflow/commit/4077c6de297566a4c598065867a9a27324ae6eb1
> https://github.com/apache/airflow/commit/92965e8275c6f2ec2282ad46c09950bab10c1cb2
>  
> root casue:
>   In a loop, the scheduler evaluate each dag and all its task dependcies 
> around by around.
>   Each round evaluation happens twice in the context of flag_upstream_failed 
> = false and true.
>  
>   The dag run update method mark the dag run deadlocked which stops the dag 
> and all its tasks from be processed furture.
>   https://github.com/apache/airflow/blob/1.8.1/airflow/models.py#L4184
>   It is due to in no_dependencies_met.  All_sccucess trigger rule misses 
> skipped status check and marks the task as failed when upstream only has 
> skipped tasks.
>   https://github.com/apache/airflow/blob/1.8.1/airflow/models.py#L4152
>   
> https://github.com/apache/airflow/blob/1.8.1/airflow/ti_deps/deps/trigger_rule_dep.py#L165
>  
>   Each dag update will checks all its task deps and sent ready tasks to run 
> in the context of flag_upstream_failed=false (defalt)
>   https://github.com/apache/airflow/blob/1.8.1/airflow/models.py#L4156   
> which wont handle skip status propogation.
>  
>   After dag update, dag will checks all its task deps and sent ready tasks to 
> run in the context of flag_upstream_failed=true
>   https://github.com/apache/airflow/blob/1.8.1/airflow/jobs.py#L904
>   which handles skip status propogration.
>   
> https://github.com/apache/airflow/blob/1.8.1/airflow/ti_deps/deps/trigger_rule_dep.py#L138
>  
>   Two potential causes that will trigger dag update detect a deadlock.
>   The skip status proprogatation rely on detected skipped upstreams (which 
> happens asyncly by other nodes writing to db).
>   If the tasks been evaluated  are not following topoloy order(random order) 
> by priority_weigth. It requried multipe loop rounds to propogate skip statue 
> to all downsteam tasks.
>   Depending on how close the topoloy order the tasks fetched, the 
> proprogation may go further or shorter.
>  
>   The deadlock detetion can be avoid only the following  conditions happen at 
> the same time:
>   1. the skip task (shortcurit operation async process) update db with 
> skipped task status, right after dag update (flag_upstream_failed=false 
> )before dag task checks(flag_upstream_failed=true) in scheduler process.
>   2. dag checks(flag_upstream_failed=true) have all tasks fectch/evaluated in 
> the topology order that skip status can propogate in one evaluations round.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4099) Run button inside DAG page

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4099:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Run button inside DAG page
> --
>
> Key: AIRFLOW-4099
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4099
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.10.2
>Reporter: Gonçalo Costa
>Assignee: Gonçalo Costa
>Priority: Trivial
> Fix For: 2.0.0
>
>
> We feel that it would be more functional to get a "Run" button inside DAGs 
> themselves instead of going back to the main page to run it. This would also 
> be useful to run a single SubDAG without the need to look for it inside a 
> parent DAG.
> Being able to run the DAG inside its own page is faster and more intuitive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5115) S3KeySensor template_fields for bucket_name & bucket_key do not support Jinja variables

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5115:

Fix Version/s: (was: 1.10.4)
   2.0.0

> S3KeySensor template_fields for bucket_name & bucket_key do not support Jinja 
> variables
> ---
>
> Key: AIRFLOW-5115
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5115
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.9.0
>Reporter: Dmitriy Synkov
>Assignee: Dmitriy Synkov
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: 2.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> In all Airflow operators (which inherit form {{BaseOperator}}) there is a 
> {{template_fields}} attribute defined as ["which fields will get 
> jinjafied"|https://github.com/apache/airflow/blob/master/airflow/models/baseoperator.py#L218-L219]).
>  For the {{S3KeySensor}} op in specific, these are {{template_fields = 
> ('bucket_key', 'bucket_name')}}.
> The {{bucket_key}} kwarg, however, has some input validation in that the 
> {{bucket_key}} needs to begin with the S3 protocol {{s3://}}; this exception 
> is thrown by the 
> [constructor|https://github.com/apache/airflow/blob/master/airflow/sensors/s3_key_sensor.py#L71-L74],
>  which makes it impossible to use Jinja strings as an arg to {{bucket_key}}, 
> since these don't get rendered in the scope of the DAG {{*.py}} file itself. 
> Below is an example; I'm using Airflow 1.9.0 with Python 3.5.3:
> Given the below DAG code, where "my_s3_key" is 
> {{s3://bucket/prefix/object.txt:}}
> {code:java}
> dag = DAG('sample_dag', start_date=datetime(2019, 8, 1, 12, 15))
> s3_variable_sensor = S3KeySensor(
> task_id='s3_variable_sensor',
> bucket_key=Variable.get('my_s3_key'),
> dag=dag
> )
> s3_jinja_sensor = S3KeySensor(
> task_id='s3_jinja_sensor',
> bucket_key="{{ var.value.my_s3_key }}",
> dag=dag
> )
> {code}
> Executing the first task will run just fine while the next task will throw 
> the following exception:
> {code:java}
> airflow.exceptions.AirflowException: Please provide a bucket_name.
> {code}
> This ticket is to propose a code change that will move input validation out 
> of the constructor to allow for Jinja-templated strings to be passed into 
> both {{bucket_name}} and {{bucket_key}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4003) Extract all example scripts to separate files in doc

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4003:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Extract all example scripts to separate files in doc
> 
>
> Key: AIRFLOW-4003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4003
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.10.2
>Reporter: Chen Tong
>Assignee: Chen Tong
>Priority: Major
> Fix For: 2.0.0
>
>
> By extracting example python script, we could have unittest on them and make 
> sure correction easily.
>  
> {quote} 
> Files can be stored in the {{airflow/example_dags}} directory or 
> {{airflow/contrib/example_dags}} directory. Files from this directory can be 
> automatically tested to confirm their correctness.
> Example:
> {code}
> .. literalinclude:: ../../airflow/example_dags/example_python_operator.py 
> :language: python 
> :start-after: [START howto_operator_python_kwargs] 
> :end-before: [END howto_operator_python_kwargs] 
> {code}
> Source: 
> [https://raw.githubusercontent.com/apache/airflow/master/docs/howto/operator.rst]
> Other scripts are stored in py files.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3886) Add bulk insert feature to db hooks.

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3886:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Add bulk insert feature to db hooks.
> 
>
> Key: AIRFLOW-3886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3886
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Srinivasa Garlapati
>Assignee: Srinivasa Garlapati
>Priority: Minor
> Fix For: 2.0.0
>
>
> Airflow's dbapi_hook doesn't support inserting multiple rows at a time, it's 
> tedious inserting one row at a time.
> Right now airflow supports single inserts as below.
> *INSERT INTO tbl_name (a,b,c) VALUES (1,2,3)*
> It'll be really faster if multiple inserts joined together.
> *{{INSERT INTO tbl_name (a,b,c) VALUES (1,2,3), (4,5,6), (7,8,9);}}*
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4426) Allow to limit the size of airflow-scheduler.out

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4426:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Allow to limit the size of airflow-scheduler.out 
> -
>
> Key: AIRFLOW-4426
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4426
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: jack
>Priority: Major
> Fix For: 2.0.0
>
>
> My airflow-scheduler.out is now more than 5GB in just 3 days.
> Needless to say that this is not ideal.
> Some of us use airflow on local machines and we can't have files growing 
> without limit. The storage space is limited.
> There should be way to limit the size of the file (circular writing). 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-5082) add subject in aws sns hook

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-5082:

Fix Version/s: (was: 1.10.4)
   2.0.0

> add subject in aws sns hook
> ---
>
> Key: AIRFLOW-5082
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5082
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws
>Affects Versions: 1.10.4
>Reporter: MOHAMMD SHAKEEL SHAIK
>Priority: Major
> Fix For: 2.0.0
>
>
> While sending SNS notification to AWS. The subject is an optional field. If 
> we don't send Subject AWS will add default SNS Subject to email "*AWS 
> Notification Message*". If anyone wants to add a different Subject. They can 
> send Subject parameter in AWS SNS hook. 
>  
> It is also optional only



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3804) MySqlToGoogleCloudStorageOperator success when it should fail

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3804:

Fix Version/s: (was: 1.10.4)
   2.0.0

> MySqlToGoogleCloudStorageOperator success when it should fail
> -
>
> Key: AIRFLOW-3804
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3804
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp, operators
>Reporter: jack
>Priority: Major
> Fix For: 2.0.0
>
>
> Testing the following query on MySqlToGoogleCloudStorageOperator.
>  
> {code:java}
> SELECT * FROM table where modifiedTS>2000-01-01 00:00:00 and modifiedTS<= 
> 2019-02-04 13:55:21{code}
>  
> The operator runs smoothly and report success so airflow continue to execute 
> the down stream of the operator.
> However this query is invalid.
> Running it on MySQL will give:
>  
> {code:java}
> Error Code: 1064. You have an error in your SQL syntax; check the manual that 
> corresponds to your MySQL server version for the right syntax to use near 
> '00:00:00  and modifiedTS<= 2019-02-04 13:55:21' at line 11{code}
>  
> The operator should have *FAILD* when running this query it has syntax error.
> There is probably a problem with how this operator treats the result of this 
> query and confuses it with valid result of no rows returned. 
>  
> Not sure if it's related but I'm running the query with SQL file using : 
> filename option of the operator.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4074) Cannot put labels on Cloud Dataproc jobs

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4074:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Cannot put labels on Cloud Dataproc jobs
> 
>
> Key: AIRFLOW-4074
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4074
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Andri Renardi Lauw
>Priority: Major
>  Labels: dataproc, gcp
> Fix For: 2.0.0
>
>
> Hi
> After looking at 
> [https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dataproc_operator.py]
> on PySparkDataprocOperator, I realize that there is no way to put labels on 
> invoked Dataproc jobs here unlike when submitting through GCP console or 
> command line job submission.
> Is it possible to add this functionality?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3955) Ensure airflow.api.auth.backend.deny_all works with connexion

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3955:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Ensure airflow.api.auth.backend.deny_all works with connexion
> -
>
> Key: AIRFLOW-3955
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3955
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: api, authentication
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Priority: Major
>  Labels: aip-13, openapi
> Fix For: 2.0.0
>
>
> Make sure all the existing test cases for the deny_all auth method pass, and 
> perform some smoke tests to ensure the authentication works in practice.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3957) Ensure CLI works with connexion API

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3957:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Ensure CLI works with connexion API
> ---
>
> Key: AIRFLOW-3957
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3957
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api, cli
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Priority: Major
>  Labels: aip-13, openapi
> Fix For: 2.0.0
>
>
> Ensure all the existing CLI commands which access airflow through the API 
> function correctly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4815) Add runAsGroup to securityContext of Kubernetes Executor

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4815:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Add runAsGroup to securityContext of Kubernetes Executor
> 
>
> Key: AIRFLOW-4815
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4815
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executors
>Affects Versions: 1.10.3
>Reporter: jack
>Priority: Major
> Fix For: 2.0.0
>
>
> https://issues.apache.org/jira/browse/AIRFLOW-3274 added {{runAsUser and 
> }}{{fsGroup but not runAsGroup }}{{}}
>  
> {{info and example:}}
> {{[https://kubernetes.io/docs/tasks/configure-pod-container/security-context/]}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4485) All tasks stop running when using reschedule mode due to some tasks having negative a try_number

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4485:

Fix Version/s: (was: 1.10.4)
   2.0.0

> All tasks stop running when using reschedule mode due to some tasks having 
> negative a try_number
> 
>
> Key: AIRFLOW-4485
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4485
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.3
>Reporter: Teresa Martyny
>Priority: Major
> Fix For: 2.0.0
>
>
> When we use reschedule mode for our sensors, about an hour into our core 
> pipeline running, the following happens:
> 1. Negative try_number: We begin to see on the Scheduler `Executor reports 
> execution of [task info here] exited with status success for try_number -1` 
>  this then proceeds to continue to decrement until it reaches try_number 
> -4 - With each run, -4 is the number where the following steps proceed to 
> play out:
> 2. We see a spike(and then stop) in this error message on the Scheduler: 
> `ERROR - Executor reports task instance {} finished ({}) although the task 
> says its {}. Was the task killed externally?` coming from 
> `airflow/jobs.py#_process_executor_events`
> 3. Sometimes followed by a few instances of the error on a single Worker: 
> `Celery command failed` coming from 
> `airflow/executors/celery_executor.py#execute_command`
> 4. Followed on the Worker by one instance of the error: `ZeroDivisionError` 
> 5. Followed by a spike in `ZeroDivisionError` on the Scheduler originating 
> from `airflow/models/__init__.py#next_retry_datetime` line 1183
> 6. The pipeline then grinds to a halt. Tasks sit in a scheduled state in the 
> scheduler, celery won't touch them. If try_numbers go negative, but never 
> make it to negative 4, it doesn't grind to a halt. 
>  
> We identified that the reschedule mode decrements the try_number in 
> `airflow/models/__init__.py#_handle_reschedule` 
> We did not identify why it never re-increments the `try_number` again to 
> ostensibly do what the code is attempting: use the same `try_number` and 
> write to the same log file.
> When we switched the sensors to use poke instead all of the above problems 
> stopped. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4526) KubernetesPodOperator gets stuck in Running state when get_logs is set to True and there is a long gap without logs from pod

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4526:

Fix Version/s: (was: 1.10.4)
   2.0.0

> KubernetesPodOperator gets stuck in Running state when get_logs is set to 
> True and there is a long gap without logs from pod
> 
>
> Key: AIRFLOW-4526
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4526
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
> Environment: Azure Kubernetes Service cluster with Airflow based on 
> puckel/docker-airflow
>Reporter: Christian Lellmann
>Priority: Major
>  Labels: kubernetes
> Fix For: 2.0.0
>
>
> When setting the `get_logs` parameter in the KubernetesPodOperator to True 
> the Operator task get stuck in the Running state if the pod that is run by 
> the task (in_cluster mode) writes some logs and then stops writing logs for a 
> longer time (few minutes) before continuing writing. The continued logging 
> isn't fetched anymore and the pod states aren't checked anymore. So, the 
> completion of the pod isn't recognized and the tasks never finishes.
>  
> Assumption:
> In the `monitor_pod` method of the pod launcher 
> ([https://github.com/apache/airflow/blob/master/airflow/kubernetes/pod_launcher.py#L97])
>  the `read_namespaced_pod_log` method of the kubernetes client get stuck in 
> the `Follow=True` stream 
> ([https://github.com/apache/airflow/blob/master/airflow/kubernetes/pod_launcher.py#L108])
>  because if there is a time without logs from the pod the method doesn't 
> forward the following logs anymore, probably.
> So, the `pod_launcher` doesn't check the pod states later anymore 
> ([https://github.com/apache/airflow/blob/master/airflow/kubernetes/pod_launcher.py#L118])
>  and doesn't recognize the complete state -> the task sticks in Running.
> When disabling the `get_logs` parameter everything works because the log 
> stream is skipped.
>  
> Suggestion:
> Poll the logs actively without the `Follow` parameter set to True in parallel 
> with the pod state checking.
> So, it's possible to fetch the logs without the described connection problem 
> and coincidently check the pod state to be definetly able to recognize the 
> end states of the pods.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-3956) Ensure airflow.api.auth.backend.kerberos_auth works with connexion

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-3956:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Ensure airflow.api.auth.backend.kerberos_auth works with connexion
> --
>
> Key: AIRFLOW-3956
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3956
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: api, authentication
>Affects Versions: 1.10.2
>Reporter: Drew Sonne
>Priority: Major
>  Labels: aip-13, openapi
> Fix For: 2.0.0
>
>
> Make sure all the existing test cases for the kerberos_auth auth method pass, 
> and perform some smoke tests to ensure the authentication works in practice.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (AIRFLOW-4810) Bump supported mysqlclient to <1.5

2019-08-07 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-4810:

Fix Version/s: (was: 1.10.4)
   2.0.0

> Bump supported mysqlclient to <1.5
> --
>
> Key: AIRFLOW-4810
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4810
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.3
>Reporter: Roster
>Priority: Major
> Fix For: 2.0.0
>
>
> version 1.4.X introduced in Jan 2019 
> we should support it if we can.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5136) Fix Bug with Incorrect template_fields in DataProc{*} Operators

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902196#comment-16902196
 ] 

ASF GitHub Bot commented on AIRFLOW-5136:
-

kaxil commented on pull request #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751
 
 
   …Operators
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5136
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Looks like a bad cherry-pick: 
https://github.com/apache/airflow/commit/a87310ba695ce7903ef4176df3d749d73eb32b73
   
   ```
   Traceback (most recent call last):
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 389, in collect_dags safe_mode=safe_mode)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 253, in process_file self.bag_dag(dag, parent_dag=dag, root_dag=dag)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 339, in bag_dag self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 326, in bag_dag dag.resolve_template_files()
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dag.py",
 line 706, in resolve_template_files t.resolve_template_files()
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/baseoperator.py",
 line 699, in resolve_template_files content = getattr(self, attr)
   AttributeError: 'DataProcPySparkOperator' object has no attribute 
'dataproc_pyspark_jars'
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix Bug with Incorrect template_fields in DataProc{*} Operators
> ---
>
> Key: AIRFLOW-5136
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5136
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.5
>
>
> {code:python}
> Traceback (most recent call last):
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 389, in collect_dags safe_mode=safe_mode)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 253, in process_file self.bag_dag(dag, parent_dag=dag, root_dag=dag)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 339, in bag_dag self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 326, in bag_dag dag.resolve_template_files()
> File 
> 

[GitHub] [airflow] kaxil commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
kaxil commented on issue #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751#issuecomment-519165597
 
 
   Targeted towards `1.10-test` branch


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil opened a new pull request #5751: [AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} …

2019-08-07 Thread GitBox
kaxil opened a new pull request #5751: [AIRFLOW-5136] Fix Bug with Incorrect 
template_fields in DataProc{*} …
URL: https://github.com/apache/airflow/pull/5751
 
 
   …Operators
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5136
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   Looks like a bad cherry-pick: 
https://github.com/apache/airflow/commit/a87310ba695ce7903ef4176df3d749d73eb32b73
   
   ```
   Traceback (most recent call last):
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 389, in collect_dags safe_mode=safe_mode)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 253, in process_file self.bag_dag(dag, parent_dag=dag, root_dag=dag)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 339, in bag_dag self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 326, in bag_dag dag.resolve_template_files()
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dag.py",
 line 706, in resolve_template_files t.resolve_template_files()
   File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/baseoperator.py",
 line 699, in resolve_template_files content = getattr(self, attr)
   AttributeError: 'DataProcPySparkOperator' object has no attribute 
'dataproc_pyspark_jars'
   ```
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] feluelle commented on issue #5335: [AIRFLOW-4588] Add GoogleDiscoveryApiHook and GoogleApiToS3Transfer

2019-08-07 Thread GitBox
feluelle commented on issue #5335: [AIRFLOW-4588] Add GoogleDiscoveryApiHook 
and GoogleApiToS3Transfer
URL: https://github.com/apache/airflow/pull/5335#issuecomment-519163270
 
 
   @kaxil You are totally right. We should limit it. I only used it for some 
APIs where their response was quite small.
   
   I could also implement a way to pass a jsonpath as arg to only push specific 
response data to xcom for example like this `'foo[*].baz'` (see 
https://github.com/kennknowles/python-jsonpath-rw) What do you think of this? 
(besides that it would add another package. Maybe I can implement it without 
using another external package.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5136) Fix Bug with Incorrect template_fields in DataProc{*} Operators

2019-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902192#comment-16902192
 ] 

ASF subversion and git services commented on AIRFLOW-5136:
--

Commit 3f64cc8344659890c4e6127dd577b98484b49ac3 in airflow's branch 
refs/heads/v1-10-test from kaxil
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3f64cc8 ]

[AIRFLOW-5136] Fix Bug with Incorrect template_fields in DataProc{*} Operators


> Fix Bug with Incorrect template_fields in DataProc{*} Operators
> ---
>
> Key: AIRFLOW-5136
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5136
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.4
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.5
>
>
> {code:python}
> Traceback (most recent call last):
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 389, in collect_dags safe_mode=safe_mode)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 253, in process_file self.bag_dag(dag, parent_dag=dag, root_dag=dag)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 339, in bag_dag self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
>  line 326, in bag_dag dag.resolve_template_files()
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dag.py",
>  line 706, in resolve_template_files t.resolve_template_files()
> File 
> "/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/baseoperator.py",
>  line 699, in resolve_template_files content = getattr(self, attr)
> AttributeError: 'DataProcPySparkOperator' object has no attribute 
> 'dataproc_pyspark_jars'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (AIRFLOW-5136) Fix Bug with Incorrect template_fields in DataProc{*} Operators

2019-08-07 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-5136:
---

 Summary: Fix Bug with Incorrect template_fields in DataProc{*} 
Operators
 Key: AIRFLOW-5136
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5136
 Project: Apache Airflow
  Issue Type: Bug
  Components: gcp
Affects Versions: 1.10.4
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 1.10.5



{code:python}
Traceback (most recent call last):
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 389, in collect_dags safe_mode=safe_mode)
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 253, in process_file self.bag_dag(dag, parent_dag=dag, root_dag=dag)
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 339, in bag_dag self.bag_dag(subdag, parent_dag=dag, root_dag=root_dag)
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dagbag.py",
 line 326, in bag_dag dag.resolve_template_files()
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/dag.py",
 line 706, in resolve_template_files t.resolve_template_files()
File 
"/usr/local/virtualenv/airflow/local/lib/python2.7/site-packages/airflow/models/baseoperator.py",
 line 699, in resolve_template_files content = getattr(self, attr)
AttributeError: 'DataProcPySparkOperator' object has no attribute 
'dataproc_pyspark_jars'
{code}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (AIRFLOW-5132) Add tests for fallback_to_default_project_id

2019-08-07 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk resolved AIRFLOW-5132.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Add tests for fallback_to_default_project_id
> 
>
> Key: AIRFLOW-5132
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5132
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5132) Add tests for fallback_to_default_project_id

2019-08-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902187#comment-16902187
 ] 

ASF subversion and git services commented on AIRFLOW-5132:
--

Commit 0d1da8c31743701b4ea8b209bb594e14c3d62983 in airflow's branch 
refs/heads/master from Kamil Breguła
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=0d1da8c ]

[AIRFLOW-5132] Add tests for fallback_to_default_project_id (#5746)



> Add tests for fallback_to_default_project_id
> 
>
> Key: AIRFLOW-5132
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5132
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-5132) Add tests for fallback_to_default_project_id

2019-08-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902186#comment-16902186
 ] 

ASF GitHub Bot commented on AIRFLOW-5132:
-

potiuk commented on pull request #5746: [AIRFLOW-5132] Add tests for 
fallback_to_default_project_id
URL: https://github.com/apache/airflow/pull/5746
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add tests for fallback_to_default_project_id
> 
>
> Key: AIRFLOW-5132
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5132
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[GitHub] [airflow] potiuk merged pull request #5746: [AIRFLOW-5132] Add tests for fallback_to_default_project_id

2019-08-07 Thread GitBox
potiuk merged pull request #5746: [AIRFLOW-5132] Add tests for 
fallback_to_default_project_id
URL: https://github.com/apache/airflow/pull/5746
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io commented on issue #5746: [AIRFLOW-5132] Add tests for fallback_to_default_project_id

2019-08-07 Thread GitBox
codecov-io commented on issue #5746: [AIRFLOW-5132] Add tests for 
fallback_to_default_project_id
URL: https://github.com/apache/airflow/pull/5746#issuecomment-519158801
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=h1) 
Report
   > Merging 
[#5746](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/9f14bd8e430b05dc4a3e24fc041807ce01c59321?src=pr=desc)
 will **decrease** coverage by `0.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/5746/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#5746  +/-   ##
   ==
   - Coverage   79.95%   79.69%   -0.27% 
   ==
 Files 497  498   +1 
 Lines   3207832126  +48 
   ==
   - Hits2564925603  -46 
   - Misses   6429 6523  +94
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfb3BlcmF0b3IucHk=)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==)
 | `0% <0%> (-100%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `74.41% <0%> (-4.66%)` | :arrow_down: |
   | 
[airflow/hooks/hive\_hooks.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9oaXZlX2hvb2tzLnB5)
 | `75.82% <0%> (-1.79%)` | :arrow_down: |
   | 
[airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==)
 | `63.88% <0%> (-1.12%)` | :arrow_down: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `87.71% <0%> (-0.88%)` | :arrow_down: |
   | 
[...irflow/contrib/operators/gcp\_container\_operator.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9nY3BfY29udGFpbmVyX29wZXJhdG9yLnB5)
 | `95.83% <0%> (ø)` | :arrow_up: |
   | 
[...ow/contrib/operators/bigquery\_to\_mysql\_operator.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9iaWdxdWVyeV90b19teXNxbF9vcGVyYXRvci5weQ==)
 | `72.91% <0%> (ø)` | |
   | 
[airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5)
 | `93.35% <0%> (+0.49%)` | :arrow_up: |
   | 
[airflow/contrib/hooks/gcp\_api\_base\_hook.py](https://codecov.io/gh/apache/airflow/pull/5746/diff?src=pr=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2djcF9hcGlfYmFzZV9ob29rLnB5)
 | `86.32% <0%> (+0.85%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=footer). 
Last update 
[9f14bd8...8110f19](https://codecov.io/gh/apache/airflow/pull/5746?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-5123) Normalize *_conn_id parameter in GCS operators

2019-08-07 Thread Jarek Potiuk (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk resolved AIRFLOW-5123.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Normalize *_conn_id parameter in GCS operators
> --
>
> Key: AIRFLOW-5123
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5123
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: gcp, operators
>Affects Versions: 1.10.3
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >