[jira] [Commented] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-27 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701011#comment-16701011
 ] 

Bolke de Bruin commented on AIRFLOW-3164:
-

I agree that the backwards compatibility could have been handled better. We 
probably targeted 2.0 around the times this happened, but instead went for 
1.10.1.

However, we have given you the option. If you want to be insecure. Go ahead 
copy the auth and disable the check. It's not a big change. If big companies 
can find the time to maintain an insecure LDAP setup they can also delegate 
some time to add this to their setup (oh and my big company is a good client of 
yours if i'm correct ;) ).

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-27 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700940#comment-16700940
 ] 

Bolke de Bruin commented on AIRFLOW-3164:
-

Because the user in this respect can't really be trusted. We have already had 
reports of people leaving their Airflow installations wide open.

We give you the choice by implementing your own auth backend but then you are 
really on your own.

On your note on FAB's usage in Airflow.  FAB still supports non TLS indeed, but 
we should maybe consider suggesting a patch that disables it. You have plenty 
of time to test it without being required to use it and you are not required to 
upgrade if you don't want. We just don't want to maintain two UIs side by side.

 

Long story short: enable TLS on your LDAP server it is not hard to do and it is 
best practice. There is no reason not to.

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3277) Invalid timezone transition handling for cron schedules

2018-10-30 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-3277:
---

 Summary: Invalid timezone transition handling for cron schedules
 Key: AIRFLOW-3277
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3277
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Bolke de Bruin
 Fix For: 1.10.1


`following_schedule` converts to naive time by using the local time zone. In 
case of a DST transition, say 3AM -> 2AM ("summer time to winter time") we 
generate date times that could overlap with earlier schedules. Therefore a DAG 
that should run every 5 minutes will not do so if it has already seen the 
schedule.

We should not convert to naive and keep UTC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2064) Polish timezone implementation

2018-10-30 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669257#comment-16669257
 ] 

Bolke de Bruin commented on AIRFLOW-2064:
-

[~phani8996] that is a setting in airflow.cfg (default_timezone = XXX, which 
defaults to UTC)

> Polish timezone implementation
> --
>
> Key: AIRFLOW-2064
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2064
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Assignee: Marcus Rehm
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Couple of things are left over after moving to time zone support:
>  
>  # End_dates within dags should be converted to UTC by using the time zone of 
> start_date if naive
>  # Task instances that are instantiated without timezone information for 
> their execution_date should convert those to UTC by using the DAG's timezone 
> or configured
>  # Some doc polishing
>  # Tests should be added that cover more of the edge cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3036) Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL

2018-10-30 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669256#comment-16669256
 ] 

Bolke de Bruin commented on AIRFLOW-3036:
-

Google is testing an alternative. Meanwhile you could try to replace

`cur = conn.execute({color:#a5c261}"SELECT 
@@explicit_defaults_for_timestamp"{color})`

in airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py

by

`cur = conn.execute({color:#a5c261}"SET 
{{explicit_defaults_for_timestamp=1}}"{color})`

 

 

> Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL
> 
>
> Key: AIRFLOW-3036
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3036
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, db
>Affects Versions: 1.10.0
> Environment: Google Cloud Platform, Google Kubernetes Engine, Airflow 
> 1.10 on Debian Stretch, Google Cloud SQL MySQL
>Reporter: Smith Mathieu
>Priority: Blocker
>  Labels: 1.10, google, google-cloud-sql
>
> The upgrade path to airflow 1.10 seems impossible for users of MySQL in 
> Google's Cloud SQL service given new mysql requirements for 1.10.
>  
> When executing "airflow upgradedb"
> ```
>  INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 
> 0e2a74e0fc9f, Add time zone awareness
>  Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 32, in 
>  args.func(args)
>  File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 1002, 
> in initdb
>  db_utils.initdb(settings.RBAC)
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 92, 
> in initdb
>  upgradedb()
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 346, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python3.6/site-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python3.6/site-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/compat.py", line 
> 68, in load_module_py
>  module_id, path).load_module(module_id)
>  File "", line 399, in 
> _check_name_wrapper
>  File "", line 823, in load_module
>  File "", line 682, in load_module
>  File "", line 265, in _load_module_shim
>  File "", line 684, in _load
>  File "", line 665, in _load_unlocked
>  File "", line 678, in exec_module
>  File "", line 219, in _call_with_frames_removed
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 91, in 
>  run_migrations_online()
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 86, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python3.6/site-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python3.6/site-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py",
>  line 46, in upgrade
>  raise Exception("Global variable explicit_defaults_for_timestamp needs to be 
> on (1) for mysql")
>  Exception: Global variable explicit_defaults_for_timestamp needs to be on 
> (1) for mysql
>  ```
>   
> Reading documentation for upgrading to airflow 1.10, it seems the requirement 
> for explicit_defaults_for_timestamp=1 was intentional. 
>  
> However,  MySQL on Google Cloud SQL does not support configuring this 
> variable and it is off by default. Users of MySQL and Cloud SQL do not have 
> an upgrade path to 1.10. Alas, so close to the mythical Kubernetes Executor.
> In GCP, Cloud SQL is _the_ hosted MySQL solution. 
> [https://cloud.google.com/sql/docs/mysql/flags]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3165) Document use of interpolation by ConfigParser

2018-10-05 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-3165:
---

 Summary: Document use of interpolation by ConfigParser
 Key: AIRFLOW-3165
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3165
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin


The config parser interpolates '%' in variables. This can lead to issues when 
specifiying passwords. As we cant disable inerpolation on a per variable we 
need to document that people should not use a % sign in their passwords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3164) verify certificate of LDAP server

2018-10-05 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-3164:
---

 Summary: verify certificate of LDAP server
 Key: AIRFLOW-3164
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Bolke de Bruin


Currently we dont verify the certificate of the Ldap server this can lead to 
security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1260) FLOWER XSS Vulnerability

2018-09-03 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-1260.
---
Resolution: Invalid

Please report this with Celery, flower is not an Airflow component.

> FLOWER XSS Vulnerability
> 
>
> Key: AIRFLOW-1260
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1260
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: Airflow 1.7.1.3
>Reporter: Camille TOLSA
>Priority: Critical
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The affected functions are WorkerQueueAddConsumer() and 
> WorkerQueueCancelConsumer() from the fichier flower/static/js/flower.js file. 
> The use of the .html() function instead of .text() allows script execution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2984) Cannot convert naive_datetime when task has a naive start_date/end_date

2018-08-30 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2984:
---

 Summary: Cannot convert naive_datetime when task has a naive 
start_date/end_date
 Key: AIRFLOW-2984
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2984
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Bolke de Bruin
 Fix For: 1.10.1


Task can have a start_date / end_date separately from the DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-12 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2870.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2888) Do not use Shell=True and bash to launch tasks

2018-08-11 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2888:
---

 Summary: Do not use Shell=True and bash to launch tasks
 Key: AIRFLOW-2888
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2888
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Bolke de Bruin


Using shell=True is a security risk and there is no need to use bash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572906#comment-16572906
 ] 

Bolke de Bruin edited comment on AIRFLOW-2870 at 8/8/18 9:05 AM:
-

or use with_entities, trying that. It's a very annoying migration. DagBags can 
be huge


was (Author: bolke):
or use with_entities, trying that

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572906#comment-16572906
 ] 

Bolke de Bruin commented on AIRFLOW-2870:
-

or use with_entities, trying that

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

2018-08-08 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572876#comment-16572876
 ] 

Bolke de Bruin commented on AIRFLOW-2870:
-

Gotcha. The weakness of using orm in alembic. Column loading might be an option 
as we do not need the full model. Or instead of using the database as a 
reference use the dagbag as a reference and update by using direct sql.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> 
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
> context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1617) XSS Vulnerability in Variable endpoint

2018-08-07 Thread Bolke de Bruin (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571789#comment-16571789
 ] 

Bolke de Bruin commented on AIRFLOW-1617:
-

Yes it does.

> XSS Vulnerability in Variable endpoint
> --
>
> Key: AIRFLOW-1617
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1617
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.8.2
>Reporter: Bolke de Bruin
>Priority: Critical
>  Labels: security
> Fix For: 1.9.0
>
>
> Variable view has an XSS vulnerability when the Variable template does not 
> exist. The input is returned to the user as is, without escaping.
> Original report by Seth Long. CVE is pending



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2859) DateTimes returned from the database are not converted to UTC

2018-08-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2859:
---

 Summary: DateTimes returned from the database are not converted to 
UTC
 Key: AIRFLOW-2859
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2859
 Project: Apache Airflow
  Issue Type: Bug
  Components: database
Reporter: Bolke de Bruin
 Fix For: 1.10.0


This is due to the fact that sqlalchemy-utcdatetime does not convert to UTC 
when the database returns datetimes with tzinfo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2807) Add support for External ID when using STS Assume Role

2018-07-28 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2807.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3647
[https://github.com/apache/incubator-airflow/pull/3647]

> Add support for External ID when using STS Assume Role
> --
>
> Key: AIRFLOW-2807
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2807
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, boto3, hooks
>Affects Versions: 1.10.1
>Reporter: Vojtech Vondra
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently the role assumption method works only if the granting account does 
> not specify an External ID. The external ID is used to solved the confused 
> deputy problem. When using the AWS hook to export data to multiple customers, 
> it's good security practice to use the external ID.
>  Documentation: 
> https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2817) Force explicit choice on GPL dependency

2018-07-28 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2817:
---

 Summary: Force explicit choice on GPL dependency
 Key: AIRFLOW-2817
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2817
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin


A more explicit choice on GPL dependency was required by the IPMC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2816) Remove github copyright clause as it shouldnt be there

2018-07-28 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2816:
---

 Summary: Remove github copyright clause as it shouldnt be there
 Key: AIRFLOW-2816
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2816
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin


probably copy paste error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2815) Notice cannot contain "onwards" and needs to be specific

2018-07-28 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2815:
---

 Summary: Notice cannot contain "onwards" and needs to be specific
 Key: AIRFLOW-2815
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2815
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2812) Fix error in Updating.md for upgrading to 1.10

2018-07-28 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2812.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

> Fix error in Updating.md for upgrading to 1.10
> --
>
> Key: AIRFLOW-2812
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2812
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Nick Hughes
>Assignee: Nick Hughes
>Priority: Minor
> Fix For: 1.10.0
>
>
> Error in Updating.md directions under Logging Configuration having too many 
> brackets which results in errors:
>  * {{log_filename_template}}
>  * {{log_processor_filename_template}}{{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2766) Replicated *Base date* in each tab on DAG detail view

2018-07-26 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2766.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3615
[https://github.com/apache/incubator-airflow/pull/3615]

> Replicated *Base date* in each tab on DAG detail view
> -
>
> Key: AIRFLOW-2766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2766
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: Screen Shot 2018-07-18 at 1.57.22 PM.png
>
>
> At the moment, most of the tabs on DAG detail page has the *Base date* 
> feature which behaves only for that particular tab/view. From end-user's 
> perspective it is confusing as a user assume to select a date and then see 
> all views based on that date
> The idea is to move the *Base date* outside the tabs, and each tab will then 
> behave on the *Base date* on the Global level. Users will then only need to 
> select the base date once and every tab will behave on that date. 
> Please see attached screenshot for reference
> Or, the second approach could be to make sure the change of Base Date in each 
> tab remains the same and doesn't change on page refresh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2776) Tree view JSON is unnecessarily large

2018-07-26 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2776.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3620
[https://github.com/apache/incubator-airflow/pull/3620]

> Tree view JSON is unnecessarily large
> -
>
> Key: AIRFLOW-2776
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2776
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Abdul Nimeri
>Assignee: Abdul Nimeri
>Priority: Minor
> Fix For: 2.0.0
>
>
> The tree view generates JSON that can be massive for bigger DAGs –– some of 
> our tree views at stripe have 10s of MBs of JSON.
> The [generated JSON is 
> prettified|https://github.com/apache/incubator-airflow/blob/52c745da71a6da798f7322956967b5e818b56e48/airflow/www/views.py#L1480],
>  which both takes up more CPU time during serialization, as well as slowing 
> everything else that uses it. We patched this on stripe's fork by removing 
> all whitespace and had a ton of speedup for bigger tree views. Considering 
> the JSON is only meant to be used programmatically, this is probably an easy 
> win.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2777) dag.sub_dag(...) speedups

2018-07-26 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2777.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3621
[https://github.com/apache/incubator-airflow/pull/3621]

> dag.sub_dag(...) speedups
> -
>
> Key: AIRFLOW-2777
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2777
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Abdul Nimeri
>Assignee: Abdul Nimeri
>Priority: Minor
> Fix For: 2.0.0
>
>
> dag.sub_dag(...) currently works by first deep copying the entire dag, and 
> then filtering down to the appropriate tasks. that can be slow since deep 
> copying a big dag takes a while. specifically, copying over all the tasks is 
> the bottleneck
> this can be a lot faster by instead only copying over the filtered down tasks
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2765) Set default mime_charset in email_operator.EmailOperator to UTF-8, to match email utils

2018-07-26 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2765.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3627
[https://github.com/apache/incubator-airflow/pull/3627]

> Set default mime_charset in email_operator.EmailOperator to UTF-8, to match 
> email utils
> ---
>
> Key: AIRFLOW-2765
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2765
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeffrey Payne
>Assignee: Jeffrey Payne
>Priority: Major
> Fix For: 2.0.0
>
>
> Assuming we would want both {{EmailOperator()}} and 
> {{utils.email.send_email()}} to use the same default value for the 
> {{mime_charset}} parameter.  The default for {{utils.email.send_email()}} was 
> recently changed to {{UTF-8}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2783) Integrate `eslint` with a decent set of rules

2018-07-26 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2783.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Integrate `eslint` with a decent set of rules
> -
>
> Key: AIRFLOW-2783
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2783
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Priority: Major
> Fix For: 2.0.0
>
>
> To make sure that the javascript code is standardized and well written, we 
> should integrate eslint with a decent set of rules, so that everyone follows 
> those rules while writing the JS. 
> https://eslint.org/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2691) Make Airflow's JS code (and dependencies) manageable via npm and webpack

2018-07-22 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2691.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3572
[https://github.com/apache/incubator-airflow/pull/3572]

> Make Airflow's JS code (and dependencies) manageable via npm and webpack
> 
>
> Key: AIRFLOW-2691
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2691
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
> Fix For: 2.0.0
>
>
> Airflow's JS code is hard to maintain and upgrade. The dependencies are 
> locally existing files making it hard to upgrade versions. 
> Make sure Airflow uses *npm* and *webpack* for the dependencies management. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2779) Verify and correct licenses

2018-07-21 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2779:
---

 Summary: Verify and correct licenses
 Key: AIRFLOW-2779
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2779
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin
 Fix For: 1.10.1, 2.0.0


# {color:#00}/airflow/security/utils.py{color}
{color:#00}2. ./airflow/security/kerberos.py{color}
{color:#00}3. ./airflow/www_rbac/static/jqClock.min.js{color}
{color:#00}4. ./airflow/www/static/bootstrap3-typeahead.min.js{color}
{color:#00}5. 
./apache-airflow-1.10.0rc2+incubating/scripts/ci/flake8_diff.sh{color}
{color:#00}6. {color}[https://www.apache.org/legal/resolved.html#optional]
{color:#00}7. ./docs/license.rst{color}
{color:#00}8. airflow/contrib/auth/backends/google_auth.py{color}
{color:#00}9. 
/airflow/contrib/auth/backends/github_enterprise_auth.py{color}
{color:#00}10. /airflow/contrib/hooks/ssh_hook.py{color}
{color:#00}11. /airflow/minihivecluster.py{color}

{color:#00}This files [1][2] seem to be 3rd party ALv2 licensed files that 
refers to a NOTICE file, that information in that NOTICE file (at the very 
least the copyright into) should be in your NOTICE file. This should also be 
noted in LICENSE.{color}

 

{color:#00}LICENSE is:
- missing jQuery clock [3] and typeahead [4], as they are ALv2 it’s not 
required to list them but it’s a good idea to do so.
- missing the license for this [5]
- this file [7] oddly has © 2016 GitHub, [Inc.at|http://inc.at/] the bottom of 
it{color}

 
 * {color:#00}Year in NOTICE is not correct "2016 and onwards” isn’t valid 
as copyright has an expiry date{color}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2737) Restore original license header

2018-07-21 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2737.
-
   Resolution: Fixed
Fix Version/s: (was: 2.0.0)
   1.10.0

Issue resolved by pull request #3591
[https://github.com/apache/incubator-airflow/pull/3591]

> Restore original license header
> ---
>
> Key: AIRFLOW-2737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2737
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.0
>
>
> The original license header in airflow/api/auth/backend/kerberos_auth.py was 
> replaced with the AL. It should be restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2729) .airflowignore is not being respected

2018-07-13 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2729.
---
Resolution: Fixed

> .airflowignore is not being respected
> -
>
> Key: AIRFLOW-2729
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2729
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: James Meickle
>Assignee: Ash Berlin-Taylor
>Priority: Minor
> Fix For: 1.10.0
>
>
> I have a repo that in 1.10 is giving airflowignore errors that did not exist 
> in 1.9. I have a DAG repo with the following .airflowignore:
> {{airflow@XXX:~$ ls -la /home/airflow/airflow/dags/airflow-tasks/}}
> {{total 172}}
> {{drwxr-xr-x 6 airflow airflow 4096 Jul 9 18:48 .}}
> {{drwxrwxr-x 3 airflow airflow 4096 Jul 9 18:48 ..}}
> {{-rw-r--r-- 1 airflow airflow 13 Jul 9 16:20 .airflowignore}}
> {{airflow@airflow-core-i-063df3268720e58fd:~$ cat 
> /home/airflow/airflow/dags/airflow-tasks/.airflowignore}}
> {{submodules/*}}
> However, the submoduled repository is being scanned for DAGs anyways, 
> including the test suite. Note the paths in the section below:
>  
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,814] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/quantflow/operators/zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,817] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,818] 
> \{{models.py:365}} ERROR - Failed to import: 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: Traceback (most recent call last):}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/site-packages/airflow/models.py",
>  line 362, in process_file}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: m = imp.load_source(mod_name, 
> filepath)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/imp.py", line 172, in 
> load_source}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: module = _load(spec)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 693, in _load}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 673, in _load_unlocked}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap_external>", line 665, in exec_module}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 222, in _call_with_frames_removed}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py",
>  line 6, in }}
> {{Jul 09 18:52:01 airflow_web-stdout.log: from moto import mock_sqs}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: ImportError: No module named 
> 'moto'}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,821] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,822] 
> \{{models.py:365}} ERROR - Failed to import: 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: Traceback (most recent call last):}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/site-packages/airflow/models.py",
>  line 362, in process_file}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: m = imp.load_source(mod_name, 
> filepath)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/imp.py", line 172, in 
> load_source}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: module = _load(spec)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 693, in _load}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 673, in _load_unlocked}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap_external>", line 665, in exec_module}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 222, in _call_with_frames_removed}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py",
>  line 6, in }}
> {{Jul 09 18:52:01 airflow_web-stdout.log: from freezegun import freeze_time}}
> {{Jul 09 18:52:01 

[jira] [Updated] (AIRFLOW-2729) .airflowignore is not being respected

2018-07-13 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2729:

Fix Version/s: 1.10.0

> .airflowignore is not being respected
> -
>
> Key: AIRFLOW-2729
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2729
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: James Meickle
>Assignee: Ash Berlin-Taylor
>Priority: Minor
> Fix For: 1.10.0
>
>
> I have a repo that in 1.10 is giving airflowignore errors that did not exist 
> in 1.9. I have a DAG repo with the following .airflowignore:
> {{airflow@XXX:~$ ls -la /home/airflow/airflow/dags/airflow-tasks/}}
> {{total 172}}
> {{drwxr-xr-x 6 airflow airflow 4096 Jul 9 18:48 .}}
> {{drwxrwxr-x 3 airflow airflow 4096 Jul 9 18:48 ..}}
> {{-rw-r--r-- 1 airflow airflow 13 Jul 9 16:20 .airflowignore}}
> {{airflow@airflow-core-i-063df3268720e58fd:~$ cat 
> /home/airflow/airflow/dags/airflow-tasks/.airflowignore}}
> {{submodules/*}}
> However, the submoduled repository is being scanned for DAGs anyways, 
> including the test suite. Note the paths in the section below:
>  
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,814] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/quantflow/operators/zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,817] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,818] 
> \{{models.py:365}} ERROR - Failed to import: 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: Traceback (most recent call last):}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/site-packages/airflow/models.py",
>  line 362, in process_file}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: m = imp.load_source(mod_name, 
> filepath)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/imp.py", line 172, in 
> load_source}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: module = _load(spec)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 693, in _load}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 673, in _load_unlocked}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap_external>", line 665, in exec_module}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 222, in _call_with_frames_removed}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_sqs_operators.py",
>  line 6, in }}
> {{Jul 09 18:52:01 airflow_web-stdout.log: from moto import mock_sqs}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: ImportError: No module named 
> 'moto'}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,821] 
> \{{models.py:351}} DEBUG - Importing 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: [2018-07-09 18:52:01,822] 
> \{{models.py:365}} ERROR - Failed to import: 
> /home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: Traceback (most recent call last):}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/site-packages/airflow/models.py",
>  line 362, in process_file}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: m = imp.load_source(mod_name, 
> filepath)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/virtualenvs/airflow/lib/python3.5/imp.py", line 172, in 
> load_source}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: module = _load(spec)}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 693, in _load}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 673, in _load_unlocked}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap_external>", line 665, in exec_module}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File " importlib._bootstrap>", line 222, in _call_with_frames_removed}}
> {{Jul 09 18:52:01 airflow_web-stdout.log: File 
> "/home/airflow/airflow/dags/airflow-tasks/submodules/quantflow/tests/operators/test_zipline_operators.py",
>  line 6, in }}
> {{Jul 09 18:52:01 airflow_web-stdout.log: from freezegun import freeze_time}}
> {{Jul 09 18:52:01 

[jira] [Updated] (AIRFLOW-1729) Ignore whole directories in .airflowignore

2018-07-13 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1729:

Fix Version/s: (was: 2.0.0)
   1.10.0

> Ignore whole directories in .airflowignore
> --
>
> Key: AIRFLOW-1729
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 2.0
>Reporter: Cedric Hourcade
>Assignee: Ash Berlin-Taylor
>Priority: Minor
> Fix For: 1.10.0
>
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if 
> we blacklist fulldirectory the {{os.walk}} will still go through them no 
> matter how deep they are and skip files one by one, which can be an issue 
> when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) 
> for p in patterns])]
> {code}
> to prune the directories here: 
> https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209
>  and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1729) Ignore whole directories in .airflowignore

2018-07-13 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-1729.
---
Resolution: Fixed

> Ignore whole directories in .airflowignore
> --
>
> Key: AIRFLOW-1729
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: Airflow 2.0
>Reporter: Cedric Hourcade
>Assignee: Ash Berlin-Taylor
>Priority: Minor
> Fix For: 1.10.0
>
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if 
> we blacklist fulldirectory the {{os.walk}} will still go through them no 
> matter how deep they are and skip files one by one, which can be an issue 
> when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) 
> for p in patterns])]
> {code}
> to prune the directories here: 
> https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209
>  and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2739) Airflow crashes on startup if LC_ALL env isnt set to utf-8

2018-07-13 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2739.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3593
[https://github.com/apache/incubator-airflow/pull/3593]

> Airflow crashes on startup if LC_ALL env isnt set to utf-8
> --
>
> Key: AIRFLOW-2739
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2739
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.0
> Environment: Python 3.6.0, Ubuntu 14.04.5 LTS 
>Reporter: Carl Johan Gustavsson
>Assignee: Carl Johan Gustavsson
>Priority: Major
> Fix For: 1.10.0
>
>
> When running Airflow 1.10.0 RC1 without LC_ALL environment variable set 
> Airflow crashes on start with the following trace
>  
> {code:java}
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01 Traceback (most 
> recent call last):
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01   File 
> "/opt/virtualenv/tictail/pipeline/bin/airflow", line 21, in 
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01     from airflow 
> import configuration
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01   File 
> "/opt/virtualenv/tictail/pipeline/lib/python3.6/site-packages/airflow/__init__.py",
>  line 35, in 
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01     from airflow 
> import configuration as conf
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01   File 
> "/opt/virtualenv/tictail/pipeline/lib/python3.6/site-packages/airflow/configuration.py",
>  line 106, in 
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01     DEFAULT_CONFIG 
> = f.read()
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01   File 
> "/opt/virtualenv/tictail/pipeline/lib/python3.6/encodings/ascii.py", line 26, 
> in decode
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01     return 
> codecs.ascii_decode(input, self.errors)[0]
> Jul 10 08:50:33 hostname supervisord: airflow-webserver-01 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20770: 
> ordinal not in range(128)
> {code}
> This is because the `config_templates/default_airflow.cfg` contains a 
> non-ascii character and if LC_ALL isnt set to 
> `{color:#00}en_US.UTF-8{color}` or similar Python will assume the file is 
> in ascii.
>  
> Solution would be to always open the config files as utf-8 regardless of the 
> LC_ALL environment variable.
>  
> This worked up until 
> [https://github.com/apache/incubator-airflow/commit/16bae5634df24132b37eb752fe816f51bf7e83ca]
>  it seems.
>  
> Python versions affected, 3.4.0, 3.5.5, 3.6.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2730) Airflow 1.9.0+ Web UI Fails to Load in IE11

2018-07-10 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2730:

Affects Version/s: 1.9.0

> Airflow 1.9.0+ Web UI Fails to Load in IE11
> ---
>
> Key: AIRFLOW-2730
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2730
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.9.0
>Reporter: Cameron Yick
>Priority: Minor
>
> As a developer, I would like to use Airflow in enterprise environments where 
> IE11 is the only browser available.
> Presently, the admin view doesn't load because some of the inlined javascript 
> on the page uses ES6 features, like the array spread operator. 
> Fixing this change will become a lot easier after AIRFLOW-2691 goes through, 
> because transpilation could happen as part of the build process.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2730) Airflow 1.9.0+ Web UI Fails to Load in IE11

2018-07-10 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2730:

Priority: Critical  (was: Blocker)

> Airflow 1.9.0+ Web UI Fails to Load in IE11
> ---
>
> Key: AIRFLOW-2730
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2730
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Reporter: Cameron Yick
>Priority: Critical
>
> As a developer, I would like to use Airflow in enterprise environments where 
> IE11 is the only browser available.
> Presently, the admin view doesn't load because some of the inlined javascript 
> on the page uses ES6 features, like the array spread operator. 
> Fixing this change will become a lot easier after AIRFLOW-2691 goes through, 
> because transpilation could happen as part of the build process.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2730) Airflow 1.9.0+ Web UI Fails to Load in IE11

2018-07-10 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2730:

Priority: Minor  (was: Critical)

> Airflow 1.9.0+ Web UI Fails to Load in IE11
> ---
>
> Key: AIRFLOW-2730
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2730
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Reporter: Cameron Yick
>Priority: Minor
>
> As a developer, I would like to use Airflow in enterprise environments where 
> IE11 is the only browser available.
> Presently, the admin view doesn't load because some of the inlined javascript 
> on the page uses ES6 features, like the array spread operator. 
> Fixing this change will become a lot easier after AIRFLOW-2691 goes through, 
> because transpilation could happen as part of the build process.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2710) Configuration documentation is misleading to the uninitiated

2018-07-08 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2710.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3574
[https://github.com/apache/incubator-airflow/pull/3574]

> Configuration documentation is misleading to the uninitiated
> 
>
> Key: AIRFLOW-2710
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2710
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Matthew Thorley
>Priority: Major
> Fix For: 1.10.0
>
>
> The documentation on this page on the configuration page 
> [https://airflow.apache.org/configuration.html] is slightly misleading 
> It reads
> 2. Generate fernet_key, using this code snippet below. fernet_key must be a 
> base64-encoded 32-byte key.
> from cryptography.fernet import Fernet fernet_key= Fernet.generate_key() 
> print(fernet_key) # your fernet_key, keep it in secured place!
>  3. Replace {{airflow.cfg}} fernet_key value with the one from step 2.
> The value returned in step one is something like  
> _b'K_8Yv52REP1qsa7OPupKYJe_CzngMI_KqwfM-2qAyVs='_
> which lead me to believe the config was suppose to be
> fernet_key = b'K_8Yv52REP1qsa7OPupKYJe_CzngMI_KqwfM-2qAyVs='
> When in fact it should be 
> fernet_key = K_8Yv52REP1qsa7OPupKYJe_CzngMI_KqwfM-2qAyVs=
> I assumed the config parse needed to know it was a byte string and would 
> handle the value correctly. After wasting 30mins I was able to figure out the 
> solution, and probably could have arrived at it sooner had I been more 
> familiar with python.
> But I recommend changing the docs as below to avoid confusion for other new 
> users.
>  
>  _from cryptography.fernet import Fernet fernet_key= Fernet.generate_key() 
> print(fernet_key.decode()) # your fernet_key, keep it in secured place!_
>   
>  
> I'll submit a pr for this shortly.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2499) Dockerised CI pipeline

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2499:

Fix Version/s: 2.0.0

> Dockerised CI pipeline
> --
>
> Key: AIRFLOW-2499
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2499
> Project: Apache Airflow
>  Issue Type: Test
>  Components: ci, tests
>Affects Versions: 1.10.0, 1.10
>Reporter: Gerardo Curiel
>Assignee: Gerardo Curiel
>Priority: Major
>  Labels: ci, docker, travis-ci
> Fix For: 2.0.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> PR: https://github.com/apache/incubator-airflow/pull/3393
> Currently, running unit tests is a difficult process. Airflow tests depend on 
> many external services and other custom setup, which makes it hard for 
> contributors to work on this codebase. CI builds have also been 
> unreliable[0][1][2][3], and it is hard to reproduce the causes. Having 
> contributors trying to emulate the build environment every time makes it 
> easier to get to an "it works on my machine" sort of situation.
> This PR implements a dockerised version of the current build pipeline. This 
> setup has a few advantages:
>  * TravisCI tests are reproducible locally
>  * The same build setup can be used to create a local development environment 
> (there's a request for it [4])
>  
> *Implementation details*
>  * I'm using Docker Compose for the container orchestration and configuration.
>  * MySQL, PostgreSQL, OpenLDAP, krb5 and rabbitmq are now services running 
> inside their own containers
>  * I created a separate repo, called incubator-airflow-ci[5] (TravisCI build 
> here[6]), where a base image with all dependencies is built. In this case, 
> I'm following the same pattern the CouchDB[7] project follows
>  * Hadoop, Hive and MiniCluster were moved to this base image
>  * The current TravisCI pipeline lives here[8]. A few tests are still 
> failing. It's still WIP.
>  
> *References*
> [0] https://issues.apache.org/jira/browse/AIRFLOW-671
>  [1] https://issues.apache.org/jira/browse/AIRFLOW-968
>  [2] https://issues.apache.org/jira/browse/AIRFLOW-2157
>  [3] https://issues.apache.org/jira/browse/AIRFLOW-2272
>  [4] https://issues.apache.org/jira/browse/AIRFLOW-1042 
>  [5] [https://github.com/gerardo/incubator-airflow-ci]
>  [6] [https://travis-ci.org/gerardo/incubator-airflow-ci]
>  [7] [https://travis-ci.org/apache/couchdb-ci]
>  [8] [https://travis-ci.org/gerardo/incubator-airflow]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2499) Dockerised CI pipeline

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2499:

Fix Version/s: (was: 1.10)
   (was: 1.10.0)

> Dockerised CI pipeline
> --
>
> Key: AIRFLOW-2499
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2499
> Project: Apache Airflow
>  Issue Type: Test
>  Components: ci, tests
>Affects Versions: 1.10.0, 1.10
>Reporter: Gerardo Curiel
>Assignee: Gerardo Curiel
>Priority: Major
>  Labels: ci, docker, travis-ci
> Fix For: 2.0.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> PR: https://github.com/apache/incubator-airflow/pull/3393
> Currently, running unit tests is a difficult process. Airflow tests depend on 
> many external services and other custom setup, which makes it hard for 
> contributors to work on this codebase. CI builds have also been 
> unreliable[0][1][2][3], and it is hard to reproduce the causes. Having 
> contributors trying to emulate the build environment every time makes it 
> easier to get to an "it works on my machine" sort of situation.
> This PR implements a dockerised version of the current build pipeline. This 
> setup has a few advantages:
>  * TravisCI tests are reproducible locally
>  * The same build setup can be used to create a local development environment 
> (there's a request for it [4])
>  
> *Implementation details*
>  * I'm using Docker Compose for the container orchestration and configuration.
>  * MySQL, PostgreSQL, OpenLDAP, krb5 and rabbitmq are now services running 
> inside their own containers
>  * I created a separate repo, called incubator-airflow-ci[5] (TravisCI build 
> here[6]), where a base image with all dependencies is built. In this case, 
> I'm following the same pattern the CouchDB[7] project follows
>  * Hadoop, Hive and MiniCluster were moved to this base image
>  * The current TravisCI pipeline lives here[8]. A few tests are still 
> failing. It's still WIP.
>  
> *References*
> [0] https://issues.apache.org/jira/browse/AIRFLOW-671
>  [1] https://issues.apache.org/jira/browse/AIRFLOW-968
>  [2] https://issues.apache.org/jira/browse/AIRFLOW-2157
>  [3] https://issues.apache.org/jira/browse/AIRFLOW-2272
>  [4] https://issues.apache.org/jira/browse/AIRFLOW-1042 
>  [5] [https://github.com/gerardo/incubator-airflow-ci]
>  [6] [https://travis-ci.org/gerardo/incubator-airflow-ci]
>  [7] [https://travis-ci.org/apache/couchdb-ci]
>  [8] [https://travis-ci.org/gerardo/incubator-airflow]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2606:

Fix Version/s: (was: 1.10.0)
   2.0.0

> Test needed to ensure database schema always match SQLAlchemy model types
> -
>
> Key: AIRFLOW-2606
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
>
> An issue was discovered by [this 
> PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
>  where database schema does not match its corresponding SQLAlchemy model 
> declaration. We should add generic unit test for this to prevent similar bugs 
> from occurring in the future. (Alternatively, we can add the policing logic 
> to `airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2615:

Affects Version/s: 1.10.0

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 2.0.0
>
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all in the parent process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2615:

Fix Version/s: (was: 1.10.0)
   2.0.0

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 2.0.0
>
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all in the parent process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2650) SchedulerJob is never marked as succeded

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2650.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3525
[https://github.com/apache/incubator-airflow/pull/3525]

> SchedulerJob is never marked as succeded
> 
>
> Key: AIRFLOW-2650
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2650
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
>Priority: Major
> Fix For: 1.10.0
>
>
> The SchedulerJob rows are only ever marked success if {{SchedulerJob.run}}
> finishes cleanly -- this will only happy when max_runs or run_duration is 
> specified. Given this is not recommended anymore this means that 
> SchedulerJobs are never marked as anything other than running for most 
> people. (I have over 400 "running" jobs in my system)
> Noticed by [~jackjack10] in Gitter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2650) SchedulerJob is never marked as succeded

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2650:

Fix Version/s: (was: 1.10.0)

> SchedulerJob is never marked as succeded
> 
>
> Key: AIRFLOW-2650
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2650
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Ash Berlin-Taylor
>Priority: Major
>
> The SchedulerJob rows are only ever marked success if {{SchedulerJob.run}}
> finishes cleanly -- this will only happy when max_runs or run_duration is 
> specified. Given this is not recommended anymore this means that 
> SchedulerJobs are never marked as anything other than running for most 
> people. (I have over 400 "running" jobs in my system)
> Noticed by [~jackjack10] in Gitter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2679) GoogleCloudStorageToBigQueryOperator to support UPSERT

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2679:

Fix Version/s: (was: 1.10)
   (was: 1.10.0)

> GoogleCloudStorageToBigQueryOperator to support UPSERT
> --
>
> Key: AIRFLOW-2679
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2679
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: jack
>Priority: Major
>
> Currently the 
> {color:#22}GoogleCloudStorageToBigQueryOp{color}{color:#22}erator 
> support incremental load using 
> *{color:#404040}max_id_key{color}*{color:#404040} {color}.{color}
>  
> {color:#22}However many systems actually needs "UPSERT" in terms of - if 
> row exists update it, if not insert/copy it.{color}
> {color:#22}Currently the operator assumes that we only need to insert new 
> data, it can't handle update of data. Most of the time data is not static it 
> changes with time. Yesterday order status was NEW today it's Processing, 
> tomorrow it's SENT in a month it will be REFUNDED etc... {color}
>  
> {color:#22} {color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2635) Add discription column to UI

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2635:

Priority: Minor  (was: Critical)

> Add discription column to UI
> 
>
> Key: AIRFLOW-2635
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2635
> Project: Apache Airflow
>  Issue Type: Wish
>Affects Versions: 1.9.0, 1.10
>Reporter: jack
>Priority: Minor
>  Labels: newbie
> Fix For: 2.0.0
>
>
> Currently in the UI we see the dag_id of the DAG.
> When many users use the UI it's not always clear what the DAG does.
> It will be extremely helpful if in the UI we can see a short description of 
> the DAG.
>  
> Example:
> {color:#33}DAG{color}   
> {color:#33}Description  {color}{color:#33}Schedule{color}
> {color:#33}CostReport    Update Athena with costs   [0 4 * * 
> *|http://172.26.9.161:8080/admin/dagrun/?flt2_dag_id_equals=OrenReports]{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2635) Add discription column to UI

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-2635:

Fix Version/s: (was: 1.10)
   (was: 1.10.0)
   (was: Airflow 2.0)

> Add discription column to UI
> 
>
> Key: AIRFLOW-2635
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2635
> Project: Apache Airflow
>  Issue Type: Wish
>Affects Versions: 1.9.0, 1.10
>Reporter: jack
>Priority: Minor
>  Labels: newbie
> Fix For: 2.0.0
>
>
> Currently in the UI we see the dag_id of the DAG.
> When many users use the UI it's not always clear what the DAG does.
> It will be extremely helpful if in the UI we can see a short description of 
> the DAG.
>  
> Example:
> {color:#33}DAG{color}   
> {color:#33}Description  {color}{color:#33}Schedule{color}
> {color:#33}CostReport    Update Athena with costs   [0 4 * * 
> *|http://172.26.9.161:8080/admin/dagrun/?flt2_dag_id_equals=OrenReports]{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2668) Missing cryptogrpahy dependency on airflow initdb call

2018-06-27 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2668.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3550
[https://github.com/apache/incubator-airflow/pull/3550]

> Missing cryptogrpahy dependency on airflow initdb call
> --
>
> Key: AIRFLOW-2668
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2668
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 1.10.0
>Reporter: Nicholas Pezolano
>Priority: Minor
> Fix For: 1.10.0
>
>
> The cryptography packages looks to be required now for `airflow initdb` calls 
> on a fresh install from master as of commit 
> 702a57ec5a96d159105c4f5ca76ddd2229eb2f44.
> $ airflow initdb
>  Traceback (most recent call last):
>  File "/home/n/git/airflow_testing/env/bin/airflow", line 6, in 
>  exec(compile(open(__file__).read(), __file__, 'exec'))
>  File "/home/n/git/incubator-airflow/airflow/bin/airflow", line 21, in 
> 
>  from airflow import configuration
>  File "/home/n/git/incubator-airflow/airflow/__init__.py", line 37, in 
> 
>  from airflow.models import DAG
>  File "/home/n/git/incubator-airflow/airflow/models.py", line 31, in 
>  import cryptography
>  ImportError: No module named cryptography
>  
> Steps to reproduce:
> {code}
> git clone https://github.com/apache/incubator-airflow
> virtualenv env
> . env/bin/activate
> pip install -e .[s3]
> airflow initdb
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2556) Reduce time spent on unit tests

2018-06-03 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2556:
---

 Summary: Reduce time spent on unit tests
 Key: AIRFLOW-2556
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2556
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin


Unit tests are taking up way too much time. This costs time and actually also 
Money from the Apache Foundation. We need to reduce this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2557) Reduce time spent in S3 tests

2018-06-03 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2557:
---

 Summary: Reduce time spent in S3 tests
 Key: AIRFLOW-2557
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2557
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2554) Inlets and outlets should be availabe in templates by their fully_qualified name or name

2018-06-02 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2554:
---

 Summary: Inlets and outlets should be availabe in templates by 
their fully_qualified name or name
 Key: AIRFLOW-2554
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2554
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2425) Maintain lineage and make available in other systems

2018-05-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2425.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3321
[https://github.com/apache/incubator-airflow/pull/3321]

> Maintain lineage and make available in other systems
> 
>
> Key: AIRFLOW-2425
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2425
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2457) Upgrade FAB version in setup.py to support timezone

2018-05-12 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2457.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3349
[https://github.com/apache/incubator-airflow/pull/3349]

> Upgrade FAB version in setup.py to support timezone
> ---
>
> Key: AIRFLOW-2457
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2457
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
> Fix For: 1.10.0
>
>
> FAB 1.9.6 doesn't support datetime with timezones, upgrade to 1.10.0 will fix 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2441) Fix bugs in HiveCliHook.load_df

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2441.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3334
[https://github.com/apache/incubator-airflow/pull/3334]

> Fix bugs in HiveCliHook.load_df
> ---
>
> Key: AIRFLOW-2441
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2441
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, hooks
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> {{HiveCliHook.load_df}} has some bugs and doesn't work for now.
> 1. Executing it fails as follows:
> {code}
> In [1]: import pandas as pd
> In [2]: df = pd.DataFrame({"c": ["foo", "bar", "baz"]})
> In [3]: from airflow.hooks.hive_hooks import HiveCliHook
> In [4]: hook = HiveCliHook()
> [2018-05-08 06:38:19,211] {base_hook.py:85} INFO - Using connection to: 
> localhost
> In [5]: hook.load_df(df, "t")
> (snip)
> TypeError: "delimiter" must be string, not unicode
> {code}
> To solve this, "delimiter" parameter should be encoded by "encoding" 
> parameter. The latter is declared but unused for now.
> 2. For small dataset, it loads an empty file into Hive:
> {code}
> In [1]: import pandas as pd
>...: df = pd.DataFrame({"c": ["foo", "bar", "baz"]})
>...: from airflow.hooks.hive_hooks import HiveCliHook
>...: hook = HiveCliHook()
>...: hook.load_df(df, "t")
>...:
> (snip)
> [2018-05-08 20:46:48,883] {hive_hooks.py:231} INFO - Loading data to table 
> default.t
> [2018-05-08 20:46:49,448] {hive_hooks.py:231} INFO - Table default.t stats: 
> [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
> {code}
> {code}
> hive> SELECT count(*) FROM t;
> (snip)
> OK
> 0
> Time taken: 4.962 seconds, Fetched: 1 row(s)
> {code}
> This is because the file contents is still in buffer when LOAD DATA statement 
> is executed. That should be flushed just like {{HiveCliHook.run_cli}} does.
> 3. Even with fixes for #1 and #2, unexpected data is loaded into Hive:
> {code}
> In [1]: import pandas as pd
>...: df = pd.DataFrame({"c": ["foo", "bar", "baz"]})
>...: from airflow.hooks.hive_hooks import HiveCliHook
>...: hook = HiveCliHook()
>...: hook.load_df(df, "t")
>...:
> (snip)
> [2018-05-08 20:57:17,467] {hive_hooks.py:231} INFO - Loading data to table 
> default.t
> [2018-05-08 20:57:18,163] {hive_hooks.py:231} INFO - Table default.t stats: 
> [numFiles=1, numRows=0, totalSize=21, rawDataSize=0]
> {code}
> {code}
> hive> SELECT * FROM t;
> OK
> 0
> 1
> 2
> Time taken: 2.317 seconds, Fetched: 4 row(s)
> {code}
> This is because {{pandas.DataFrame.to_csv}} outputs data into file with row 
> index by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2358) make kubernetes examples installed optionally

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2358.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3315
[https://github.com/apache/incubator-airflow/pull/3315]

> make kubernetes examples installed optionally
> -
>
> Key: AIRFLOW-2358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: Airflow 2.0, 1.10.0
>Reporter: Ruslan Dautkhanov
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: kubernetes
> Fix For: 1.10.0
>
>
> Is it possible to make kubernetes examples installed optionally?
>  
> We don't use Kubernetes and a bare Airflow install fills logs with following :
>  
> {quote}2018-04-22 19:49:04,718 ERROR - Failed to import: 
> /opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py
> Traceback (most recent call last):
>   File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/models.py", 
> line 300, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py",
>  line 19, in 
>     from airflow.contrib.operators.kubernetes_pod_operator import 
> KubernetesPodOperator
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/operators/kubernetes_pod_operator.py",
>  line 21, in 
>     from airflow.contrib.kubernetes import kube_client, pod_generator, 
> pod_launcher
>   File 
> "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py",
>  line 25, in 
>     from kubernetes import watch
> {color:#f6c342}ImportError: No module named kubernetes{color}{quote}
>  
> Would be great to make examples driven by what modules installed if they have 
> external dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2436) Remove cli_logger in initdb

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2436.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3330
[https://github.com/apache/incubator-airflow/pull/3330]

> Remove cli_logger in initdb
> ---
>
> Key: AIRFLOW-2436
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2436
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jin Hyuk Chang
>Assignee: Jin Hyuk Chang
>Priority: Minor
> Fix For: 1.10.0
>
>
> For initdb operation, database is not there yet, but cli_logger is trying to 
> write audit log into database and currently logs error message (only error 
> message but not fail).
> As initdb is a one time bootstrap operation for Airflow, I think it's fine to 
> remove cli_logger on initdb.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2444) Remove unused option(include_adhoc) in cli backfill command

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2444.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3337
[https://github.com/apache/incubator-airflow/pull/3337]

> Remove unused option(include_adhoc) in cli backfill command
> ---
>
> Key: AIRFLOW-2444
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2444
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
> Fix For: 1.10.0
>
>
> Include_adhoc is removed in this 
> pr([https://github.com/apache/incubator-airflow/pull/1667/files).] This 
> option currently doesn't do anything. Just remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2447) Fix TestHiveMetastoreHook to run all cases

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2447.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3341
[https://github.com/apache/incubator-airflow/pull/3341]

> Fix TestHiveMetastoreHook to run all cases
> --
>
> Key: AIRFLOW-2447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, tests
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 1.10.0
>
>
> TestHiveMetastoreHook has a method called {{get_databases}} but it isn't 
> executed since its name doesn't start with {{test_}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2445) Allow templating for kubernetes operator

2018-05-10 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2445.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3338
[https://github.com/apache/incubator-airflow/pull/3338]

> Allow templating for kubernetes operator
> 
>
> Key: AIRFLOW-2445
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2445
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sergio B
>Priority: Major
> Fix For: 1.10.0
>
>
> Allow templating in command, arguments and environment variables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2425) Maintain lineage and make available in other systems

2018-05-05 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2425:
---

 Summary: Maintain lineage and make available in other systems
 Key: AIRFLOW-2425
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2425
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2381) Fix the Flaky ApiPasswordTests

2018-04-27 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2381.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3269
[https://github.com/apache/incubator-airflow/pull/3269]

> Fix the Flaky ApiPasswordTests
> --
>
> Key: AIRFLOW-2381
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2381
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently the ApiPasswordTests fail because the dag is not available in the 
> database. I believe this is an issue with the different tests running 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2336) Update hive_hook dependencies so that it can work with Python 3

2018-04-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2336.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3239
[https://github.com/apache/incubator-airflow/pull/3239]

> Update hive_hook dependencies so that it can work with Python 3
> ---
>
> Key: AIRFLOW-2336
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2336
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 1.9.0
>Reporter: Giovanni Lanzani
>Assignee: Giovanni Lanzani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I already have a new version of the hive metastore thrift client out. I'm 
> updating it and I will update Airflow consequently (without changing the API)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2369) Fix failing GCS test

2018-04-24 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2369.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3260
[https://github.com/apache/incubator-airflow/pull/3260]

> Fix failing GCS test
> 
>
> Key: AIRFLOW-2369
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2369
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> The version was hardcoded



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2365) Fix autocommit test issue with SQLite

2018-04-24 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2365.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

> Fix autocommit test issue with SQLite
> -
>
> Key: AIRFLOW-2365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2365
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Major
> Fix For: 1.10.0
>
>
> In a previous PR, I added acheck for an autocommit attribute which fails for 
> SQLite. Correcting the tests now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization

2018-04-23 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448812#comment-16448812
 ] 

Bolke de Bruin commented on AIRFLOW-2367:
-

You really need to provide more metrics and configuration options. The 
scheduler can be really busy when you have a lot of dags. worker memory also 
affects your performance. In other words let a DBA have a look at what you are 
doing and report back.

> High POSTGRES DB CPU utilization
> 
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: John Arnold
>Priority: Major
> Attachments: cpu.png
>
>
> We are seeing steady state 70-90% CPU utilization.  It feels like a missing 
> index kind of problem, as our TPS rate is really low, I'm not seeing any long 
> running queries, connection counts are reasonable (low hundreds) and locks 
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be 
> in that part of the code. My guess is either the scheduler has an inefficient 
> query, or the (Celery) executor code path does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2068.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3008
[https://github.com/apache/incubator-airflow/pull/3008]

> Mesos Executor should allow specifying optional Docker image before running 
> command
> ---
>
> Key: AIRFLOW-2068
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2068
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: Airflow 1.8
>Reporter: Agraj Mangal
>Assignee: Agraj Mangal
>Priority: Major
> Fix For: 1.10.0
>
>
> In its current form, MesosExecutor schedules tasks on mesos slaves which just 
> contain airflow commands assuming that the mesos slaves already have airflow 
> installed and configured on them. This assumption goes against the Mesos 
> philosophy of having a heterogeneous cluster.
> Since mesos provides an option to pull a docker image before actually running 
> the actual task/command so this improvement changes the mesos_executor.py to 
> specify an optional docker image containing airflow which can be pulled on 
> slaves before running the actual airflow command. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1652.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #2641
[https://github.com/apache/incubator-airflow/pull/2641]

> Push DatabricksRunSubmitOperator metadata into XCOM
> ---
>
> Key: AIRFLOW-1652
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1652
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Andrew Chen
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-775) AutoCommit in jdbc hook seems not to turn off if set to false

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-775.

   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> AutoCommit in jdbc hook seems not to turn off if set to false
> -
>
> Key: AIRFLOW-775
> URL: https://issues.apache.org/jira/browse/AIRFLOW-775
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, hooks
>Reporter: Daniel Lamblin
>Priority: Major
> Fix For: 1.10.0
>
>
> If I use JdbcHook and run with autocommit=false I still get exceptions when 
> the commit is made because autocommit mode is on by default and apparently 
> was not set to off.
> This can be worked around by setting the connection host with 
> ;autocommit=false
> however it doesn't seem like the intended behavior when passing 
> autocommit=False with the hook's methods.
> The JdbcHook does not seem to have a constructor that could take the jdbc 
> driver, location, host, schema, port, username, and password and work without 
> a set connection id, so working around this in code isn't too straightforward 
> either.
> [2017-01-19 19:03:22,728] {models.py:1286} ERROR - 
> org.netezza.error.NzSQLException: The connection object is in auto-commit mode
> Traceback (most recent call last):
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/models.py",
>  line 1242, in run
> result = task_copy.execute(context=context)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/operators/python_operator.py",
>  line 66, in execute
> return_value = self.python_callable(*self.op_args, **self.op_kwargs)
>   File 
> "/Users/daniellamblin/airflow/dags/dpds/dpds_go_pda_dwd_sku_and_dwd_hist_up_sku_grade.py",
>  line 356, in stage_to_update_tables
> hook.run(sql=sql, autocommit=False)
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py",
>  line 134, in run
> conn.commit()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 391, in commit
> _handle_sql_exception()
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 148, in _handle_sql_exception_jpype
> reraise(exc_type, exc_info[1], exc_info[2])
>   File 
> "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py",
>  line 389, in commit
> self.jconn.commit()
> DatabaseError: org.netezza.error.NzSQLException: The connection object is in 
> auto-commit mode
> [2017-01-19 19:03:22,730] {models.py:1306} INFO - Marking task as FAILED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-766) Skip conn.commit() when in Auto-commit

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-766.

   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> Skip conn.commit() when in Auto-commit
> --
>
> Key: AIRFLOW-766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-766
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: Airflow 2.0
> Environment: Airflow 2.0, IBM Netezza
>Reporter: Pfubar.k
>Assignee: Pfubar.k
>Priority: Major
>  Labels: easyfix
> Fix For: 1.10.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Some JDBC Drivers fails when using DbApiHook.run(), DbApiHook.insert_rows().
> I'm using IBM Netezza. When auto-commit mode is on I get this error message.
> {code}
> NzSQLException: The connection object is in auto-commit mode
> {code}
> conn.commit() needs to be called only when auto-commit mode is off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2364.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3257
[https://github.com/apache/incubator-airflow/pull/3257]

> The autocommit flag can be set on a connection which does not support it.
> -
>
> Key: AIRFLOW-2364
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2364
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Arthur Wiedmer
>Assignee: Arthur Wiedmer
>Priority: Minor
> Fix For: 1.10.0
>
>
> We could just add a logging warning when the method is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2234) Enable insert_rows for PrestoHook

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2234.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3146
[https://github.com/apache/incubator-airflow/pull/3146]

> Enable insert_rows for PrestoHook
> -
>
> Key: AIRFLOW-2234
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2234
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0
>
>
> PrestoHook.insert_rows() raises NotImplementedError for now.
> But [Presto 0.126+ allows specifying column names in INSERT 
> queries|https://prestodb.io/docs/current/release/release-0.126.html], so we 
> can leverage DbApiHook.insert_rows() almost as is.
> I think there is no reason to keep it disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2357) Use a persisten volume for the Kubernetes logs

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2357.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3252
[https://github.com/apache/incubator-airflow/pull/3252]

> Use a persisten volume for the Kubernetes logs
> --
>
> Key: AIRFLOW-2357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2357
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> Right now, when a pod exits, the log is lost forever because it is still in 
> the container. By mounting a persistent volume we can easily fix this and 
> this allows us to have logs on our local machines (minikube). In production 
> you might want to write your logs to GCS/S3/etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2270) Subdag backfill spins on removed tasks

2018-04-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2270.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

> Subdag backfill spins on removed tasks
> --
>
> Key: AIRFLOW-2270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2270
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Winston Huang
>Priority: Major
> Fix For: 1.10.0
>
>
> My understanding is that subdag operators execute via a backfill job which 
> runs in a loop, maintaining the state of the associated tasks and breaking 
> only once all pending tasks have been exhausted: 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159]
>  
> The issue is that this task instance status is initialized by this method 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,]
>  which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were 
> previously instantiated in the database but removed from the dag definition. 
> Hence, the task will be missing from this list 
> [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168]
>  but will exist in {{ti_status.to_run}}. This causes the backfill job to loop 
> indefinitely, since it considers those removed tasks to be pending but 
> doesn't attempt to run them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1623) Clearing task in UI does not trigger on_kill method in operator

2018-04-11 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1623.
-
Resolution: Fixed

Issue resolved by pull request #3204
[https://github.com/apache/incubator-airflow/pull/3204]

> Clearing task in UI does not trigger on_kill method in operator
> ---
>
> Key: AIRFLOW-1623
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1623
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: Airflow 2.0
>Reporter: Richard Penman
>Priority: Major
> Fix For: 1.10.0
>
>
> When a task is cleared in the UI it doesn't call the [operators on_kill() 
> method|https://github.com/apache/incubator-airflow/blob/b2e1753f5b74ad1b6e0889f7b784ce69623c95ce/airflow/models.py#L2380]
>  to clean up the task. Apparently this is meant to be handled in the 
> [LocalTaskJob.on_kill()|https://github.com/apache/incubator-airflow/blob/b2e1753f5b74ad1b6e0889f7b784ce69623c95ce/airflow/jobs.py#L2512]
>  method, however it is not currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2305) Fix CI failure caused by AIRFLOW-2027

2018-04-09 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2305.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3205
[https://github.com/apache/incubator-airflow/pull/3205]

> Fix CI failure caused by AIRFLOW-2027
> -
>
> Key: AIRFLOW-2305
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2305
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci, tests
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Critical
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2027) Only trigger sleep in scheduler after all files have parsed

2018-04-09 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2027.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #2986
[https://github.com/apache/incubator-airflow/pull/2986]

> Only trigger sleep in scheduler after all files have parsed
> ---
>
> Key: AIRFLOW-2027
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2027
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Major
> Fix For: 1.10.0
>
>
> The scheduler loop sleeps for 1 second every loop unnecessarily. Remove this 
> sleep to slightly speed up scheduling, and instead do it once all files have 
> been parsed. It can add up since it runs to every scheduler loop which runs # 
> of dags to parse/scheduler parallelism times.
> Also remove the unnecessary increased file processing interval in tests which 
> slows them down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2295) Correct license headers

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2295:
---

 Summary: Correct license headers
 Key: AIRFLOW-2295
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2295
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin
 Fix For: 1.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2294) Fix missing sub dependency licenses

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2294:
---

 Summary: Fix missing sub dependency licenses
 Key: AIRFLOW-2294
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2294
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin
Assignee: Bolke de Bruin
 Fix For: 1.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2252) Add instructions to avoid GPL dependency 'unidecode'

2018-04-06 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed AIRFLOW-2252.
---
Resolution: Fixed

> Add instructions to avoid GPL dependency 'unidecode'
> 
>
> Key: AIRFLOW-2252
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2252
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Bolke de Bruin
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2290) Include CVE references in changelog

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2290:
---

 Summary: Include CVE references in changelog
 Key: AIRFLOW-2290
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2290
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin
 Fix For: 1.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2289) Add additional quick start to INSTALL

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2289:
---

 Summary: Add additional quick start to INSTALL
 Key: AIRFLOW-2289
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2289
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin
 Fix For: 1.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2288) Source tarball should not extract to root

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2288:
---

 Summary: Source tarball should not extract to root
 Key: AIRFLOW-2288
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2288
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin
 Fix For: 1.10.0


{color:#454545}the src tarball extracting to the current{color}
directory was surprising.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2287) Missing and incorrect license headers

2018-04-06 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2287:
---

 Summary: Missing and incorrect license headers
 Key: AIRFLOW-2287
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2287
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Bolke de Bruin
 Fix For: 1.10.0


* {color:#454545}a few files are missing licenses, like docs/Makefile{color}
 * {color:#454545}please fix year in notice ("2016 and onwards” makes it a 
little bard to work out when copyright would expire){color}
 * {color:#454545}LICENSE is OK but some license texts are missing i.e. 
Bootstrap Toggle, normalize.css, parallel.js. Note that in order to comply with 
the terms of the the licenses the full text of the license MUST be 
included.{color}
 * {color:#454545}also note that ace and d3 are under a  BSD 3 clause not BSD 2 
clause{color}
 * {color:#454545} A large number of files are missing the correct ASF header. 
(see below){color}
 ** {color:#454545}Re incorrect header not perfect but shows scope of the 
issue:{color}
 *** {color:#454545} find . -name "*.*" -exec grep "contributor license" {} \; 
-print | wc{color}
 *** {color:#454545} find . -name "*.*" -exec grep 
"[http://www.apache.org/licenses/LICENSE-2.0]; {} \; -print | wc{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1430) GPL licensing issues

2018-03-27 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1430.
-
Resolution: Fixed

Issue resolved by pull request #3160
[https://github.com/apache/incubator-airflow/pull/3160]

> GPL licensing issues
> 
>
> Key: AIRFLOW-1430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1430
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.1, 1.9.0, 1.8.2
>Reporter: Florian Jetter
>Priority: Critical
> Fix For: 1.10.0
>
>
> The current requirements require the install of the GPLv2 licensed package 
> {{Unidecode}} .
> This is a violation to the ASF licensing conditions, c.f.
> https://www.apache.org/legal/resolved.html#category-x
> https://www.apache.org/legal/resolved.html#prohibited
>  The requirement comes in via:
> {{python-nvd3}} https://github.com/areski/python-nvd3
>  -> {{python-sluggify}} https://github.com/un33k/python-slugify
> -> {{Unidecode}} https://github.com/avian2/unidecode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1430) GPL licensing issues

2018-03-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated AIRFLOW-1430:

Fix Version/s: 1.10.0

> GPL licensing issues
> 
>
> Key: AIRFLOW-1430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1430
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.1, 1.9.0, 1.8.2
>Reporter: Florian Jetter
>Priority: Critical
> Fix For: 1.10.0
>
>
> The current requirements require the install of the GPLv2 licensed package 
> {{Unidecode}} .
> This is a violation to the ASF licensing conditions, c.f.
> https://www.apache.org/legal/resolved.html#category-x
> https://www.apache.org/legal/resolved.html#prohibited
>  The requirement comes in via:
> {{python-nvd3}} https://github.com/areski/python-nvd3
>  -> {{python-sluggify}} https://github.com/un33k/python-slugify
> -> {{Unidecode}} https://github.com/avian2/unidecode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2252) Add instructions to avoid GPL dependency 'unidecode'

2018-03-25 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created AIRFLOW-2252:
---

 Summary: Add instructions to avoid GPL dependency 'unidecode'
 Key: AIRFLOW-2252
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2252
 Project: Apache Airflow
  Issue Type: Sub-task
Reporter: Bolke de Bruin
 Fix For: 1.10.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1430) GPL licensing issues

2018-03-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412934#comment-16412934
 ] 

Bolke de Bruin commented on AIRFLOW-1430:
-

This has become a non issue. I have created a PR with python-slugify which 
allows for an alternative dependency (`text-unidecode`) that is APL compatible. 

As soon this has bubbled up in the releases (python-slugify, python-nvd3) we 
can include instructions how to exclude the GPL code if required.

> GPL licensing issues
> 
>
> Key: AIRFLOW-1430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1430
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.1, 1.9.0, 1.8.2
>Reporter: Florian Jetter
>Priority: Critical
>
> The current requirements require the install of the GPLv2 licensed package 
> {{Unidecode}} .
> This is a violation to the ASF licensing conditions, c.f.
> https://www.apache.org/legal/resolved.html#category-x
> https://www.apache.org/legal/resolved.html#prohibited
>  The requirement comes in via:
> {{python-nvd3}} https://github.com/areski/python-nvd3
>  -> {{python-sluggify}} https://github.com/un33k/python-slugify
> -> {{Unidecode}} https://github.com/avian2/unidecode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2060) Clear task raise a TypeError - timezone related

2018-03-24 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2060.
-
   Resolution: Fixed
Fix Version/s: 1.9.1
   1.10.0

Issue resolved by pull request #3154
[https://github.com/apache/incubator-airflow/pull/3154]

> Clear task raise a TypeError - timezone related
> ---
>
> Key: AIRFLOW-2060
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2060
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Fabrice Dossin
>Assignee: Fabrice Dossin
>Priority: Major
> Fix For: 1.10.0, 1.9.1
>
> Attachments: simple_dag_timezone.py
>
>
> Hello,
> Please find attached a simple dag.
> Run it:
>  airflow backfill simple_dag_timezone -s "2018-01-25T00:00:00+01:00" -e 
> "2018-01-25T00:00:00+01:00"
> Then from webserver, when clearing the task it throw an exception:
> {code:java}
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1988, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in 
> reraise
> raise value
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 
> 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line 
> 264, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line 
> 311, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/www/views.py", line 
> 1018, in clear
> include_upstream=upstream)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line 3697, 
> in sub_dag
> dag = copy.deepcopy(self)
>   File "/usr/lib/python3.5/copy.py", line 166, in deepcopy
> y = copier(memo)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line 3682, 
> in __deepcopy__
> setattr(result, k, copy.deepcopy(v, memo))
>   File "/usr/lib/python3.5/copy.py", line 155, in deepcopy
> y = copier(x, memo)
>   File "/usr/lib/python3.5/copy.py", line 243, in _deepcopy_dict
> y[deepcopy(key, memo)] = deepcopy(value, memo)
>   File "/usr/lib/python3.5/copy.py", line 166, in deepcopy
> y = copier(memo)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line 2498, 
> in __deepcopy__
> setattr(result, k, copy.deepcopy(v, memo))
>   File "/usr/lib/python3.5/copy.py", line 182, in deepcopy
> y = _reconstruct(x, rv, 1, memo)
>   File "/usr/lib/python3.5/copy.py", line 291, in _reconstruct
> args = deepcopy(args, memo)
>   File "/usr/lib/python3.5/copy.py", line 155, in deepcopy
> y = copier(x, memo)
>   File "/usr/lib/python3.5/copy.py", line 223, in _deepcopy_tuple
> y = [deepcopy(a, memo) for a in x]
>   File "/usr/lib/python3.5/copy.py", line 223, in 
> y = [deepcopy(a, memo) for a in x]
>   File "/usr/lib/python3.5/copy.py", line 182, in deepcopy
> y = _reconstruct(x, rv, 1, memo)
>   File "/usr/lib/python3.5/copy.py", line 292, in _reconstruct
> y = callable(*args)
> TypeError: __init__() takes 1 positional argument but 6 were given
> {code}
>  
>  
> I did dig a lot but was just able to tell the deepcopy was failing on a 
> datetime... It has not been possible to me to find a solution.
>  
> on airflow/models.py", line 2498, in __deepcopy__:
> {code:java}
> setattr(result, k, copy.deepcopy(v, memo))
> {code}
> the k is the default_args dict.
> It fails on the start_date.
> I did not manage to reproduce it outside of airflow
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-85) Create DAGs UI

2018-03-23 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-85.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3015
[https://github.com/apache/incubator-airflow/pull/3015]

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
> Fix For: 1.10.0
>
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
>  * Admin
>  * Data profiler
>  * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
>  # View their DAGs, but no one else's.
>  # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.
>  
> (From Airflow-1443)
> The authentication capabilities in the [RBAC design 
> proposal|https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]
>  introduces a significant amount of work that is otherwise already built-in 
> in existing frameworks.
> Per [community 
> discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
>  Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
> implementing RBAC. This will support integration with different 
> authentication backends out-of-the-box, and generate permissions for views 
> and ORM models that will simplify view-level and dag-level access control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2034) mixup between %s and {} when using str.format

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2034.
-
   Resolution: Fixed
Fix Version/s: (was: 1.9.0)
   1.10.0

Issue resolved by pull request #2976
[https://github.com/apache/incubator-airflow/pull/2976]

> mixup between %s and {} when using str.format
> -
>
> Key: AIRFLOW-2034
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2034
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.9.0
>Reporter: knil-sama
>Assignee: knil-sama
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.10.0
>
>
> Convention is to use .format for string formating outside logging, else use 
> lazy formation
>  See comment in related issue
>  #[https://github.com/apache/incubator-airflow/pull/2823/files]
> But some code didn't implement it correctly.
> Problematic cases can be identified using following command line
> {{grep -r '%s'./* | grep '\.format('}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2102) Add custom_args to Sendgrid personalizations

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2102.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3035
[https://github.com/apache/incubator-airflow/pull/3035]

> Add custom_args to Sendgrid personalizations
> 
>
> Key: AIRFLOW-2102
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2102
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Reporter: Marcin Szymanski
>Assignee: Marcin Szymanski
>Priority: Major
> Fix For: 1.10.0
>
>
> Add support for {{custom_args}} in personalizations
> [https://sendgrid.com/docs/Classroom/Send/v3_Mail_Send/personalizations.html]
> {{custom_args}} should be passed in {{kwargs}} as other backends don't 
> support them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1053) HiveOperator: unicode character in HQL query produces "UnicodeEncodeError: 'ascii' codec can't encode character ..."

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1053.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3053
[https://github.com/apache/incubator-airflow/pull/3053]

> HiveOperator: unicode character in HQL query produces "UnicodeEncodeError: 
> 'ascii' codec can't encode character ..."
> 
>
> Key: AIRFLOW-1053
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1053
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Tomas Kafka
>Priority: Minor
>  Labels: utf-8
> Fix For: 1.10.0
>
> Attachments: airflow-hive-utf.py
>
>
> Run an attached DAG, for example as:
> {quote}
> airflow test airflow-hive-sample-utf utf-snowman 2017-01-01
> {quote}
> Imporant part:
> {quote}
> unicode_snowman = unichr(0x2603)
> op_test_select = HiveOperator(
> task_id='utf-snowman',
> hql='select \'' + unicode_snowman + '\' as utf_text;',
> dag=dag)
> {quote}
> It should return a single row with an unicode snowman, but instead ends with 
> error:
> {quote}
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2603' in 
> position 8: ordinal not in range(128)
> {quote}
> The same applies for unicode characters in external .hql files.
> Why is it a problem? Not because of snowmen, but I need to replace some 
> unicode chars in a Hive ETL query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2127) Airflow's Alembic migrations globally disable logging

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2127.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3059
[https://github.com/apache/incubator-airflow/pull/3059]

> Airflow's Alembic migrations globally disable logging
> -
>
> Key: AIRFLOW-2127
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2127
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Matt Davis
>Priority: Major
> Fix For: 1.10.0
>
>
> When running Airflow's 
> {{[upgradedb|https://github.com/apache/incubator-airflow/blob/fc26cade87e181f162ecc8391ae16dccbe6f29c4/airflow/utils/db.py#L295]}},
>  
> {{[resetdb|https://github.com/apache/incubator-airflow/blob/fc26cade87e181f162ecc8391ae16dccbe6f29c4/airflow/utils/db.py#L311]}},
>  and 
> {{[initdb|https://github.com/apache/incubator-airflow/blob/fc26cade87e181f162ecc8391ae16dccbe6f29c4/airflow/utils/db.py#L83]}}
>  functions logging is disabled thereafter for all but the 
> {{sqlalchemy.engine}} and {{alembic}} loggers. This is caused [this 
> usage|https://github.com/apache/incubator-airflow/blob/fc26cade87e181f162ecc8391ae16dccbe6f29c4/airflow/migrations/env.py#L28]
>  of Python's {{fileConfig}} function, which by default disables all loggers 
> that aren't part of the supplied configuration. (See [Python 2 
> docs|https://docs.python.org/2/library/logging.config.html#logging.config.fileConfig]
>  and [Python 3 
> docs|https://docs.python.org/3/library/logging.config.html#logging.config.fileConfig].)
>  This can be fixed by adding {{disable_existing_loggers=False}} to the call 
> of {{fileConfig}}.
> This has affected us at Clover Health because we use these database utility 
> functions in some of our tooling, and particularly our _tests_ of the 
> tooling. Having all logging disabled in the midst of our tests makes it more 
> difficult to test our use of logging in completely unrelated parts of our 
> codebase.
> As an example, we were trying to use [pytest's caplog 
> feature|https://docs.pytest.org/en/latest/logging.html#caplog-fixture], but 
> were unable to do so with logging globally disabled by {{fileConfig}}. Here's 
> an example of a test that fails with {{disable_existing_loggers=True}} (the 
> default), but passes with {{disable_existing_loggers=False}}.
> {code}
> import logging
> import pytest
> import airflow.utils.db as af_db
> LOGGER = logging.getLogger(__name__)
> @pytest.fixture(autouse=True)
> def resetdb():
> af_db.resetdb()
> def test_caplog(caplog):
> LOGGER.info('LINE 1')
> assert caplog.record_tuples
> assert 'LINE 1' in caplog.text
> {code}
> I'll submit a pull request shortly to add {{disable_existing_loggers=False}} 
> to Airflow's 
> {{[env.py|https://github.com/apache/incubator-airflow/blob/fc26cade87e181f162ecc8391ae16dccbe6f29c4/airflow/migrations/env.py#L28]}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2146) Initialize default Google BigQuery Connection with valid conn_type & Fix broken DBApiHook

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2146.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3073
[https://github.com/apache/incubator-airflow/pull/3073]

> Initialize default Google BigQuery Connection with valid conn_type & Fix 
> broken DBApiHook
> -
>
> Key: AIRFLOW-2146
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2146
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.0
>
>
> `airflow initdb` creates a connection with conn_id='bigquery_default' and 
> conn_type='bigquery'. However, bigquery is not a valid conn_type, according 
> to models.Connection._types, and BigQuery connections should use the 
> google_cloud_platform conn_type.
> Also as [renanleme|https://github.com/renanleme] mentioned 
> [here|https://github.com/apache/incubator-airflow/pull/3031#issuecomment-368132910]
>  the dags he has created are broken when he is using `get_records()` from 
> BigQueryHook which is extended from DbApiHook.
> *Error Log*:
> {code}
> Traceback (most recent call last):
>   File "/src/apache-airflow/airflow/models.py", line 1519, in _run_raw_task
> result = task_copy.execute(context=context)
>   File "/airflow/dags/lib/operators/test_operator.py", line 21, in execute
> records = self._get_db_hook(self.source_conn_id).get_records(self.sql)
>   File "/src/apache-airflow/airflow/hooks/base_hook.py", line 92, in 
> get_records
> raise NotImplementedError()
> {code}
> *Dag*:
> {code:python}
> from datetime import datetime
> from airflow import DAG
> from lib.operators.test_operator import TestOperator
> default_args = {
> 'depends_on_past': False,
> 'start_date': datetime(2018, 2, 21),
> }
> dag = DAG(
> 'test_dag',
> default_args=default_args,
> schedule_interval='0 6 * * *'
> )
> sql = '''
> SELECT id from YOUR_BIGQUERY_TABLE limit 10
> '''
> compare_grouped_event = TestOperator(
> task_id='test_operator',
> source_conn_id='gcp_airflow',
> sql=sql,
> dag=dag
> )
> {code}
> *Operator*:
> {code:python}
> from airflow.hooks.base_hook import BaseHook
> from airflow.models import BaseOperator
> from airflow.utils.decorators import apply_defaults
> class TestOperator(BaseOperator):
> @apply_defaults
> def __init__(
> self,
> sql,
> source_conn_id=None,
> *args, **kwargs):
> super(TestOperator, self).__init__(*args, **kwargs)
> self.sql = sql
> self.source_conn_id = source_conn_id
> def execute(self, context=None):
> records = self._get_db_hook(self.source_conn_id).get_records(self.sql)
> self.log.info('Fetched records from source')
> @staticmethod
> def _get_db_hook(conn_id):
> return BaseHook.get_hook(conn_id=conn_id)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2087) Scheduler Report shows incorrect "Total task number"

2018-02-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2087.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3074
[https://github.com/apache/incubator-airflow/pull/3074]

> Scheduler Report shows incorrect "Total task number"
> 
>
> Key: AIRFLOW-2087
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2087
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.9.0
>Reporter: Daniel Lamblin
>Assignee: Tao Feng
>Priority: Trivial
> Fix For: 1.10.0
>
>
> [https://github.com/apache/incubator-airflow/blob/4751abf8acad766cb576ecfe3a333d68cc693b8c/airflow/models.py#L479]
> This line is printing the same "Total task number" as "Number of DAGs" in the 
> cli tool `airflow list_dags -r`.
> E.G. some output:
> {{---}}
> {{DagBag loading stats for /pang/service/airflow/dags}}
> {{---}}
> {{Number of DAGs: 1143}}
> {{Total task number: 1143}}
> {{DagBag parsing time: 24.900074}}
> {{}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2089) Add on kill for SparkSubmit in Standalone Cluster

2018-02-16 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2089.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3023
[https://github.com/apache/incubator-airflow/pull/3023]

> Add on kill for SparkSubmit in Standalone Cluster
> -
>
> Key: AIRFLOW-2089
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2089
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Milan van der Meer
>Assignee: Milan van der Meer
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2113) Address missing DagRun callbacks

2018-02-16 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2113.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3038
[https://github.com/apache/incubator-airflow/pull/3038]

> Address missing DagRun callbacks
> 
>
> Key: AIRFLOW-2113
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2113
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Alan Ma
>Assignee: Alan Ma
>Priority: Critical
> Fix For: 1.10.0
>
>
> This originally arose from the missing notification from the on_failure and 
> on_success callback at the dag level. The stack trace is as follows:
> {code:java}
> [2018-02-07 07:00:08,145] \{models.py:2984} DagFileProcessor172 INFO - 
> Executing dag callback function: 
>  .GeneralNotifyFailed instance at 0x7fec9d8ad368>
> [2018-02-07 07:00:08,161] \{models.py:168} DagFileProcessor172 INFO - Filling 
> up the DagBag from /home/charon/.virtualenvs/airflow/airflow_home/dags
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> Dag: , paused: False
> [2018-02-07 07:00:12,103] \{jobs.py:354} DagFileProcessor172 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 346, in helper
> pickle_dags)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1586, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 1175, in _process_dags
> dag_run = self.create_dag_run(dag)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/jobs.py",
>  line 747, in create_dag_run
> dag.handle_callback(dr, success=False, reason='dagrun_timeout', 
> session=session)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/airflow/models.py",
>  line 2990, in handle_callback
> d = dagrun.dag or DagBag().get_dag(dag_id=dagrun.dag_id)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py",
>  line 579, in get
> value = state._load_expired(state, passive)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py",
>  line 592, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File 
> "/home/charon/.virtualenvs/airflow/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py",
>  line 644, in load_scalar_attributes
> (state_str(state)))
> DetachedInstanceError: Instance  is not bound to a 
> Session; attribute refresh operation cannot proceed
> [2018-02-07 07:00:31,003] \{jobs.py:343} DagFileProcessor208 INFO - Started 
> process (PID=7813) to work on 
> /home/charon/.virtualenvs/airflow/airflow_home/dags/c
> haron-airflow/dags/inapp_vendor_sku_breakdown.py\
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2094) Jinjafy project_id, region & zone in DataProc{*} Operators

2018-02-09 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2094.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3027
[https://github.com/apache/incubator-airflow/pull/3027]

> Jinjafy project_id, region & zone in DataProc{*} Operators
> --
>
> Key: AIRFLOW-2094
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2094
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Reporter: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>
> The project_id, region, and zone in DataProc{*} Operators are not jinjafied. 
> If we can do that, we can use Airflow variables to use a default project_id, 
> region and zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2077) S3 list_objects_v2 is a paged response, need to fetch all pages

2018-02-08 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-2077.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3012
[https://github.com/apache/incubator-airflow/pull/3012]

> S3 list_objects_v2 is a paged response, need to fetch all pages
> ---
>
> Key: AIRFLOW-2077
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2077
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: boto3
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Minor
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   >