[jira] [Updated] (AIRFLOW-3109) Default user permission should contain 'can_clear'

2018-09-24 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-3109:
-
Description: The default user role is missing 'can_clear' permission which 
allows user to clear DAG runs.  (was: There's a bug in the default user 
permission. 'clear' should have been 'can_clear' as FAB automatically prepend 
model permissions with 'can_' prefix.)

> Default user permission should contain 'can_clear'
> --
>
> Key: AIRFLOW-3109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3109
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
>
> The default user role is missing 'can_clear' permission which allows user to 
> clear DAG runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3109) Default user permission should contain 'can_clear'

2018-09-24 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-3109:
-
Summary: Default user permission should contain 'can_clear'  (was: Default 
user permission should be 'can_clear' instead of 'clear')

> Default user permission should contain 'can_clear'
> --
>
> Key: AIRFLOW-3109
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3109
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
>
> There's a bug in the default user permission. 'clear' should have been 
> 'can_clear' as FAB automatically prepend model permissions with 'can_' prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3109) Default user permission should be 'can_clear' instead of 'clear'

2018-09-24 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-3109:


 Summary: Default user permission should be 'can_clear' instead of 
'clear'
 Key: AIRFLOW-3109
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3109
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao


There's a bug in the default user permission. 'clear' should have been 
'can_clear' as FAB automatically prepend model permissions with 'can_' prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3072) Only admin can view logs in RBAC UI

2018-09-20 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-3072.
--
   Resolution: Fixed
Fix Version/s: 1.10.1

> Only admin can view logs in RBAC UI
> ---
>
> Key: AIRFLOW-3072
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3072
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.1
>
>
> With RBAC enabled, only users with role admin can view logs.
> The default roles (excluding public) include permission {{can_log}} which 
> allows to open the /log page, however the actual log message is loaded with 
> another XHR request which required the additional permission 
> {{get_logs_with_metadata}}.
> My suggestion is to add the permission and assign tog viewer role. Or is 
> there a cause why only admin should be able to see logs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3085) Log viewing not possible in default RBAC setting

2018-09-19 Thread Joy Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620871#comment-16620871
 ] 

Joy Gao commented on AIRFLOW-3085:
--

oops, thanks!

> Log viewing not possible in default RBAC setting
> 
>
> Key: AIRFLOW-3085
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3085
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Priority: Major
>
> Aside from Admin role, all other roles are not able to view logs right now 
> due to a missing permission in the default setting. The permission should be 
> added to Viewer/User/Op as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3085) Log viewing not possible in default RBAC setting

2018-09-19 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao closed AIRFLOW-3085.

Resolution: Duplicate

> Log viewing not possible in default RBAC setting
> 
>
> Key: AIRFLOW-3085
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3085
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Priority: Major
>
> Aside from Admin role, all other roles are not able to view logs right now 
> due to a missing permission in the default setting. The permission should be 
> added to Viewer/User/Op as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3085) Log viewing not possible in default RBAC setting

2018-09-18 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-3085:


 Summary: Log viewing not possible in default RBAC setting
 Key: AIRFLOW-3085
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3085
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao


Aside from Admin role, all other roles are not able to view logs right now due 
to a missing permission in the default setting. The permission should be added 
to Viewer/User/Op as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2604) dag_id, task_id, execution_date in dag_fail should be indexed

2018-06-26 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2604.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3539
[https://github.com/apache/incubator-airflow/pull/3539]

> dag_id, task_id, execution_date in dag_fail should be indexed
> -
>
> Key: AIRFLOW-2604
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2604
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.0
>
>
> As a follow-up to AIRFLOW-2602, we should index dag_id, task_id and 
> execution_date to make sure the /gantt page (and any other future UIs relying 
> on task_fail) can still be rendered quickly as the table grows in size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2678) Fix db scheme unit test to remove checking fab models

2018-06-26 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2678.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3548
[https://github.com/apache/incubator-airflow/pull/3548]

> Fix db scheme unit test to remove checking fab models
> -
>
> Key: AIRFLOW-2678
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2678
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently airflow doesn't have FAB models as well migration script for the 
> models. We should ignore checking those models in the unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2624) Airflow webserver broken out of the box

2018-06-26 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao closed AIRFLOW-2624.

   Resolution: Fixed
Fix Version/s: 1.10.0

> Airflow webserver broken out of the box
> ---
>
> Key: AIRFLOW-2624
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2624
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Blocker
> Fix For: 1.10.0
>
>
> `airflow webserver` and then click on any DAG, I get
> ```
>   File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", 
> line 364, in view_func
> return f(*args, **kwargs)
>   File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", 
> line 251, in wrapper
> user = current_user.user.username
> AttributeError: 'NoneType' object has no attribute 'username'
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2681) Last execution date is not included in UI for externally triggered DAGs

2018-06-26 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2681.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3551
[https://github.com/apache/incubator-airflow/pull/3551]

> Last execution date is not included in UI for externally triggered DAGs
> ---
>
> Key: AIRFLOW-2681
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2681
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: David Hatch
>Assignee: David Hatch
>Priority: Major
> Fix For: 1.10.0
>
>
> If a DAG has no schedule and is only externally triggered, the last run's 
> execution date is not included in the UI.
>  
> This is because {{include_externally_triggered}} is not passed to 
> {{get_last_dagrun}} from the {{dags.html}} template.  It used to be before 
> this commit 
> https://github.com/apache/incubator-airflow/commit/0bf7adb209ce969243ffaf4fc5213ff3957cbbc9#diff-f38558559ea1b4c30ddf132b7f223cf9L299.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-22 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2615.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3506
[https://github.com/apache/incubator-airflow/pull/3506]

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all in the parent process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2602) Show failed attempts in Gantt view

2018-06-21 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2602.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

Issue resolved by pull request #3492
[https://github.com/apache/incubator-airflow/pull/3492]

> Show failed attempts in Gantt view
> --
>
> Key: AIRFLOW-2602
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2602
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Screenshot_2018-06-13_00-13-21.png
>
>
> The Gantt view only shows the last attempt (successful or failed). It would 
> be nice to also visualize failed attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2654) NotFoundError in refresh button in new FAB UI

2018-06-21 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2654.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3527
[https://github.com/apache/incubator-airflow/pull/3527]

> NotFoundError in refresh button in new FAB UI
> -
>
> Key: AIRFLOW-2654
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2654
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.0, 2.0.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: airflow-refresh-error.png
>
>
> When you click on the *refresh* button, you get "error: NOT FOUND" as shown 
> in the image attachment.
> The issue is the wrong URL is requested when the refresh button is pressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types

2018-06-20 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2606.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3516
[https://github.com/apache/incubator-airflow/pull/3516]

> Test needed to ensure database schema always match SQLAlchemy model types
> -
>
> Key: AIRFLOW-2606
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.0
>
>
> An issue was discovered by [this 
> PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
>  where database schema does not match its corresponding SQLAlchemy model 
> declaration. We should add generic unit test for this to prevent similar bugs 
> from occurring in the future. (Alternatively, we can add the policing logic 
> to `airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types

2018-06-13 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-2606:
-
Summary: Test needed to ensure database schema always match SQLAlchemy 
model types  (was: Test needed to ensure database schema always match 
SQLAlchemy models)

> Test needed to ensure database schema always match SQLAlchemy model types
> -
>
> Key: AIRFLOW-2606
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Priority: Major
>
> An issue was discovered by [this 
> PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
>  where database schema does not match its corresponding SQLAlchemy model 
> declaration. We should add generic unit test for this to prevent similar bugs 
> from occurring in the future. (Alternatively, we can add the policing logic 
> to `airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy models

2018-06-13 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2606:


 Summary: Test needed to ensure database schema always match 
SQLAlchemy models
 Key: AIRFLOW-2606
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2606
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Joy Gao


An issue was discovered by [this 
PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203]
 where database schema does not match its corresponding SQLAlchemy model 
declaration. We should add generic unit test for this to prevent similar bugs 
from occurring in the future. (Alternatively, we can add the policing logic to 
`airflow upgradedb` command so each migrations can do the check)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2414) Fix RBAC log display

2018-06-12 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao closed AIRFLOW-2414.

Resolution: Fixed

> Fix RBAC log display
> 
>
> Key: AIRFLOW-2414
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2414
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Oleg Yamin
>Assignee: Oleg Yamin
>Priority: Major
> Fix For: 1.10.0
>
>
> Getting the following error when trying to view the log file in new RBAC UI.
> {code:java}
> [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET]
> Traceback (most recent call last):
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>  response = self.full_dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>  rv = self.handle_user_exception(e)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>  rv = self.dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>  File 
> "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", 
> line 26, in wraps
>  return f(self, *args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line 
> 55, in wrapper
>  return f(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in 
> wrapper
>  return func(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, 
> in log
>  logs = log.decode('utf-8')
> AttributeError: 'list' object has no attribute 'decode'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2414) Fix RBAC log display

2018-06-12 Thread Joy Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510358#comment-16510358
 ] 

Joy Gao commented on AIRFLOW-2414:
--

A fix for this was merged recently: 
[https://github.com/apache/incubator-airflow/pull/3310]

[~rushtokunal] let me know if you are still seeing issues with this. Going to 
close the ticket for now.

> Fix RBAC log display
> 
>
> Key: AIRFLOW-2414
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2414
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Oleg Yamin
>Assignee: Oleg Yamin
>Priority: Major
> Fix For: 1.10.0
>
>
> Getting the following error when trying to view the log file in new RBAC UI.
> {code:java}
> [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET]
> Traceback (most recent call last):
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>  response = self.full_dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>  rv = self.handle_user_exception(e)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>  rv = self.dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>  File 
> "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", 
> line 26, in wraps
>  return f(self, *args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line 
> 55, in wrapper
>  return f(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in 
> wrapper
>  return func(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, 
> in log
>  logs = log.decode('utf-8')
> AttributeError: 'list' object has no attribute 'decode'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2585) Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator

2018-06-11 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-2585:
-
Description: 
* Issue with UUID type conversion: currently UUID is converted to hex string, 
but should be converted to base64-encoded as that is the required format in 
BigQuery for uploading.
 * Issue with configuring load balancing policy in CassandraHook: currently the 
hook only successfully instantiate with the default LB policy, but throw an 
exception if attempts to pass in a custom LB policy in the extra field.
 * Issue with connections not closed properly after use: should always shut 
down the cluster in the operator to close all sessions/connections associated 
with the cluster instance.

 

  was:
* Issue with UUID type conversion: currently UUID is converted to hex string, 
but should be converted to base64-encoded as that is the required format in 
BigQuery for uploading.
 * Issue with configuring load balancing policy in CassandraHook: currently the 
hook only successfully instantiate with the default LB policy, but throw an 
exception if attempts to pass in a custom LB policy in the extra field.

 


> Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator
> --
>
> Key: AIRFLOW-2585
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2585
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
>
> * Issue with UUID type conversion: currently UUID is converted to hex string, 
> but should be converted to base64-encoded as that is the required format in 
> BigQuery for uploading.
>  * Issue with configuring load balancing policy in CassandraHook: currently 
> the hook only successfully instantiate with the default LB policy, but throw 
> an exception if attempts to pass in a custom LB policy in the extra field.
>  * Issue with connections not closed properly after use: should always shut 
> down the cluster in the operator to close all sessions/connections associated 
> with the cluster instance.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1115) github enterprise auth fail to fetch user info

2018-06-11 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-1115.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3469
[https://github.com/apache/incubator-airflow/pull/3469]

> github enterprise auth fail to fetch user info
> --
>
> Key: AIRFLOW-1115
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1115
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: Airflow 1.8
>Reporter: Deo
>Assignee: Deo
>Priority: Major
> Fix For: 2.0.0
>
>
> [2017-04-17 13:30:50,540] [68622] {github_enterprise_auth.py:216} ERROR -
> Traceback (most recent call last):
>   File 
> "/xxx/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/contrib/auth/backends/github_enterprise_auth.py",
>  line 210, in oauth_callback
> username, email = self.get_ghe_user_profile_info(ghe_token)
>   File 
> "/xxx/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/contrib/auth/backends/github_enterprise_auth.py",
>  line 140, in get_ghe_user_profile_info
> resp.status if resp else 'None'))
> airflow.contrib.auth.backends.github_enterprise_auth.AuthenticationError: 
> Failed to fetch user profile, status (404)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2585) Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator

2018-06-08 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2585:


 Summary: Bug fix in CassandraHook and 
CassandraToGoogleCloudStorageOperator
 Key: AIRFLOW-2585
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2585
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao


* Issue with UUID type conversion: currently UUID is converted to hex string, 
but should be converted to base64-encoded as that is the required format in 
BigQuery for uploading.
 * Issue with configuring load balancing policy in CassandraHook: currently the 
hook only successfully instantiate with the default LB policy, but throw an 
exception if attempts to pass in a custom LB policy in the extra field.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2573) Cast TIMESTAMP field to float rather than int

2018-06-07 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2573.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3471
[https://github.com/apache/incubator-airflow/pull/3471]

> Cast TIMESTAMP field to float rather than int
> -
>
> Key: AIRFLOW-2573
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2573
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Hongyi Wang
>Assignee: Hongyi Wang
>Priority: Blocker
> Fix For: 2.0.0
>
>
> In current bigquery_hook.py, we have a `_bq_cast(string_field, bq_type)` 
> function that help casts a BigQuery row to the appropriate data types. 
> {quote}elif bq_type == 'INTEGER' or bq_type == 'TIMESTAMP':
>     return int(string_field)
> {quote}
> However, when a bq_type equals to 'TIMESTAMP', it causes ValueError.
> {quote}>>> int('1.458668898E9')
> ValueError: invalid literal for int() with base 10: '1.458668898E9'
> {quote}
> Because 'TIMESTAMP' in bigquery is stored as double in python, thus should be 
> cast to float instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2504) Airflow UI Auditing - log username show extra filter

2018-06-01 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2504.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3438
[https://github.com/apache/incubator-airflow/pull/3438]

> Airflow UI Auditing - log username show extra filter
> 
>
> Key: AIRFLOW-2504
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2504
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Junda Yang
>Assignee: Junda Yang
>Priority: Minor
> Fix For: 2.0.0
>
>
> 1. There is a bug in the 
> [action_logging|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/utils.py#L249]
>  of old UI. The *username* attribute is always in *current_user* but it is 
> *None*. We should call *current_user.user.username* to get the username. See 
> example usage of 
> [current_user.user.username|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/views.py#L1929]
> 2. We also need to add a column filter on *extra* so we can search for 
> request content, like who send what kind of write request from Airflow UI, as 
> the action_logging is [logging all request 
> parameters|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/utils.py#L258]
>  in extra field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2551) Encode binary data with base64 standard rather than base64 url

2018-06-01 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2551.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3449
[https://github.com/apache/incubator-airflow/pull/3449]

> Encode binary data with base64 standard rather than base64 url
> --
>
> Key: AIRFLOW-2551
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2551
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Hongyi Wang
>Assignee: Hongyi Wang
>Priority: Major
> Fix For: 2.0.0
>
>
> When we try to load mysql data into Google BigQuery (mysql -> gcs -> bq), 
> there is a binary filed (uuid), which will cause BigQuery job failed, with 
> message "_Could not decode base64 string to bytes. Field: uuid; Value: 
> _gJbkmC1QTiS-zZ46uiHWg==_"
> This was caused by "_col_val = base64.urlsafe_b64encode(col_val)_"  in 
> mysql_to_gcs_operator.
> We should use "_standard_b64encode()_" instead.
> {quote}{{Base64url encoding is basically base64 encoding except they use 
> non-reserved URL characters (e.g. – is used instead of + and _ is used 
> instead of /) __ }}
> {quote}
> Related to [AIRFLOW-2169]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2529) Improve graph view performance and usability

2018-06-01 Thread Joy Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2529.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3441
[https://github.com/apache/incubator-airflow/pull/3441]

> Improve graph view performance and usability
> 
>
> Key: AIRFLOW-2529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2529
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screenshot_2018-05-28_21-32-38.png
>
>
> The "Graph View" has a dropdown which contains all DAG run IDs. If there are 
> many (thousands) of DAG runs the page gets barely usable. It takes multiple 
> seconds to load the page because all DAG runs must be fetched from DB, are 
> processed, and a long option list is rendered in the browser. It is also not 
> very useful because in such a long list it is hard to find a particular DAG 
> run.
> A simple fix to address the load time would be to just limit the number of 
> shown DAG runs. For example only the latest N are shown, N could be 
> "page_size" from airflow.cfg which is also used in other views. If the DAG 
> run that should be shown (via query parameters execution_date or run_id) is 
> not included in the N lastest list it can still be added by a 2nd SQL query.
> A more complex change to improve usability would require a different way to 
> select a DAG run. For example a popup to search for DAG runs with pagination 
> etc. But such functionality already exits in the /dagrun UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2477) Improve time units for task duration and landing times charts for RBAC UI

2018-05-17 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2477.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3368
[https://github.com/apache/incubator-airflow/pull/3368]

> Improve time units for task duration and landing times charts for RBAC UI
> -
>
> Key: AIRFLOW-2477
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2477
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
> Fix For: 1.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2474) Should not attempt to import snakebite in py3

2018-05-16 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2474:


 Summary: Should not attempt to import snakebite in py3
 Key: AIRFLOW-2474
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2474
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao


Patch in HDFSHook module to stop importing snakebite in PY3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2458) Create CassandraToGoogleCloudStorageOperator and CassandraHook

2018-05-12 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2458:


 Summary: Create CassandraToGoogleCloudStorageOperator and 
CassandraHook
 Key: AIRFLOW-2458
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2458
 Project: Apache Airflow
  Issue Type: New Feature
Affects Versions: 1.10.0
Reporter: Joy Gao
Assignee: Joy Gao


Create an operator that allows storying Cassandra cql query results to Google 
Cloud Storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2457) Upgrade FAB version in setup.py to support timezone

2018-05-11 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2457:


 Summary: Upgrade FAB version in setup.py to support timezone
 Key: AIRFLOW-2457
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2457
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.10
Reporter: Joy Gao
Assignee: Joy Gao


FAB 1.9.6 doesn't support datetime with timezones, upgrade to 1.10.0 will fix 
this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2414) Fix RBAC log display

2018-05-07 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466714#comment-16466714
 ] 

Joy Gao commented on AIRFLOW-2414:
--

 

Hmm interesting:
{code:java}

if ti is None:
  logs = ["*** Task instance did not exist in the DB\n"]
else:
  logger = logging.getLogger('airflow.task')
  task_log_reader = conf.get('core', 'task_log_reader')
  handler = next((handler for handler in logger.handlers
  if handler.name == task_log_reader), None)
try:
  ti.task = dag.get_task(ti.task_id)
  logs = handler.read(ti)
except AttributeError as e:
  logs = ["Task log handler {} does not support read 
logs.\n{}\n".format(task_log_reader, str(e))]

for i, log in enumerate(logs):
  if PY2 and not isinstance(log, unicode):
logs[i] = log.decode('utf-8')

{code}
```

Log should be string, wondering if this bug is related to subdags? can you 
print out the list object and see what it contains? 

> Fix RBAC log display
> 
>
> Key: AIRFLOW-2414
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2414
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.0
>Reporter: Oleg Yamin
>Assignee: Oleg Yamin
>Priority: Major
> Fix For: 1.10.0
>
>
> Getting the following error when trying to view the log file in new RBAC UI.
> {code:java}
> [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET]
> Traceback (most recent call last):
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>  response = self.full_dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>  rv = self.handle_user_exception(e)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>  rv = self.dispatch_request()
>  File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>  File 
> "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", 
> line 26, in wraps
>  return f(self, *args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line 
> 55, in wrapper
>  return f(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in 
> wrapper
>  return func(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, 
> in log
>  logs = log.decode('utf-8')
> AttributeError: 'list' object has no attribute 'decode'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2431) Add the navigation bar color parameter for RBAC UI

2018-05-07 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2431:


 Summary: Add the navigation bar color parameter for RBAC UI
 Key: AIRFLOW-2431
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2431
 Project: Apache Airflow
  Issue Type: New Feature
Reporter: Licht Takeuchi
Assignee: Licht Takeuchi
 Fix For: 2.0.0


We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot 
distinguish which Airflow is. This feature enables us to discern the Airflow by 
the color of navigation bar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method

2018-04-13 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789
 ] 

Joy Gao edited comment on AIRFLOW-2321 at 4/13/18 7:32 PM:
---

I replicated this issue you described above.

The work-around is:

(1) Clear the ab_user table 

(2) Set the following config in webserver_config.py
{code:java}
AUTH_USER_REGISTRATION = True  # Will allow user self registration
AUTH_USER_REGISTRATION_ROLE = "Admin"  # The default user self registration 
role {code}
(3) Register the admin user via the UI (do not use the `create_user` command)

(4) Change
{code:java}
AUTH_USER_REGISTRATION = False{code}
to prevent others from registering, or set 
{code:java}
AUTH_USER_REGISTRATION_ROLE == "Viewer"  # or User/Op{code}
to allow view-only self-registration for others.

The reason that this 'Invalid login. Please try again.' error appeared is 
because the username is incorrect. Flask-Appbuilder generates its own username 
during OAuth flow (For example, for Google OAuth, it would take "id" of the 
user in the OAuth response, and prefix it with 'google_', so it would look 
something like `google_)

In the case where a user is created manually via `create_user` command, I'd 
assume this username is different, so it fails to authenticate.

I don't have a good sense of how to retrieve this id other than through oauth 
at this moment, so self-registration is the best flow.


was (Author: joygao):
I replicated this issue you described above.

The work-around is:

(1) Clear the ab_user table 

(2) Set the following config in webserver_config.py
{code:java}
AUTH_USER_REGISTRATION = True  # Will allow user self registration
AUTH_USER_REGISTRATION_ROLE = "Admin"  # The default user self registration 
role{code}
 

(3) Register the admin user via the UI (do not use the `create_user` command)

(4) Change
{code:java}
AUTH_USER_REGISTRATION = False{code}
to prevent others from registering, or set 
{code:java}
AUTH_USER_REGISTRATION_ROLE == "Viewer"  # or User/Op{code}
to allow view-only self-registration. 

 

The reason that this 'Invalid login. Please try again.' error appeared is 
because the username is incorrect. Flask-Appbuilder generates its own username 
during OAuth flow (For example, for Google OAuth, it would take "id" of the 
user in the OAuth response, and prefix it with 'google_', so it would look 
something like `google_)

In the case where a user is created manually via `create_user` command, I'd 
assume this username is different, so it fails to authenticate.

I don't have a good sense of how to retrieve this id other than through oauth 
at this moment, so self-registration is the best flow.

> RBAC support from new UI's failing on OAuth authentication method
> -
>
> Key: AIRFLOW-2321
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2321
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Reporter: Guillermo Rodríguez Cano
>Priority: Major
>
> I tried configuring the RBAC support for the new webserver UI as provided 
> thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] 
> (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues 
> with OAuth as authentication method with Google as provider.
> I have no issues configuring the authentication details as pointed in the 
> UPDATING document, but when I test a fresh installation I manage to get to 
> the Google authentication webpage and on returning to Airflow's site I get 
> the message: 'Invalid login. Please try again.' which I have traced it down 
> to coming from 
> [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549].
> And as pointed it seems the user variable is None.
> I have tried to login using the standard DB authentication method without no 
> problems. The same issue happens even when I tried registering a new user, or 
> with that user registered via the DB authentication and then switching to 
> OAUTH authentication method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method

2018-04-13 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789
 ] 

Joy Gao commented on AIRFLOW-2321:
--

I replicated this issue you described above.

The work-around is:

(1) Clear the ab_user table 

(2) Set the following config in webserver_config.py
{code:java}
AUTH_USER_REGISTRATION = True  # Will allow user self registration
AUTH_USER_REGISTRATION_ROLE = "Admin"  # The default user self registration 
role{code}
 

(3) Register the admin user via the UI (do not use the `create_admin` command)

(4) Change
{code:java}
AUTH_USER_REGISTRATION = False{code}
to prevent others from registering, or set 
{code:java}
AUTH_USER_REGISTRATION_ROLE == "Viewer"  # or User/Op{code}
to allow view-only self-registration. 

 

The reason that this 'Invalid login. Please try again.' error appeared is 
because the username is incorrect. Flask-Appbuilder generates its own username 
during OAuth flow (For example, for Google OAuth, it would take "id" of the 
user in the OAuth response, and prefix it with 'google_', so it would look 
something like `google_)

In the case where a user is created manually via `create_user` command, I'd 
assume this username is different, so it fails to authenticate.

I don't have a good sense of how to retrieve this id other than through oauth 
at this moment, so self-registration is the best flow.

> RBAC support from new UI's failing on OAuth authentication method
> -
>
> Key: AIRFLOW-2321
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2321
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Reporter: Guillermo Rodríguez Cano
>Priority: Major
>
> I tried configuring the RBAC support for the new webserver UI as provided 
> thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] 
> (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues 
> with OAuth as authentication method with Google as provider.
> I have no issues configuring the authentication details as pointed in the 
> UPDATING document, but when I test a fresh installation I manage to get to 
> the Google authentication webpage and on returning to Airflow's site I get 
> the message: 'Invalid login. Please try again.' which I have traced it down 
> to coming from 
> [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549].
> And as pointed it seems the user variable is None.
> I have tried to login using the standard DB authentication method without no 
> problems. The same issue happens even when I tried registering a new user, or 
> with that user registered via the DB authentication and then switching to 
> OAUTH authentication method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method

2018-04-13 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789
 ] 

Joy Gao edited comment on AIRFLOW-2321 at 4/13/18 7:31 PM:
---

I replicated this issue you described above.

The work-around is:

(1) Clear the ab_user table 

(2) Set the following config in webserver_config.py
{code:java}
AUTH_USER_REGISTRATION = True  # Will allow user self registration
AUTH_USER_REGISTRATION_ROLE = "Admin"  # The default user self registration 
role{code}
 

(3) Register the admin user via the UI (do not use the `create_user` command)

(4) Change
{code:java}
AUTH_USER_REGISTRATION = False{code}
to prevent others from registering, or set 
{code:java}
AUTH_USER_REGISTRATION_ROLE == "Viewer"  # or User/Op{code}
to allow view-only self-registration. 

 

The reason that this 'Invalid login. Please try again.' error appeared is 
because the username is incorrect. Flask-Appbuilder generates its own username 
during OAuth flow (For example, for Google OAuth, it would take "id" of the 
user in the OAuth response, and prefix it with 'google_', so it would look 
something like `google_)

In the case where a user is created manually via `create_user` command, I'd 
assume this username is different, so it fails to authenticate.

I don't have a good sense of how to retrieve this id other than through oauth 
at this moment, so self-registration is the best flow.


was (Author: joygao):
I replicated this issue you described above.

The work-around is:

(1) Clear the ab_user table 

(2) Set the following config in webserver_config.py
{code:java}
AUTH_USER_REGISTRATION = True  # Will allow user self registration
AUTH_USER_REGISTRATION_ROLE = "Admin"  # The default user self registration 
role{code}
 

(3) Register the admin user via the UI (do not use the `create_admin` command)

(4) Change
{code:java}
AUTH_USER_REGISTRATION = False{code}
to prevent others from registering, or set 
{code:java}
AUTH_USER_REGISTRATION_ROLE == "Viewer"  # or User/Op{code}
to allow view-only self-registration. 

 

The reason that this 'Invalid login. Please try again.' error appeared is 
because the username is incorrect. Flask-Appbuilder generates its own username 
during OAuth flow (For example, for Google OAuth, it would take "id" of the 
user in the OAuth response, and prefix it with 'google_', so it would look 
something like `google_)

In the case where a user is created manually via `create_user` command, I'd 
assume this username is different, so it fails to authenticate.

I don't have a good sense of how to retrieve this id other than through oauth 
at this moment, so self-registration is the best flow.

> RBAC support from new UI's failing on OAuth authentication method
> -
>
> Key: AIRFLOW-2321
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2321
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Reporter: Guillermo Rodríguez Cano
>Priority: Major
>
> I tried configuring the RBAC support for the new webserver UI as provided 
> thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] 
> (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues 
> with OAuth as authentication method with Google as provider.
> I have no issues configuring the authentication details as pointed in the 
> UPDATING document, but when I test a fresh installation I manage to get to 
> the Google authentication webpage and on returning to Airflow's site I get 
> the message: 'Invalid login. Please try again.' which I have traced it down 
> to coming from 
> [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549].
> And as pointed it seems the user variable is None.
> I have tried to login using the standard DB authentication method without no 
> problems. The same issue happens even when I tried registering a new user, or 
> with that user registered via the DB authentication and then switching to 
> OAUTH authentication method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method

2018-04-13 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437684#comment-16437684
 ] 

Joy Gao commented on AIRFLOW-2321:
--

Hi [~wileeam],  if you comment out the following line in webserver_config.py,
{code:java}
# 'whitelist': ['@YOU_COMPANY_DOMAIN'], # optional{code}
does the issue still occur?  (alternatively, if you are using a whitelist, make 
sure the domain matches).

I should have that entire line commented out, right now it's a bit misleading. 
So my bad there.

> RBAC support from new UI's failing on OAuth authentication method
> -
>
> Key: AIRFLOW-2321
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2321
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Reporter: Guillermo Rodríguez Cano
>Priority: Major
>
> I tried configuring the RBAC support for the new webserver UI as provided 
> thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] 
> (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues 
> with OAuth as authentication method with Google as provider.
> I have no issues configuring the authentication details as pointed in the 
> UPDATING document, but when I test a fresh installation I manage to get to 
> the Google authentication webpage and on returning to Airflow's site I get 
> the message: 'Invalid login. Please try again.' which I have traced it down 
> to coming from 
> [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549].
> And as pointed it seems the user variable is None.
> I have tried to login using the standard DB authentication method without no 
> problems. The same issue happens even when I tried registering a new user, or 
> with that user registered via the DB authentication and then switching to 
> OAUTH authentication method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2162) Run DAG as user other than airflow does NOT have access to AIRFLOW_ environment variables

2018-04-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2162.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3184
[https://github.com/apache/incubator-airflow/pull/3184]

> Run DAG as user other than airflow does NOT have access to AIRFLOW_ 
> environment variables
> -
>
> Key: AIRFLOW-2162
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2162
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Reporter: Sebastian Radloff
>Assignee: John Arnold
>Priority: Minor
>  Labels: configuration
> Fix For: 1.10.0
>
>
> When running airflow with LocalExecutor, I inject airflow environment 
> variables that are supposed to override what is in the airflow.cfg, according 
> to the documentation [https://airflow.apache.org/configuration.html.
> I|https://airflow.apache.org/configuration.html.]f you specify to run your 
> DAGs as another linux user, root for example, this is what airflow executes 
> under the hood:
> {code:java}
> ['bash', '-c', u'sudo -H -u root airflow run docker_sample docker_op_tester 
> 2018-03-01T15:14:55.699668 --job_id 2 --raw -sd 
> DAGS_FOLDER/docker-operator.py --cfg_path /tmp/tmpignV9B']
> {code}
>  
> It uses sudo and switches to the root linux user, unfortunately, it won't 
> have access to the environment variables injected to override the config. 
> This is important for people who are trying to inject variables into a docker 
> container at run time while wishing to maintain a level of security around 
> database credentials.
> I think a decent proposal made by [~ashb] in gitter, would be to 
> automatically pass all environment variables starting with *AIRFLOW__* to any 
> user. Please lmk if y'all want any help on the documentation or point me in 
> the right direction and I could create a PR. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2311) Environment variables are accessible to dag execution

2018-04-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-2311:
-
Summary: Environment variables are accessible to dag execution  (was: 
Environment variables from the scheduler process are accessible to dag 
execution)

> Environment variables are accessible to dag execution
> -
>
> Key: AIRFLOW-2311
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2311
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Reporter: Joy Gao
>Priority: Major
>
> Currently, environment variables are accessible to dag execution for both 
> LocalExecutor and CeleryExecutor (from the machine/container where `airflow 
> scheduler` process is running on)
> I believe it is a potential security concern on the whole by passing down all 
> environment variables to task execution, which sometimes include sensitive 
> credentials. This means that it is the responsibility of (1) the airflow 
> admin to not store sensitive data in environment variables in production or 
> (2) the dag maintainer to properly audit the dag file and make sure it is not 
> malicious. (1) seems very hard to guarantee (2) seems easier, but not 
> foolproof.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2311) Environment variables from the scheduler process are accessible to dag execution

2018-04-10 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2311:


 Summary: Environment variables from the scheduler process are 
accessible to dag execution
 Key: AIRFLOW-2311
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2311
 Project: Apache Airflow
  Issue Type: Bug
  Components: security
Reporter: Joy Gao


Currently, environment variables are accessible to dag execution for both 
LocalExecutor and CeleryExecutor (from the machine/container where `airflow 
scheduler` process is running on)

I believe it is a potential security concern on the whole by passing down all 
environment variables to task execution, which sometimes include sensitive 
credentials. This means that it is the responsibility of (1) the airflow admin 
to not store sensitive data in environment variables in production or (2) the 
dag maintainer to properly audit the dag file and make sure it is not 
malicious. (1) seems very hard to guarantee (2) seems easier, but not foolproof.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2273) Add Discord webhook operator/hook

2018-04-05 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2273.
--
   Resolution: Fixed
Fix Version/s: (was: Airflow 2.0)
   1.10.0

Issue resolved by pull request #3178
[https://github.com/apache/incubator-airflow/pull/3178]

> Add Discord webhook operator/hook
> -
>
> Key: AIRFLOW-2273
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2273
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks, operators
>Reporter: Thomas Buida
>Assignee: Thomas Buida
>Priority: Minor
> Fix For: 1.10.0
>
>
> [Discord|https://discordapp.com/] is used by many as an alternative to Slack. 
> [AIRFLOW 2217|https://issues.apache.org/jira/browse/AIRFLOW-2217] added 
> support for Slack incoming webhooks as a way to post messages to a Slack 
> channel. It would be great to have the same offering for Discord users by 
> using [Discord 
> webhooks|https://discordapp.com/developers/docs/resources/webhook].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2200) Add Snowflake Operator

2018-04-05 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2200.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3150
[https://github.com/apache/incubator-airflow/pull/3150]

> Add Snowflake Operator
> --
>
> Key: AIRFLOW-2200
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2200
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, dependencies, hooks, operators
>Reporter: Devin Jones
>Assignee: Devin Jones
>Priority: Major
> Fix For: 1.10.0
>
>
> Add Connection, Hook and Operator to interface with a Snowflake account



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2282) Fix grammar in UPDATING.md

2018-04-05 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2282.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3189
[https://github.com/apache/incubator-airflow/pull/3189]

> Fix grammar in UPDATING.md
> --
>
> Key: AIRFLOW-2282
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2282
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.10.0
>
>
> Fixes a small grammatical typo in UPDATING.md. Also auto removes some 
> trailing whitespace in another .md file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2271) Undead tasks are heartbeating and not getting killed

2018-03-30 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420804#comment-16420804
 ] 

Joy Gao commented on AIRFLOW-2271:
--

Please see [PR 2975|https://github.com/apache/incubator-airflow/pull/2975],  it 
addresses the second change you have described above to fix zombie thread.

> Undead tasks are heartbeating and not getting killed
> 
>
> Key: AIRFLOW-2271
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2271
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: worker
>Affects Versions: 1.9.0
>Reporter: Greg
>Priority: Major
> Attachments: Airflow_zombies_masked.png
>
>
> Background: We had a resource leak in some of our Airflow operators, so after 
> the task is completed, the connection pool was not disposed and the processes 
> were still running (see attached screenshot). It caused the execution pool 
> (size=16) being exhaused after a couple of days.
>  Investigation:
>  We checked those task instances and related jobs in the database, and found 
> mismatch:
>  SQL:
> {code:sql}
> select 
> ti.execution_date,
> ti.state AS task_state,
> ti.start_date AS task_start_dt,
> ti.end_date As task_end_dt,
> j.id AS job_id,
> j.state AS job_state,
> j.start_date AS job_start_dt,
> j.end_date AS job_end_dt,
> j.latest_heartbeat
> from task_instance ti
> join job j
> on j.id=ti.job_id
> where ti.task_id='backup_data_tables'
> order by task_start_dt DESC
> {code}
> ||execution_date||task_state||task_start_dt||task_end_dt||job_id||job_state||job_start_dt||job_end_dt||latest_heartbeat||
> |2018-03-23 23:00:00|success|2018-03-27 08:42:12.846058|2018-03-27 
> 08:42:17.408723|10925|success|2018-03-27 08:42:12.768759|2018-03-27 
> 08:42:22.815474|2018-03-27 08:42:12.768773|
> |2018-03-22 23:00:00|success|2018-03-23 23:02:44.079996|2018-03-24 
> 01:08:52.842612|9683|running|2018-03-23 23:02:44.010813| |2018-03-26 
> 11:29:15.928836|
> |2018-03-21 23:00:00|success|2018-03-22 23:02:14.254779|2018-03-23 
> 01:07:58.322927|9075|running|2018-03-22 23:02:14.199652| |2018-03-26 
> 11:29:16.570076|
> |2018-03-20 23:00:00|success|2018-03-21 23:02:33.417882|2018-03-22 
> 01:16:56.695002|8475|running|2018-03-21 23:02:33.33754| |2018-03-26 
> 11:29:16.529516|
> |2018-03-19 23:00:00|success|2018-03-21 13:20:36.084062|2018-03-21 
> 15:32:51.263954|8412|running|2018-03-21 13:20:36.026206| |2018-03-26 
> 11:29:16.529413|
> As shown in the result set above, jobs of the completed tasks are still 
> running and heartbeating several days after the actual task is completed, 
> stopped only after we killed them manually.
>  In the log files of the tasks we see a bunch of entries like below, which 
> show that _kill_process_tree()_ method is envoked every ~5sec:
> {code:java}
> [2018-03-28 13:03:33,013] \{{helpers.py:269}} DEBUG - There are no descendant 
> processes to kill
> [2018-03-28 13:03:38,211] \{{helpers.py:269}} DEBUG - There are no descendant 
> processes to kill
> [2018-03-28 13:03:43,290] \{{helpers.py:269}} DEBUG - There are no descendant 
> processes to kill
> [2018-03-28 13:03:48,416] \{{helpers.py:269}} DEBUG - There are no descendant 
> processes to kill
> [2018-03-28 13:03:53,604] \{{helpers.py:269}} DEBUG - There are no descendant 
> processes to kill
> {code}
> After some debugging we found that _LocalTaskJob.terminating_ flag is set to 
> _True_, but processes are still not getting killed, moreover, job is still 
> heartbeating.
>  Expected result: Airflow is responsible for shutting down the processes, not 
> leaving undeads, even if force kill is needed.
> Possible fix:
>  We did the following two changes in the code (we have fixed it in our fork):
>  - _LocalTaskJob._execute_ - do not heartbeat if task is terminating
>  - _kill_process_tree_ - add bool argument kill_root, and kill the root 
> process after descendants if True
> After that all the tasks having that resource leak were shutting down 
> correctly, without leaving any "undead" processes.
> Would love to get some feedback from expects about this issue and the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2248) Fix wrong param name in RedshiftToS3Transfer doc

2018-03-23 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2248.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3156
[https://github.com/apache/incubator-airflow/pull/3156]

> Fix wrong param name in RedshiftToS3Transfer doc
> 
>
> Key: AIRFLOW-2248
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2248
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Kengo Seki
>Priority: Minor
> Fix For: 1.10.0
>
>
> RedshiftToS3Transfer's docstring says:
> {code}
> :param options: reference to a list of UNLOAD options
> :type options: list
> {code}
> but the correct name is {{unload_options}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2235) Fix wrong docstrings in Transfer operators for MySQL

2018-03-22 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2235.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3147
[https://github.com/apache/incubator-airflow/pull/3147]

> Fix wrong docstrings in Transfer operators for MySQL
> 
>
> Key: AIRFLOW-2235
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2235
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Kengo Seki
>Assignee: Tao Feng
>Priority: Minor
> Fix For: 1.10.0
>
>
> Docstrings in HiveToMySqlTransfer and PrestoToMySqlTransfer says:
> {code}
> :param sql: SQL query to execute against the MySQL database
> {code}
> but actually these queries are executed against Hive and Presto respectively.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails

2018-03-21 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408846#comment-16408846
 ] 

Joy Gao commented on AIRFLOW-2231:
--

(That said, I was still unable to produce the bug you had, perhaps we are using 
different version of dateutil package, the issue I had was that the dag with 
relativedelta would simply never get scheduled)

> DAG with a relativedelta schedule_interval fails
> 
>
> Key: AIRFLOW-2231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2231
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Kyle Brooks
>Priority: Major
> Attachments: test_reldel.py
>
>
> The documentation for the DAG class says using 
> dateutil.relativedelta.relativedelta as a schedule_interval is supported but 
> it fails:
>  
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, 
> in process_file
>     m = imp.load_source(mod_name, filepath)
>   File 
> "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py",
>  line 172, in load_source
>     module = _load(spec)
>   File "", line 675, in _load
>   File "", line 655, in _load_unlocked
>   File "", line 678, in exec_module
>   File "", line 205, in _call_with_frames_removed
>   File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in 
>     dagrun_timeout=timedelta(minutes=60))
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, 
> in __init__
>     if schedule_interval in cron_presets:
> TypeError: unhashable type: 'relativedelta'
>  
> It looks like the __init__ function for class DAG assumes the 
> schedule_interval is hashable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails

2018-03-21 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408835#comment-16408835
 ] 

Joy Gao commented on AIRFLOW-2231:
--

Got a chance to look at code and turns out the doc is not accurate, 
dateutil.relativedelta.relativedelta is not supported, but datetime.timedelta 
is. In the case where you need monthly cadence, you can use either @monthly or 
cron syntax instead.

Alternatively, if you'd like to add relativedelta, you can  submit a PR and 
modify [this 
section|https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3168-L3196]
 to support relativedelta.

> DAG with a relativedelta schedule_interval fails
> 
>
> Key: AIRFLOW-2231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2231
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Kyle Brooks
>Priority: Major
> Attachments: test_reldel.py
>
>
> The documentation for the DAG class says using 
> dateutil.relativedelta.relativedelta as a schedule_interval is supported but 
> it fails:
>  
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, 
> in process_file
>     m = imp.load_source(mod_name, filepath)
>   File 
> "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py",
>  line 172, in load_source
>     module = _load(spec)
>   File "", line 675, in _load
>   File "", line 655, in _load_unlocked
>   File "", line 678, in exec_module
>   File "", line 205, in _call_with_frames_removed
>   File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in 
>     dagrun_timeout=timedelta(minutes=60))
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, 
> in __init__
>     if schedule_interval in cron_presets:
> TypeError: unhashable type: 'relativedelta'
>  
> It looks like the __init__ function for class DAG assumes the 
> schedule_interval is hashable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails

2018-03-20 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407340#comment-16407340
 ] 

Joy Gao commented on AIRFLOW-2231:
--

Can't replicate this. Can you provide the dag file? thanks!

> DAG with a relativedelta schedule_interval fails
> 
>
> Key: AIRFLOW-2231
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2231
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Kyle Brooks
>Priority: Major
>
> The documentation for the DAG class says using 
> dateutil.relativedelta.relativedelta as a schedule_interval is supported but 
> it fails:
>  
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, 
> in process_file
>     m = imp.load_source(mod_name, filepath)
>   File 
> "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py",
>  line 172, in load_source
>     module = _load(spec)
>   File "", line 675, in _load
>   File "", line 655, in _load_unlocked
>   File "", line 678, in exec_module
>   File "", line 205, in _call_with_frames_removed
>   File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in 
>     dagrun_timeout=timedelta(minutes=60))
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, 
> in __init__
>     if schedule_interval in cron_presets:
> TypeError: unhashable type: 'relativedelta'
>  
> It looks like the __init__ function for class DAG assumes the 
> schedule_interval is hashable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2232) DAG must be imported for airflow dag discovery

2018-03-20 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2232.
--
Resolution: Duplicate

Closing since it's a dupe.

> DAG must be imported for airflow dag discovery
> --
>
> Key: AIRFLOW-2232
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2232
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: andy dreyfuss
>Priority: Critical
>
> repro: put the following in the dags/ directory
> -
> from my_dags import MyDag
> d = MyDag() . # this is an airflow.DAG
> `
>  
> Expected: airflow list_dags lists the dag
> Actual: airflow does not list the dag unless an unused `from airflow import 
> DAG` is added



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2205) Remove unsupported args from JdbcHook doc

2018-03-13 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2205.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3121
[https://github.com/apache/incubator-airflow/pull/3121]

> Remove unsupported args from JdbcHook doc
> -
>
> Key: AIRFLOW-2205
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2205
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 1.10.0
>
>
> The following arguments are in JdbcHook's docstring, but unsupported actually 
> (the last two have to be specified via "extra" field in the database, not as 
> arguments):
>  - jdbc_url
>  - sql
>  - jdbc_driver_name
>  - jdbc_driver_loc
> Also, the following functionality doesn't seem to be implemented:
> bq. Otherwise host, port, schema, username and password can be specified on 
> the fly.
> In addition, JdbcHook is missing from the API reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2204) Broken webserver debug mode

2018-03-12 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2204.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3118
[https://github.com/apache/incubator-airflow/pull/3118]

> Broken webserver debug mode
> ---
>
> Key: AIRFLOW-2204
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2204
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Affects Versions: 1.9.0
>Reporter: Bruno Bonagura
>Assignee: Bruno Bonagura
>Priority: Minor
> Fix For: 1.10.0
>
>
> {code}
> $ airflow webserver -d
> [2018-03-09 21:04:25,730] {__init__.py:45} INFO - Using executor LocalExecutor
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
>  
> [2018-03-09 21:04:26,003] {models.py:196} INFO - Filling up the DagBag from 
> /.../incubator-airflow/dags
> Starting the web server on port 8080 and host 0.0.0.0.
> Traceback (most recent call last):
>   File "/.../.virtualenvs/incubator-airflow/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/.../incubator-airflow/airflow/bin/airflow", line 27, in 
> args.func(args)
>   File "/.../incubator-airflow/airflow/bin/cli.py", line 716, in webserver
> app.run(debug=True, port=args.port, host=args.hostname,
> AttributeError: 'DispatcherMiddleware' object has no attribute 'run'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2206) Remove unsupported args from JdbcOperator doc

2018-03-12 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2206.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3122
[https://github.com/apache/incubator-airflow/pull/3122]

> Remove unsupported args from JdbcOperator doc
> -
>
> Key: AIRFLOW-2206
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2206
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 1.10.0
>
>
> The following arguments are in JdbcOperator's docstring, but unsupported 
> actually.
> - jdbc_url
> - jdbc_driver_name
> - jdbc_driver_loc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2207) Fix flaky test that uses app.cached_app()

2018-03-12 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2207.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3123
[https://github.com/apache/incubator-airflow/pull/3123]

> Fix flaky test that uses app.cached_app()
> -
>
> Key: AIRFLOW-2207
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2207
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 1.10.0
>
>
> tests.www.test_views:TestMountPoint.test_mount changes base_url then calls 
> airflow.www.app.cached_app().
> But if another test calls app.cached_app() first without changing base_url, 
> succeeding test_mount fails on Travis.
> For example, adding the following test causes test_mount to fail,
> whereas test_dummy itself succeeds:
> {code}
> class TestDummy(unittest.TestCase):
> def setUp(self):
> super(TestDummy, self).setUp()
> configuration.load_test_config()
> app = application.cached_app(testing=True)
> self.client = Client(app)
> def test_dummy(self):
> response, _, _ = self.client.get('/', follow_redirects=True)
> resp_html = b''.join(response)
> self.assertIn(b"DAGs", resp_html)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2169) Fail to discern between VARBINARY and VARCHAR in MySQL

2018-03-09 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2169.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3091
[https://github.com/apache/incubator-airflow/pull/3091]

> Fail to discern between VARBINARY and VARCHAR in MySQL
> --
>
> Key: AIRFLOW-2169
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2169
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, operators
>Reporter: Hongyi Wang
>Assignee: Hongyi Wang
>Priority: Major
> Fix For: 1.10.0
>
>
> Current MySqlToGoogleCloudStorageOperator has difficulty to discern between 
> VARBINARY and VARCHAR in MySQL (and other similar fields–CHAR/BINARY, etc). 
> While "binary-related" MySQL data types, like VARBINARY, should be mapped to 
> "BYTES" in Google Cloud Storage, rather than "STRING".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2187) Fix Broken Travis CI due to [AIRFLOW-2123]

2018-03-06 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2187.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3108
[https://github.com/apache/incubator-airflow/pull/3108]

> Fix Broken Travis CI due to [AIRFLOW-2123]
> --
>
> Key: AIRFLOW-2187
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2187
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.0
>
>
> Travis CI is failing after merging 
> [AIRFLOW-2123|https://issues.apache.org/jira/browse/AIRFLOW-2123]. 
> This is caused due to the fact that apache-beam[gcp] is not available for 
> Python 3.x
> *Error Log:*
> {code}
> Collecting apache-beam[gcp]==2.3.0 (from 
> google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating)
> Could not find a version that satisfies the requirement 
> apache-beam[gcp]==2.3.0 (from 
> google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating) (from 
> versions: 0.6.0, 2.0.0, 2.1.0, 2.1.1, 2.2.0)
> No matching distribution found for apache-beam[gcp]==2.3.0 (from 
> google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating)
> ERROR: InvocationError: 
> '/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/bin/pip 
> wheel -w /home/travis/.wheelhouse -f /home/travis/.wheelhouse -e .[devel_ci]'
> ___ summary 
> 
> ERROR: py35-backend_mysql: commands failed
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0

2018-03-06 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2175.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3104
[https://github.com/apache/incubator-airflow/pull/3104]

> Failed to upgradedb 1.8.2 -> 1.9.0
> --
>
> Key: AIRFLOW-2175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
>Reporter: Damian Momot
>Priority: Critical
> Fix For: 1.10.0
>
>
> We've got airflow installation with hundreds of DAGs and thousands of tasks.
> During upgrade (1.8.2 -> 1.9.0) we've got following error.
> After analyzing stacktrace i've found that it's most likely caused by None 
> value in 'fileloc' field of Dag column. I checked database and indeed we've 
> got one record with such value:
>  
>  
> {code:java}
> SELECT COUNT(*) FROM dag WHERE fileloc IS NULL;
> 1
> SELECT COUNT(*) FROM dag;
> 343
> {code}
>  
>  
> {code:java}
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 27, in 
>  args.func(args)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, 
> in upgradedb
>  db_utils.upgradedb()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line 
> 79, in load_module_py
>  mod = imp.load_source(module_id, path, fp)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 86, in 
>  run_migrations_online()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 81, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
>  line 63, in upgrade
>  dag = dagbag.get_dag(ti.dag_id)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, 
> in get_dag
>  filepath=orm_dag.fileloc, only_if_updated=False)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, 
> in process_file
>  if not os.path.isfile(filepath):
>  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
>  st = os.stat(path)
> TypeError: coercing to Unicode: need string or buffer, NoneType found{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0

2018-03-05 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386502#comment-16386502
 ] 

Joy Gao commented on AIRFLOW-2175:
--

Perhaps the fileloc attribute didn't get saved to db successfully. Curious is 
this a subdag?

Maybe add a null check prior to os.path.isfile(filepath) to avoid this 
TypeError.

> Failed to upgradedb 1.8.2 -> 1.9.0
> --
>
> Key: AIRFLOW-2175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
>Reporter: Damian Momot
>Priority: Critical
>
> We've got airflow installation with hundreds of DAGs and thousands of tasks.
> During upgrade (1.8.2 -> 1.9.0) we've got following error.
> After analyzing stacktrace i've found that it's most likely caused by None 
> value in 'fileloc' field of Dag column. I checked database and indeed we've 
> got one record with such value:
>  
>  
> {code:java}
> SELECT COUNT(*) FROM dag WHERE fileloc IS NULL;
> 1
> SELECT COUNT(*) FROM dag;
> 343
> {code}
>  
>  
> {code:java}
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 27, in 
>  args.func(args)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, 
> in upgradedb
>  db_utils.upgradedb()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line 
> 79, in load_module_py
>  mod = imp.load_source(module_id, path, fp)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 86, in 
>  run_migrations_online()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 81, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
>  line 63, in upgrade
>  dag = dagbag.get_dag(ti.dag_id)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, 
> in get_dag
>  filepath=orm_dag.fileloc, only_if_updated=False)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, 
> in process_file
>  if not os.path.isfile(filepath):
>  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
>  st = os.stat(path)
> TypeError: coercing to Unicode: need string or buffer, NoneType found{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2170) The Implement Features section in the CONTRIBUTING.md is incomplete

2018-03-02 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2170.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3089
[https://github.com/apache/incubator-airflow/pull/3089]

> The Implement Features section in the CONTRIBUTING.md is incomplete
> ---
>
> Key: AIRFLOW-2170
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2170
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Trivial
> Fix For: 1.10.0
>
>
> Currently it says:
> {noformat}
> Implement Features
> Look through the Apache Jira for features. Any unassigned "Improvement" issue 
> is open to whoever wants to implement it.
> We've created the operators, hooks, macros and executors we needed, but we 
> made sure that this part of Airflow is extensible. New operators, hooks and 
> operators are very welcomed!{noformat}
> but it would probably be better to change the last sentence to:
> {noformat}
> New operators, hooks, macros and executors are very welcomed!{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock

2018-03-02 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384225#comment-16384225
 ] 

Joy Gao commented on AIRFLOW-1642:
--

This one fell off my radar, I do have a PR out for it 
[https://github.com/apache/incubator-airflow/pull/2632] but never got merged :( 

> An Alembic script not using scoped session causing deadlock
> ---
>
> Key: AIRFLOW-1642
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1642
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Priority: Minor
>
> The bug I'm about to describe is a more of an obscure edge case, however I 
> think it's something still worth fixing.
> After upgrading to airflow 1.9, while running `airflow resetdb` on my local 
> machine (with mysql), I encountered a deadlock on the final alembic revision 
> _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text 
> types)_.
> The deadlock turned out to be caused by another earlier session that was 
> created and left open in revision _cc1e65623dc7 add max tries column to task 
> instance_. Notably the code below:
> {code}
> sessionmaker = sa.orm.sessionmaker()
> session = sessionmaker(bind=connection)
> dagbag = DagBag(settings.DAGS_FOLDER)
> {code}
> The session created here was not a `scoped_session`, so when the DAGs were 
> being parsed in line 3 above, one of the DAG files makes a direct call to the 
> class method `Variable.get()` to acquire an env variable, which makes a db 
> query to the `variable` table, but raised a KeyError as the env variable was 
> non-existent, thus holding the lock to the `variable` table as a result of 
> that exception.
> Later on, the latter alembic script `_cc1e65623dc7` needs to alter the 
> `Variable` table. Instead of creating its own Session object, it attempts to 
> reuse the same one as above. And because of the exception, it waits 
> indefinitely to acquire the lock on that table. 
> So the DAG file itself could have avoided the KeyError by providing a default 
> value when calling Variable.get(). However I think it would be a good idea to 
> avoid using unscoped sessions in general, as an exception could potentially 
> occur in the future elsewhere.  The easiest fix is replacing *session = 
> sessionmaker(bind=connection)* with *session = settings.Session()*, which is 
> scoped. However, making a change on a migration script is going to make folks 
> anxious.
> If anyone have any thoughts on this, let me know! Thanks :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2130) Many Operators are missing from the docs

2018-02-23 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2130.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3061
[https://github.com/apache/incubator-airflow/pull/3061]

> Many Operators are missing from the docs
> 
>
> Key: AIRFLOW-2130
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2130
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Affects Versions: 1.10.0
>Reporter: Reid Beels
>Assignee: Reid Beels
>Priority: Critical
> Fix For: 1.10.0
>
>
> * BaseSensorOperator references the wrong import path, so the autodoc fails
> * In the core operators, these are missing:
> ** airflow.operators.check_operator.CheckOperator
> ** airflow.operators.check_operator.IntervalCheckOperator
> ** airflow.operators.check_operator.ValueCheckOperator
> ** airflow.operators.hive_stats_operator.HiveStatsCollectionOperator
> ** airflow.operators.jdbc_operator.JdbcOperator
> ** airflow.operators.latest_only_operator.LatestOnlyOperator
> ** airflow.operators.mysql_operator.MySqlOperator
> ** airflow.operators.oracle_operator.OracleOperator
> ** airflow.operators.pig_operator.PigOperator
> ** airflow.operators.s3_file_transform_operator.S3FileTransformOperator
> ** airflow.operators.sqlite_operator.SqliteOperator
> ** airflow.operators.mysql_to_hive.MySqlToHiveTransfer
> ** airflow.operators.presto_to_mysql.PrestoToMySqlTransfer
> ** airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer
> * In contrib.operators, these are missing:
> ** airflow.contrib.operators.awsbatch_operator.AWSBatchOperator
> ** airflow.contrib.operators.druid_operator.DruidOperator
> ** airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator
> ** 
> airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator
> ** 
> airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator
> ** airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator
> ** airflow.contrib.operators.jira_operator.JiraOperator
> ** airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator
> ** airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator
> ** airflow.contrib.operators.mlengine_operator.MLEngineModelOperator
> ** airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator
> ** airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator
> ** airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator
> ** 
> airflow.contrib.operators.postgres_to_gcs_operator.PostgresToGoogleCloudStorageOperator
> ** airflow.contrib.operators.sftp_operator.SFTPOperator
> ** airflow.contrib.operators.spark_jdbc_operator.SparkJDBCOperator
> ** airflow.contrib.operators.spark_sql_operator.SparkSqlOperator
> ** airflow.contrib.operators.spark_submit_operator.SparkSubmitOperator
> ** airflow.contrib.operators.sqoop_operator.SqoopOperator
> ** airflow.contrib.operators.hive_to_dynamodb.HiveToDynamoDBTransferOperator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2131) API Reference includes confusing docs from airflow.utils.AirflowImporter

2018-02-22 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-2131.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3062
[https://github.com/apache/incubator-airflow/pull/3062]

> API Reference includes confusing docs from airflow.utils.AirflowImporter
> 
>
> Key: AIRFLOW-2131
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2131
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Reid Beels
>Assignee: Reid Beels
>Priority: Critical
> Fix For: 1.10.0
>
> Attachments: image-2018-02-20-16-53-04-572.png
>
>
> The generated API documentation includes {{automodule}} declarations for 
> several modules (hooks and operators) that end up pulling in docs from 
> {{airflow.utils.helpers.AirflowImporter}}.
> This leads to the confusing situation for new users who think they're reading 
> docs about what Hooks are, but are instead reading unlabeled docs about the 
> seemingly-deprecated AirflowImporter.
> Like so:
> !image-2018-02-20-16-53-04-572.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1852) Allow hostname to be overridable

2018-02-20 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao resolved AIRFLOW-1852.
--
   Resolution: Fixed
Fix Version/s: 1.10.0

Issue resolved by pull request #3036
[https://github.com/apache/incubator-airflow/pull/3036]

> Allow hostname to be overridable
> 
>
> Key: AIRFLOW-1852
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1852
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Trevor Joynson
>Priority: Major
> Fix For: 1.10.0
>
>
> * https://github.com/apache/incubator-airflow/pull/2472
> This makes running Airflow tremendously easier in common
> production deployments that need a little more than just
> a bare `socket.getfqdn()` hostname for service discovery
> per running instance.
> Personally, I just place the Kubernetes Pod FQDN (or even IP) here.
> Question: Since the web server calls out to the individual
> worker nodes to snag logs, what happens if one dies midway?
> I may later look into that, because that scares me slightly.
> I feel like workers should not ever hold such state, but that's purely a 
> personal bias.
> Thanks,
> Trevor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-85) Create DAGs UI

2018-02-08 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-85:
---
Description: 
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:
 * Admin
 * Data profiler
 * None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:
 # View their DAGs, but no one else's.
 # Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.

 

(From Airflow-1443)

The authentication capabilities in the [RBAC design 
proposal|https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]
 introduces a significant amount of work that is otherwise already built-in in 
existing frameworks.

Per [community 
discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
 Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
implementing RBAC. This will support integration with different authentication 
backends out-of-the-box, and generate permissions for views and ORM models that 
will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.

  was:
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:
 * Admin
 * Data profiler
 * None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:
 # View their DAGs, but no one else's.
 # Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.

 

(From Airflow-1443)

The authentication capabilities in the [RBAC design proposal 
|[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] 
introduces a significant amount of work that is otherwise already built-in in 
existing frameworks.

Per [community 
discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
 Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
implementing RBAC. This will support integration with different authentication 
backends out-of-the-box, and generate permissions for views and ORM models that 
will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.


> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
>  * Admin
>  * Data profiler
>  * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
>  # View their DAGs, but no one else's.
>  # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.
>  
> (From Airflow-1443)
> The authentication capabilities in the [RBAC design 
> proposal|https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]
>  introduces a significant amount of work that is otherwise already built-in 
> in existing frameworks.
> Per [community 
> discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
>  Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
> 

[jira] [Updated] (AIRFLOW-85) Create DAGs UI

2018-02-08 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-85:
---
Description: 
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:
 * Admin
 * Data profiler
 * None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:
 # View their DAGs, but no one else's.
 # Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.

 

(From Airflow-1443)

The authentication capabilities in the [RBAC design proposal 
|[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] 
introduces a significant amount of work that is otherwise already built-in in 
existing frameworks.

Per [community 
discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
 Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
implementing RBAC. This will support integration with different authentication 
backends out-of-the-box, and generate permissions for views and ORM models that 
will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.

  was:
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:
 * Admin
 * Data profiler
 * None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:
 # View their DAGs, but no one else's.
 # Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.

 

>From Airflow-1443:

The authentication capabilities in the RBAC design proposal introduces a 
significant amount of work that is otherwise already built-in in existing 
frameworks.

Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as 
a foundation to implementing RBAC. This will support integration with different 
authentication backends out-of-the-box, and generate permissions for views and 
ORM models that will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.


> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
>  * Admin
>  * Data profiler
>  * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
>  # View their DAGs, but no one else's.
>  # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.
>  
> (From Airflow-1443)
> The authentication capabilities in the [RBAC design proposal 
> |[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] 
> introduces a significant amount of work that is otherwise already built-in in 
> existing frameworks.
> Per [community 
> discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html],
>  Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to 
> implementing RBAC. This will support integration with different 
> authentication backends out-of-the-box, and generate permissions for views 
> and ORM models that 

[jira] [Commented] (AIRFLOW-85) Create DAGs UI

2018-02-08 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357471#comment-16357471
 ] 

Joy Gao commented on AIRFLOW-85:


Closed AIRFLOW-1433 as dupe since the FAB work will directly fix the issue 
here. 

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
>  * Admin
>  * Data profiler
>  * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
>  # View their DAGs, but no one else's.
>  # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.
>  
> From Airflow-1443:
> The authentication capabilities in the RBAC design proposal introduces a 
> significant amount of work that is otherwise already built-in in existing 
> frameworks.
> Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow 
> as a foundation to implementing RBAC. This will support integration with 
> different authentication backends out-of-the-box, and generate permissions 
> for views and ORM models that will simplify view-level and dag-level access 
> control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-85) Create DAGs UI

2018-02-08 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-85:
---
Description: 
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:
 * Admin
 * Data profiler
 * None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:
 # View their DAGs, but no one else's.
 # Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.

 

>From Airflow-1443:

The authentication capabilities in the RBAC design proposal introduces a 
significant amount of work that is otherwise already built-in in existing 
frameworks.

Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as 
a foundation to implementing RBAC. This will support integration with different 
authentication backends out-of-the-box, and generate permissions for views and 
ORM models that will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.

  was:
Airflow currently provides only an {{/admin}} UI interface for the webapp. This 
UI provides three distinct roles:

* Admin
* Data profiler
* None

In addition, Airflow currently provides the ability to log in, either via a 
secure proxy front-end, or via LDAP/Kerberos, within the webapp.

We run Airflow with LDAP authentication enabled. This helps us control access 
to the UI. However, there is insufficient granularity within the UI. We would 
like to be able to grant users the ability to:

# View their DAGs, but no one else's.
# Control their DAGs, but no one else's.

This is not possible right now. You can take away the ability to access the 
connections and data profiling tabs, but users can still see all DAGs, as well 
as control the state of the DB by clearing any DAG status, etc.


> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
>  * Admin
>  * Data profiler
>  * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
>  # View their DAGs, but no one else's.
>  # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.
>  
> From Airflow-1443:
> The authentication capabilities in the RBAC design proposal introduces a 
> significant amount of work that is otherwise already built-in in existing 
> frameworks.
> Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow 
> as a foundation to implementing RBAC. This will support integration with 
> different authentication backends out-of-the-box, and generate permissions 
> for views and ORM models that will simplify view-level and dag-level access 
> control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1433) Convert Airflow to Use FAB Framework

2018-02-08 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao closed AIRFLOW-1433.

Resolution: Duplicate

Duplicate of AIRFLOW-85.

> Convert Airflow to Use FAB Framework
> 
>
> Key: AIRFLOW-1433
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1433
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
>
> The authentication capabilities in the RBAC design proposal introduces a 
> significant amount of work that is otherwise already built-in in existing 
> frameworks. 
> Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow 
> as a foundation to implementing RBAC. This will support integration with 
> different authentication backends out-of-the-box, and generate permissions 
> for views and ORM models that will simplify view-level and dag-level access 
> control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2057) Add Overstock to the list of Airflow users

2018-02-01 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-2057:


 Summary: Add Overstock to the list of Airflow users
 Key: AIRFLOW-2057
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2057
 Project: Apache Airflow
  Issue Type: Task
Reporter: Joy Gao






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1904) Correct DAG fileloc

2017-12-08 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1904:
-
Description: 
Currently dag file location `dag.fileloc` is determined by getting the second 
stack frame from the top, i.e.:

self.fileloc = sys._getframe().f_back.f_code.co_filename

However this fails if the DAG is constructed in an imported module. For 
example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath 
would ended up becoming the fileloc, rather than the dag itself. 

This causes a bug whenever the DAG is refreshed, with the message:
```This DAG isn't available in the web server's DagBag object. It shows up in 
this list because the scheduler marked it as active in the metadata database.```

  was:
Currently dag file location `dag.fileloc` is determined by getting the second 
stack frame from the top, i.e.:

self.fileloc = sys._getframe().f_back.f_code.co_filename

However this fails if the DAG is constructed in an imported module. For 
example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath 
would ended up becoming the fileloc, rather than the dag itself. 

This causes a bug whenever the DAG is attempted to be refreshed, with the 
message:
```This DAG isn't available in the web server's DagBag object. It shows up in 
this list because the scheduler marked it as active in the metadata database.```


> Correct DAG fileloc
> ---
>
> Key: AIRFLOW-1904
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1904
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Assignee: Joy Gao
>
> Currently dag file location `dag.fileloc` is determined by getting the second 
> stack frame from the top, i.e.:
> self.fileloc = sys._getframe().f_back.f_code.co_filename
> However this fails if the DAG is constructed in an imported module. For 
> example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath 
> would ended up becoming the fileloc, rather than the dag itself. 
> This causes a bug whenever the DAG is refreshed, with the message:
> ```This DAG isn't available in the web server's DagBag object. It shows up in 
> this list because the scheduler marked it as active in the metadata 
> database.```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1904) Correct DAG fileloc

2017-12-08 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1904:


 Summary: Correct DAG fileloc
 Key: AIRFLOW-1904
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1904
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao


Currently dag file location `dag.fileloc` is determined by getting the second 
stack frame from the top, i.e.:

self.fileloc = sys._getframe().f_back.f_code.co_filename

However this fails if the DAG is constructed in an imported module. For 
example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath 
would ended up becoming the fileloc, rather than the dag itself. 

This causes a bug whenever the DAG is attempted to be refreshed, with the 
message:
```This DAG isn't available in the web server's DagBag object. It shows up in 
this list because the scheduler marked it as active in the metadata database.```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1821) Default logging config file is confusing

2017-11-15 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1821:


 Summary: Default logging config file is confusing
 Key: AIRFLOW-1821
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1821
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Joy Gao
Assignee: Joy Gao


The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations:
- root
- airflow
- airflow.task
- airflow.task_runner
- airflow.processor
The number of loggers could be reduced to make configuration easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1821) Default logging config file is confusing

2017-11-15 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1821:
-
Description: 
The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations:
- root
- airflow
- airflow.task
- airflow.task_runner
- airflow.processor

The number of loggers could be reduced to make configuration easier.

  was:
The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations:
- root
- airflow
- airflow.task
- airflow.task_runner
- airflow.processor
The number of loggers could be reduced to make configuration easier.


> Default logging config file is confusing
> 
>
> Key: AIRFLOW-1821
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1821
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>
> The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations:
> - root
> - airflow
> - airflow.task
> - airflow.task_runner
> - airflow.processor
> The number of loggers could be reduced to make configuration easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1740) Cannot create/update XCOM via UI in PY3

2017-10-19 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1740:
-
Description: 
I cannot create/update XCOM via UI in PY3.

When attempting to update an existing dag's xcom, the following error is 
received:

{code:java}
Failed to update record. (builtins.TypeError) string argument without an 
encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: 
[{'xcom_id': 165, 'value': "b'bar'"}]]
{code}

And for creating a new xcom:


{code:java}
Failed to create record. (builtins.TypeError) string argument without an 
encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, 
task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: 
[{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 
'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]]
{code}





  was:
I cannot create/update XCOM via UI in PY3.

When attempting to update an existing dag's xcom, the following error is 
received:

{code:java}
Failed to update record. (builtins.TypeError) string argument without an 
encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: 
[{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]]
{code}

And for creating a new xcom:


{code:java}
Failed to create record. (builtins.TypeError) string argument without an 
encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, 
task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: 
[{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 
'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]]
{code}






> Cannot create/update XCOM via UI in PY3
> ---
>
> Key: AIRFLOW-1740
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1740
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: PY3
>Reporter: Joy Gao
>Priority: Minor
>
> I cannot create/update XCOM via UI in PY3.
> When attempting to update an existing dag's xcom, the following error is 
> received:
> {code:java}
> Failed to update record. (builtins.TypeError) string argument without an 
> encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: 
> [{'xcom_id': 165, 'value': "b'bar'"}]]
> {code}
> And for creating a new xcom:
> {code:java}
> Failed to create record. (builtins.TypeError) string argument without an 
> encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, 
> task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: 
> [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 
> 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1740) Cannot create/update XCOM via UI in PY3

2017-10-19 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1740:
-
Summary: Cannot create/update XCOM via UI in PY3  (was: Cannot add XCOM via 
UI in PY3)

> Cannot create/update XCOM via UI in PY3
> ---
>
> Key: AIRFLOW-1740
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1740
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: PY3
>Reporter: Joy Gao
>Priority: Minor
>
> I cannot create/update XCOM via UI in PY3.
> When attempting to update an existing dag's xcom, the following error is 
> received:
> {code:java}
> Failed to update record. (builtins.TypeError) string argument without an 
> encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: 
> [{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]]
> {code}
> And for creating a new xcom:
> {code:java}
> Failed to create record. (builtins.TypeError) string argument without an 
> encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, 
> task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: 
> [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 
> 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1740) Cannot add XCOM via UI

2017-10-19 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1740:


 Summary: Cannot add XCOM via UI
 Key: AIRFLOW-1740
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1740
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Joy Gao
Priority: Minor


I cannot create/update XCOM via UI. 

When attempting to update an existing dag's xcom, the following error is 
received:

{code:java}
Failed to update record. (builtins.TypeError) string argument without an 
encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: 
[{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]]
{code}

And for creating a new xcom:


{code:java}
Failed to create record. (builtins.TypeError) string argument without an 
encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, 
task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: 
[{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 
'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]]
{code}







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1708) pass JSON through the DAG pipeline

2017-10-12 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202592#comment-16202592
 ] 

Joy Gao commented on AIRFLOW-1708:
--

That's correct, you can specific the param `task_ids` when you call xcom_pull. 
The goal of xcom is communication between tasks, so it serves the use case you 
described fairly well.

> pass JSON through the DAG pipeline
> --
>
> Key: AIRFLOW-1708
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1708
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Igor Cherepanov
>
> Hello dear community,
> is it a right way to pass a JSON by means of xcom_push function in a task and 
> also get the same JSON through xcom_pull in the next task? Or is there any 
> other ways to do this?
> Thanks! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1709) workers on different machines

2017-10-12 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202590#comment-16202590
 ] 

Joy Gao commented on AIRFLOW-1709:
--

You'd want to use 
[CeleryExecutor|https://airflow.incubator.apache.org/configuration.html#scaling-out-with-celery]

A really good tutorial for reference: 
https://stlong0521.github.io/20161023%20-%20Airflow.html

> workers on different machines 
> --
>
> Key: AIRFLOW-1709
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1709
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Igor Cherepanov
>
> Hello,
> is there an example how I can distribute workers on different machines?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AIRFLOW-1702) access to the count of the happened retries in a python method

2017-10-11 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200914#comment-16200914
 ] 

Joy Gao edited comment on AIRFLOW-1702 at 10/11/17 8:31 PM:


The TaskInstance has an attribute `try_number`, you can access it via the the 
python operator. 

i.e.

{code:java}
def foo(**context):
ti = context[ti]
retry = ti.try_number - 1
# do something with retry count

op = PythonOperator(
task_id='task',
provide_context=True,
python_callable=foo,
dag=dag)
{code}


Hope this helps!


was (Author: joy.gao54):
The TaskInstance has an attribute `try_number`, you can access it via the the 
python operator. 

i.e.
def foo(**context):
ti = context[ti]
retry = ti.try_number - 1
# do something with retry count

op = PythonOperator(
task_id='task',
provide_context=True,
python_callable=foo,
dag=dag)

Hope this helps!

> access to the count of the happened retries in a python method
> --
>
> Key: AIRFLOW-1702
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1702
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Igor Cherepanov
>
> hello, 
> is it possible to access to the count of the happened retries in a python 
> method
> thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1702) access to the count of the happened retries in a python method

2017-10-11 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200914#comment-16200914
 ] 

Joy Gao commented on AIRFLOW-1702:
--

The TaskInstance has an attribute `try_number`, you can access it via the the 
python operator. 

i.e.
def foo(**context):
ti = context[ti]
retry = ti.try_number - 1
# do something with retry count

op = PythonOperator(
task_id='task',
provide_context=True,
python_callable=foo,
dag=dag)

Hope this helps!

> access to the count of the happened retries in a python method
> --
>
> Key: AIRFLOW-1702
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1702
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Igor Cherepanov
>
> hello, 
> is it possible to access to the count of the happened retries in a python 
> method
> thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.

3.
Operator currently does not support binary columns in mysql.  We should support 
uploading binary columns from mysql to cloud storage as it's a pretty common 
use-case. 

  was:
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.

3. Update:
Operator currently does not support binary columns in mysql.  We should support 
uploading binary columns from mysql to cloud storage as it's a pretty common 
use-case. 


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3.
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao reopened AIRFLOW-1613:
--

> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3. Update:
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.

3. Update:
Operator currently does not support binary columns in mysql.  We should support 
uploading binary columns from mysql to cloud storage as it's a pretty common 
use-case. 

  was:
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.

3. Update:
Currently All 


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3. Update:
> Operator currently does not support binary columns in mysql.  We should 
> support uploading binary columns from mysql to cloud storage as it's a pretty 
> common use-case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-10-10 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.

3. Update:
Currently All 

  was:
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.



> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>Assignee: Joy Gao
> Fix For: 1.9.0
>
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.
> 3. Update:
> Currently All 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1671) MIssing @apply_defaults annotation for gcs operator

2017-10-02 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1671:


 Summary: MIssing @apply_defaults annotation for gcs operator
 Key: AIRFLOW-1671
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1671
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: 1.9.0
Reporter: Joy Gao
Assignee: Joy Gao
 Fix For: 1.9.0


The @apply_defaults annotation appear to be accidentally removed in a previous 
PR. Should be added back. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1664) Make MySqlToGoogleCloudStorageOperator support binary data again

2017-09-29 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1664:


 Summary: Make MySqlToGoogleCloudStorageOperator support binary 
data again
 Key: AIRFLOW-1664
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1664
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao


The default NamedTemporaryFile mode is `w+b`, this has been modified to `w` in 
https://github.com/apache/incubator-airflow/pull/2609. This caused a regression 
for python 2.x airflow environment where it could no longer supports binary 
type in mysql. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1659) Fix invalid attribute bug in FileTaskHandler

2017-09-28 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1659:


 Summary: Fix invalid attribute bug in FileTaskHandler
 Key: AIRFLOW-1659
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1659
 Project: Apache Airflow
  Issue Type: Bug
  Components: logging
Reporter: Joy Gao
Assignee: Joy Gao
 Fix For: 1.9.0


The following line of code is failing in FileTaskHandler

{code}
response = requests.get(url, timeout=self.timeout)
{code}

self.timeout is not a valid attribute, should use local variable `timeout`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock

2017-09-25 Thread Joy Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179769#comment-16179769
 ] 

Joy Gao commented on AIRFLOW-1642:
--

Ah, just checked it's not in the 1.8.2 release. Looks like this can be fixed 
yay!  

> An Alembic script not using scoped session causing deadlock
> ---
>
> Key: AIRFLOW-1642
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1642
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Priority: Minor
>
> The bug I'm about to describe is a more of an obscure edge case, however I 
> think it's something still worth fixing.
> After upgrading to airflow 1.9, while running `airflow resetdb` on my local 
> machine (with mysql), I encountered a deadlock on the final alembic revision 
> _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text 
> types)_.
> The deadlock turned out to be caused by another earlier session that was 
> created and left open in revision _cc1e65623dc7 add max tries column to task 
> instance_. Notably the code below:
> {code}
> sessionmaker = sa.orm.sessionmaker()
> session = sessionmaker(bind=connection)
> dagbag = DagBag(settings.DAGS_FOLDER)
> {code}
> The session created here was not a `scoped_session`, so when the DAGs were 
> being parsed in line 3 above, one of the DAG files makes a direct call to the 
> class method `Variable.get()` to acquire an env variable, which makes a db 
> query to the `variable` table, but raised a KeyError as the env variable was 
> non-existent, thus holding the lock to the `variable` table as a result of 
> that exception.
> Later on, the latter alembic script `_cc1e65623dc7` needs to alter the 
> `Variable` table. Instead of creating its own Session object, it attempts to 
> reuse the same one as above. And because of the exception, it waits 
> indefinitely to acquire the lock on that table. 
> So the DAG file itself could have avoided the KeyError by providing a default 
> value when calling Variable.get(). However I think it would be a good idea to 
> avoid using unscoped sessions in general, as an exception could potentially 
> occur in the future elsewhere.  The easiest fix is replacing *session = 
> sessionmaker(bind=connection)* with *session = settings.Session()*, which is 
> scoped. However, making a change on a migration script is going to make folks 
> anxious.
> If anyone have any thoughts on this, let me know! Thanks :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock

2017-09-25 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1642:


 Summary: An Alembic script not using scoped session causing 
deadlock
 Key: AIRFLOW-1642
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1642
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Priority: Minor


The bug I'm about to describe is a more of an obscure edge case, however I 
think it's something still worth fixing.

After upgrading to airflow 1.9, while running `airflow resetdb` on my local 
machine (with mysql), I encountered a deadlock on the final alembic revision 
_d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text 
types)_.

The deadlock turned out to be caused by another earlier session that was 
created and left open in revision _cc1e65623dc7 add max tries column to task 
instance_. Notably the code below:

{code}
sessionmaker = sa.orm.sessionmaker()
session = sessionmaker(bind=connection)
dagbag = DagBag(settings.DAGS_FOLDER)
{code}

The session created here was not a `scoped_session`, so when the DAGs were 
being parsed in line 3 above, one of the DAG files makes a direct call to the 
class method `Variable.get()` to acquire an env variable, which makes a db 
query to the `variable` table, but raised a KeyError as the env variable was 
non-existent, thus holding the lock to the `variable` table as a result of that 
exception.

Later on, the latter alembic script `_cc1e65623dc7` needs to alter the 
`Variable` table. Instead of creating its own Session object, it attempts to 
reuse the same one as above. And because of the exception, it waits 
indefinitely to acquire the lock on that table. 

So the DAG file itself could have avoided the KeyError by providing a default 
value when calling Variable.get(). However I think it would be a good idea to 
avoid using unscoped sessions in general, as an exception could potentially 
occur in the future elsewhere.  The easiest fix is replacing *session = 
sessionmaker(bind=connection)* with *session = settings.Session()*, which is 
scoped. However, making a change on a migration script is going to make folks 
anxious.

If anyone have any thoughts on this, let me know! Thanks :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-09-14 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
1. 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:

{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

2.
File opened as binary, but string are written to it. Get error `a bytes-like 
object is required, not 'str'`. Use mode='w' instead.


  was:
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>
> 1. 
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> 2.
> File opened as binary, but string are written to it. Get error `a bytes-like 
> object is required, not 'str'`. Use mode='w' instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-09-14 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code:python}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.

  was:
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code:python}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code:python}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> Moving it inside the loop for re-use.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-09-14 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1613:
-
Description: 
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.

  was:
In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code:python}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.


> Make MySqlToGoogleCloudStorageOperator compaitible with python3
> ---
>
> Key: AIRFLOW-1613
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Joy Gao
>
> In Python 3, map(...) returns an iterator, which can only be iterated over 
> once. 
> Therefore the current implementation will return an empty list after the 
> first iteration of schema:
> {code}
> schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
> file_no = 0
> tmp_file_handle = NamedTemporaryFile(delete=True)
> tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}
> for row in cursor:
> # Convert datetime objects to utc seconds, and decimals to floats
> row = map(self.convert_types, row)
> row_dict = dict(zip(schema, row))
> {code}
> Moving it inside the loop for re-use.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3

2017-09-14 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1613:


 Summary: Make MySqlToGoogleCloudStorageOperator compaitible with 
python3
 Key: AIRFLOW-1613
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1613
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Reporter: Joy Gao


In Python 3, map(...) returns an iterator, which can only be iterated over 
once. 
Therefore the current implementation will return an empty list after the first 
iteration of schema:


{code:python}
schema = map(lambda schema_tuple: schema_tuple[0], cursor.description)
file_no = 0
tmp_file_handle = NamedTemporaryFile(delete=True)
tmp_file_handles = {self.filename.format(file_no): tmp_file_handle}

for row in cursor:
# Convert datetime objects to utc seconds, and decimals to floats
row = map(self.convert_types, row)
row_dict = dict(zip(schema, row))
{code}

Moving it inside the loop for re-use.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1568) Add datastore import/export operator

2017-09-05 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1568:


 Summary: Add datastore import/export operator
 Key: AIRFLOW-1568
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1568
 Project: Apache Airflow
  Issue Type: New Feature
  Components: contrib
Reporter: Joy Gao
Assignee: Joy Gao


Google recently introduced imoprt/export APIs for Cloud Datastore 
https://cloud.google.com/datastore/docs/reference/rest/, this allows Datastore 
entities to be backed up programatically. It would be useful to introduce 
operators to handles this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1474) Add dag_id regex for 'airflow clear' CLI command

2017-07-28 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1474:


 Summary: Add dag_id regex for 'airflow clear' CLI command
 Key: AIRFLOW-1474
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1474
 Project: Apache Airflow
  Issue Type: Improvement
  Components: cli
Reporter: Joy Gao
Assignee: Joy Gao
Priority: Minor


The 'airflow clear' CLI command is currently limited to clearing a single DAG 
per operation. It would be useful to add the capability to clear multiple DAGs 
per operation using regex, similar to how task_id can be filtered via regex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1433) Convert Airflow to Use FAB Framework

2017-07-19 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-1433:
-
Description: 
The authentication capabilities in the RBAC design proposal introduces a 
significant amount of work that is otherwise already built-in in existing 
frameworks. 

Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as 
a foundation to implementing RBAC. This will support integration with different 
authentication backends out-of-the-box, and generate permissions for views and 
ORM models that will simplify view-level and dag-level access control.

This implies modifying the current flask views, and deprecating the current 
Flask-Admin in favor of FAB's crud.

  was:
The authentication capabilities in the RBAC design proposal introduces a 
significant amount of work that is otherwise already built-in in existing 
frameworks. 

Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as 
a foundation to implementing RBAC. This will support integration with different 
authentication backends out-of-the-box, and generate permissions for views and 
ORM models that will simplify view-level and dag-level access control.


> Convert Airflow to Use FAB Framework
> 
>
> Key: AIRFLOW-1433
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1433
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>
> The authentication capabilities in the RBAC design proposal introduces a 
> significant amount of work that is otherwise already built-in in existing 
> frameworks. 
> Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow 
> as a foundation to implementing RBAC. This will support integration with 
> different authentication backends out-of-the-box, and generate permissions 
> for views and ORM models that will simplify view-level and dag-level access 
> control.
> This implies modifying the current flask views, and deprecating the current 
> Flask-Admin in favor of FAB's crud.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1433) Convert Airflow to Use FAB Framework

2017-07-19 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-1433:


 Summary: Convert Airflow to Use FAB Framework
 Key: AIRFLOW-1433
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1433
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Joy Gao
Assignee: Joy Gao


The authentication capabilities in the RBAC design proposal introduces a 
significant amount of work that is otherwise already built-in in existing 
frameworks. 

Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as 
a foundation to implementing RBAC. This will support integration with different 
authentication backends out-of-the-box, and generate permissions for views and 
ORM models that will simplify view-level and dag-level access control.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-85) Create DAGs UI

2017-07-19 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao reassigned AIRFLOW-85:
--

Assignee: Joy Gao

> Create DAGs UI
> --
>
> Key: AIRFLOW-85
> URL: https://issues.apache.org/jira/browse/AIRFLOW-85
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security, ui
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>
> Airflow currently provides only an {{/admin}} UI interface for the webapp. 
> This UI provides three distinct roles:
> * Admin
> * Data profiler
> * None
> In addition, Airflow currently provides the ability to log in, either via a 
> secure proxy front-end, or via LDAP/Kerberos, within the webapp.
> We run Airflow with LDAP authentication enabled. This helps us control access 
> to the UI. However, there is insufficient granularity within the UI. We would 
> like to be able to grant users the ability to:
> # View their DAGs, but no one else's.
> # Control their DAGs, but no one else's.
> This is not possible right now. You can take away the ability to access the 
> connections and data profiling tabs, but users can still see all DAGs, as 
> well as control the state of the DB by clearing any DAG status, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-430) Support list/add/delete connections in the CLI

2016-08-14 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-430:

External issue URL: https://github.com/apache/incubator-airflow/pull/1734

> Support list/add/delete connections in the CLI
> --
>
> Key: AIRFLOW-430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-430
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Joy Gao
>Assignee: Joy Gao
>
> Right now the only way to manage connections is via UI's connection page.
> To allow connection management via scripts, it would be useful support these 
> features via CLI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-430) Support list/add/delete connections in the CLI

2016-08-14 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-430:
---

 Summary: Support list/add/delete connections in the CLI
 Key: AIRFLOW-430
 URL: https://issues.apache.org/jira/browse/AIRFLOW-430
 Project: Apache Airflow
  Issue Type: Improvement
  Components: cli
Reporter: Joy Gao
Assignee: Joy Gao


Right now the only way to manage connections is via UI's connection page.
To allow connection management via scripts, it would be useful support these 
features via CLI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-297) Exponential Backoff Retry Delay

2016-06-30 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-297:

Summary: Exponential Backoff Retry Delay  (was: Exponential Backoff)

> Exponential Backoff Retry Delay
> ---
>
> Key: AIRFLOW-297
> URL: https://issues.apache.org/jira/browse/AIRFLOW-297
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Minor
>
> The retry delay time is currently fixed. It would be an useful option to 
> support progressive longer waits between retries via exponential backoff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >