[jira] [Updated] (AIRFLOW-3109) Default user permission should contain 'can_clear'
[ https://issues.apache.org/jira/browse/AIRFLOW-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-3109: - Description: The default user role is missing 'can_clear' permission which allows user to clear DAG runs. (was: There's a bug in the default user permission. 'clear' should have been 'can_clear' as FAB automatically prepend model permissions with 'can_' prefix.) > Default user permission should contain 'can_clear' > -- > > Key: AIRFLOW-3109 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3109 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > > The default user role is missing 'can_clear' permission which allows user to > clear DAG runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3109) Default user permission should contain 'can_clear'
[ https://issues.apache.org/jira/browse/AIRFLOW-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-3109: - Summary: Default user permission should contain 'can_clear' (was: Default user permission should be 'can_clear' instead of 'clear') > Default user permission should contain 'can_clear' > -- > > Key: AIRFLOW-3109 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3109 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > > There's a bug in the default user permission. 'clear' should have been > 'can_clear' as FAB automatically prepend model permissions with 'can_' prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3109) Default user permission should be 'can_clear' instead of 'clear'
Joy Gao created AIRFLOW-3109: Summary: Default user permission should be 'can_clear' instead of 'clear' Key: AIRFLOW-3109 URL: https://issues.apache.org/jira/browse/AIRFLOW-3109 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao There's a bug in the default user permission. 'clear' should have been 'can_clear' as FAB automatically prepend model permissions with 'can_' prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3072) Only admin can view logs in RBAC UI
[ https://issues.apache.org/jira/browse/AIRFLOW-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-3072. -- Resolution: Fixed Fix Version/s: 1.10.1 > Only admin can view logs in RBAC UI > --- > > Key: AIRFLOW-3072 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3072 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 1.10.1 > > > With RBAC enabled, only users with role admin can view logs. > The default roles (excluding public) include permission {{can_log}} which > allows to open the /log page, however the actual log message is loaded with > another XHR request which required the additional permission > {{get_logs_with_metadata}}. > My suggestion is to add the permission and assign tog viewer role. Or is > there a cause why only admin should be able to see logs? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3085) Log viewing not possible in default RBAC setting
[ https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620871#comment-16620871 ] Joy Gao commented on AIRFLOW-3085: -- oops, thanks! > Log viewing not possible in default RBAC setting > > > Key: AIRFLOW-3085 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3085 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Priority: Major > > Aside from Admin role, all other roles are not able to view logs right now > due to a missing permission in the default setting. The permission should be > added to Viewer/User/Op as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-3085) Log viewing not possible in default RBAC setting
[ https://issues.apache.org/jira/browse/AIRFLOW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao closed AIRFLOW-3085. Resolution: Duplicate > Log viewing not possible in default RBAC setting > > > Key: AIRFLOW-3085 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3085 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Priority: Major > > Aside from Admin role, all other roles are not able to view logs right now > due to a missing permission in the default setting. The permission should be > added to Viewer/User/Op as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3085) Log viewing not possible in default RBAC setting
Joy Gao created AIRFLOW-3085: Summary: Log viewing not possible in default RBAC setting Key: AIRFLOW-3085 URL: https://issues.apache.org/jira/browse/AIRFLOW-3085 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Aside from Admin role, all other roles are not able to view logs right now due to a missing permission in the default setting. The permission should be added to Viewer/User/Op as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2604) dag_id, task_id, execution_date in dag_fail should be indexed
[ https://issues.apache.org/jira/browse/AIRFLOW-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2604. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3539 [https://github.com/apache/incubator-airflow/pull/3539] > dag_id, task_id, execution_date in dag_fail should be indexed > - > > Key: AIRFLOW-2604 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2604 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.10 >Reporter: Joy Gao >Assignee: Stefan Seelmann >Priority: Major > Fix For: 1.10.0 > > > As a follow-up to AIRFLOW-2602, we should index dag_id, task_id and > execution_date to make sure the /gantt page (and any other future UIs relying > on task_fail) can still be rendered quickly as the table grows in size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2678) Fix db scheme unit test to remove checking fab models
[ https://issues.apache.org/jira/browse/AIRFLOW-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2678. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3548 [https://github.com/apache/incubator-airflow/pull/3548] > Fix db scheme unit test to remove checking fab models > - > > Key: AIRFLOW-2678 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2678 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > Fix For: 1.10.0 > > > Currently airflow doesn't have FAB models as well migration script for the > models. We should ignore checking those models in the unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2624) Airflow webserver broken out of the box
[ https://issues.apache.org/jira/browse/AIRFLOW-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao closed AIRFLOW-2624. Resolution: Fixed Fix Version/s: 1.10.0 > Airflow webserver broken out of the box > --- > > Key: AIRFLOW-2624 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2624 > Project: Apache Airflow > Issue Type: Bug >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Blocker > Fix For: 1.10.0 > > > `airflow webserver` and then click on any DAG, I get > ``` > File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", > line 364, in view_func > return f(*args, **kwargs) > File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", > line 251, in wrapper > user = current_user.user.username > AttributeError: 'NoneType' object has no attribute 'username' > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2681) Last execution date is not included in UI for externally triggered DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2681. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3551 [https://github.com/apache/incubator-airflow/pull/3551] > Last execution date is not included in UI for externally triggered DAGs > --- > > Key: AIRFLOW-2681 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2681 > Project: Apache Airflow > Issue Type: Bug >Reporter: David Hatch >Assignee: David Hatch >Priority: Major > Fix For: 1.10.0 > > > If a DAG has no schedule and is only externally triggered, the last run's > execution date is not included in the UI. > > This is because {{include_externally_triggered}} is not passed to > {{get_last_dagrun}} from the {{dags.html}} template. It used to be before > this commit > https://github.com/apache/incubator-airflow/commit/0bf7adb209ce969243ffaf4fc5213ff3957cbbc9#diff-f38558559ea1b4c30ddf132b7f223cf9L299. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2615) Webserver parent not using cached app
[ https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2615. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3506 [https://github.com/apache/incubator-airflow/pull/3506] > Webserver parent not using cached app > - > > Key: AIRFLOW-2615 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2615 > Project: Apache Airflow > Issue Type: Bug >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Major > Fix For: 1.10.0 > > > From what I can tell, the app cached > [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790] > attempt to cache the app for later use-likely to be for the expensive > DagBag() creation. Before I dive into the webserver parsing everything in one > process problem, I was hoping this cached app would save me sometime. However > it seems to me that every subprocess spun up by gunicorn is trying to create > the DagBag() right after they've been created--make sense to me since we > didn't share the cached app to the subprocess( doubt we can). If what I > observed is true, why do we cache the app at all in the parent process? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2602) Show failed attempts in Gantt view
[ https://issues.apache.org/jira/browse/AIRFLOW-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2602. -- Resolution: Fixed Fix Version/s: (was: Airflow 2.0) 1.10.0 Issue resolved by pull request #3492 [https://github.com/apache/incubator-airflow/pull/3492] > Show failed attempts in Gantt view > -- > > Key: AIRFLOW-2602 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2602 > Project: Apache Airflow > Issue Type: Improvement > Components: webapp >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 1.10.0 > > Attachments: Screenshot_2018-06-13_00-13-21.png > > > The Gantt view only shows the last attempt (successful or failed). It would > be nice to also visualize failed attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2654) NotFoundError in refresh button in new FAB UI
[ https://issues.apache.org/jira/browse/AIRFLOW-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2654. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3527 [https://github.com/apache/incubator-airflow/pull/3527] > NotFoundError in refresh button in new FAB UI > - > > Key: AIRFLOW-2654 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2654 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.0, 2.0.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Major > Fix For: 1.10.0 > > Attachments: airflow-refresh-error.png > > > When you click on the *refresh* button, you get "error: NOT FOUND" as shown > in the image attachment. > The issue is the wrong URL is requested when the refresh button is pressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types
[ https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2606. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3516 [https://github.com/apache/incubator-airflow/pull/3516] > Test needed to ensure database schema always match SQLAlchemy model types > - > > Key: AIRFLOW-2606 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2606 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Stefan Seelmann >Priority: Major > Fix For: 1.10.0 > > > An issue was discovered by [this > PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203] > where database schema does not match its corresponding SQLAlchemy model > declaration. We should add generic unit test for this to prevent similar bugs > from occurring in the future. (Alternatively, we can add the policing logic > to `airflow upgradedb` command so each migrations can do the check) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy model types
[ https://issues.apache.org/jira/browse/AIRFLOW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-2606: - Summary: Test needed to ensure database schema always match SQLAlchemy model types (was: Test needed to ensure database schema always match SQLAlchemy models) > Test needed to ensure database schema always match SQLAlchemy model types > - > > Key: AIRFLOW-2606 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2606 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Priority: Major > > An issue was discovered by [this > PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203] > where database schema does not match its corresponding SQLAlchemy model > declaration. We should add generic unit test for this to prevent similar bugs > from occurring in the future. (Alternatively, we can add the policing logic > to `airflow upgradedb` command so each migrations can do the check) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2606) Test needed to ensure database schema always match SQLAlchemy models
Joy Gao created AIRFLOW-2606: Summary: Test needed to ensure database schema always match SQLAlchemy models Key: AIRFLOW-2606 URL: https://issues.apache.org/jira/browse/AIRFLOW-2606 Project: Apache Airflow Issue Type: Improvement Reporter: Joy Gao An issue was discovered by [this PR|https://github.com/apache/incubator-airflow/pull/3492#issuecomment-396815203] where database schema does not match its corresponding SQLAlchemy model declaration. We should add generic unit test for this to prevent similar bugs from occurring in the future. (Alternatively, we can add the policing logic to `airflow upgradedb` command so each migrations can do the check) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2414) Fix RBAC log display
[ https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao closed AIRFLOW-2414. Resolution: Fixed > Fix RBAC log display > > > Key: AIRFLOW-2414 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2414 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.10.0 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > Fix For: 1.10.0 > > > Getting the following error when trying to view the log file in new RBAC UI. > {code:java} > [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET] > Traceback (most recent call last): > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File > "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", > line 26, in wraps > return f(self, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line > 55, in wrapper > return f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > return func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, > in log > logs = log.decode('utf-8') > AttributeError: 'list' object has no attribute 'decode'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2414) Fix RBAC log display
[ https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510358#comment-16510358 ] Joy Gao commented on AIRFLOW-2414: -- A fix for this was merged recently: [https://github.com/apache/incubator-airflow/pull/3310] [~rushtokunal] let me know if you are still seeing issues with this. Going to close the ticket for now. > Fix RBAC log display > > > Key: AIRFLOW-2414 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2414 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.10.0 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > Fix For: 1.10.0 > > > Getting the following error when trying to view the log file in new RBAC UI. > {code:java} > [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET] > Traceback (most recent call last): > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File > "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", > line 26, in wraps > return f(self, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line > 55, in wrapper > return f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > return func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, > in log > logs = log.decode('utf-8') > AttributeError: 'list' object has no attribute 'decode'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2585) Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-2585: - Description: * Issue with UUID type conversion: currently UUID is converted to hex string, but should be converted to base64-encoded as that is the required format in BigQuery for uploading. * Issue with configuring load balancing policy in CassandraHook: currently the hook only successfully instantiate with the default LB policy, but throw an exception if attempts to pass in a custom LB policy in the extra field. * Issue with connections not closed properly after use: should always shut down the cluster in the operator to close all sessions/connections associated with the cluster instance. was: * Issue with UUID type conversion: currently UUID is converted to hex string, but should be converted to base64-encoded as that is the required format in BigQuery for uploading. * Issue with configuring load balancing policy in CassandraHook: currently the hook only successfully instantiate with the default LB policy, but throw an exception if attempts to pass in a custom LB policy in the extra field. > Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator > -- > > Key: AIRFLOW-2585 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2585 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > > * Issue with UUID type conversion: currently UUID is converted to hex string, > but should be converted to base64-encoded as that is the required format in > BigQuery for uploading. > * Issue with configuring load balancing policy in CassandraHook: currently > the hook only successfully instantiate with the default LB policy, but throw > an exception if attempts to pass in a custom LB policy in the extra field. > * Issue with connections not closed properly after use: should always shut > down the cluster in the operator to close all sessions/connections associated > with the cluster instance. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1115) github enterprise auth fail to fetch user info
[ https://issues.apache.org/jira/browse/AIRFLOW-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-1115. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3469 [https://github.com/apache/incubator-airflow/pull/3469] > github enterprise auth fail to fetch user info > -- > > Key: AIRFLOW-1115 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1115 > Project: Apache Airflow > Issue Type: Bug > Components: security >Affects Versions: Airflow 1.8 >Reporter: Deo >Assignee: Deo >Priority: Major > Fix For: 2.0.0 > > > [2017-04-17 13:30:50,540] [68622] {github_enterprise_auth.py:216} ERROR - > Traceback (most recent call last): > File > "/xxx/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/contrib/auth/backends/github_enterprise_auth.py", > line 210, in oauth_callback > username, email = self.get_ghe_user_profile_info(ghe_token) > File > "/xxx/.virtualenvs/airflow/lib/python3.5/site-packages/airflow/contrib/auth/backends/github_enterprise_auth.py", > line 140, in get_ghe_user_profile_info > resp.status if resp else 'None')) > airflow.contrib.auth.backends.github_enterprise_auth.AuthenticationError: > Failed to fetch user profile, status (404) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2585) Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator
Joy Gao created AIRFLOW-2585: Summary: Bug fix in CassandraHook and CassandraToGoogleCloudStorageOperator Key: AIRFLOW-2585 URL: https://issues.apache.org/jira/browse/AIRFLOW-2585 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao * Issue with UUID type conversion: currently UUID is converted to hex string, but should be converted to base64-encoded as that is the required format in BigQuery for uploading. * Issue with configuring load balancing policy in CassandraHook: currently the hook only successfully instantiate with the default LB policy, but throw an exception if attempts to pass in a custom LB policy in the extra field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2573) Cast TIMESTAMP field to float rather than int
[ https://issues.apache.org/jira/browse/AIRFLOW-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2573. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3471 [https://github.com/apache/incubator-airflow/pull/3471] > Cast TIMESTAMP field to float rather than int > - > > Key: AIRFLOW-2573 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2573 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hongyi Wang >Assignee: Hongyi Wang >Priority: Blocker > Fix For: 2.0.0 > > > In current bigquery_hook.py, we have a `_bq_cast(string_field, bq_type)` > function that help casts a BigQuery row to the appropriate data types. > {quote}elif bq_type == 'INTEGER' or bq_type == 'TIMESTAMP': > return int(string_field) > {quote} > However, when a bq_type equals to 'TIMESTAMP', it causes ValueError. > {quote}>>> int('1.458668898E9') > ValueError: invalid literal for int() with base 10: '1.458668898E9' > {quote} > Because 'TIMESTAMP' in bigquery is stored as double in python, thus should be > cast to float instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2504) Airflow UI Auditing - log username show extra filter
[ https://issues.apache.org/jira/browse/AIRFLOW-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2504. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3438 [https://github.com/apache/incubator-airflow/pull/3438] > Airflow UI Auditing - log username show extra filter > > > Key: AIRFLOW-2504 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2504 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Junda Yang >Assignee: Junda Yang >Priority: Minor > Fix For: 2.0.0 > > > 1. There is a bug in the > [action_logging|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/utils.py#L249] > of old UI. The *username* attribute is always in *current_user* but it is > *None*. We should call *current_user.user.username* to get the username. See > example usage of > [current_user.user.username|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/views.py#L1929] > 2. We also need to add a column filter on *extra* so we can search for > request content, like who send what kind of write request from Airflow UI, as > the action_logging is [logging all request > parameters|https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/www/utils.py#L258] > in extra field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2551) Encode binary data with base64 standard rather than base64 url
[ https://issues.apache.org/jira/browse/AIRFLOW-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2551. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3449 [https://github.com/apache/incubator-airflow/pull/3449] > Encode binary data with base64 standard rather than base64 url > -- > > Key: AIRFLOW-2551 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2551 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hongyi Wang >Assignee: Hongyi Wang >Priority: Major > Fix For: 2.0.0 > > > When we try to load mysql data into Google BigQuery (mysql -> gcs -> bq), > there is a binary filed (uuid), which will cause BigQuery job failed, with > message "_Could not decode base64 string to bytes. Field: uuid; Value: > _gJbkmC1QTiS-zZ46uiHWg==_" > This was caused by "_col_val = base64.urlsafe_b64encode(col_val)_" in > mysql_to_gcs_operator. > We should use "_standard_b64encode()_" instead. > {quote}{{Base64url encoding is basically base64 encoding except they use > non-reserved URL characters (e.g. – is used instead of + and _ is used > instead of /) __ }} > {quote} > Related to [AIRFLOW-2169] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2529) Improve graph view performance and usability
[ https://issues.apache.org/jira/browse/AIRFLOW-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2529. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3441 [https://github.com/apache/incubator-airflow/pull/3441] > Improve graph view performance and usability > > > Key: AIRFLOW-2529 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2529 > Project: Apache Airflow > Issue Type: Improvement > Components: webapp >Affects Versions: 1.9.0 >Reporter: Stefan Seelmann >Assignee: Stefan Seelmann >Priority: Major > Fix For: 2.0.0 > > Attachments: Screenshot_2018-05-28_21-32-38.png > > > The "Graph View" has a dropdown which contains all DAG run IDs. If there are > many (thousands) of DAG runs the page gets barely usable. It takes multiple > seconds to load the page because all DAG runs must be fetched from DB, are > processed, and a long option list is rendered in the browser. It is also not > very useful because in such a long list it is hard to find a particular DAG > run. > A simple fix to address the load time would be to just limit the number of > shown DAG runs. For example only the latest N are shown, N could be > "page_size" from airflow.cfg which is also used in other views. If the DAG > run that should be shown (via query parameters execution_date or run_id) is > not included in the N lastest list it can still be added by a 2nd SQL query. > A more complex change to improve usability would require a different way to > select a DAG run. For example a popup to search for DAG runs with pagination > etc. But such functionality already exits in the /dagrun UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2477) Improve time units for task duration and landing times charts for RBAC UI
[ https://issues.apache.org/jira/browse/AIRFLOW-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2477. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3368 [https://github.com/apache/incubator-airflow/pull/3368] > Improve time units for task duration and landing times charts for RBAC UI > - > > Key: AIRFLOW-2477 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2477 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > Fix For: 1.10.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2474) Should not attempt to import snakebite in py3
Joy Gao created AIRFLOW-2474: Summary: Should not attempt to import snakebite in py3 Key: AIRFLOW-2474 URL: https://issues.apache.org/jira/browse/AIRFLOW-2474 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao Patch in HDFSHook module to stop importing snakebite in PY3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2458) Create CassandraToGoogleCloudStorageOperator and CassandraHook
Joy Gao created AIRFLOW-2458: Summary: Create CassandraToGoogleCloudStorageOperator and CassandraHook Key: AIRFLOW-2458 URL: https://issues.apache.org/jira/browse/AIRFLOW-2458 Project: Apache Airflow Issue Type: New Feature Affects Versions: 1.10.0 Reporter: Joy Gao Assignee: Joy Gao Create an operator that allows storying Cassandra cql query results to Google Cloud Storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2457) Upgrade FAB version in setup.py to support timezone
Joy Gao created AIRFLOW-2457: Summary: Upgrade FAB version in setup.py to support timezone Key: AIRFLOW-2457 URL: https://issues.apache.org/jira/browse/AIRFLOW-2457 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10 Reporter: Joy Gao Assignee: Joy Gao FAB 1.9.6 doesn't support datetime with timezones, upgrade to 1.10.0 will fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2414) Fix RBAC log display
[ https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466714#comment-16466714 ] Joy Gao commented on AIRFLOW-2414: -- Hmm interesting: {code:java} if ti is None: logs = ["*** Task instance did not exist in the DB\n"] else: logger = logging.getLogger('airflow.task') task_log_reader = conf.get('core', 'task_log_reader') handler = next((handler for handler in logger.handlers if handler.name == task_log_reader), None) try: ti.task = dag.get_task(ti.task_id) logs = handler.read(ti) except AttributeError as e: logs = ["Task log handler {} does not support read logs.\n{}\n".format(task_log_reader, str(e))] for i, log in enumerate(logs): if PY2 and not isinstance(log, unicode): logs[i] = log.decode('utf-8') {code} ``` Log should be string, wondering if this bug is related to subdags? can you print out the list object and see what it contains? > Fix RBAC log display > > > Key: AIRFLOW-2414 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2414 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.10.0 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > Fix For: 1.10.0 > > > Getting the following error when trying to view the log file in new RBAC UI. > {code:java} > [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET] > Traceback (most recent call last): > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File > "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", > line 26, in wraps > return f(self, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line > 55, in wrapper > return f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > return func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, > in log > logs = log.decode('utf-8') > AttributeError: 'list' object has no attribute 'decode'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2431) Add the navigation bar color parameter for RBAC UI
Joy Gao created AIRFLOW-2431: Summary: Add the navigation bar color parameter for RBAC UI Key: AIRFLOW-2431 URL: https://issues.apache.org/jira/browse/AIRFLOW-2431 Project: Apache Airflow Issue Type: New Feature Reporter: Licht Takeuchi Assignee: Licht Takeuchi Fix For: 2.0.0 We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot distinguish which Airflow is. This feature enables us to discern the Airflow by the color of navigation bar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method
[ https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789 ] Joy Gao edited comment on AIRFLOW-2321 at 4/13/18 7:32 PM: --- I replicated this issue you described above. The work-around is: (1) Clear the ab_user table (2) Set the following config in webserver_config.py {code:java} AUTH_USER_REGISTRATION = True # Will allow user self registration AUTH_USER_REGISTRATION_ROLE = "Admin" # The default user self registration role {code} (3) Register the admin user via the UI (do not use the `create_user` command) (4) Change {code:java} AUTH_USER_REGISTRATION = False{code} to prevent others from registering, or set {code:java} AUTH_USER_REGISTRATION_ROLE == "Viewer" # or User/Op{code} to allow view-only self-registration for others. The reason that this 'Invalid login. Please try again.' error appeared is because the username is incorrect. Flask-Appbuilder generates its own username during OAuth flow (For example, for Google OAuth, it would take "id" of the user in the OAuth response, and prefix it with 'google_', so it would look something like `google_) In the case where a user is created manually via `create_user` command, I'd assume this username is different, so it fails to authenticate. I don't have a good sense of how to retrieve this id other than through oauth at this moment, so self-registration is the best flow. was (Author: joygao): I replicated this issue you described above. The work-around is: (1) Clear the ab_user table (2) Set the following config in webserver_config.py {code:java} AUTH_USER_REGISTRATION = True # Will allow user self registration AUTH_USER_REGISTRATION_ROLE = "Admin" # The default user self registration role{code} (3) Register the admin user via the UI (do not use the `create_user` command) (4) Change {code:java} AUTH_USER_REGISTRATION = False{code} to prevent others from registering, or set {code:java} AUTH_USER_REGISTRATION_ROLE == "Viewer" # or User/Op{code} to allow view-only self-registration. The reason that this 'Invalid login. Please try again.' error appeared is because the username is incorrect. Flask-Appbuilder generates its own username during OAuth flow (For example, for Google OAuth, it would take "id" of the user in the OAuth response, and prefix it with 'google_', so it would look something like `google_) In the case where a user is created manually via `create_user` command, I'd assume this username is different, so it fails to authenticate. I don't have a good sense of how to retrieve this id other than through oauth at this moment, so self-registration is the best flow. > RBAC support from new UI's failing on OAuth authentication method > - > > Key: AIRFLOW-2321 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2321 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Reporter: Guillermo Rodríguez Cano >Priority: Major > > I tried configuring the RBAC support for the new webserver UI as provided > thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] > (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues > with OAuth as authentication method with Google as provider. > I have no issues configuring the authentication details as pointed in the > UPDATING document, but when I test a fresh installation I manage to get to > the Google authentication webpage and on returning to Airflow's site I get > the message: 'Invalid login. Please try again.' which I have traced it down > to coming from > [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549]. > And as pointed it seems the user variable is None. > I have tried to login using the standard DB authentication method without no > problems. The same issue happens even when I tried registering a new user, or > with that user registered via the DB authentication and then switching to > OAUTH authentication method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method
[ https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789 ] Joy Gao commented on AIRFLOW-2321: -- I replicated this issue you described above. The work-around is: (1) Clear the ab_user table (2) Set the following config in webserver_config.py {code:java} AUTH_USER_REGISTRATION = True # Will allow user self registration AUTH_USER_REGISTRATION_ROLE = "Admin" # The default user self registration role{code} (3) Register the admin user via the UI (do not use the `create_admin` command) (4) Change {code:java} AUTH_USER_REGISTRATION = False{code} to prevent others from registering, or set {code:java} AUTH_USER_REGISTRATION_ROLE == "Viewer" # or User/Op{code} to allow view-only self-registration. The reason that this 'Invalid login. Please try again.' error appeared is because the username is incorrect. Flask-Appbuilder generates its own username during OAuth flow (For example, for Google OAuth, it would take "id" of the user in the OAuth response, and prefix it with 'google_', so it would look something like `google_) In the case where a user is created manually via `create_user` command, I'd assume this username is different, so it fails to authenticate. I don't have a good sense of how to retrieve this id other than through oauth at this moment, so self-registration is the best flow. > RBAC support from new UI's failing on OAuth authentication method > - > > Key: AIRFLOW-2321 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2321 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Reporter: Guillermo Rodríguez Cano >Priority: Major > > I tried configuring the RBAC support for the new webserver UI as provided > thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] > (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues > with OAuth as authentication method with Google as provider. > I have no issues configuring the authentication details as pointed in the > UPDATING document, but when I test a fresh installation I manage to get to > the Google authentication webpage and on returning to Airflow's site I get > the message: 'Invalid login. Please try again.' which I have traced it down > to coming from > [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549]. > And as pointed it seems the user variable is None. > I have tried to login using the standard DB authentication method without no > problems. The same issue happens even when I tried registering a new user, or > with that user registered via the DB authentication and then switching to > OAUTH authentication method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method
[ https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437789#comment-16437789 ] Joy Gao edited comment on AIRFLOW-2321 at 4/13/18 7:31 PM: --- I replicated this issue you described above. The work-around is: (1) Clear the ab_user table (2) Set the following config in webserver_config.py {code:java} AUTH_USER_REGISTRATION = True # Will allow user self registration AUTH_USER_REGISTRATION_ROLE = "Admin" # The default user self registration role{code} (3) Register the admin user via the UI (do not use the `create_user` command) (4) Change {code:java} AUTH_USER_REGISTRATION = False{code} to prevent others from registering, or set {code:java} AUTH_USER_REGISTRATION_ROLE == "Viewer" # or User/Op{code} to allow view-only self-registration. The reason that this 'Invalid login. Please try again.' error appeared is because the username is incorrect. Flask-Appbuilder generates its own username during OAuth flow (For example, for Google OAuth, it would take "id" of the user in the OAuth response, and prefix it with 'google_', so it would look something like `google_) In the case where a user is created manually via `create_user` command, I'd assume this username is different, so it fails to authenticate. I don't have a good sense of how to retrieve this id other than through oauth at this moment, so self-registration is the best flow. was (Author: joygao): I replicated this issue you described above. The work-around is: (1) Clear the ab_user table (2) Set the following config in webserver_config.py {code:java} AUTH_USER_REGISTRATION = True # Will allow user self registration AUTH_USER_REGISTRATION_ROLE = "Admin" # The default user self registration role{code} (3) Register the admin user via the UI (do not use the `create_admin` command) (4) Change {code:java} AUTH_USER_REGISTRATION = False{code} to prevent others from registering, or set {code:java} AUTH_USER_REGISTRATION_ROLE == "Viewer" # or User/Op{code} to allow view-only self-registration. The reason that this 'Invalid login. Please try again.' error appeared is because the username is incorrect. Flask-Appbuilder generates its own username during OAuth flow (For example, for Google OAuth, it would take "id" of the user in the OAuth response, and prefix it with 'google_', so it would look something like `google_) In the case where a user is created manually via `create_user` command, I'd assume this username is different, so it fails to authenticate. I don't have a good sense of how to retrieve this id other than through oauth at this moment, so self-registration is the best flow. > RBAC support from new UI's failing on OAuth authentication method > - > > Key: AIRFLOW-2321 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2321 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Reporter: Guillermo Rodríguez Cano >Priority: Major > > I tried configuring the RBAC support for the new webserver UI as provided > thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] > (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues > with OAuth as authentication method with Google as provider. > I have no issues configuring the authentication details as pointed in the > UPDATING document, but when I test a fresh installation I manage to get to > the Google authentication webpage and on returning to Airflow's site I get > the message: 'Invalid login. Please try again.' which I have traced it down > to coming from > [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549]. > And as pointed it seems the user variable is None. > I have tried to login using the standard DB authentication method without no > problems. The same issue happens even when I tried registering a new user, or > with that user registered via the DB authentication and then switching to > OAUTH authentication method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2321) RBAC support from new UI's failing on OAuth authentication method
[ https://issues.apache.org/jira/browse/AIRFLOW-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437684#comment-16437684 ] Joy Gao commented on AIRFLOW-2321: -- Hi [~wileeam], if you comment out the following line in webserver_config.py, {code:java} # 'whitelist': ['@YOU_COMPANY_DOMAIN'], # optional{code} does the issue still occur? (alternatively, if you are using a whitelist, make sure the domain matches). I should have that entire line commented out, right now it's a bit misleading. So my bad there. > RBAC support from new UI's failing on OAuth authentication method > - > > Key: AIRFLOW-2321 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2321 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Reporter: Guillermo Rodríguez Cano >Priority: Major > > I tried configuring the RBAC support for the new webserver UI as provided > thanks to this [PR|https://github.com/apache/incubator-airflow/pull/3015] > (solving AIRFLOW-1433 and AIRFLOW-85 issues) but I have encountered issues > with OAuth as authentication method with Google as provider. > I have no issues configuring the authentication details as pointed in the > UPDATING document, but when I test a fresh installation I manage to get to > the Google authentication webpage and on returning to Airflow's site I get > the message: 'Invalid login. Please try again.' which I have traced it down > to coming from > [here|https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L549]. > And as pointed it seems the user variable is None. > I have tried to login using the standard DB authentication method without no > problems. The same issue happens even when I tried registering a new user, or > with that user registered via the DB authentication and then switching to > OAUTH authentication method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2162) Run DAG as user other than airflow does NOT have access to AIRFLOW_ environment variables
[ https://issues.apache.org/jira/browse/AIRFLOW-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2162. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3184 [https://github.com/apache/incubator-airflow/pull/3184] > Run DAG as user other than airflow does NOT have access to AIRFLOW_ > environment variables > - > > Key: AIRFLOW-2162 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2162 > Project: Apache Airflow > Issue Type: Bug > Components: configuration >Reporter: Sebastian Radloff >Assignee: John Arnold >Priority: Minor > Labels: configuration > Fix For: 1.10.0 > > > When running airflow with LocalExecutor, I inject airflow environment > variables that are supposed to override what is in the airflow.cfg, according > to the documentation [https://airflow.apache.org/configuration.html. > I|https://airflow.apache.org/configuration.html.]f you specify to run your > DAGs as another linux user, root for example, this is what airflow executes > under the hood: > {code:java} > ['bash', '-c', u'sudo -H -u root airflow run docker_sample docker_op_tester > 2018-03-01T15:14:55.699668 --job_id 2 --raw -sd > DAGS_FOLDER/docker-operator.py --cfg_path /tmp/tmpignV9B'] > {code} > > It uses sudo and switches to the root linux user, unfortunately, it won't > have access to the environment variables injected to override the config. > This is important for people who are trying to inject variables into a docker > container at run time while wishing to maintain a level of security around > database credentials. > I think a decent proposal made by [~ashb] in gitter, would be to > automatically pass all environment variables starting with *AIRFLOW__* to any > user. Please lmk if y'all want any help on the documentation or point me in > the right direction and I could create a PR. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2311) Environment variables are accessible to dag execution
[ https://issues.apache.org/jira/browse/AIRFLOW-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-2311: - Summary: Environment variables are accessible to dag execution (was: Environment variables from the scheduler process are accessible to dag execution) > Environment variables are accessible to dag execution > - > > Key: AIRFLOW-2311 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2311 > Project: Apache Airflow > Issue Type: Bug > Components: security >Reporter: Joy Gao >Priority: Major > > Currently, environment variables are accessible to dag execution for both > LocalExecutor and CeleryExecutor (from the machine/container where `airflow > scheduler` process is running on) > I believe it is a potential security concern on the whole by passing down all > environment variables to task execution, which sometimes include sensitive > credentials. This means that it is the responsibility of (1) the airflow > admin to not store sensitive data in environment variables in production or > (2) the dag maintainer to properly audit the dag file and make sure it is not > malicious. (1) seems very hard to guarantee (2) seems easier, but not > foolproof. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2311) Environment variables from the scheduler process are accessible to dag execution
Joy Gao created AIRFLOW-2311: Summary: Environment variables from the scheduler process are accessible to dag execution Key: AIRFLOW-2311 URL: https://issues.apache.org/jira/browse/AIRFLOW-2311 Project: Apache Airflow Issue Type: Bug Components: security Reporter: Joy Gao Currently, environment variables are accessible to dag execution for both LocalExecutor and CeleryExecutor (from the machine/container where `airflow scheduler` process is running on) I believe it is a potential security concern on the whole by passing down all environment variables to task execution, which sometimes include sensitive credentials. This means that it is the responsibility of (1) the airflow admin to not store sensitive data in environment variables in production or (2) the dag maintainer to properly audit the dag file and make sure it is not malicious. (1) seems very hard to guarantee (2) seems easier, but not foolproof. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2273) Add Discord webhook operator/hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2273. -- Resolution: Fixed Fix Version/s: (was: Airflow 2.0) 1.10.0 Issue resolved by pull request #3178 [https://github.com/apache/incubator-airflow/pull/3178] > Add Discord webhook operator/hook > - > > Key: AIRFLOW-2273 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2273 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks, operators >Reporter: Thomas Buida >Assignee: Thomas Buida >Priority: Minor > Fix For: 1.10.0 > > > [Discord|https://discordapp.com/] is used by many as an alternative to Slack. > [AIRFLOW 2217|https://issues.apache.org/jira/browse/AIRFLOW-2217] added > support for Slack incoming webhooks as a way to post messages to a Slack > channel. It would be great to have the same offering for Discord users by > using [Discord > webhooks|https://discordapp.com/developers/docs/resources/webhook]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2200) Add Snowflake Operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2200. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3150 [https://github.com/apache/incubator-airflow/pull/3150] > Add Snowflake Operator > -- > > Key: AIRFLOW-2200 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2200 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, dependencies, hooks, operators >Reporter: Devin Jones >Assignee: Devin Jones >Priority: Major > Fix For: 1.10.0 > > > Add Connection, Hook and Operator to interface with a Snowflake account -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2282) Fix grammar in UPDATING.md
[ https://issues.apache.org/jira/browse/AIRFLOW-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2282. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3189 [https://github.com/apache/incubator-airflow/pull/3189] > Fix grammar in UPDATING.md > -- > > Key: AIRFLOW-2282 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2282 > Project: Apache Airflow > Issue Type: Task >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Trivial > Labels: documentation > Fix For: 1.10.0 > > > Fixes a small grammatical typo in UPDATING.md. Also auto removes some > trailing whitespace in another .md file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2271) Undead tasks are heartbeating and not getting killed
[ https://issues.apache.org/jira/browse/AIRFLOW-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420804#comment-16420804 ] Joy Gao commented on AIRFLOW-2271: -- Please see [PR 2975|https://github.com/apache/incubator-airflow/pull/2975], it addresses the second change you have described above to fix zombie thread. > Undead tasks are heartbeating and not getting killed > > > Key: AIRFLOW-2271 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2271 > Project: Apache Airflow > Issue Type: Bug > Components: worker >Affects Versions: 1.9.0 >Reporter: Greg >Priority: Major > Attachments: Airflow_zombies_masked.png > > > Background: We had a resource leak in some of our Airflow operators, so after > the task is completed, the connection pool was not disposed and the processes > were still running (see attached screenshot). It caused the execution pool > (size=16) being exhaused after a couple of days. > Investigation: > We checked those task instances and related jobs in the database, and found > mismatch: > SQL: > {code:sql} > select > ti.execution_date, > ti.state AS task_state, > ti.start_date AS task_start_dt, > ti.end_date As task_end_dt, > j.id AS job_id, > j.state AS job_state, > j.start_date AS job_start_dt, > j.end_date AS job_end_dt, > j.latest_heartbeat > from task_instance ti > join job j > on j.id=ti.job_id > where ti.task_id='backup_data_tables' > order by task_start_dt DESC > {code} > ||execution_date||task_state||task_start_dt||task_end_dt||job_id||job_state||job_start_dt||job_end_dt||latest_heartbeat|| > |2018-03-23 23:00:00|success|2018-03-27 08:42:12.846058|2018-03-27 > 08:42:17.408723|10925|success|2018-03-27 08:42:12.768759|2018-03-27 > 08:42:22.815474|2018-03-27 08:42:12.768773| > |2018-03-22 23:00:00|success|2018-03-23 23:02:44.079996|2018-03-24 > 01:08:52.842612|9683|running|2018-03-23 23:02:44.010813| |2018-03-26 > 11:29:15.928836| > |2018-03-21 23:00:00|success|2018-03-22 23:02:14.254779|2018-03-23 > 01:07:58.322927|9075|running|2018-03-22 23:02:14.199652| |2018-03-26 > 11:29:16.570076| > |2018-03-20 23:00:00|success|2018-03-21 23:02:33.417882|2018-03-22 > 01:16:56.695002|8475|running|2018-03-21 23:02:33.33754| |2018-03-26 > 11:29:16.529516| > |2018-03-19 23:00:00|success|2018-03-21 13:20:36.084062|2018-03-21 > 15:32:51.263954|8412|running|2018-03-21 13:20:36.026206| |2018-03-26 > 11:29:16.529413| > As shown in the result set above, jobs of the completed tasks are still > running and heartbeating several days after the actual task is completed, > stopped only after we killed them manually. > In the log files of the tasks we see a bunch of entries like below, which > show that _kill_process_tree()_ method is envoked every ~5sec: > {code:java} > [2018-03-28 13:03:33,013] \{{helpers.py:269}} DEBUG - There are no descendant > processes to kill > [2018-03-28 13:03:38,211] \{{helpers.py:269}} DEBUG - There are no descendant > processes to kill > [2018-03-28 13:03:43,290] \{{helpers.py:269}} DEBUG - There are no descendant > processes to kill > [2018-03-28 13:03:48,416] \{{helpers.py:269}} DEBUG - There are no descendant > processes to kill > [2018-03-28 13:03:53,604] \{{helpers.py:269}} DEBUG - There are no descendant > processes to kill > {code} > After some debugging we found that _LocalTaskJob.terminating_ flag is set to > _True_, but processes are still not getting killed, moreover, job is still > heartbeating. > Expected result: Airflow is responsible for shutting down the processes, not > leaving undeads, even if force kill is needed. > Possible fix: > We did the following two changes in the code (we have fixed it in our fork): > - _LocalTaskJob._execute_ - do not heartbeat if task is terminating > - _kill_process_tree_ - add bool argument kill_root, and kill the root > process after descendants if True > After that all the tasks having that resource leak were shutting down > correctly, without leaving any "undead" processes. > Would love to get some feedback from expects about this issue and the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2248) Fix wrong param name in RedshiftToS3Transfer doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2248. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3156 [https://github.com/apache/incubator-airflow/pull/3156] > Fix wrong param name in RedshiftToS3Transfer doc > > > Key: AIRFLOW-2248 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2248 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Kengo Seki >Priority: Minor > Fix For: 1.10.0 > > > RedshiftToS3Transfer's docstring says: > {code} > :param options: reference to a list of UNLOAD options > :type options: list > {code} > but the correct name is {{unload_options}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2235) Fix wrong docstrings in Transfer operators for MySQL
[ https://issues.apache.org/jira/browse/AIRFLOW-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2235. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3147 [https://github.com/apache/incubator-airflow/pull/3147] > Fix wrong docstrings in Transfer operators for MySQL > > > Key: AIRFLOW-2235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2235 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Kengo Seki >Assignee: Tao Feng >Priority: Minor > Fix For: 1.10.0 > > > Docstrings in HiveToMySqlTransfer and PrestoToMySqlTransfer says: > {code} > :param sql: SQL query to execute against the MySQL database > {code} > but actually these queries are executed against Hive and Presto respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails
[ https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408846#comment-16408846 ] Joy Gao commented on AIRFLOW-2231: -- (That said, I was still unable to produce the bug you had, perhaps we are using different version of dateutil package, the issue I had was that the dag with relativedelta would simply never get scheduled) > DAG with a relativedelta schedule_interval fails > > > Key: AIRFLOW-2231 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2231 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: Kyle Brooks >Priority: Major > Attachments: test_reldel.py > > > The documentation for the DAG class says using > dateutil.relativedelta.relativedelta as a schedule_interval is supported but > it fails: > > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, > in process_file > m = imp.load_source(mod_name, filepath) > File > "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 675, in _load > File "", line 655, in _load_unlocked > File "", line 678, in exec_module > File "", line 205, in _call_with_frames_removed > File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in > dagrun_timeout=timedelta(minutes=60)) > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, > in __init__ > if schedule_interval in cron_presets: > TypeError: unhashable type: 'relativedelta' > > It looks like the __init__ function for class DAG assumes the > schedule_interval is hashable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails
[ https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408835#comment-16408835 ] Joy Gao commented on AIRFLOW-2231: -- Got a chance to look at code and turns out the doc is not accurate, dateutil.relativedelta.relativedelta is not supported, but datetime.timedelta is. In the case where you need monthly cadence, you can use either @monthly or cron syntax instead. Alternatively, if you'd like to add relativedelta, you can submit a PR and modify [this section|https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3168-L3196] to support relativedelta. > DAG with a relativedelta schedule_interval fails > > > Key: AIRFLOW-2231 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2231 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: Kyle Brooks >Priority: Major > Attachments: test_reldel.py > > > The documentation for the DAG class says using > dateutil.relativedelta.relativedelta as a schedule_interval is supported but > it fails: > > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, > in process_file > m = imp.load_source(mod_name, filepath) > File > "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 675, in _load > File "", line 655, in _load_unlocked > File "", line 678, in exec_module > File "", line 205, in _call_with_frames_removed > File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in > dagrun_timeout=timedelta(minutes=60)) > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, > in __init__ > if schedule_interval in cron_presets: > TypeError: unhashable type: 'relativedelta' > > It looks like the __init__ function for class DAG assumes the > schedule_interval is hashable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails
[ https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407340#comment-16407340 ] Joy Gao commented on AIRFLOW-2231: -- Can't replicate this. Can you provide the dag file? thanks! > DAG with a relativedelta schedule_interval fails > > > Key: AIRFLOW-2231 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2231 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: Kyle Brooks >Priority: Major > > The documentation for the DAG class says using > dateutil.relativedelta.relativedelta as a schedule_interval is supported but > it fails: > > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, > in process_file > m = imp.load_source(mod_name, filepath) > File > "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 675, in _load > File "", line 655, in _load_unlocked > File "", line 678, in exec_module > File "", line 205, in _call_with_frames_removed > File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in > dagrun_timeout=timedelta(minutes=60)) > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, > in __init__ > if schedule_interval in cron_presets: > TypeError: unhashable type: 'relativedelta' > > It looks like the __init__ function for class DAG assumes the > schedule_interval is hashable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2232) DAG must be imported for airflow dag discovery
[ https://issues.apache.org/jira/browse/AIRFLOW-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2232. -- Resolution: Duplicate Closing since it's a dupe. > DAG must be imported for airflow dag discovery > -- > > Key: AIRFLOW-2232 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2232 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: andy dreyfuss >Priority: Critical > > repro: put the following in the dags/ directory > - > from my_dags import MyDag > d = MyDag() . # this is an airflow.DAG > ` > > Expected: airflow list_dags lists the dag > Actual: airflow does not list the dag unless an unused `from airflow import > DAG` is added -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2205) Remove unsupported args from JdbcHook doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2205. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3121 [https://github.com/apache/incubator-airflow/pull/3121] > Remove unsupported args from JdbcHook doc > - > > Key: AIRFLOW-2205 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2205 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > Fix For: 1.10.0 > > > The following arguments are in JdbcHook's docstring, but unsupported actually > (the last two have to be specified via "extra" field in the database, not as > arguments): > - jdbc_url > - sql > - jdbc_driver_name > - jdbc_driver_loc > Also, the following functionality doesn't seem to be implemented: > bq. Otherwise host, port, schema, username and password can be specified on > the fly. > In addition, JdbcHook is missing from the API reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2204) Broken webserver debug mode
[ https://issues.apache.org/jira/browse/AIRFLOW-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2204. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3118 [https://github.com/apache/incubator-airflow/pull/3118] > Broken webserver debug mode > --- > > Key: AIRFLOW-2204 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2204 > Project: Apache Airflow > Issue Type: Bug > Components: webapp, webserver >Affects Versions: 1.9.0 >Reporter: Bruno Bonagura >Assignee: Bruno Bonagura >Priority: Minor > Fix For: 1.10.0 > > > {code} > $ airflow webserver -d > [2018-03-09 21:04:25,730] {__init__.py:45} INFO - Using executor LocalExecutor > _ > |__( )_ __/__ / __ > /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_//_//_/ \//|__/ > > [2018-03-09 21:04:26,003] {models.py:196} INFO - Filling up the DagBag from > /.../incubator-airflow/dags > Starting the web server on port 8080 and host 0.0.0.0. > Traceback (most recent call last): > File "/.../.virtualenvs/incubator-airflow/bin/airflow", line 6, in > exec(compile(open(__file__).read(), __file__, 'exec')) > File "/.../incubator-airflow/airflow/bin/airflow", line 27, in > args.func(args) > File "/.../incubator-airflow/airflow/bin/cli.py", line 716, in webserver > app.run(debug=True, port=args.port, host=args.hostname, > AttributeError: 'DispatcherMiddleware' object has no attribute 'run' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2206) Remove unsupported args from JdbcOperator doc
[ https://issues.apache.org/jira/browse/AIRFLOW-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2206. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3122 [https://github.com/apache/incubator-airflow/pull/3122] > Remove unsupported args from JdbcOperator doc > - > > Key: AIRFLOW-2206 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2206 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > Fix For: 1.10.0 > > > The following arguments are in JdbcOperator's docstring, but unsupported > actually. > - jdbc_url > - jdbc_driver_name > - jdbc_driver_loc -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2207) Fix flaky test that uses app.cached_app()
[ https://issues.apache.org/jira/browse/AIRFLOW-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2207. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3123 [https://github.com/apache/incubator-airflow/pull/3123] > Fix flaky test that uses app.cached_app() > - > > Key: AIRFLOW-2207 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2207 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > Fix For: 1.10.0 > > > tests.www.test_views:TestMountPoint.test_mount changes base_url then calls > airflow.www.app.cached_app(). > But if another test calls app.cached_app() first without changing base_url, > succeeding test_mount fails on Travis. > For example, adding the following test causes test_mount to fail, > whereas test_dummy itself succeeds: > {code} > class TestDummy(unittest.TestCase): > def setUp(self): > super(TestDummy, self).setUp() > configuration.load_test_config() > app = application.cached_app(testing=True) > self.client = Client(app) > def test_dummy(self): > response, _, _ = self.client.get('/', follow_redirects=True) > resp_html = b''.join(response) > self.assertIn(b"DAGs", resp_html) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2169) Fail to discern between VARBINARY and VARCHAR in MySQL
[ https://issues.apache.org/jira/browse/AIRFLOW-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2169. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3091 [https://github.com/apache/incubator-airflow/pull/3091] > Fail to discern between VARBINARY and VARCHAR in MySQL > -- > > Key: AIRFLOW-2169 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2169 > Project: Apache Airflow > Issue Type: Bug > Components: db, operators >Reporter: Hongyi Wang >Assignee: Hongyi Wang >Priority: Major > Fix For: 1.10.0 > > > Current MySqlToGoogleCloudStorageOperator has difficulty to discern between > VARBINARY and VARCHAR in MySQL (and other similar fields–CHAR/BINARY, etc). > While "binary-related" MySQL data types, like VARBINARY, should be mapped to > "BYTES" in Google Cloud Storage, rather than "STRING". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2187) Fix Broken Travis CI due to [AIRFLOW-2123]
[ https://issues.apache.org/jira/browse/AIRFLOW-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2187. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3108 [https://github.com/apache/incubator-airflow/pull/3108] > Fix Broken Travis CI due to [AIRFLOW-2123] > -- > > Key: AIRFLOW-2187 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2187 > Project: Apache Airflow > Issue Type: Improvement > Components: ci >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Major > Fix For: 1.10.0 > > > Travis CI is failing after merging > [AIRFLOW-2123|https://issues.apache.org/jira/browse/AIRFLOW-2123]. > This is caused due to the fact that apache-beam[gcp] is not available for > Python 3.x > *Error Log:* > {code} > Collecting apache-beam[gcp]==2.3.0 (from > google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating) > Could not find a version that satisfies the requirement > apache-beam[gcp]==2.3.0 (from > google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating) (from > versions: 0.6.0, 2.0.0, 2.1.0, 2.1.1, 2.2.0) > No matching distribution found for apache-beam[gcp]==2.3.0 (from > google-cloud-dataflow>=2.2.0->apache-airflow==1.10.0.dev0+incubating) > ERROR: InvocationError: > '/home/travis/build/apache/incubator-airflow/.tox/py35-backend_mysql/bin/pip > wheel -w /home/travis/.wheelhouse -f /home/travis/.wheelhouse -e .[devel_ci]' > ___ summary > > ERROR: py35-backend_mysql: commands failed > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0
[ https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2175. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3104 [https://github.com/apache/incubator-airflow/pull/3104] > Failed to upgradedb 1.8.2 -> 1.9.0 > -- > > Key: AIRFLOW-2175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2175 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 >Reporter: Damian Momot >Priority: Critical > Fix For: 1.10.0 > > > We've got airflow installation with hundreds of DAGs and thousands of tasks. > During upgrade (1.8.2 -> 1.9.0) we've got following error. > After analyzing stacktrace i've found that it's most likely caused by None > value in 'fileloc' field of Dag column. I checked database and indeed we've > got one record with such value: > > > {code:java} > SELECT COUNT(*) FROM dag WHERE fileloc IS NULL; > 1 > SELECT COUNT(*) FROM dag; > 343 > {code} > > > {code:java} > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 27, in > args.func(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, > in upgradedb > db_utils.upgradedb() > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, > in upgradedb > command.upgrade(config, 'heads') > File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, > in upgrade > script.run_env() > File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line > 416, in run_env > util.load_python_file(self.dir, 'env.py') > File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line > 93, in load_python_file > module = load_module_py(module_id, path) > File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line > 79, in load_module_py > mod = imp.load_source(module_id, path, fp) > File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", > line 86, in > run_migrations_online() > File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", > line 81, in run_migrations_online > context.run_migrations() > File "", line 8, in run_migrations > File > "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line > 807, in run_migrations > self.get_context().run_migrations(**kw) > File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", > line 321, in run_migrations > step.migration_fn(**kw) > File > "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py", > line 63, in upgrade > dag = dagbag.get_dag(ti.dag_id) > File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, > in get_dag > filepath=orm_dag.fileloc, only_if_updated=False) > File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, > in process_file > if not os.path.isfile(filepath): > File "/usr/lib/python2.7/genericpath.py", line 29, in isfile > st = os.stat(path) > TypeError: coercing to Unicode: need string or buffer, NoneType found{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0
[ https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386502#comment-16386502 ] Joy Gao commented on AIRFLOW-2175: -- Perhaps the fileloc attribute didn't get saved to db successfully. Curious is this a subdag? Maybe add a null check prior to os.path.isfile(filepath) to avoid this TypeError. > Failed to upgradedb 1.8.2 -> 1.9.0 > -- > > Key: AIRFLOW-2175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2175 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0 >Reporter: Damian Momot >Priority: Critical > > We've got airflow installation with hundreds of DAGs and thousands of tasks. > During upgrade (1.8.2 -> 1.9.0) we've got following error. > After analyzing stacktrace i've found that it's most likely caused by None > value in 'fileloc' field of Dag column. I checked database and indeed we've > got one record with such value: > > > {code:java} > SELECT COUNT(*) FROM dag WHERE fileloc IS NULL; > 1 > SELECT COUNT(*) FROM dag; > 343 > {code} > > > {code:java} > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 27, in > args.func(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, > in upgradedb > db_utils.upgradedb() > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, > in upgradedb > command.upgrade(config, 'heads') > File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, > in upgrade > script.run_env() > File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line > 416, in run_env > util.load_python_file(self.dir, 'env.py') > File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line > 93, in load_python_file > module = load_module_py(module_id, path) > File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line > 79, in load_module_py > mod = imp.load_source(module_id, path, fp) > File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", > line 86, in > run_migrations_online() > File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", > line 81, in run_migrations_online > context.run_migrations() > File "", line 8, in run_migrations > File > "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line > 807, in run_migrations > self.get_context().run_migrations(**kw) > File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", > line 321, in run_migrations > step.migration_fn(**kw) > File > "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py", > line 63, in upgrade > dag = dagbag.get_dag(ti.dag_id) > File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, > in get_dag > filepath=orm_dag.fileloc, only_if_updated=False) > File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, > in process_file > if not os.path.isfile(filepath): > File "/usr/lib/python2.7/genericpath.py", line 29, in isfile > st = os.stat(path) > TypeError: coercing to Unicode: need string or buffer, NoneType found{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2170) The Implement Features section in the CONTRIBUTING.md is incomplete
[ https://issues.apache.org/jira/browse/AIRFLOW-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2170. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3089 [https://github.com/apache/incubator-airflow/pull/3089] > The Implement Features section in the CONTRIBUTING.md is incomplete > --- > > Key: AIRFLOW-2170 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2170 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Trivial > Fix For: 1.10.0 > > > Currently it says: > {noformat} > Implement Features > Look through the Apache Jira for features. Any unassigned "Improvement" issue > is open to whoever wants to implement it. > We've created the operators, hooks, macros and executors we needed, but we > made sure that this part of Airflow is extensible. New operators, hooks and > operators are very welcomed!{noformat} > but it would probably be better to change the last sentence to: > {noformat} > New operators, hooks, macros and executors are very welcomed!{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock
[ https://issues.apache.org/jira/browse/AIRFLOW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384225#comment-16384225 ] Joy Gao commented on AIRFLOW-1642: -- This one fell off my radar, I do have a PR out for it [https://github.com/apache/incubator-airflow/pull/2632] but never got merged :( > An Alembic script not using scoped session causing deadlock > --- > > Key: AIRFLOW-1642 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1642 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Priority: Minor > > The bug I'm about to describe is a more of an obscure edge case, however I > think it's something still worth fixing. > After upgrading to airflow 1.9, while running `airflow resetdb` on my local > machine (with mysql), I encountered a deadlock on the final alembic revision > _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text > types)_. > The deadlock turned out to be caused by another earlier session that was > created and left open in revision _cc1e65623dc7 add max tries column to task > instance_. Notably the code below: > {code} > sessionmaker = sa.orm.sessionmaker() > session = sessionmaker(bind=connection) > dagbag = DagBag(settings.DAGS_FOLDER) > {code} > The session created here was not a `scoped_session`, so when the DAGs were > being parsed in line 3 above, one of the DAG files makes a direct call to the > class method `Variable.get()` to acquire an env variable, which makes a db > query to the `variable` table, but raised a KeyError as the env variable was > non-existent, thus holding the lock to the `variable` table as a result of > that exception. > Later on, the latter alembic script `_cc1e65623dc7` needs to alter the > `Variable` table. Instead of creating its own Session object, it attempts to > reuse the same one as above. And because of the exception, it waits > indefinitely to acquire the lock on that table. > So the DAG file itself could have avoided the KeyError by providing a default > value when calling Variable.get(). However I think it would be a good idea to > avoid using unscoped sessions in general, as an exception could potentially > occur in the future elsewhere. The easiest fix is replacing *session = > sessionmaker(bind=connection)* with *session = settings.Session()*, which is > scoped. However, making a change on a migration script is going to make folks > anxious. > If anyone have any thoughts on this, let me know! Thanks :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2130) Many Operators are missing from the docs
[ https://issues.apache.org/jira/browse/AIRFLOW-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2130. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3061 [https://github.com/apache/incubator-airflow/pull/3061] > Many Operators are missing from the docs > > > Key: AIRFLOW-2130 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2130 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Affects Versions: 1.10.0 >Reporter: Reid Beels >Assignee: Reid Beels >Priority: Critical > Fix For: 1.10.0 > > > * BaseSensorOperator references the wrong import path, so the autodoc fails > * In the core operators, these are missing: > ** airflow.operators.check_operator.CheckOperator > ** airflow.operators.check_operator.IntervalCheckOperator > ** airflow.operators.check_operator.ValueCheckOperator > ** airflow.operators.hive_stats_operator.HiveStatsCollectionOperator > ** airflow.operators.jdbc_operator.JdbcOperator > ** airflow.operators.latest_only_operator.LatestOnlyOperator > ** airflow.operators.mysql_operator.MySqlOperator > ** airflow.operators.oracle_operator.OracleOperator > ** airflow.operators.pig_operator.PigOperator > ** airflow.operators.s3_file_transform_operator.S3FileTransformOperator > ** airflow.operators.sqlite_operator.SqliteOperator > ** airflow.operators.mysql_to_hive.MySqlToHiveTransfer > ** airflow.operators.presto_to_mysql.PrestoToMySqlTransfer > ** airflow.operators.redshift_to_s3_operator.RedshiftToS3Transfer > * In contrib.operators, these are missing: > ** airflow.contrib.operators.awsbatch_operator.AWSBatchOperator > ** airflow.contrib.operators.druid_operator.DruidOperator > ** airflow.contrib.operators.emr_add_steps_operator.EmrAddStepsOperator > ** > airflow.contrib.operators.emr_create_job_flow_operator.EmrCreateJobFlowOperator > ** > airflow.contrib.operators.emr_terminate_job_flow_operator.EmrTerminateJobFlowOperator > ** airflow.contrib.operators.gcs_to_bq.GoogleCloudStorageToBigQueryOperator > ** airflow.contrib.operators.jira_operator.JiraOperator > ** airflow.contrib.operators.kubernetes_pod_operator.KubernetesPodOperator > ** airflow.contrib.operators.mlengine_operator.MLEngineBatchPredictionOperator > ** airflow.contrib.operators.mlengine_operator.MLEngineModelOperator > ** airflow.contrib.operators.mlengine_operator.MLEngineVersionOperator > ** airflow.contrib.operators.mlengine_operator.MLEngineTrainingOperator > ** airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator > ** > airflow.contrib.operators.postgres_to_gcs_operator.PostgresToGoogleCloudStorageOperator > ** airflow.contrib.operators.sftp_operator.SFTPOperator > ** airflow.contrib.operators.spark_jdbc_operator.SparkJDBCOperator > ** airflow.contrib.operators.spark_sql_operator.SparkSqlOperator > ** airflow.contrib.operators.spark_submit_operator.SparkSubmitOperator > ** airflow.contrib.operators.sqoop_operator.SqoopOperator > ** airflow.contrib.operators.hive_to_dynamodb.HiveToDynamoDBTransferOperator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2131) API Reference includes confusing docs from airflow.utils.AirflowImporter
[ https://issues.apache.org/jira/browse/AIRFLOW-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2131. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3062 [https://github.com/apache/incubator-airflow/pull/3062] > API Reference includes confusing docs from airflow.utils.AirflowImporter > > > Key: AIRFLOW-2131 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2131 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Reid Beels >Assignee: Reid Beels >Priority: Critical > Fix For: 1.10.0 > > Attachments: image-2018-02-20-16-53-04-572.png > > > The generated API documentation includes {{automodule}} declarations for > several modules (hooks and operators) that end up pulling in docs from > {{airflow.utils.helpers.AirflowImporter}}. > This leads to the confusing situation for new users who think they're reading > docs about what Hooks are, but are instead reading unlabeled docs about the > seemingly-deprecated AirflowImporter. > Like so: > !image-2018-02-20-16-53-04-572.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1852) Allow hostname to be overridable
[ https://issues.apache.org/jira/browse/AIRFLOW-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-1852. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3036 [https://github.com/apache/incubator-airflow/pull/3036] > Allow hostname to be overridable > > > Key: AIRFLOW-1852 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1852 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Trevor Joynson >Priority: Major > Fix For: 1.10.0 > > > * https://github.com/apache/incubator-airflow/pull/2472 > This makes running Airflow tremendously easier in common > production deployments that need a little more than just > a bare `socket.getfqdn()` hostname for service discovery > per running instance. > Personally, I just place the Kubernetes Pod FQDN (or even IP) here. > Question: Since the web server calls out to the individual > worker nodes to snag logs, what happens if one dies midway? > I may later look into that, because that scares me slightly. > I feel like workers should not ever hold such state, but that's purely a > personal bias. > Thanks, > Trevor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-85) Create DAGs UI
[ https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-85: --- Description: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. (From Airflow-1443) The authentication capabilities in the [RBAC design proposal|https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal] introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per [community discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html], Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. was: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. (From Airflow-1443) The authentication capabilities in the [RBAC design proposal |[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per [community discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html], Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. > Create DAGs UI > -- > > Key: AIRFLOW-85 > URL: https://issues.apache.org/jira/browse/AIRFLOW-85 > Project: Apache Airflow > Issue Type: Bug > Components: security, ui >Reporter: Chris Riccomini >Assignee: Joy Gao >Priority: Major > > Airflow currently provides only an {{/admin}} UI interface for the webapp. > This UI provides three distinct roles: > * Admin > * Data profiler > * None > In addition, Airflow currently provides the ability to log in, either via a > secure proxy front-end, or via LDAP/Kerberos, within the webapp. > We run Airflow with LDAP authentication enabled. This helps us control access > to the UI. However, there is insufficient granularity within the UI. We would > like to be able to grant users the ability to: > # View their DAGs, but no one else's. > # Control their DAGs, but no one else's. > This is not possible right now. You can take away the ability to access the > connections and data profiling tabs, but users can still see all DAGs, as > well as control the state of the DB by clearing any DAG status, etc. > > (From Airflow-1443) > The authentication capabilities in the [RBAC design > proposal|https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal] > introduces a significant amount of work that is otherwise already built-in > in existing frameworks. > Per [community > discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html], > Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to >
[jira] [Updated] (AIRFLOW-85) Create DAGs UI
[ https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-85: --- Description: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. (From Airflow-1443) The authentication capabilities in the [RBAC design proposal |[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per [community discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html], Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. was: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. >From Airflow-1443: The authentication capabilities in the RBAC design proposal introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. > Create DAGs UI > -- > > Key: AIRFLOW-85 > URL: https://issues.apache.org/jira/browse/AIRFLOW-85 > Project: Apache Airflow > Issue Type: Bug > Components: security, ui >Reporter: Chris Riccomini >Assignee: Joy Gao >Priority: Major > > Airflow currently provides only an {{/admin}} UI interface for the webapp. > This UI provides three distinct roles: > * Admin > * Data profiler > * None > In addition, Airflow currently provides the ability to log in, either via a > secure proxy front-end, or via LDAP/Kerberos, within the webapp. > We run Airflow with LDAP authentication enabled. This helps us control access > to the UI. However, there is insufficient granularity within the UI. We would > like to be able to grant users the ability to: > # View their DAGs, but no one else's. > # Control their DAGs, but no one else's. > This is not possible right now. You can take away the ability to access the > connections and data profiling tabs, but users can still see all DAGs, as > well as control the state of the DB by clearing any DAG status, etc. > > (From Airflow-1443) > The authentication capabilities in the [RBAC design proposal > |[https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+RBAC+proposal]] > introduces a significant amount of work that is otherwise already built-in in > existing frameworks. > Per [community > discussion|https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg02946.html], > Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to > implementing RBAC. This will support integration with different > authentication backends out-of-the-box, and generate permissions for views > and ORM models that
[jira] [Commented] (AIRFLOW-85) Create DAGs UI
[ https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357471#comment-16357471 ] Joy Gao commented on AIRFLOW-85: Closed AIRFLOW-1433 as dupe since the FAB work will directly fix the issue here. > Create DAGs UI > -- > > Key: AIRFLOW-85 > URL: https://issues.apache.org/jira/browse/AIRFLOW-85 > Project: Apache Airflow > Issue Type: Bug > Components: security, ui >Reporter: Chris Riccomini >Assignee: Joy Gao >Priority: Major > > Airflow currently provides only an {{/admin}} UI interface for the webapp. > This UI provides three distinct roles: > * Admin > * Data profiler > * None > In addition, Airflow currently provides the ability to log in, either via a > secure proxy front-end, or via LDAP/Kerberos, within the webapp. > We run Airflow with LDAP authentication enabled. This helps us control access > to the UI. However, there is insufficient granularity within the UI. We would > like to be able to grant users the ability to: > # View their DAGs, but no one else's. > # Control their DAGs, but no one else's. > This is not possible right now. You can take away the ability to access the > connections and data profiling tabs, but users can still see all DAGs, as > well as control the state of the DB by clearing any DAG status, etc. > > From Airflow-1443: > The authentication capabilities in the RBAC design proposal introduces a > significant amount of work that is otherwise already built-in in existing > frameworks. > Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow > as a foundation to implementing RBAC. This will support integration with > different authentication backends out-of-the-box, and generate permissions > for views and ORM models that will simplify view-level and dag-level access > control. > This implies modifying the current flask views, and deprecating the current > Flask-Admin in favor of FAB's crud. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-85) Create DAGs UI
[ https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-85: --- Description: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. >From Airflow-1443: The authentication capabilities in the RBAC design proposal introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. was: Airflow currently provides only an {{/admin}} UI interface for the webapp. This UI provides three distinct roles: * Admin * Data profiler * None In addition, Airflow currently provides the ability to log in, either via a secure proxy front-end, or via LDAP/Kerberos, within the webapp. We run Airflow with LDAP authentication enabled. This helps us control access to the UI. However, there is insufficient granularity within the UI. We would like to be able to grant users the ability to: # View their DAGs, but no one else's. # Control their DAGs, but no one else's. This is not possible right now. You can take away the ability to access the connections and data profiling tabs, but users can still see all DAGs, as well as control the state of the DB by clearing any DAG status, etc. > Create DAGs UI > -- > > Key: AIRFLOW-85 > URL: https://issues.apache.org/jira/browse/AIRFLOW-85 > Project: Apache Airflow > Issue Type: Bug > Components: security, ui >Reporter: Chris Riccomini >Assignee: Joy Gao >Priority: Major > > Airflow currently provides only an {{/admin}} UI interface for the webapp. > This UI provides three distinct roles: > * Admin > * Data profiler > * None > In addition, Airflow currently provides the ability to log in, either via a > secure proxy front-end, or via LDAP/Kerberos, within the webapp. > We run Airflow with LDAP authentication enabled. This helps us control access > to the UI. However, there is insufficient granularity within the UI. We would > like to be able to grant users the ability to: > # View their DAGs, but no one else's. > # Control their DAGs, but no one else's. > This is not possible right now. You can take away the ability to access the > connections and data profiling tabs, but users can still see all DAGs, as > well as control the state of the DB by clearing any DAG status, etc. > > From Airflow-1443: > The authentication capabilities in the RBAC design proposal introduces a > significant amount of work that is otherwise already built-in in existing > frameworks. > Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow > as a foundation to implementing RBAC. This will support integration with > different authentication backends out-of-the-box, and generate permissions > for views and ORM models that will simplify view-level and dag-level access > control. > This implies modifying the current flask views, and deprecating the current > Flask-Admin in favor of FAB's crud. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-1433) Convert Airflow to Use FAB Framework
[ https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao closed AIRFLOW-1433. Resolution: Duplicate Duplicate of AIRFLOW-85. > Convert Airflow to Use FAB Framework > > > Key: AIRFLOW-1433 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1433 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > > The authentication capabilities in the RBAC design proposal introduces a > significant amount of work that is otherwise already built-in in existing > frameworks. > Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow > as a foundation to implementing RBAC. This will support integration with > different authentication backends out-of-the-box, and generate permissions > for views and ORM models that will simplify view-level and dag-level access > control. > This implies modifying the current flask views, and deprecating the current > Flask-Admin in favor of FAB's crud. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2057) Add Overstock to the list of Airflow users
Joy Gao created AIRFLOW-2057: Summary: Add Overstock to the list of Airflow users Key: AIRFLOW-2057 URL: https://issues.apache.org/jira/browse/AIRFLOW-2057 Project: Apache Airflow Issue Type: Task Reporter: Joy Gao -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-1904) Correct DAG fileloc
[ https://issues.apache.org/jira/browse/AIRFLOW-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1904: - Description: Currently dag file location `dag.fileloc` is determined by getting the second stack frame from the top, i.e.: self.fileloc = sys._getframe().f_back.f_code.co_filename However this fails if the DAG is constructed in an imported module. For example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath would ended up becoming the fileloc, rather than the dag itself. This causes a bug whenever the DAG is refreshed, with the message: ```This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.``` was: Currently dag file location `dag.fileloc` is determined by getting the second stack frame from the top, i.e.: self.fileloc = sys._getframe().f_back.f_code.co_filename However this fails if the DAG is constructed in an imported module. For example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath would ended up becoming the fileloc, rather than the dag itself. This causes a bug whenever the DAG is attempted to be refreshed, with the message: ```This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.``` > Correct DAG fileloc > --- > > Key: AIRFLOW-1904 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1904 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Assignee: Joy Gao > > Currently dag file location `dag.fileloc` is determined by getting the second > stack frame from the top, i.e.: > self.fileloc = sys._getframe().f_back.f_code.co_filename > However this fails if the DAG is constructed in an imported module. For > example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath > would ended up becoming the fileloc, rather than the dag itself. > This causes a bug whenever the DAG is refreshed, with the message: > ```This DAG isn't available in the web server's DagBag object. It shows up in > this list because the scheduler marked it as active in the metadata > database.``` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1904) Correct DAG fileloc
Joy Gao created AIRFLOW-1904: Summary: Correct DAG fileloc Key: AIRFLOW-1904 URL: https://issues.apache.org/jira/browse/AIRFLOW-1904 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao Currently dag file location `dag.fileloc` is determined by getting the second stack frame from the top, i.e.: self.fileloc = sys._getframe().f_back.f_code.co_filename However this fails if the DAG is constructed in an imported module. For example, if I import a DagBuilder in my dagfile, the dagbuilder's filepath would ended up becoming the fileloc, rather than the dag itself. This causes a bug whenever the DAG is attempted to be refreshed, with the message: ```This DAG isn't available in the web server's DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.``` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1821) Default logging config file is confusing
Joy Gao created AIRFLOW-1821: Summary: Default logging config file is confusing Key: AIRFLOW-1821 URL: https://issues.apache.org/jira/browse/AIRFLOW-1821 Project: Apache Airflow Issue Type: Improvement Reporter: Joy Gao Assignee: Joy Gao The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations: - root - airflow - airflow.task - airflow.task_runner - airflow.processor The number of loggers could be reduced to make configuration easier. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1821) Default logging config file is confusing
[ https://issues.apache.org/jira/browse/AIRFLOW-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1821: - Description: The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations: - root - airflow - airflow.task - airflow.task_runner - airflow.processor The number of loggers could be reduced to make configuration easier. was: The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations: - root - airflow - airflow.task - airflow.task_runner - airflow.processor The number of loggers could be reduced to make configuration easier. > Default logging config file is confusing > > > Key: AIRFLOW-1821 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1821 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao > > The current DEFAULT_LOGGING_CONFIG has 5 loggers for configurations: > - root > - airflow > - airflow.task > - airflow.task_runner > - airflow.processor > The number of loggers could be reduced to make configuration easier. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1740) Cannot create/update XCOM via UI in PY3
[ https://issues.apache.org/jira/browse/AIRFLOW-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1740: - Description: I cannot create/update XCOM via UI in PY3. When attempting to update an existing dag's xcom, the following error is received: {code:java} Failed to update record. (builtins.TypeError) string argument without an encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: [{'xcom_id': 165, 'value': "b'bar'"}]] {code} And for creating a new xcom: {code:java} Failed to create record. (builtins.TypeError) string argument without an encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]] {code} was: I cannot create/update XCOM via UI in PY3. When attempting to update an existing dag's xcom, the following error is received: {code:java} Failed to update record. (builtins.TypeError) string argument without an encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: [{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]] {code} And for creating a new xcom: {code:java} Failed to create record. (builtins.TypeError) string argument without an encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]] {code} > Cannot create/update XCOM via UI in PY3 > --- > > Key: AIRFLOW-1740 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1740 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 > Environment: PY3 >Reporter: Joy Gao >Priority: Minor > > I cannot create/update XCOM via UI in PY3. > When attempting to update an existing dag's xcom, the following error is > received: > {code:java} > Failed to update record. (builtins.TypeError) string argument without an > encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: > [{'xcom_id': 165, 'value': "b'bar'"}]] > {code} > And for creating a new xcom: > {code:java} > Failed to create record. (builtins.TypeError) string argument without an > encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, > task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: > [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', > 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1740) Cannot create/update XCOM via UI in PY3
[ https://issues.apache.org/jira/browse/AIRFLOW-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1740: - Summary: Cannot create/update XCOM via UI in PY3 (was: Cannot add XCOM via UI in PY3) > Cannot create/update XCOM via UI in PY3 > --- > > Key: AIRFLOW-1740 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1740 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 > Environment: PY3 >Reporter: Joy Gao >Priority: Minor > > I cannot create/update XCOM via UI in PY3. > When attempting to update an existing dag's xcom, the following error is > received: > {code:java} > Failed to update record. (builtins.TypeError) string argument without an > encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: > [{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]] > {code} > And for creating a new xcom: > {code:java} > Failed to create record. (builtins.TypeError) string argument without an > encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, > task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: > [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', > 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1740) Cannot add XCOM via UI
Joy Gao created AIRFLOW-1740: Summary: Cannot add XCOM via UI Key: AIRFLOW-1740 URL: https://issues.apache.org/jira/browse/AIRFLOW-1740 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.9.0 Reporter: Joy Gao Priority: Minor I cannot create/update XCOM via UI. When attempting to update an existing dag's xcom, the following error is received: {code:java} Failed to update record. (builtins.TypeError) string argument without an encoding [SQL: 'UPDATE xcom SET value=%s WHERE xcom.id = %s'] [parameters: [{'xcom_id': 165, 'value': "b'\\x80\\x03J+\\x92\\xdbYa.'"}]] {code} And for creating a new xcom: {code:java} Failed to create record. (builtins.TypeError) string argument without an encoding [SQL: 'INSERT INTO xcom (`key`, value, timestamp, execution_date, task_id, dag_id) VALUES (%s, %s, now(), %s, %s, %s)'] [parameters: [{'execution_date': datetime.datetime(2017, 10, 7, 1, 1), 'value': 'bar', 'task_id': 'test_task', 'key': 'foo', 'dag_id': 'test_dag'}]] {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1708) pass JSON through the DAG pipeline
[ https://issues.apache.org/jira/browse/AIRFLOW-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202592#comment-16202592 ] Joy Gao commented on AIRFLOW-1708: -- That's correct, you can specific the param `task_ids` when you call xcom_pull. The goal of xcom is communication between tasks, so it serves the use case you described fairly well. > pass JSON through the DAG pipeline > -- > > Key: AIRFLOW-1708 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1708 > Project: Apache Airflow > Issue Type: Wish >Reporter: Igor Cherepanov > > Hello dear community, > is it a right way to pass a JSON by means of xcom_push function in a task and > also get the same JSON through xcom_pull in the next task? Or is there any > other ways to do this? > Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1709) workers on different machines
[ https://issues.apache.org/jira/browse/AIRFLOW-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202590#comment-16202590 ] Joy Gao commented on AIRFLOW-1709: -- You'd want to use [CeleryExecutor|https://airflow.incubator.apache.org/configuration.html#scaling-out-with-celery] A really good tutorial for reference: https://stlong0521.github.io/20161023%20-%20Airflow.html > workers on different machines > -- > > Key: AIRFLOW-1709 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1709 > Project: Apache Airflow > Issue Type: Wish >Reporter: Igor Cherepanov > > Hello, > is there an example how I can distribute workers on different machines? > Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (AIRFLOW-1702) access to the count of the happened retries in a python method
[ https://issues.apache.org/jira/browse/AIRFLOW-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200914#comment-16200914 ] Joy Gao edited comment on AIRFLOW-1702 at 10/11/17 8:31 PM: The TaskInstance has an attribute `try_number`, you can access it via the the python operator. i.e. {code:java} def foo(**context): ti = context[ti] retry = ti.try_number - 1 # do something with retry count op = PythonOperator( task_id='task', provide_context=True, python_callable=foo, dag=dag) {code} Hope this helps! was (Author: joy.gao54): The TaskInstance has an attribute `try_number`, you can access it via the the python operator. i.e. def foo(**context): ti = context[ti] retry = ti.try_number - 1 # do something with retry count op = PythonOperator( task_id='task', provide_context=True, python_callable=foo, dag=dag) Hope this helps! > access to the count of the happened retries in a python method > -- > > Key: AIRFLOW-1702 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1702 > Project: Apache Airflow > Issue Type: Wish >Reporter: Igor Cherepanov > > hello, > is it possible to access to the count of the happened retries in a python > method > thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1702) access to the count of the happened retries in a python method
[ https://issues.apache.org/jira/browse/AIRFLOW-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200914#comment-16200914 ] Joy Gao commented on AIRFLOW-1702: -- The TaskInstance has an attribute `try_number`, you can access it via the the python operator. i.e. def foo(**context): ti = context[ti] retry = ti.try_number - 1 # do something with retry count op = PythonOperator( task_id='task', provide_context=True, python_callable=foo, dag=dag) Hope this helps! > access to the count of the happened retries in a python method > -- > > Key: AIRFLOW-1702 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1702 > Project: Apache Airflow > Issue Type: Wish >Reporter: Igor Cherepanov > > hello, > is it possible to access to the count of the happened retries in a python > method > thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. 3. Operator currently does not support binary columns in mysql. We should support uploading binary columns from mysql to cloud storage as it's a pretty common use-case. was: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. 3. Update: Operator currently does not support binary columns in mysql. We should support uploading binary columns from mysql to cloud storage as it's a pretty common use-case. > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao >Assignee: Joy Gao > Fix For: 1.9.0 > > > 1. > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > 2. > File opened as binary, but string are written to it. Get error `a bytes-like > object is required, not 'str'`. Use mode='w' instead. > 3. > Operator currently does not support binary columns in mysql. We should > support uploading binary columns from mysql to cloud storage as it's a pretty > common use-case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao reopened AIRFLOW-1613: -- > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao >Assignee: Joy Gao > Fix For: 1.9.0 > > > 1. > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > 2. > File opened as binary, but string are written to it. Get error `a bytes-like > object is required, not 'str'`. Use mode='w' instead. > 3. Update: > Operator currently does not support binary columns in mysql. We should > support uploading binary columns from mysql to cloud storage as it's a pretty > common use-case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. 3. Update: Operator currently does not support binary columns in mysql. We should support uploading binary columns from mysql to cloud storage as it's a pretty common use-case. was: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. 3. Update: Currently All > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao >Assignee: Joy Gao > Fix For: 1.9.0 > > > 1. > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > 2. > File opened as binary, but string are written to it. Get error `a bytes-like > object is required, not 'str'`. Use mode='w' instead. > 3. Update: > Operator currently does not support binary columns in mysql. We should > support uploading binary columns from mysql to cloud storage as it's a pretty > common use-case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. 3. Update: Currently All was: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao >Assignee: Joy Gao > Fix For: 1.9.0 > > > 1. > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > 2. > File opened as binary, but string are written to it. Get error `a bytes-like > object is required, not 'str'`. Use mode='w' instead. > 3. Update: > Currently All -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1671) MIssing @apply_defaults annotation for gcs operator
Joy Gao created AIRFLOW-1671: Summary: MIssing @apply_defaults annotation for gcs operator Key: AIRFLOW-1671 URL: https://issues.apache.org/jira/browse/AIRFLOW-1671 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.9.0 Reporter: Joy Gao Assignee: Joy Gao Fix For: 1.9.0 The @apply_defaults annotation appear to be accidentally removed in a previous PR. Should be added back. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1664) Make MySqlToGoogleCloudStorageOperator support binary data again
Joy Gao created AIRFLOW-1664: Summary: Make MySqlToGoogleCloudStorageOperator support binary data again Key: AIRFLOW-1664 URL: https://issues.apache.org/jira/browse/AIRFLOW-1664 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao The default NamedTemporaryFile mode is `w+b`, this has been modified to `w` in https://github.com/apache/incubator-airflow/pull/2609. This caused a regression for python 2.x airflow environment where it could no longer supports binary type in mysql. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1659) Fix invalid attribute bug in FileTaskHandler
Joy Gao created AIRFLOW-1659: Summary: Fix invalid attribute bug in FileTaskHandler Key: AIRFLOW-1659 URL: https://issues.apache.org/jira/browse/AIRFLOW-1659 Project: Apache Airflow Issue Type: Bug Components: logging Reporter: Joy Gao Assignee: Joy Gao Fix For: 1.9.0 The following line of code is failing in FileTaskHandler {code} response = requests.get(url, timeout=self.timeout) {code} self.timeout is not a valid attribute, should use local variable `timeout` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock
[ https://issues.apache.org/jira/browse/AIRFLOW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179769#comment-16179769 ] Joy Gao commented on AIRFLOW-1642: -- Ah, just checked it's not in the 1.8.2 release. Looks like this can be fixed yay! > An Alembic script not using scoped session causing deadlock > --- > > Key: AIRFLOW-1642 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1642 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Priority: Minor > > The bug I'm about to describe is a more of an obscure edge case, however I > think it's something still worth fixing. > After upgrading to airflow 1.9, while running `airflow resetdb` on my local > machine (with mysql), I encountered a deadlock on the final alembic revision > _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text > types)_. > The deadlock turned out to be caused by another earlier session that was > created and left open in revision _cc1e65623dc7 add max tries column to task > instance_. Notably the code below: > {code} > sessionmaker = sa.orm.sessionmaker() > session = sessionmaker(bind=connection) > dagbag = DagBag(settings.DAGS_FOLDER) > {code} > The session created here was not a `scoped_session`, so when the DAGs were > being parsed in line 3 above, one of the DAG files makes a direct call to the > class method `Variable.get()` to acquire an env variable, which makes a db > query to the `variable` table, but raised a KeyError as the env variable was > non-existent, thus holding the lock to the `variable` table as a result of > that exception. > Later on, the latter alembic script `_cc1e65623dc7` needs to alter the > `Variable` table. Instead of creating its own Session object, it attempts to > reuse the same one as above. And because of the exception, it waits > indefinitely to acquire the lock on that table. > So the DAG file itself could have avoided the KeyError by providing a default > value when calling Variable.get(). However I think it would be a good idea to > avoid using unscoped sessions in general, as an exception could potentially > occur in the future elsewhere. The easiest fix is replacing *session = > sessionmaker(bind=connection)* with *session = settings.Session()*, which is > scoped. However, making a change on a migration script is going to make folks > anxious. > If anyone have any thoughts on this, let me know! Thanks :) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1642) An Alembic script not using scoped session causing deadlock
Joy Gao created AIRFLOW-1642: Summary: An Alembic script not using scoped session causing deadlock Key: AIRFLOW-1642 URL: https://issues.apache.org/jira/browse/AIRFLOW-1642 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Priority: Minor The bug I'm about to describe is a more of an obscure edge case, however I think it's something still worth fixing. After upgrading to airflow 1.9, while running `airflow resetdb` on my local machine (with mysql), I encountered a deadlock on the final alembic revision _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text types)_. The deadlock turned out to be caused by another earlier session that was created and left open in revision _cc1e65623dc7 add max tries column to task instance_. Notably the code below: {code} sessionmaker = sa.orm.sessionmaker() session = sessionmaker(bind=connection) dagbag = DagBag(settings.DAGS_FOLDER) {code} The session created here was not a `scoped_session`, so when the DAGs were being parsed in line 3 above, one of the DAG files makes a direct call to the class method `Variable.get()` to acquire an env variable, which makes a db query to the `variable` table, but raised a KeyError as the env variable was non-existent, thus holding the lock to the `variable` table as a result of that exception. Later on, the latter alembic script `_cc1e65623dc7` needs to alter the `Variable` table. Instead of creating its own Session object, it attempts to reuse the same one as above. And because of the exception, it waits indefinitely to acquire the lock on that table. So the DAG file itself could have avoided the KeyError by providing a default value when calling Variable.get(). However I think it would be a good idea to avoid using unscoped sessions in general, as an exception could potentially occur in the future elsewhere. The easiest fix is replacing *session = sessionmaker(bind=connection)* with *session = settings.Session()*, which is scoped. However, making a change on a migration script is going to make folks anxious. If anyone have any thoughts on this, let me know! Thanks :) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: 1. In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} 2. File opened as binary, but string are written to it. Get error `a bytes-like object is required, not 'str'`. Use mode='w' instead. was: In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao > > 1. > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > 2. > File opened as binary, but string are written to it. Get error `a bytes-like > object is required, not 'str'`. Use mode='w' instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code:python} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. was: In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code:python} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao > > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code:python} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > Moving it inside the loop for re-use. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1613: - Description: In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. was: In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code:python} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. > Make MySqlToGoogleCloudStorageOperator compaitible with python3 > --- > > Key: AIRFLOW-1613 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Reporter: Joy Gao > > In Python 3, map(...) returns an iterator, which can only be iterated over > once. > Therefore the current implementation will return an empty list after the > first iteration of schema: > {code} > schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) > file_no = 0 > tmp_file_handle = NamedTemporaryFile(delete=True) > tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} > for row in cursor: > # Convert datetime objects to utc seconds, and decimals to floats > row = map(self.convert_types, row) > row_dict = dict(zip(schema, row)) > {code} > Moving it inside the loop for re-use. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1613) Make MySqlToGoogleCloudStorageOperator compaitible with python3
Joy Gao created AIRFLOW-1613: Summary: Make MySqlToGoogleCloudStorageOperator compaitible with python3 Key: AIRFLOW-1613 URL: https://issues.apache.org/jira/browse/AIRFLOW-1613 Project: Apache Airflow Issue Type: Bug Components: contrib Reporter: Joy Gao In Python 3, map(...) returns an iterator, which can only be iterated over once. Therefore the current implementation will return an empty list after the first iteration of schema: {code:python} schema = map(lambda schema_tuple: schema_tuple[0], cursor.description) file_no = 0 tmp_file_handle = NamedTemporaryFile(delete=True) tmp_file_handles = {self.filename.format(file_no): tmp_file_handle} for row in cursor: # Convert datetime objects to utc seconds, and decimals to floats row = map(self.convert_types, row) row_dict = dict(zip(schema, row)) {code} Moving it inside the loop for re-use. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1568) Add datastore import/export operator
Joy Gao created AIRFLOW-1568: Summary: Add datastore import/export operator Key: AIRFLOW-1568 URL: https://issues.apache.org/jira/browse/AIRFLOW-1568 Project: Apache Airflow Issue Type: New Feature Components: contrib Reporter: Joy Gao Assignee: Joy Gao Google recently introduced imoprt/export APIs for Cloud Datastore https://cloud.google.com/datastore/docs/reference/rest/, this allows Datastore entities to be backed up programatically. It would be useful to introduce operators to handles this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1474) Add dag_id regex for 'airflow clear' CLI command
Joy Gao created AIRFLOW-1474: Summary: Add dag_id regex for 'airflow clear' CLI command Key: AIRFLOW-1474 URL: https://issues.apache.org/jira/browse/AIRFLOW-1474 Project: Apache Airflow Issue Type: Improvement Components: cli Reporter: Joy Gao Assignee: Joy Gao Priority: Minor The 'airflow clear' CLI command is currently limited to clearing a single DAG per operation. It would be useful to add the capability to clear multiple DAGs per operation using regex, similar to how task_id can be filtered via regex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1433) Convert Airflow to Use FAB Framework
[ https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-1433: - Description: The authentication capabilities in the RBAC design proposal introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. This implies modifying the current flask views, and deprecating the current Flask-Admin in favor of FAB's crud. was: The authentication capabilities in the RBAC design proposal introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. > Convert Airflow to Use FAB Framework > > > Key: AIRFLOW-1433 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1433 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao > > The authentication capabilities in the RBAC design proposal introduces a > significant amount of work that is otherwise already built-in in existing > frameworks. > Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow > as a foundation to implementing RBAC. This will support integration with > different authentication backends out-of-the-box, and generate permissions > for views and ORM models that will simplify view-level and dag-level access > control. > This implies modifying the current flask views, and deprecating the current > Flask-Admin in favor of FAB's crud. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1433) Convert Airflow to Use FAB Framework
Joy Gao created AIRFLOW-1433: Summary: Convert Airflow to Use FAB Framework Key: AIRFLOW-1433 URL: https://issues.apache.org/jira/browse/AIRFLOW-1433 Project: Apache Airflow Issue Type: Improvement Reporter: Joy Gao Assignee: Joy Gao The authentication capabilities in the RBAC design proposal introduces a significant amount of work that is otherwise already built-in in existing frameworks. Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow as a foundation to implementing RBAC. This will support integration with different authentication backends out-of-the-box, and generate permissions for views and ORM models that will simplify view-level and dag-level access control. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-85) Create DAGs UI
[ https://issues.apache.org/jira/browse/AIRFLOW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao reassigned AIRFLOW-85: -- Assignee: Joy Gao > Create DAGs UI > -- > > Key: AIRFLOW-85 > URL: https://issues.apache.org/jira/browse/AIRFLOW-85 > Project: Apache Airflow > Issue Type: Bug > Components: security, ui >Reporter: Chris Riccomini >Assignee: Joy Gao > > Airflow currently provides only an {{/admin}} UI interface for the webapp. > This UI provides three distinct roles: > * Admin > * Data profiler > * None > In addition, Airflow currently provides the ability to log in, either via a > secure proxy front-end, or via LDAP/Kerberos, within the webapp. > We run Airflow with LDAP authentication enabled. This helps us control access > to the UI. However, there is insufficient granularity within the UI. We would > like to be able to grant users the ability to: > # View their DAGs, but no one else's. > # Control their DAGs, but no one else's. > This is not possible right now. You can take away the ability to access the > connections and data profiling tabs, but users can still see all DAGs, as > well as control the state of the DB by clearing any DAG status, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-430) Support list/add/delete connections in the CLI
[ https://issues.apache.org/jira/browse/AIRFLOW-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-430: External issue URL: https://github.com/apache/incubator-airflow/pull/1734 > Support list/add/delete connections in the CLI > -- > > Key: AIRFLOW-430 > URL: https://issues.apache.org/jira/browse/AIRFLOW-430 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Joy Gao >Assignee: Joy Gao > > Right now the only way to manage connections is via UI's connection page. > To allow connection management via scripts, it would be useful support these > features via CLI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-430) Support list/add/delete connections in the CLI
Joy Gao created AIRFLOW-430: --- Summary: Support list/add/delete connections in the CLI Key: AIRFLOW-430 URL: https://issues.apache.org/jira/browse/AIRFLOW-430 Project: Apache Airflow Issue Type: Improvement Components: cli Reporter: Joy Gao Assignee: Joy Gao Right now the only way to manage connections is via UI's connection page. To allow connection management via scripts, it would be useful support these features via CLI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-297) Exponential Backoff Retry Delay
[ https://issues.apache.org/jira/browse/AIRFLOW-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-297: Summary: Exponential Backoff Retry Delay (was: Exponential Backoff) > Exponential Backoff Retry Delay > --- > > Key: AIRFLOW-297 > URL: https://issues.apache.org/jira/browse/AIRFLOW-297 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Minor > > The retry delay time is currently fixed. It would be an useful option to > support progressive longer waits between retries via exponential backoff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)