[jira] [Created] (AIRFLOW-800) Initialize default Google BigQuery Connection with valid conn_type
Wilson Lian created AIRFLOW-800: --- Summary: Initialize default Google BigQuery Connection with valid conn_type Key: AIRFLOW-800 URL: https://issues.apache.org/jira/browse/AIRFLOW-800 Project: Apache Airflow Issue Type: Bug Components: utils Reporter: Wilson Lian Assignee: Wilson Lian Priority: Minor {{airflow initdb}} creates a connection with conn_id='bigquery_default' and conn_type='bigquery'. However, bigquery is not a valid conn_type, according to models.Connection._types, and BigQuery connections should use the google_cloud_platform conn_type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-300) Add Google Pubsub hook and operator
[ https://issues.apache.org/jira/browse/AIRFLOW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian reassigned AIRFLOW-300: --- Assignee: Wilson Lian > Add Google Pubsub hook and operator > --- > > Key: AIRFLOW-300 > URL: https://issues.apache.org/jira/browse/AIRFLOW-300 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, gcp, hooks, operators >Affects Versions: Airflow 1.7.1.3 >Reporter: Chris Riccomini >Assignee: Wilson Lian > > We've had several use cases where we'd like to publish messages from Airflow > to Google pubsub (usually when a DAG finishes, or data is pushed, or > something). It'd be nice if Airflow supported this. Should be able to build > off of existing base GCP hook. > I don't think that we should support consuming as part of this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-1401) Standardize GCP project, region, and zone argument names
[ https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082942#comment-16082942 ] Wilson Lian commented on AIRFLOW-1401: -- Thanks for writing this up, Peter. This looks good to me. > Standardize GCP project, region, and zone argument names > > > Key: AIRFLOW-1401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1401 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: 1.8.1 >Reporter: Peter Dolan >Assignee: Peter Dolan > > At the moment, there isn't standard usage of operator arguments for Google > Cloud Platform across the contributions, primarily in the usage of the > parameter meaning the GCP project name/id. This makes it difficult to specify > default_arguments that work across all GCP-centric operators in a graph. > Using the command `grep -r project airflow/contrib/*`, we can see these uses: > project_id: > * gcp_dataproc_hook > * datastore_hook > * gcp_api_base_hook > * bigquery_hook > * dataproc_operator > * bigquery_sensor > project: > * gcp_pubsub_hook (here 'project' means project id or project name, which > does not fully understand the distinction within GCP between project id and > project name as elements of the REST api) > * dataflow_operator (see note below) > * pubsub_operator > project_name: > * gcp_cloudml_hook > * cloudml_operator > Notably, the Dataflow Operator diverges from the pattern of using top-level > operator parameters by specifying an options dict, which can be populated by > the dataflow_default_options dict. This can contain 'project', and 'zone.' > Within the GCP API, there are three fields used: project number, project id, > and project name. More details are here: > https://cloud.google.com/resource-manager/reference/rest/v1/projects. > Briefly, project number is an auto-assigned unique int64 assigned by GCP to > identify the project. Project ID is a 6-30 character unique user-assigned id. > Project name is a user-assigned display name for the project, which need to > be unique, and cannot be used to identify the project to the service. When > users think of their project id, name, or other identifier within the context > of API calls, they are almost certainly thinking of the project id. > This improvement proposes to standardize the above operators (at least) on > * project_id (meaning '' in this example request: GET > https://www.googleapis.com/compute/v1/projects//zones//instances/) > * region > * zone > This can be done by changing the names of parameters of operators and hooks > that were not included in the 1.8.1 release (cloud ml and pubsub), and by > adding parameters to operators and hooks that were included in 1.8.1 (and > internally copying the old parameter name to the new one, and deprecating the > old one). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-2062) Support fine-grained Connection encryption
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian updated AIRFLOW-2062: - Description: This effort targets containerized tasks (e.g., those launched by KubernetesExecutor). Under that paradigm, each task could potentially operate under different credentials, and fine-grained Connection encryption will enable an administrator to restrict which connections can be accessed by which tasks. (was: This entails adding columns to the Connection table to store connection extra field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption. To avoid a chicken and egg problem, the cryptoKey must be accessible using application default credentials. In the meantime, a workaround is to create a subclass of SubDagOperator in which the "business" task depends on a task that decrypts the key, places it into a temp file in shared storage, and sets up a new Airflow Connection referencing it; and afterwards another task deletes the temp file and Airflow Connection) Summary: Support fine-grained Connection encryption (was: Support just-in-time decryption of Connection credentials) > Support fine-grained Connection encryption > -- > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Priority: Minor > > This effort targets containerized tasks (e.g., those launched by > KubernetesExecutor). Under that paradigm, each task could potentially operate > under different credentials, and fine-grained Connection encryption will > enable an administrator to restrict which connections can be accessed by > which tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian updated AIRFLOW-2062: - Description: This entails adding columns to the Connection table to store connection extra field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption. To avoid a chicken and egg problem, the cryptoKey must be accessible using application default credentials. In the meantime, a workaround is to create a subclass of SubDagOperator in which the "business" task depends on a task that decrypts the key, places it into a temp file in shared storage, and sets up a new Airflow Connection referencing it; and afterwards another task deletes the temp file and Airflow Connection was: This entails adding a connection extra field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption. To avoid a chicken and egg problem, the cryptoKey must be accessible using application default credentials. In the meantime, a workaround is to create a subclass of SubDagOperator in which the "business" task depends on a task that decrypts the key, places it into a temp file in shared storage, and sets up a new Airflow Connection referencing it; and afterwards another task deletes the temp file and Airflow Connection > Support just-in-time decryption of Connection credentials > - > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Priority: Minor > > This entails adding columns to the Connection table to store connection extra > field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption. > To avoid a chicken and egg problem, the cryptoKey must be accessible using > application default credentials. > In the meantime, a workaround is to create a subclass of SubDagOperator in > which the "business" task depends on a task that decrypts the key, places it > into a temp file in shared storage, and sets up a new Airflow Connection > referencing it; and afterwards another task deletes the temp file and Airflow > Connection -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian updated AIRFLOW-2062: - Summary: Support just-in-time decryption of Connection credentials (was: Support just-in-time decryption of Connection credentials in GoogleCloudBaseHook) > Support just-in-time decryption of Connection credentials > - > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Priority: Minor > > This entails adding a connection extra field to store a path to a GCP Cloud > KMS cryptoKey to be used for decryption. > To avoid a chicken and egg problem, the cryptoKey must be accessible using > application default credentials. > In the meantime, a workaround is to create a subclass of SubDagOperator in > which the "business" task depends on a task that decrypts the key, places it > into a temp file in shared storage, and sets up a new Airflow Connection > referencing it; and afterwards another task deletes the temp file and Airflow > Connection -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials in GoogleCloudBaseHook
Wilson Lian created AIRFLOW-2062: Summary: Support just-in-time decryption of Connection credentials in GoogleCloudBaseHook Key: AIRFLOW-2062 URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 Project: Apache Airflow Issue Type: Improvement Components: contrib Reporter: Wilson Lian This entails adding a connection extra field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption. To avoid a chicken and egg problem, the cryptoKey must be accessible using application default credentials. In the meantime, a workaround is to create a subclass of SubDagOperator in which the "business" task depends on a task that decrypts the key, places it into a temp file in shared storage, and sets up a new Airflow Connection referencing it; and afterwards another task deletes the temp file and Airflow Connection -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3143) Support auto-zone in DataprocClusterCreateOperator
Wilson Lian created AIRFLOW-3143: Summary: Support auto-zone in DataprocClusterCreateOperator Key: AIRFLOW-3143 URL: https://issues.apache.org/jira/browse/AIRFLOW-3143 Project: Apache Airflow Issue Type: Improvement Components: contrib, operators Reporter: Wilson Lian [Dataproc Auto-zone|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone] allows users to omit the zone when creating a cluster, and the service will pick a zone in the chosen region. Providing an empty string or None for `zone` would match up with how users would request auto-zone via direct API access, but as-is the DataprocClusterCreateOperator makes a bad API request when such values are passed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3154) Failed attempt to send SLA miss email blocks scheduling for DAG with miss
Wilson Lian created AIRFLOW-3154: Summary: Failed attempt to send SLA miss email blocks scheduling for DAG with miss Key: AIRFLOW-3154 URL: https://issues.apache.org/jira/browse/AIRFLOW-3154 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.9.0 Reporter: Wilson Lian Haven't tested for non-Sendgrid email backends, but when [email]email_backend = airflow.contrib.utils.sendgrid.send_email, and a DAG's SLA miss email fails to send (e.g., authorization error), the scheduler stops scheduling tasks for that DAG. Other DAGs still run fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id
Wilson Lian created AIRFLOW-3550: Summary: GKEClusterHook doesn't use gcp_conn_id Key: AIRFLOW-3550 URL: https://issues.apache.org/jira/browse/AIRFLOW-3550 Project: Apache Airflow Issue Type: Bug Components: contrib Affects Versions: 1.10.1, 1.10.0 Reporter: Wilson Lian The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using the default service account (if present). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3495) DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided
Wilson Lian created AIRFLOW-3495: Summary: DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided Key: AIRFLOW-3495 URL: https://issues.apache.org/jira/browse/AIRFLOW-3495 Project: Apache Airflow Issue Type: Bug Components: contrib Reporter: Wilson Lian Exactly 1 of the query and query_uri params will be used. It should be an error to provide more than one. Fixing this will make cases like [this|https://stackoverflow.com/questions/53424091/unable-to-query-using-file-in-data-proc-hive-operator] less confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3497) Allow for port configuration in KubernetesPodOperator
Wilson Lian created AIRFLOW-3497: Summary: Allow for port configuration in KubernetesPodOperator Key: AIRFLOW-3497 URL: https://issues.apache.org/jira/browse/AIRFLOW-3497 Project: Apache Airflow Issue Type: Improvement Components: contrib Reporter: Wilson Lian Allowing for ports to be configured would enable cross-pod communication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3496) Support multi-container pod in KubernetesPodOperator
Wilson Lian created AIRFLOW-3496: Summary: Support multi-container pod in KubernetesPodOperator Key: AIRFLOW-3496 URL: https://issues.apache.org/jira/browse/AIRFLOW-3496 Project: Apache Airflow Issue Type: Improvement Components: contrib Reporter: Wilson Lian KubernetesPodOperator currently only allows one to run a single-container pod (aside from init containers). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3401) Properly encode templated fields in Cloud Pub/Sub example DAG
Wilson Lian created AIRFLOW-3401: Summary: Properly encode templated fields in Cloud Pub/Sub example DAG Key: AIRFLOW-3401 URL: https://issues.apache.org/jira/browse/AIRFLOW-3401 Project: Apache Airflow Issue Type: Bug Components: contrib, examples Reporter: Wilson Lian Context: [https://groups.google.com/d/msg/cloud-composer-discuss/McHHu582G7o/7N66GrwsBAAJ|http://example.com] -- This message was sent by Atlassian JIRA (v7.6.3#76005)