[jira] [Created] (AIRFLOW-800) Initialize default Google BigQuery Connection with valid conn_type

2017-01-24 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-800:
---

 Summary: Initialize default Google BigQuery Connection with valid 
conn_type
 Key: AIRFLOW-800
 URL: https://issues.apache.org/jira/browse/AIRFLOW-800
 Project: Apache Airflow
  Issue Type: Bug
  Components: utils
Reporter: Wilson Lian
Assignee: Wilson Lian
Priority: Minor


{{airflow initdb}} creates a connection with conn_id='bigquery_default' and 
conn_type='bigquery'. However, bigquery is not a valid conn_type, according to 
models.Connection._types, and BigQuery connections should use the 
google_cloud_platform conn_type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-300) Add Google Pubsub hook and operator

2017-01-05 Thread Wilson Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilson Lian reassigned AIRFLOW-300:
---

Assignee: Wilson Lian

> Add Google Pubsub hook and operator
> ---
>
> Key: AIRFLOW-300
> URL: https://issues.apache.org/jira/browse/AIRFLOW-300
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, gcp, hooks, operators
>Affects Versions: Airflow 1.7.1.3
>Reporter: Chris Riccomini
>Assignee: Wilson Lian
>
> We've had several use cases where we'd like to publish messages from Airflow 
> to Google pubsub (usually when a DAG finishes, or data is pushed, or 
> something). It'd be nice if Airflow supported this. Should be able to build 
> off of existing base GCP hook.
> I don't think that we should support consuming as part of this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-1401) Standardize GCP project, region, and zone argument names

2017-07-11 Thread Wilson Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082942#comment-16082942
 ] 

Wilson Lian commented on AIRFLOW-1401:
--

Thanks for writing this up, Peter. This looks good to me.

> Standardize GCP project, region, and zone argument names
> 
>
> Key: AIRFLOW-1401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1401
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.8.1
>Reporter: Peter Dolan
>Assignee: Peter Dolan
>
> At the moment, there isn't standard usage of operator arguments for Google 
> Cloud Platform across the contributions, primarily in the usage of the 
> parameter meaning the GCP project name/id. This makes it difficult to specify 
> default_arguments that work across all GCP-centric operators in a graph.
> Using the command `grep -r project airflow/contrib/*`, we can see these uses:
> project_id:
>  * gcp_dataproc_hook
>  * datastore_hook
>  * gcp_api_base_hook
>  * bigquery_hook
>  * dataproc_operator
>  * bigquery_sensor
> project:
>  * gcp_pubsub_hook (here 'project' means project id or project name, which 
> does not fully understand the distinction within GCP between project id and 
> project name as elements of the REST api)
>  * dataflow_operator (see note below)
>  * pubsub_operator
> project_name:
>  * gcp_cloudml_hook
>  * cloudml_operator
> Notably, the Dataflow Operator diverges from the pattern of using top-level 
> operator parameters by specifying an options dict, which can be populated by 
> the dataflow_default_options dict. This can contain 'project', and 'zone.'
> Within the GCP API, there are three fields used: project number, project id, 
> and project name. More details are here: 
> https://cloud.google.com/resource-manager/reference/rest/v1/projects. 
> Briefly, project number is an auto-assigned unique int64 assigned by GCP to 
> identify the project. Project ID is a 6-30 character unique user-assigned id. 
> Project name is a user-assigned display name for the project, which need to 
> be unique, and cannot be used to identify the project to the service. When 
> users think of their project id, name, or other identifier within the context 
> of API calls, they are almost certainly thinking of the project id.
> This improvement proposes to standardize the above operators (at least) on
>  * project_id (meaning '' in this example request: GET 
> https://www.googleapis.com/compute/v1/projects//zones//instances/)
>  * region
>  * zone
> This can be done by changing the names of parameters of operators and hooks 
> that were not included in the 1.8.1 release (cloud ml and pubsub), and by 
> adding parameters to operators and hooks that were included in 1.8.1 (and 
> internally copying the old parameter name to the new one, and deprecating the 
> old one).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-2062) Support fine-grained Connection encryption

2018-06-29 Thread Wilson Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilson Lian updated AIRFLOW-2062:
-
Description: This effort targets containerized tasks (e.g., those launched 
by KubernetesExecutor). Under that paradigm, each task could potentially 
operate under different credentials, and fine-grained Connection encryption 
will enable an administrator to restrict which connections can be accessed by 
which tasks.  (was: This entails adding columns to the Connection table to 
store connection extra field to store a path to a GCP Cloud KMS cryptoKey to be 
used for decryption.

To avoid a chicken and egg problem, the cryptoKey must be accessible using 
application default credentials.

In the meantime, a workaround is to create a subclass of SubDagOperator in 
which the "business" task depends on a task that decrypts the key, places it 
into a temp file in shared storage, and sets up a new Airflow Connection 
referencing it; and afterwards another task deletes the temp file and Airflow 
Connection)
Summary: Support fine-grained Connection encryption  (was: Support 
just-in-time decryption of Connection credentials)

> Support fine-grained Connection encryption
> --
>
> Key: AIRFLOW-2062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Wilson Lian
>Priority: Minor
>
> This effort targets containerized tasks (e.g., those launched by 
> KubernetesExecutor). Under that paradigm, each task could potentially operate 
> under different credentials, and fine-grained Connection encryption will 
> enable an administrator to restrict which connections can be accessed by 
> which tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials

2018-06-29 Thread Wilson Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilson Lian updated AIRFLOW-2062:
-
Description: 
This entails adding columns to the Connection table to store connection extra 
field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption.

To avoid a chicken and egg problem, the cryptoKey must be accessible using 
application default credentials.

In the meantime, a workaround is to create a subclass of SubDagOperator in 
which the "business" task depends on a task that decrypts the key, places it 
into a temp file in shared storage, and sets up a new Airflow Connection 
referencing it; and afterwards another task deletes the temp file and Airflow 
Connection

  was:
This entails adding a connection extra field to store a path to a GCP Cloud KMS 
cryptoKey to be used for decryption.

To avoid a chicken and egg problem, the cryptoKey must be accessible using 
application default credentials.

In the meantime, a workaround is to create a subclass of SubDagOperator in 
which the "business" task depends on a task that decrypts the key, places it 
into a temp file in shared storage, and sets up a new Airflow Connection 
referencing it; and afterwards another task deletes the temp file and Airflow 
Connection


> Support just-in-time decryption of Connection credentials
> -
>
> Key: AIRFLOW-2062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Wilson Lian
>Priority: Minor
>
> This entails adding columns to the Connection table to store connection extra 
> field to store a path to a GCP Cloud KMS cryptoKey to be used for decryption.
> To avoid a chicken and egg problem, the cryptoKey must be accessible using 
> application default credentials.
> In the meantime, a workaround is to create a subclass of SubDagOperator in 
> which the "business" task depends on a task that decrypts the key, places it 
> into a temp file in shared storage, and sets up a new Airflow Connection 
> referencing it; and afterwards another task deletes the temp file and Airflow 
> Connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials

2018-06-27 Thread Wilson Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilson Lian updated AIRFLOW-2062:
-
Summary: Support just-in-time decryption of Connection credentials  (was: 
Support just-in-time decryption of Connection credentials in 
GoogleCloudBaseHook)

> Support just-in-time decryption of Connection credentials
> -
>
> Key: AIRFLOW-2062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Wilson Lian
>Priority: Minor
>
> This entails adding a connection extra field to store a path to a GCP Cloud 
> KMS cryptoKey to be used for decryption.
> To avoid a chicken and egg problem, the cryptoKey must be accessible using 
> application default credentials.
> In the meantime, a workaround is to create a subclass of SubDagOperator in 
> which the "business" task depends on a task that decrypts the key, places it 
> into a temp file in shared storage, and sets up a new Airflow Connection 
> referencing it; and afterwards another task deletes the temp file and Airflow 
> Connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2062) Support just-in-time decryption of Connection credentials in GoogleCloudBaseHook

2018-02-02 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-2062:


 Summary: Support just-in-time decryption of Connection credentials 
in GoogleCloudBaseHook
 Key: AIRFLOW-2062
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Wilson Lian


This entails adding a connection extra field to store a path to a GCP Cloud KMS 
cryptoKey to be used for decryption.

To avoid a chicken and egg problem, the cryptoKey must be accessible using 
application default credentials.

In the meantime, a workaround is to create a subclass of SubDagOperator in 
which the "business" task depends on a task that decrypts the key, places it 
into a temp file in shared storage, and sets up a new Airflow Connection 
referencing it; and afterwards another task deletes the temp file and Airflow 
Connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3143) Support auto-zone in DataprocClusterCreateOperator

2018-10-02 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3143:


 Summary: Support auto-zone in DataprocClusterCreateOperator
 Key: AIRFLOW-3143
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3143
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib, operators
Reporter: Wilson Lian


[Dataproc 
Auto-zone|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone]
 allows users to omit the zone when creating a cluster, and the service will 
pick a zone in the chosen region.

Providing an empty string or None for `zone` would match up with how users 
would request auto-zone via direct API access, but as-is the 
DataprocClusterCreateOperator makes a bad API request when such values are 
passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3154) Failed attempt to send SLA miss email blocks scheduling for DAG with miss

2018-10-03 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3154:


 Summary: Failed attempt to send SLA miss email blocks scheduling 
for DAG with miss
 Key: AIRFLOW-3154
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3154
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: 1.9.0
Reporter: Wilson Lian


Haven't tested for non-Sendgrid email backends, but when [email]email_backend = 
airflow.contrib.utils.sendgrid.send_email, and a DAG's SLA miss email fails to 
send (e.g., authorization error), the scheduler stops scheduling tasks for that 
DAG.

 

Other DAGs still run fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3550) GKEClusterHook doesn't use gcp_conn_id

2018-12-20 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3550:


 Summary: GKEClusterHook doesn't use gcp_conn_id
 Key: AIRFLOW-3550
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3550
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Affects Versions: 1.10.1, 1.10.0
Reporter: Wilson Lian


The hook doesn't inherit from GoogleCloudBaseHook. API calls are made using the 
default service account (if present).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3495) DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3495:


 Summary: DataProcSparkSqlOperator and DataProcHiveOperator should 
raise error when query and query_uri are both provided
 Key: AIRFLOW-3495
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3495
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib
Reporter: Wilson Lian


Exactly 1 of the query and query_uri params will be used. It should be an error 
to provide more than one. Fixing this will make cases like 
[this|https://stackoverflow.com/questions/53424091/unable-to-query-using-file-in-data-proc-hive-operator]
 less confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3497) Allow for port configuration in KubernetesPodOperator

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3497:


 Summary: Allow for port configuration in KubernetesPodOperator
 Key: AIRFLOW-3497
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3497
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Wilson Lian


Allowing for ports to be configured would enable cross-pod communication. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3496) Support multi-container pod in KubernetesPodOperator

2018-12-10 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3496:


 Summary: Support multi-container pod in KubernetesPodOperator
 Key: AIRFLOW-3496
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3496
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Wilson Lian


KubernetesPodOperator currently only allows one to run a single-container pod 
(aside from init containers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3401) Properly encode templated fields in Cloud Pub/Sub example DAG

2018-11-26 Thread Wilson Lian (JIRA)
Wilson Lian created AIRFLOW-3401:


 Summary: Properly encode templated fields in Cloud Pub/Sub example 
DAG
 Key: AIRFLOW-3401
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3401
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib, examples
Reporter: Wilson Lian


Context: 
[https://groups.google.com/d/msg/cloud-composer-discuss/McHHu582G7o/7N66GrwsBAAJ|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)