[jira] [Assigned] (AIRFLOW-3871) Allow Jinja templating recursively on object attributes

2019-09-18 Thread Galak (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Galak reassigned AIRFLOW-3871:
--

Assignee: Galak  (was: Björn Pollex)

> Allow Jinja templating recursively on object attributes
> ---
>
> Key: AIRFLOW-3871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3871
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.10.0
>Reporter: Galak
>Assignee: Galak
>Priority: Minor
> Fix For: 1.10.6
>
>
> Some {{Operator}} fields can be templated (using Jinja). Template rendering 
> only works for string values (either direct values or values stored in 
> collections).
> But a templated string inside a custom class instance won't be rendered
> Here is my scenario: 
> I have a python method {{transform_data_file}} which is designed to call a 
> command object. This command object constructor 
> ({{MyAwesomeDataFileTransformer}}) has parameters that could be templated. 
> These templated parameters are not rendered so far (see 
> {{BaseOperator.render_template_from_field}} method). 
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=transform_data_file,
> templates_dict={
>   'transformer': MyAwesomeDataFileTransformer(
> "/data/{{ dag.dag_id }}/{{ ts }}/input_file",
> "/data/{{ dag.dag_id }}/{{ ts }}/output_file",
> )
> },
> dag=dag
> )
> {code}
> I have 3 alternatives in mind to allow rendering inner attributes:
> # Either define an Abstract Base Class declaring an abstract method 
> {{render_template}}; then my command object would have to extend this 
> Abstract Base Class, and then implement {{render_template}} method.
> # Or use duck typing in {{BaseOperator.render_template_from_field}} to call 
> {{render_template}} method when it exists on templated custom objects; then 
> my command object would just have to implement {{render_template}} method.
> # Or traverse object attributes when rendering templates and call 
> {{BaseOperator.render_template}} recursively; then my command object would 
> not need any change
> My preferred solution is the 3rd one, but I would like to hear about your 
> opinion on this before. Maybe is there a 4th and better solution?
> I would be glad to submit a PR if this functionality is accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-4833) Jinja templating removes newlines

2019-08-27 Thread Galak (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916714#comment-16916714
 ] 

Galak commented on AIRFLOW-4833:


the dag additional attribute will actually be called 
{{jinja_environment_kwargs}}.

It gives something like:
{code:python}
DAG(dag_id='my-dag',
jinja_environment_kwargs={
'keep_trailing_newline': True,
# some other jinja2 Environment options here
}){code}
see this [thread on 
slack|https://apache-airflow.slack.com/archives/CCPRP7943/p1566908457361800] 
for more information.

 

> Jinja templating removes newlines
> -
>
> Key: AIRFLOW-4833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4833
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Francesco Macagno
>Assignee: Galak
>Priority: Minor
>
> When using an operator that has Jinja templating enabled for a field, if the 
> field value ends with a newline then the newline is removed, regardless of 
> whether there was a template in the string.
>  
> This came up when attempting to send data to Prometheus pushgateway using the 
> SimpleHttpOperator. Pushgateway requires a newline at the end of every entry, 
> so the removal of the newline at the end of the data parameter causes the 
> request to fail in a way that is difficult to debug.
>  
> This can be gotten around by including a space after the newline character, 
> though this is not a great solution. The space is ignored by pushgateway.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (AIRFLOW-4833) Jinja templating removes newlines

2019-08-27 Thread Galak (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Galak reassigned AIRFLOW-4833:
--

Assignee: Galak

> Jinja templating removes newlines
> -
>
> Key: AIRFLOW-4833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4833
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Francesco Macagno
>Assignee: Galak
>Priority: Minor
>
> When using an operator that has Jinja templating enabled for a field, if the 
> field value ends with a newline then the newline is removed, regardless of 
> whether there was a template in the string.
>  
> This came up when attempting to send data to Prometheus pushgateway using the 
> SimpleHttpOperator. Pushgateway requires a newline at the end of every entry, 
> so the removal of the newline at the end of the data parameter causes the 
> request to fail in a way that is difficult to debug.
>  
> This can be gotten around by including a space after the newline character, 
> though this is not a great solution. The space is ignored by pushgateway.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-4451) [AIRFLOW-1814] converts namedtuples args in PythonOperators to lists

2019-07-03 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878149#comment-16878149
 ] 

Galak commented on AIRFLOW-4451:


[~rossmechanic] : can you please provide a short source code example?

> [AIRFLOW-1814] converts namedtuples args in PythonOperators to lists
> 
>
> Key: AIRFLOW-4451
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4451
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.3
>Reporter: Ross Mechanic
>Priority: Major
>
> Upgrading to Airflow 1.10.3 from Airflow 1.10.2 removed support for passing 
> in `namedtuple`s as `op_kwargs` or `op_args` to `PythonOperator`. The 
> specific PR that made the breaking change is 
> [https://github.com/apache/airflow/pull/4691]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4833) Jinja templating removes newlines

2019-07-03 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877967#comment-16877967
 ] 

Galak commented on AIRFLOW-4833:


[~thenumenorean]: I agree with you, {{keep_trailing_newline}} should default to 
{{True}}; but it defaults to {{False}} now, and can't be changed for backward 
compatibility. See [https://github.com/pallets/jinja/issues/848] and 
[https://github.com/pallets/jinja/issues/949]

I'll try to submit a PR for this in the coming weeks

 

> Jinja templating removes newlines
> -
>
> Key: AIRFLOW-4833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4833
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Francesco Macagno
>Priority: Minor
>
> When using an operator that has Jinja templating enabled for a field, if the 
> field value ends with a newline then the newline is removed, regardless of 
> whether there was a template in the string.
>  
> This came up when attempting to send data to Prometheus pushgateway using the 
> SimpleHttpOperator. Pushgateway requires a newline at the end of every entry, 
> so the removal of the newline at the end of the data parameter causes the 
> request to fail in a way that is difficult to debug.
>  
> This can be gotten around by including a space after the newline character, 
> though this is not a great solution. The space is ignored by pushgateway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-4833) Jinja templating removes newlines

2019-06-26 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873304#comment-16873304
 ] 

Galak commented on AIRFLOW-4833:


I had the same issue using Airflow 1.10.2: a trailing newline on a template 
field value is removed.

I dug into the code and discovered this is due to the way a {{Dag}} is 
instantiating a jinja2 environment for template rendering:
{code}
    def get_template_env(self):
    """
    Returns a jinja2 Environment while taking into account the DAGs
    template_searchpath, user_defined_macros and user_defined_filters
    """
    searchpath = [self.folder]
    if self.template_searchpath:
    searchpath += self.template_searchpath

    env = jinja2.Environment(
    loader=jinja2.FileSystemLoader(searchpath),
    undefined=self.template_undefined,
    extensions=["jinja2.ext.do"],
    cache_size=0)
    if self.user_defined_macros:
    env.globals.update(self.user_defined_macros)
    if self.user_defined_filters:
    env.filters.update(self.user_defined_filters)

    return env
 
{code}
 

jinja2.Environment has a property {{keep_trailing_newline}} set to False by 
default
(see 
[https://stackoverflow.com/questions/40832588/jinja2-ignoring-last-new-line] 
and [http://jinja.pocoo.org/docs/2.10/api/#jinja2.Environment]).

 

I would suggest to have a way to add {{jinja2.Environment}} options to a 
{{Dag}}. Something like :
{code:java}
DAG(dag_id='my-dag',
templating_options={
'keep_trailing_newline': True,
# some other jinja2 Environment options here
}){code}
 

What do you think about it ?

> Jinja templating removes newlines
> -
>
> Key: AIRFLOW-4833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4833
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.10.2
>Reporter: Francesco Macagno
>Priority: Minor
>
> When using an operator that has Jinja templating enabled for a field, if the 
> field value ends with a newline then the newline is removed, regardless of 
> whether there was a template in the string.
>  
> This came up when attempting to send data to Prometheus pushgateway using the 
> SimpleHttpOperator. Pushgateway requires a newline at the end of every entry, 
> so the removal of the newline at the end of the data parameter causes the 
> request to fail in a way that is difficult to debug.
>  
> This can be gotten around by including a space after the newline character, 
> though this is not a great solution. The space is ignored by pushgateway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2508) Handle non string types in render_template_from_field

2019-02-21 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774156#comment-16774156
 ] 

Galak edited comment on AIRFLOW-2508 at 2/21/19 3:17 PM:
-

[~bjoern.pollex] : I've just read your comment here. I had not seen it, or 
probably not understood it before...

Since then, I've created another issue around the same topic, and suggested 
several solutions (one of them is the same as you suggested: duck typing): 
https://issues.apache.org/jira/browse/AIRFLOW-3871

I actually implemented another solution based on recursively rendering all 
attributes (see [https://github.com/apache/airflow/pull/4743]), but I could 
submit another PR if duck typing (hook method for template rendering) is 
preferred.

Any feedback would be appreciated

:)

 


was (Author: galak75):
[~bjoern.pollex] : I've just read your comment here. I had not seen it, or 
probably not understood it before...

Since then, I've created another issue around the same topic, and suggested 
several solutions (one of them is the same as you suggested: dick typing): 
https://issues.apache.org/jira/browse/AIRFLOW-3871

I actually implemented another solution based on recursively rendering all 
attributes (see https://github.com/apache/airflow/pull/4743), but I could 
submit another PR if duck typing (hook method for template rendering) is 
preferred.

Any feedback would be appreciated

:)

 

> Handle non string types in render_template_from_field
> -
>
> Key: AIRFLOW-2508
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2508
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 2.0.0
>Reporter: Eugene Brown
>Assignee: Galak
>Priority: Minor
>  Labels: easyfix, newbie
> Fix For: 1.10.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The render_template_from_field method of the BaseOperator class raises an 
> exception when it encounters content that is not a string_type, list, tuple 
> or dict.
> Example exception:
> {noformat}
> airflow.exceptions.AirflowException: Type '' used for parameter 
> 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not 
> supported for templating{noformat}
> I propose instead that when it encounters content of other types it returns 
> the content unchanged, rather than raising an exception.
> Consider this case: I extended the EmrCreateJobFlowOperator to make the 
> job_flow_overrides argument a templatable field. job_flow_overrides is a 
> dictionary with a mix of strings, integers and booleans for values.
> When I extended the class as such:
> {code:java}
> class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
> template_fields = ['job_flow_overrides']{code}
> And added a task to my dag with this format:
> {code:java}
> step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
> task_id="create_cluster",
> job_flow_overrides={
> "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
> "Instances": {
> "InstanceGroups": [
> {
> "Name": "Master nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 1
> },
> {
> "Name": "Slave nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 4
> },
> "TerminationProtected": False
> ]
> },
> "BootstrapActions": [{
>  "Name": "Custom action",
>  "ScriptBootstrapAction": {
>  "Path": "s3://repo/{{ dag_run.conf['branch'] 
> }}/requirements.txt"
>  }
> }],
>},
>aws_conn_id='aws_default',
>emr_conn_id='aws_default',
>dag=dag
> )
> {code}
> The exception I gave above was raised and the step failed. I think it would 
> be preferable for the method to instead pass over numeric and boolean values 
> as users may want to use template_fields in the way I have to template string 
> values in dictionaries or lists of mixed types.
> Here is the render_template_from_field method from the BaseOperator:
> {code:java}
> def render_template_from_field(self, attr, content, context, jinja_env):
> """
> Renders a template from a field. If the field is a string, it will
> simply render the string and return the result. If it is a collection or
> nested set of collections, it will traverse the structure and render
> all strings in it.
> """
> rt = self.render_template
> if isinstance(content, six.string_types):
> result = jinja_env.from_string(content).render(**context)
> elif isinstance(content, (list, tuple)):
> result = 

[jira] [Commented] (AIRFLOW-2508) Handle non string types in render_template_from_field

2019-02-21 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774156#comment-16774156
 ] 

Galak commented on AIRFLOW-2508:


[~bjoern.pollex] : I've just read your comment here. I had not seen it, or 
probably not understood it before...

Since then, I've created another issue around the same topic, and suggested 
several solutions (one of them is the same as you suggested: dick typing): 
https://issues.apache.org/jira/browse/AIRFLOW-3871

I actually implemented another solution based on recursively rendering all 
attributes (see https://github.com/apache/airflow/pull/4743), but I could 
submit another PR if duck typing (hook method for template rendering) is 
preferred.

Any feedback would be appreciated

:)

 

> Handle non string types in render_template_from_field
> -
>
> Key: AIRFLOW-2508
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2508
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 2.0.0
>Reporter: Eugene Brown
>Assignee: Galak
>Priority: Minor
>  Labels: easyfix, newbie
> Fix For: 1.10.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The render_template_from_field method of the BaseOperator class raises an 
> exception when it encounters content that is not a string_type, list, tuple 
> or dict.
> Example exception:
> {noformat}
> airflow.exceptions.AirflowException: Type '' used for parameter 
> 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not 
> supported for templating{noformat}
> I propose instead that when it encounters content of other types it returns 
> the content unchanged, rather than raising an exception.
> Consider this case: I extended the EmrCreateJobFlowOperator to make the 
> job_flow_overrides argument a templatable field. job_flow_overrides is a 
> dictionary with a mix of strings, integers and booleans for values.
> When I extended the class as such:
> {code:java}
> class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
> template_fields = ['job_flow_overrides']{code}
> And added a task to my dag with this format:
> {code:java}
> step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
> task_id="create_cluster",
> job_flow_overrides={
> "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
> "Instances": {
> "InstanceGroups": [
> {
> "Name": "Master nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 1
> },
> {
> "Name": "Slave nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 4
> },
> "TerminationProtected": False
> ]
> },
> "BootstrapActions": [{
>  "Name": "Custom action",
>  "ScriptBootstrapAction": {
>  "Path": "s3://repo/{{ dag_run.conf['branch'] 
> }}/requirements.txt"
>  }
> }],
>},
>aws_conn_id='aws_default',
>emr_conn_id='aws_default',
>dag=dag
> )
> {code}
> The exception I gave above was raised and the step failed. I think it would 
> be preferable for the method to instead pass over numeric and boolean values 
> as users may want to use template_fields in the way I have to template string 
> values in dictionaries or lists of mixed types.
> Here is the render_template_from_field method from the BaseOperator:
> {code:java}
> def render_template_from_field(self, attr, content, context, jinja_env):
> """
> Renders a template from a field. If the field is a string, it will
> simply render the string and return the result. If it is a collection or
> nested set of collections, it will traverse the structure and render
> all strings in it.
> """
> rt = self.render_template
> if isinstance(content, six.string_types):
> result = jinja_env.from_string(content).render(**context)
> elif isinstance(content, (list, tuple)):
> result = [rt(attr, e, context) for e in content]
> elif isinstance(content, dict):
> result = {
> k: rt("{}[{}]".format(attr, k), v, context)
> for k, v in list(content.items())}
> else:
> param_type = type(content)
> msg = (
> "Type '{param_type}' used for parameter '{attr}' is "
> "not supported for templating").format(**locals())
> raise AirflowException(msg)
> return result{code}
>  I propose that the method returns content unchanged if the content is of one 
> of (int, float, complex, bool) types. So my solution would include an extra 
> elif in the form:
> {code}
> elif 

[jira] [Commented] (AIRFLOW-3871) Allow Jinja templating recursively on object attributes

2019-02-20 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1677#comment-1677
 ] 

Galak commented on AIRFLOW-3871:


I opened a new PR for this JIRA issue : 
https://github.com/apache/airflow/pull/4743

It could be a good start for a discussion about this functionality


> Allow Jinja templating recursively on object attributes
> ---
>
> Key: AIRFLOW-3871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3871
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.10.0
>Reporter: Galak
>Priority: Minor
>
> Some {{Operator}} fields can be templated (using Jinja). Template rendering 
> only works for string values (either direct values or values stored in 
> collections).
> But a templated string inside a custom class instance won't be rendered
> Here is my scenario: 
> I have a python method {{transform_data_file}} which is designed to call a 
> command object. This command object constructor 
> ({{MyAwesomeDataFileTransformer}}) has parameters that could be templated. 
> These templated parameters are not rendered so far (see 
> {{BaseOperator.render_template_from_field}} method). 
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=transform_data_file,
> templates_dict={
>   'transformer': MyAwesomeDataFileTransformer(
> "/data/{{ dag.dag_id }}/{{ ts }}/input_file",
> "/data/{{ dag.dag_id }}/{{ ts }}/output_file",
> )
> },
> dag=dag
> )
> {code}
> I have 3 alternatives in mind to allow rendering inner attributes:
> # Either define an Abstract Base Class declaring an abstract method 
> {{render_template}}; then my command object would have to extend this 
> Abstract Base Class, and then implement {{render_template}} method.
> # Or use duck typing in {{BaseOperator.render_template_from_field}} to call 
> {{render_template}} method when it exists on templated custom objects; then 
> my command object would just have to implement {{render_template}} method.
> # Or traverse object attributes when rendering templates and call 
> {{BaseOperator.render_template}} recursively; then my command object would 
> not need any change
> My preferred solution is the 3rd one, but I would like to hear about your 
> opinion on this before. Maybe is there a 4th and better solution?
> I would be glad to submit a PR if this functionality is accepted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3871) Allow Jinja templating recursively on object attributes

2019-02-15 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769448#comment-16769448
 ] 

Galak commented on AIRFLOW-3871:


Any comment someone? :-)

> Allow Jinja templating recursively on object attributes
> ---
>
> Key: AIRFLOW-3871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3871
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.10.0
>Reporter: Galak
>Priority: Minor
>
> Some {{Operator}} fields can be templated (using Jinja). Template rendering 
> only works for string values (either direct values or values stored in 
> collections).
> But a templated string inside a custom class instance won't be rendered
> Here is my scenario: 
> I have a python method {{transform_data_file}} which is designed to call a 
> command object. This command object constructor 
> ({{MyAwesomeDataFileTransformer}}) has parameters that could be templated. 
> These templated parameters are not rendered so far (see 
> {{BaseOperator.render_template_from_field}} method). 
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=transform_data_file,
> templates_dict={
>   'transformer': MyAwesomeDataFileTransformer(
> "/data/{{ dag.dag_id }}/{{ ts }}/input_file",
> "/data/{{ dag.dag_id }}/{{ ts }}/output_file",
> )
> },
> dag=dag
> )
> {code}
> I have 3 alternatives in mind to allow rendering inner attributes:
> # Either define an Abstract Base Class declaring an abstract method 
> {{render_template}}; then my command object would have to extend this 
> Abstract Base Class, and then implement {{render_template}} method.
> # Or use duck typing in {{BaseOperator.render_template_from_field}} to call 
> {{render_template}} method when it exists on templated custom objects; then 
> my command object would just have to implement {{render_template}} method.
> # Or traverse object attributes when rendering templates and call 
> {{BaseOperator.render_template}} recursively; then my command object would 
> not need any change
> My preferred solution is the 3rd one, but I would like to hear about your 
> opinion on this before. Maybe is there a 4th and better solution?
> I would be glad to submit a PR if this functionality is accepted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3871) Allow Jinja templating recursively on object attributes

2019-02-11 Thread Galak (JIRA)
Galak created AIRFLOW-3871:
--

 Summary: Allow Jinja templating recursively on object attributes
 Key: AIRFLOW-3871
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3871
 Project: Apache Airflow
  Issue Type: Wish
  Components: operators
Affects Versions: 1.10.0
Reporter: Galak


Some {{Operator}} fields can be templated (using Jinja). Template rendering 
only works for string values (either direct values or values stored in 
collections).
But a templated string inside a custom class instance won't be rendered

Here is my scenario: 
I have a python method {{transform_data_file}} which is designed to call a 
command object. This command object constructor 
({{MyAwesomeDataFileTransformer}}) has parameters that could be templated. 
These templated parameters are not rendered so far (see 
{{BaseOperator.render_template_from_field}} method). 

{code}
simple_task = PythonOperator(
task_id='simple_task',
provide_context=True,
python_callable=transform_data_file,
templates_dict={
'transformer': MyAwesomeDataFileTransformer(
"/data/{{ dag.dag_id }}/{{ ts }}/input_file",
"/data/{{ dag.dag_id }}/{{ ts }}/output_file",
)
},
dag=dag
)
{code}

I have 3 alternatives in mind to allow rendering inner attributes:
# Either define an Abstract Base Class declaring an abstract method 
{{render_template}}; then my command object would have to extend this Abstract 
Base Class, and then implement {{render_template}} method.
# Or use duck typing in {{BaseOperator.render_template_from_field}} to call 
{{render_template}} method when it exists on templated custom objects; then my 
command object would just have to implement {{render_template}} method.
# Or traverse object attributes when rendering templates and call 
{{BaseOperator.render_template}} recursively; then my command object would not 
need any change

My preferred solution is the 3rd one, but I would like to hear about your 
opinion on this before. Maybe is there a 4th and better solution?

I would be glad to submit a PR if this functionality is accepted.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-1814) Add op_args and op_kwargs in PythonOperator templated fields

2019-02-11 Thread Galak (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1814 started by Galak.
--
> Add op_args and op_kwargs in PythonOperator templated fields
> 
>
> Key: AIRFLOW-1814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1814
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Galak
>Assignee: Galak
>Priority: Minor
>
> *I'm wondering if "_op_args_" and "_op_kwargs_" PythonOperator parameters 
> could be templated.*
> I have 2 different use cases where this change could help a lot:
> +1/ Provide some job execution information as a python callable argument:+
> let's explain it through a simple example:
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=extract_data,
> op_args=[
>   "my_db_connection_id"
>   "select * from my_table"
>   "/data/{{ dag.dag_id }}/{{ ts }}/my_export.csv"
> ],
> dag=dag
> )
> {code}
> "extract_data" python function seems to be simple here, but it could be 
> anything re-usable in multiple dags...
> +2/ Provide some XCom value as a python callable argument:+
> Let's say I a have a task which is retrieving or calculating a value, and 
> then storing it in an XCom for further use by other tasks:
> {code}
> value_producer_task = PythonOperator(
> task_id='value_producer_task',
> provide_context=True,
> python_callable=produce_value,
> op_args=[
>   "my_db_connection_id",
>   "some_other_static_parameter",
>   "my_xcom_key"
> ],
> dag=dag
> )
> {code}
> Then I can just configure a PythonCallable task to use the produced value:
> {code}
> value_consumer_task = PythonOperator(
> task_id='value_consumer_task',
> provide_context=True,
> python_callable=consume_value,
> op_args=[
>   "{{ task_instance.xcom_pull(task_ids=None, key='my_xcom_key') }}"
> ],
> dag=dag
> )
> {code}
> I quickly tried the following class:
> {code}
> from airflow.operators.python_operator import PythonOperator
> class MyPythonOperator(PythonOperator):
> template_fields = PythonOperator.template_fields + ('op_args', 
> 'op_kwargs')
> {code}
> and it worked like a charm.
> So could these 2 arguments be added to templated_fields? Or did I miss some 
> major drawback to this change?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1814) Add op_args and op_kwargs in PythonOperator templated fields

2019-02-08 Thread Galak (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Galak reassigned AIRFLOW-1814:
--

Assignee: Galak

> Add op_args and op_kwargs in PythonOperator templated fields
> 
>
> Key: AIRFLOW-1814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1814
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Galak
>Assignee: Galak
>Priority: Minor
>
> *I'm wondering if "_op_args_" and "_op_kwargs_" PythonOperator parameters 
> could be templated.*
> I have 2 different use cases where this change could help a lot:
> +1/ Provide some job execution information as a python callable argument:+
> let's explain it through a simple example:
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=extract_data,
> op_args=[
>   "my_db_connection_id"
>   "select * from my_table"
>   "/data/{{ dag.dag_id }}/{{ ts }}/my_export.csv"
> ],
> dag=dag
> )
> {code}
> "extract_data" python function seems to be simple here, but it could be 
> anything re-usable in multiple dags...
> +2/ Provide some XCom value as a python callable argument:+
> Let's say I a have a task which is retrieving or calculating a value, and 
> then storing it in an XCom for further use by other tasks:
> {code}
> value_producer_task = PythonOperator(
> task_id='value_producer_task',
> provide_context=True,
> python_callable=produce_value,
> op_args=[
>   "my_db_connection_id",
>   "some_other_static_parameter",
>   "my_xcom_key"
> ],
> dag=dag
> )
> {code}
> Then I can just configure a PythonCallable task to use the produced value:
> {code}
> value_consumer_task = PythonOperator(
> task_id='value_consumer_task',
> provide_context=True,
> python_callable=consume_value,
> op_args=[
>   "{{ task_instance.xcom_pull(task_ids=None, key='my_xcom_key') }}"
> ],
> dag=dag
> )
> {code}
> I quickly tried the following class:
> {code}
> from airflow.operators.python_operator import PythonOperator
> class MyPythonOperator(PythonOperator):
> template_fields = PythonOperator.template_fields + ('op_args', 
> 'op_kwargs')
> {code}
> and it worked like a charm.
> So could these 2 arguments be added to templated_fields? Or did I miss some 
> major drawback to this change?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1814) Add op_args and op_kwargs in PythonOperator templated fields

2019-02-07 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762788#comment-16762788
 ] 

Galak commented on AIRFLOW-1814:


Since [AIRFLOW-2508] has been fixed, the blocker issue described above is not 
relevant anymore.
I can work on this improvement and submit a PR.
Any objection? 

> Add op_args and op_kwargs in PythonOperator templated fields
> 
>
> Key: AIRFLOW-1814
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1814
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: operators
>Affects Versions: 1.8.0
>Reporter: Galak
>Priority: Minor
>
> *I'm wondering if "_op_args_" and "_op_kwargs_" PythonOperator parameters 
> could be templated.*
> I have 2 different use cases where this change could help a lot:
> +1/ Provide some job execution information as a python callable argument:+
> let's explain it through a simple example:
> {code}
> simple_task = PythonOperator(
> task_id='simple_task',
> provide_context=True,
> python_callable=extract_data,
> op_args=[
>   "my_db_connection_id"
>   "select * from my_table"
>   "/data/{{ dag.dag_id }}/{{ ts }}/my_export.csv"
> ],
> dag=dag
> )
> {code}
> "extract_data" python function seems to be simple here, but it could be 
> anything re-usable in multiple dags...
> +2/ Provide some XCom value as a python callable argument:+
> Let's say I a have a task which is retrieving or calculating a value, and 
> then storing it in an XCom for further use by other tasks:
> {code}
> value_producer_task = PythonOperator(
> task_id='value_producer_task',
> provide_context=True,
> python_callable=produce_value,
> op_args=[
>   "my_db_connection_id",
>   "some_other_static_parameter",
>   "my_xcom_key"
> ],
> dag=dag
> )
> {code}
> Then I can just configure a PythonCallable task to use the produced value:
> {code}
> value_consumer_task = PythonOperator(
> task_id='value_consumer_task',
> provide_context=True,
> python_callable=consume_value,
> op_args=[
>   "{{ task_instance.xcom_pull(task_ids=None, key='my_xcom_key') }}"
> ],
> dag=dag
> )
> {code}
> I quickly tried the following class:
> {code}
> from airflow.operators.python_operator import PythonOperator
> class MyPythonOperator(PythonOperator):
> template_fields = PythonOperator.template_fields + ('op_args', 
> 'op_kwargs')
> {code}
> and it worked like a charm.
> So could these 2 arguments be added to templated_fields? Or did I miss some 
> major drawback to this change?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2508) Handle non string types in render_template_from_field

2019-01-21 Thread Galak (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748225#comment-16748225
 ] 

Galak commented on AIRFLOW-2508:


[Pull Request #4292|https://github.com/apache/incubator-airflow/pull/4292] is 
waiting for a code review.

> Handle non string types in render_template_from_field
> -
>
> Key: AIRFLOW-2508
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2508
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 2.0.0
>Reporter: Eugene Brown
>Assignee: Galak
>Priority: Minor
>  Labels: easyfix, newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The render_template_from_field method of the BaseOperator class raises an 
> exception when it encounters content that is not a string_type, list, tuple 
> or dict.
> Example exception:
> {noformat}
> airflow.exceptions.AirflowException: Type '' used for parameter 
> 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not 
> supported for templating{noformat}
> I propose instead that when it encounters content of other types it returns 
> the content unchanged, rather than raising an exception.
> Consider this case: I extended the EmrCreateJobFlowOperator to make the 
> job_flow_overrides argument a templatable field. job_flow_overrides is a 
> dictionary with a mix of strings, integers and booleans for values.
> When I extended the class as such:
> {code:java}
> class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
> template_fields = ['job_flow_overrides']{code}
> And added a task to my dag with this format:
> {code:java}
> step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
> task_id="create_cluster",
> job_flow_overrides={
> "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
> "Instances": {
> "InstanceGroups": [
> {
> "Name": "Master nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 1
> },
> {
> "Name": "Slave nodes",
> "InstanceType": "c3.4xlarge",
> "InstanceCount": 4
> },
> "TerminationProtected": False
> ]
> },
> "BootstrapActions": [{
>  "Name": "Custom action",
>  "ScriptBootstrapAction": {
>  "Path": "s3://repo/{{ dag_run.conf['branch'] 
> }}/requirements.txt"
>  }
> }],
>},
>aws_conn_id='aws_default',
>emr_conn_id='aws_default',
>dag=dag
> )
> {code}
> The exception I gave above was raised and the step failed. I think it would 
> be preferable for the method to instead pass over numeric and boolean values 
> as users may want to use template_fields in the way I have to template string 
> values in dictionaries or lists of mixed types.
> Here is the render_template_from_field method from the BaseOperator:
> {code:java}
> def render_template_from_field(self, attr, content, context, jinja_env):
> """
> Renders a template from a field. If the field is a string, it will
> simply render the string and return the result. If it is a collection or
> nested set of collections, it will traverse the structure and render
> all strings in it.
> """
> rt = self.render_template
> if isinstance(content, six.string_types):
> result = jinja_env.from_string(content).render(**context)
> elif isinstance(content, (list, tuple)):
> result = [rt(attr, e, context) for e in content]
> elif isinstance(content, dict):
> result = {
> k: rt("{}[{}]".format(attr, k), v, context)
> for k, v in list(content.items())}
> else:
> param_type = type(content)
> msg = (
> "Type '{param_type}' used for parameter '{attr}' is "
> "not supported for templating").format(**locals())
> raise AirflowException(msg)
> return result{code}
>  I propose that the method returns content unchanged if the content is of one 
> of (int, float, complex, bool) types. So my solution would include an extra 
> elif in the form:
> {code}
> elif isinstance(content, (int, float, complex, bool)):
> result = content
> {code}
>  Are there any reasons this would be a bad idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)