[jira] [Commented] (AIRFLOW-2774) DataFlowPythonOperator needs to support DirectRunner to speed up end-to-end testing of Airflow dag

2018-12-10 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714671#comment-16714671
 ] 

Iuliia Volkova commented on AIRFLOW-2774:
-

but DirectRunner is not a 'local' version of DataflowRunner, it's just a 
different engine that supported by Apache Beam, as SparkRunner or ApexRunner, 
or others. 

> DataFlowPythonOperator needs to support DirectRunner to speed up end-to-end 
> testing of Airflow dag
> --
>
> Key: AIRFLOW-2774
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2774
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Dataflow
>Affects Versions: 1.9.0
>Reporter: Evgeny Podlepaev
>Priority: Minor
>
> **DataFlowPythonOperator needs to support DirectRunner as a runner option to 
> facilitate local end-to-end testing of the entire Airflow dag. Right now if 
> DirectRunner is set via job options, the DataFlowHook will wait infinitely 
> trying to get status of the remote job which does not exist:
> _DataflowJob(self.get_conn(), variables['project'], name,
> variables['region'], self.poll_sleep).wait_for_done()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1951) kerberos keytab , principal command line argument not getting passed to run function

2018-12-09 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714206#comment-16714206
 ] 

Iuliia Volkova commented on AIRFLOW-1951:
-

@ashb, will we close the duplicate? 

> kerberos keytab , principal command line argument not getting passed to run 
> function
> 
>
> Key: AIRFLOW-1951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1951
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1951) kerberos keytab , principal command line argument not getting passed to run function

2018-12-09 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714206#comment-16714206
 ] 

Iuliia Volkova edited comment on AIRFLOW-1951 at 12/10/18 1:46 AM:
---

[~ashb], will we close the duplicate? 


was (Author: xnuinside):
@ashb, will we close the duplicate? 

> kerberos keytab , principal command line argument not getting passed to run 
> function
> 
>
> Key: AIRFLOW-1951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1951
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-28 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-987:
--

Assignee: Iuliia Volkova

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Iuliia Volkova
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1951) kerberos keytab , principal command line argument not getting passed to run function

2018-11-28 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701715#comment-16701715
 ] 

Iuliia Volkova commented on AIRFLOW-1951:
-

https://github.com/apache/incubator-airflow/pull/4238/files - pr for principal 
and keytab , I'm not sure what needed cli arg for ccache, if needed - separate 
the ticket pls

> kerberos keytab , principal command line argument not getting passed to run 
> function
> 
>
> Key: AIRFLOW-1951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1951
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1951) kerberos keytab , principal command line argument not getting passed to run function

2018-11-28 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1951:
---

Assignee: Iuliia Volkova

> kerberos keytab , principal command line argument not getting passed to run 
> function
> 
>
> Key: AIRFLOW-1951
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1951
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3118) DAGs not successful on new installation

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698576#comment-16698576
 ] 

Iuliia Volkova commented on AIRFLOW-3118:
-

https://issues.apache.org/jira/browse/AIRFLOW-1561 - the fix was merged 
[~huyanhvn],[~ashb], will be good to close this task. 

> DAGs not successful on new installation
> ---
>
> Key: AIRFLOW-3118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.0
> Environment: Ubuntu 18.04
> Python 3.6
>Reporter: Brylie Christopher Oxley
>Assignee: Huy Nguyen
>Priority: Blocker
> Fix For: 1.10.2
>
> Attachments: Screenshot_20180926_161837.png, 
> image-2018-09-26-12-39-03-094.png
>
>
> When trying out Airflow, on localhost, none of the DAG runs are getting to 
> the 'success' state. They are getting stuck in 'running', or I manually label 
> them as failed:
> !image-2018-09-26-12-39-03-094.png!
> h2. Steps to reproduce
>  # create new conda environment
>  ** conda create -n airflow
>  ** source activate airflow
>  # install airflow
>  ** pip install apache-airflow
>  # initialize Airflow db
>  ** airflow initdb
>  # disable default paused setting in airflow.cfg
>  ** dags_are_paused_at_creation = False
>  # {color:#6a8759}run airflow and airflow scheduler (in separate 
> terminal){color}
>  ** {color:#6a8759}airflow scheduler{color}
>  ** {color:#6a8759}airflow webserver{color}
>  # {color:#6a8759}unpause example_bash_operator{color}
>  ** {color:#6a8759}airflow unpause example_bash_operator{color}
>  # {color:#6a8759}log in to Airflow UI{color}
>  # {color:#6a8759}turn on example_bash_operator{color}
>  # {color:#6a8759}click "Trigger DAG" in `example_bash_operator` row{color}
> h2. {color:#6a8759}Observed result{color}
> {color:#6a8759}The `example_bash_operator` never leaves the "running" 
> state.{color}
> h2. {color:#6a8759}Expected result{color}
> {color:#6a8759}The `example_bash_operator` would quickly enter the "success" 
> state{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1934) Unable to Launch Example DAG if ~/AIRFLOW_HOME/dags folder is empty

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698575#comment-16698575
 ] 

Iuliia Volkova commented on AIRFLOW-1934:
-

https://issues.apache.org/jira/browse/AIRFLOW-1561 - the fix was merged 
[~ramandumcs],[~ashb], will be good to close this task. 

> Unable to Launch Example DAG if ~/AIRFLOW_HOME/dags folder is empty
> ---
>
> Key: AIRFLOW-1934
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1934
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
> Environment: RHEL
>Reporter: raman
>Priority: Major
>
> Steps to reproduce
> 1. Install airflow
> 2. Keep the ~/{airflow_home}/dags folder empty
> 3. airflow initdb
> 4. airflow webserver and scheduler
> 2. Enable a example DAG and trigger it manually from web UI.
> Result: DAG run gets created in the dag_run table. task_instance table also 
> get relevant enteries but scheduler does not pick the DAG.
> Workaround: Create one sample dag in the ~/{airflow_home}/dags folder and 
> scheduler picks it up.
> The following code in jobs.py seems to be doing the trick but this code is 
> only triggered if there is a dag inside ~/{airflow_home}/dags folder
> File: jobs.py
> Function: _find_executable_task_instances
> ti_query = (
>session
>.query(TI)
>.filter(TI.dag_id.in_(simple_dag_bag.dag_ids))
>.outerjoin(DR,
>and_(DR.dag_id == TI.dag_id,
> DR.execution_date == TI.execution_date))
>.filter(or_(DR.run_id == None,
>not_(DR.run_id.like(BackfillJob.ID_PREFIX + '%'
>.outerjoin(DM, DM.dag_id==TI.dag_id)
>.filter(or_(DM.dag_id == None,
>not_(DM.is_paused)))
>)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3357) Scheduler doesn't work on example DAGs unless there's some dag file to process

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698574#comment-16698574
 ] 

Iuliia Volkova commented on AIRFLOW-3357:
-

https://issues.apache.org/jira/browse/AIRFLOW-1561 - the fix was merged 
[~villasv], [~ashb], will be good to close this task. 

> Scheduler doesn't work on example DAGs unless there's some dag file to process
> --
>
> Key: AIRFLOW-3357
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3357
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Victor Villas Bôas Chaves
>Priority: Major
>
> Having a blank Airflow install, if you try to manually run one of the example 
> DAGs no tasks are going to get queued or executed. They're going to stay with 
> state null.
> Steps to reproduce on a new airflow:
>  # Entered the UI, turned on the example_bash_operator, manually triggered 
> the example_bash_operator, 6 tasks went to None state, nothing gets scheduled
>  # Rebooted the scheduler (with debug logging on). Nothing gets scheduled.
>  # Create a mytutorial.py in the dag folder with code from tutorial.py but 
> DAG name changed, everything starts getting scheduled.
> A debug view of the logs is here: 
> [https://gist.github.com/tomfaulhaber/4e72ed0884c9580c606e02e4b745ddff]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1823) API get_task_info is incompatible with manual runs created by UI

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698471#comment-16698471
 ] 

Iuliia Volkova commented on AIRFLOW-1823:
-

[~ashb], a ticket is not actual for version 1.10

> API get_task_info is incompatible with manual runs created by UI
> 
>
> Key: AIRFLOW-1823
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1823
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.0.0
> Environment: ubuntu
> Airflow 1.9rc02
> commit: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126
>Reporter: Jeremy Lewi
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The API method 
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  doesn't work with manual runs created by the UI.
> The UI creates dag runs with ids with sub second precision in the name. An 
> example of a run created by the UI is
> 2017-11-16T20:23:32.045330
> The endpoint for  
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  however assumes the dag run id is of the form '%Y-%m-%dT%H:%M:%S'.
> Runs triggered via the CLI generate run ids with the form expected by the API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1823) API get_task_info is incompatible with manual runs created by UI

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698301#comment-16698301
 ] 

Iuliia Volkova commented on AIRFLOW-1823:
-

this issue only relative to 1.9

in 1.10 it was already fixed and works correct

[~bolke], can you close this task? 

> API get_task_info is incompatible with manual runs created by UI
> 
>
> Key: AIRFLOW-1823
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1823
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.0.0
> Environment: ubuntu
> Airflow 1.9rc02
> commit: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126
>Reporter: Jeremy Lewi
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The API method 
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  doesn't work with manual runs created by the UI.
> The UI creates dag runs with ids with sub second precision in the name. An 
> example of a run created by the UI is
> 2017-11-16T20:23:32.045330
> The endpoint for  
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  however assumes the dag run id is of the form '%Y-%m-%dT%H:%M:%S'.
> Runs triggered via the CLI generate run ids with the form expected by the API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1823) API get_task_info is incompatible with manual runs created by UI

2018-11-25 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1823:
---

Assignee: Iuliia Volkova  (was: Tao Feng)

> API get_task_info is incompatible with manual runs created by UI
> 
>
> Key: AIRFLOW-1823
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1823
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.0.0
> Environment: ubuntu
> Airflow 1.9rc02
> commit: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126
>Reporter: Jeremy Lewi
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The API method 
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  doesn't work with manual runs created by the UI.
> The UI creates dag runs with ids with sub second precision in the name. An 
> example of a run created by the UI is
> 2017-11-16T20:23:32.045330
> The endpoint for  
> [task_instance_info|https://github.com/apache/incubator-airflow/blob/master/airflow/www/api/experimental/endpoints.py#L126]
>  however assumes the dag run id is of the form '%Y-%m-%dT%H:%M:%S'.
> Runs triggered via the CLI generate run ids with the form expected by the API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3395) add to documentation all existed REST API endpoints and example how to pass dag_runs params

2018-11-25 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3395 started by Iuliia Volkova.
---
> add to documentation all existed REST API endpoints and example how to pass 
> dag_runs params
> ---
>
> Key: AIRFLOW-3395
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3395
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.2
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Minor
>
> In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 
> In source code 
> (https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/www_rbac/api/experimental/endpoints.py)
>  we have more, caused issues when users think that there are not more methods 
> - I got on work project several questions about it and also saw in 
> stackoverflow relative questions: 
> https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
>  
> I want to add more information about REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3395) add to documentation all existed REST API endpoints and example how to pass dag_runs params

2018-11-25 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova updated AIRFLOW-3395:

Description: 
In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 

In source code 
(https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/www_rbac/api/experimental/endpoints.py)
 we have more, caused issues when users think that there are not more methods - 
I got on work project several questions about it and also saw in stackoverflow 
relative questions: 
https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
 

I want to add more information about REST API

  was:
In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 

In source code we have several more, caused issues when users think that there 
are not more methods - I got on work project several questions about it and 
also saw in stackoverflow relative questions: 
https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
 

I want to add more information about REST API


> add to documentation all existed REST API endpoints and example how to pass 
> dag_runs params
> ---
>
> Key: AIRFLOW-3395
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3395
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.2
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Minor
>
> In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 
> In source code 
> (https://github.com/apache/incubator-airflow/blob/v1-10-stable/airflow/www_rbac/api/experimental/endpoints.py)
>  we have more, caused issues when users think that there are not more methods 
> - I got on work project several questions about it and also saw in 
> stackoverflow relative questions: 
> https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
>  
> I want to add more information about REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (AIRFLOW-3395) add to documentation all existed REST API endpoints and example how to pass dag_runs params

2018-11-25 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3395 stopped by Iuliia Volkova.
---
> add to documentation all existed REST API endpoints and example how to pass 
> dag_runs params
> ---
>
> Key: AIRFLOW-3395
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3395
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.2
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Minor
>
> In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 
> In source code we have several more, caused issues when users think that 
> there are not more methods - I got on work project several questions about it 
> and also saw in stackoverflow relative questions: 
> https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
>  
> I want to add more information about REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3395) add to documentation all existed REST API endpoints and example how to pass dag_runs params

2018-11-25 Thread Iuliia Volkova (JIRA)
Iuliia Volkova created AIRFLOW-3395:
---

 Summary: add to documentation all existed REST API endpoints and 
example how to pass dag_runs params
 Key: AIRFLOW-3395
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3395
 Project: Apache Airflow
  Issue Type: Task
Affects Versions: 1.10.2
Reporter: Iuliia Volkova
Assignee: Iuliia Volkova


In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 

In source code we have several more, caused issues when users think that there 
are not more methods - I got on work project several questions about it and 
also saw in stackoverflow relative questions: 
https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
 

I want to add more information about REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3395) add to documentation all existed REST API endpoints and example how to pass dag_runs params

2018-11-25 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3395 started by Iuliia Volkova.
---
> add to documentation all existed REST API endpoints and example how to pass 
> dag_runs params
> ---
>
> Key: AIRFLOW-3395
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3395
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.2
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Minor
>
> In doc exist only 2 endpoints https://airflow.apache.org/api.html#endpoints 
> In source code we have several more, caused issues when users think that 
> there are not more methods - I got on work project several questions about it 
> and also saw in stackoverflow relative questions: 
> https://stackoverflow.com/questions/50121593/pass-parameters-to-airflow-experimental-rest-api-when-creating-dag-run
>  
> I want to add more information about REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-25 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698199#comment-16698199
 ] 

Iuliia Volkova commented on AIRFLOW-987:


[~pratap20], you set up yourself as Assignee, do you plan to fix the issue and 
open PR?

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Pratap20
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3326) High Sierra Complaining 'in progress in another thread when fork() was called'

2018-11-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695621#comment-16695621
 ] 

Iuliia Volkova commented on AIRFLOW-3326:
-

[~ryan.yuan] is this all code of plugin? 

from airflow.contrib.hooks.bigquery_hook import BigQueryHook

class BQHook(BigQueryHook):
pass 

> High Sierra Complaining 'in progress in another thread when fork() was called'
> --
>
> Key: AIRFLOW-3326
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3326
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
> Environment: macOS High Sierra 10.13.6 (17G65)
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Blocker
>
> Inside the plugins folder, I have a hook that is a child class of 
> BigQueryHook. 
> {code:java}
> // code
> from airflow.contrib.hooks.bigquery_hook import BigQueryHook
> class BQHook(BigQueryHook):
> pass{code}
> When I run the airflow server, it keeps throwing messages complaining 'in 
> progress in another thread when fork() was called', and I can't use the web 
> server UI at all.
> {code:java}
> // messages from terminal
> objc[15098]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15098]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15102] [INFO] Booting worker with pid: 15102
> [2018-11-12 14:03:40,792] {__init__.py:51} INFO - Using executor 
> SequentialExecutor
> [2018-11-12 14:03:40,851] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15099]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15099]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15103] [INFO] Booting worker with pid: 15103
> [2018-11-12 14:03:40,902] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15101]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15101]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> [2018-11-12 14:03:40 +1100] [15104] [INFO] Booting worker with pid: 15104
> [2018-11-12 14:03:40,948] {base_hook.py:83} INFO - Using connection to: 
> https://custom-data-z00100-dev.appspot.com/
> objc[15100]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called.
> objc[15100]: +[__NSPlaceholderDate initialize] may have been in progress in 
> another thread when fork() was called. We cannot safely call it or ignore it 
> in the fork() child process. Crashing instead. Set a breakpoint on 
> objc_initializeAfterForkError to debug.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3118) DAGs not successful on new installation

2018-11-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695620#comment-16695620
 ] 

Iuliia Volkova commented on AIRFLOW-3118:
-

[~huyanhvn], make sense to open new PR for this ticket, because of the 
mentioned PR very old, it needs to be rebased, and author of PR must return 

> DAGs not successful on new installation
> ---
>
> Key: AIRFLOW-3118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.10.0
> Environment: Ubuntu 18.04
> Python 3.6
>Reporter: Brylie Christopher Oxley
>Assignee: Huy Nguyen
>Priority: Blocker
> Fix For: 1.10.2
>
> Attachments: Screenshot_20180926_161837.png, 
> image-2018-09-26-12-39-03-094.png
>
>
> When trying out Airflow, on localhost, none of the DAG runs are getting to 
> the 'success' state. They are getting stuck in 'running', or I manually label 
> them as failed:
> !image-2018-09-26-12-39-03-094.png!
> h2. Steps to reproduce
>  # create new conda environment
>  ** conda create -n airflow
>  ** source activate airflow
>  # install airflow
>  ** pip install apache-airflow
>  # initialize Airflow db
>  ** airflow initdb
>  # disable default paused setting in airflow.cfg
>  ** dags_are_paused_at_creation = False
>  # {color:#6a8759}run airflow and airflow scheduler (in separate 
> terminal){color}
>  ** {color:#6a8759}airflow scheduler{color}
>  ** {color:#6a8759}airflow webserver{color}
>  # {color:#6a8759}unpause example_bash_operator{color}
>  ** {color:#6a8759}airflow unpause example_bash_operator{color}
>  # {color:#6a8759}log in to Airflow UI{color}
>  # {color:#6a8759}turn on example_bash_operator{color}
>  # {color:#6a8759}click "Trigger DAG" in `example_bash_operator` row{color}
> h2. {color:#6a8759}Observed result{color}
> {color:#6a8759}The `example_bash_operator` never leaves the "running" 
> state.{color}
> h2. {color:#6a8759}Expected result{color}
> {color:#6a8759}The `example_bash_operator` would quickly enter the "success" 
> state{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3358) POC: Refactor command line to make it more testable and easy to develop

2018-11-20 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694207#comment-16694207
 ] 

Iuliia Volkova commented on AIRFLOW-3358:
-

[~kaxilnaik] something tells me what all don't care :)) 
https://console.cloud.google.com/home/dashboard?project=hybrid-elysium-118418&_ga=2.138918448.-2031574442.1525778366
 

> POC: Refactor command line to make it  more testable and easy to develop
> 
>
> Key: AIRFLOW-3358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Major
>
> Hi all! 
> In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
> talk with Ashb, what will be cool to refactor the cli for getting more 
> testable and readable code.
> I want to prepare POC based on one command with implementation (with Click if 
> we want it to use, or with Argparse and Command Pattern) and covering with 
> tests for discussing Airflow Cli architecture.
> Click already exists in Airflow dependencies.
> Main stimulus: 
> - Get more readable and changeable cli - for easy adding command or changing 
> commands
> - Get possible to add more tests 
>  Will be good to know your concerns about such initiative and if there are no 
> disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3358) POC: Refactor command line to make it more testable and easy to develop

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690425#comment-16690425
 ] 

Iuliia Volkova commented on AIRFLOW-3358:
-

[~kaxilnaik], at real, 'Click' is one of the possible variants, we can stay 
with argparse and it will be ok. But need to refactor cli to match 'Command 
pattern', when command execution, methods what created final command and utils 
methods - separated to get possible cover each step by tests. For, example, PR 
what I attached upper (I just refactor part to get possible add tests). I also 
look at changes in PRs what done last time in cli and most of them without 
tests and it's understandable because there is no easy way to add a test to new 
feature without refactoring a big part of a code.  

> POC: Refactor command line to make it  more testable and easy to develop
> 
>
> Key: AIRFLOW-3358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Major
>
> Hi all! 
> In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
> talk with Ashb, what will be cool to refactor the cli for getting more 
> testable and readable code.
> I want to prepare POC based on one command with implementation (with Click if 
> we want it to use, or with Argparse and Command Pattern) and covering with 
> tests for discussing Airflow Cli architecture.
> Click already exists in Airflow dependencies.
> Main stimulus: 
> - Get more readable and changeable cli - for easy adding command or changing 
> commands
> - Get possible to add more tests 
>  Will be good to know your concerns about such initiative and if there are no 
> disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3358) POC: Refactor command line to make it more testable and easy to develop

2018-11-16 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova updated AIRFLOW-3358:

Description: 
Hi all! 

In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
talk with Ashb, what will be cool to refactor the cli for getting more testable 
and readable code.

I want to prepare POC based on one command with implementation (with Click if 
we want it to use, or with Argparse and Command Pattern) and covering with 
tests for discussing Airflow Cli architecture.


Click already exists in Airflow dependencies.

Main stimulus: 

- Get more readable and changeable cli - for easy adding command or changing 
commands
- Get possible to add more tests 

 Will be good to know your concerns about such initiative and if there are no 
disagrees about it, I will be happy to start POC

  was:
Hi all! 

In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
talk with Ashb, what will be cool to refactor the cli for getting more testable 
and readable code.

I want to prepare POC based on one command with implementation with Click and 
covering with tests for discussing Airflow Cli architecture.


Click already exists in Airflow dependencies.

Main stimulus: 

- Get more readable and changeable cli - for easy adding command or changing 
commands
- Get possible to add more tests 

 Will be good to know your concerns about such initiative and if there are no 
disagrees about it, I will be happy to start POC


> POC: Refactor command line to make it  more testable and easy to develop
> 
>
> Key: AIRFLOW-3358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Major
>
> Hi all! 
> In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
> talk with Ashb, what will be cool to refactor the cli for getting more 
> testable and readable code.
> I want to prepare POC based on one command with implementation (with Click if 
> we want it to use, or with Argparse and Command Pattern) and covering with 
> tests for discussing Airflow Cli architecture.
> Click already exists in Airflow dependencies.
> Main stimulus: 
> - Get more readable and changeable cli - for easy adding command or changing 
> commands
> - Get possible to add more tests 
>  Will be good to know your concerns about such initiative and if there are no 
> disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3358) POC: Refactor command line to make it more testable and easy to develop

2018-11-16 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova updated AIRFLOW-3358:

Summary: POC: Refactor command line to make it  more testable and easy to 
develop  (was: POC: Refactor command line to use Click)

> POC: Refactor command line to make it  more testable and easy to develop
> 
>
> Key: AIRFLOW-3358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Major
>
> Hi all! 
> In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
> talk with Ashb, what will be cool to refactor the cli for getting more 
> testable and readable code.
> I want to prepare POC based on one command with implementation with Click and 
> covering with tests for discussing Airflow Cli architecture.
> Click already exists in Airflow dependencies.
> Main stimulus: 
> - Get more readable and changeable cli - for easy adding command or changing 
> commands
> - Get possible to add more tests 
>  Will be good to know your concerns about such initiative and if there are no 
> disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689493#comment-16689493
 ] 

Iuliia Volkova commented on AIRFLOW-987:


[~pratap20] this issue about the command line, you define your settings in 
config file

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Pratap20
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1945) Pass --autoscale to celery workers

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689475#comment-16689475
 ] 

Iuliia Volkova commented on AIRFLOW-1945:
-

[~ashb], [~Fokko], please close the task, PR was already merged - 
https://github.com/apache/incubator-airflow/pull/3989/files 

> Pass --autoscale to celery workers
> --
>
> Key: AIRFLOW-1945
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1945
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, cli
>Reporter: Michael O.
>Assignee: Sai Phanindhra
>Priority: Trivial
>  Labels: easyfix
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Celery supports autoscaling of the worker pool size (number of tasks that can 
> parallelize within one worker node).  I'd like to propose to support passing 
> the --autoscale parameter to {{airflow worker}}.
> Since this is a trivial change, I am not sure if there's any reason for not 
> being supported already.(?)
> For example
> {{airflow worker --concurrency=4}} will set a fixed pool size of 4.
> With minimal changes in 
> [https://github.com/apache/incubator-airflow/blob/4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd/airflow/bin/cli.py#L855]
>  it could support
> {{airflow worker --autoscale=2,10}} to set an autoscaled pool size of 2 to 10
> Some references:
> * 
> http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.autoscale.html
> * 
> https://github.com/apache/incubator-airflow/blob/4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd/airflow/bin/cli.py#L855



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-390) [AIRFLOW-Don't load example dags by default]

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689486#comment-16689486
 ] 

Iuliia Volkova commented on AIRFLOW-390:


[~ashb], [~Fokko], [~bolke], any concerns on this scope? Could we set up 
'load_examples = False' by default?

> [AIRFLOW-Don't load example dags by default]
> 
>
> Key: AIRFLOW-390
> URL: https://issues.apache.org/jira/browse/AIRFLOW-390
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Sunny Sun
>Priority: Trivial
>  Labels: easyfix
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Load examples should by default be set to False, so they are not 
> automatically deployed into production environments. This is especially heavy 
> because the twitter example dag requires Hive, which users may or may not use 
> in their own deployments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-801) Outdated docstring on baseclass

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689468#comment-16689468
 ] 

Iuliia Volkova commented on AIRFLOW-801:


[~jackjack10] PR was merged, changes in master, a task should be closed 
[~dseisun][~ashb]

> Outdated docstring on baseclass
> ---
>
> Key: AIRFLOW-801
> URL: https://issues.apache.org/jira/browse/AIRFLOW-801
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Daniel Seisun
>Assignee: Kengo Seki
>Priority: Trivial
>
> The docstring of the BaseOperator still makes reference to it inheriting from 
> SQL Alchemy's Base class, which it no longer does. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3307) Update insecure node dependencies

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689460#comment-16689460
 ] 

Iuliia Volkova commented on AIRFLOW-3307:
-

[~jmcarp], please, do not forget to close the task if PR was merged ) Thank you!

> Update insecure node dependencies
> -
>
> Key: AIRFLOW-3307
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3307
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> `npm audit` shows some node dependencies that are out of date and potentially 
> insecure. We should update them with `npm audit fix`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3306) Disable unused flask-sqlalchemy modification tracking

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689458#comment-16689458
 ] 

Iuliia Volkova commented on AIRFLOW-3306:
-

[~jmcarp], please, do not forget to close the task if PR was merged ) Thank you!

> Disable unused flask-sqlalchemy modification tracking
> -
>
> Key: AIRFLOW-3306
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3306
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
>
> By default, flask-sqlalchemy tracks model changes for its event system, which 
> adds some overhead. Since I don't think we're using the flask-sqlalchemy 
> event system, we should be able to turn off modification tracking and improve 
> performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1822) Add gaiohttp and gthread gunicorn workerclass option in cli

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689450#comment-16689450
 ] 

Iuliia Volkova commented on AIRFLOW-1822:
-

covered in this PR: https://github.com/apache/incubator-airflow/pull/4174 

> Add gaiohttp and gthread gunicorn workerclass option in cli
> ---
>
> Key: AIRFLOW-1822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1822
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Minor
>
> gunicorn in min version has been updated to 19.40 
> we need to add cli support for gthread and gaiohttp worker class



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1592) Add keep-alive argument supported by gunicorn backend to the airflow configuration

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689449#comment-16689449
 ] 

Iuliia Volkova commented on AIRFLOW-1592:
-

covered in this PR: https://github.com/apache/incubator-airflow/pull/4174 

> Add keep-alive argument supported by gunicorn backend to the airflow 
> configuration
> --
>
> Key: AIRFLOW-1592
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1592
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Demian Ginther
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The --keep-alive option is necessary for gunicorn to function properly with 
> AWS ELBs, as gunicorn appears to have an issue with the ELB timeouts as set 
> by default.
> In addition, it makes no sense to provide a wrapper for a program but not 
> allow all configuration options to be set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-251) Add optional parameter SQL_ALCHEMY_SCHEMA to control schema for metadata repository

2018-11-16 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-251:
--

Assignee: Iuliia Volkova

> Add optional parameter SQL_ALCHEMY_SCHEMA to control schema for metadata 
> repository
> ---
>
> Key: AIRFLOW-251
> URL: https://issues.apache.org/jira/browse/AIRFLOW-251
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Ed Parcell
>Assignee: Iuliia Volkova
>Priority: Minor
>
> Using SQL Server as a database for metadata, it is preferable to group all 
> Airflow tables into a separate schema, rather than using dbo. I propose 
> adding an optional parameter SQL_ALCHEMY_SCHEMA to control this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3358) POC: Refactor command line to use Click

2018-11-16 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689297#comment-16689297
 ] 

Iuliia Volkova commented on AIRFLOW-3358:
-

[~sanand], [~ashb], [~bolke], [~Fokko], [~kaxilnaik], hi guys, sorry what I'm 
pulling you, but will be cool to get your comments or maybe need to ping 
somebody else from Maintainers team. And thank in advance! 

> POC: Refactor command line to use Click
> ---
>
> Key: AIRFLOW-3358
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Iuliia Volkova
>Assignee: Iuliia Volkova
>Priority: Major
>
> Hi all! 
> In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
> talk with Ashb, what will be cool to refactor the cli for getting more 
> testable and readable code.
> I want to prepare POC based on one command with implementation with Click and 
> covering with tests for discussing Airflow Cli architecture.
> Click already exists in Airflow dependencies.
> Main stimulus: 
> - Get more readable and changeable cli - for easy adding command or changing 
> commands
> - Get possible to add more tests 
>  Will be good to know your concerns about such initiative and if there are no 
> disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3358) POC: Refactor command line to use Click

2018-11-16 Thread Iuliia Volkova (JIRA)
Iuliia Volkova created AIRFLOW-3358:
---

 Summary: POC: Refactor command line to use Click
 Key: AIRFLOW-3358
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3358
 Project: Apache Airflow
  Issue Type: Improvement
  Components: cli
Affects Versions: 2.0.0
Reporter: Iuliia Volkova
Assignee: Iuliia Volkova


Hi all! 

In one of PR: https://github.com/apache/incubator-airflow/pull/4174 we had a 
talk with Ashb, what will be cool to refactor the cli for getting more testable 
and readable code.

I want to prepare POC based on one command with implementation with Click and 
covering with tests for discussing Airflow Cli architecture.


Click already exists in Airflow dependencies.

Main stimulus: 

- Get more readable and changeable cli - for easy adding command or changing 
commands
- Get possible to add more tests 

 Will be good to know your concerns about such initiative and if there are no 
disagrees about it, I will be happy to start POC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3353) redis-py 3.0.0 dependency breaks celery executor

2018-11-15 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-3353:
---

Assignee: Iuliia Volkova

> redis-py 3.0.0 dependency breaks celery executor
> 
>
> Key: AIRFLOW-3353
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3353
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: 1.10.0
>Reporter: Stefan Seelmann
>Assignee: Iuliia Volkova
>Priority: Major
>
> redis-py 3.0.0 was just released. Airflow 1.10.0 defines redis>=2.10.5 so 
> installs redis-py 3.0.0 now.
> Error in worker below.
> Workaround: Pin redis==2.10.6 (e.g. in constraints.txt)
> {code}
> [2018-11-15 12:06:18,441: CRITICAL/MainProcess] Unrecoverable error: 
> AttributeError("'float' object has no attribute 'items'",)
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 
> 205, in start
> self.blueprint.start(self)
>   File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 
> 119, in start
> step.start(parent)
>   File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 
> 369, in start
> return self.obj.start()
>   File 
> "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", 
> line 317, in start
> blueprint.start(self)
>   File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 
> 119, in start
> step.start(parent)
>   File 
> "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", 
> line 593, in start
> c.loop(*c.loop_args())
>   File "/usr/local/lib/python3.6/site-packages/celery/worker/loops.py", line 
> 91, in asynloop
> next(loop)
>   File "/usr/local/lib/python3.6/site-packages/kombu/asynchronous/hub.py", 
> line 354, in create_loop
> cb(*cbargs)
>   File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", 
> line 1040, in on_readable
> self.cycle.on_readable(fileno)
>   File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", 
> line 337, in on_readable
> chan.handlers[type]()
>   File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", 
> line 724, in _brpop_read
> self.connection._deliver(loads(bytes_to_str(item)), dest)
>   File 
> "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", 
> line 983, in _deliver
> callback(message)
>   File 
> "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", 
> line 632, in _callback
> self.qos.append(message, message.delivery_tag)
>   File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", 
> line 149, in append
> pipe.zadd(self.unacked_index_key, time(), delivery_tag) \
>   File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 2263, 
> in zadd
> for pair in iteritems(mapping):
>   File "/usr/local/lib/python3.6/site-packages/redis/_compat.py", line 123, 
> in iteritems
> return iter(x.items())
> AttributeError: 'float' object has no attribute 'items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1419) Trigger Rule not respected downstream of BranchPythonOperator

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688037#comment-16688037
 ] 

Iuliia Volkova commented on AIRFLOW-1419:
-

[~conradlee], without dummy task, you don't have a branch, you just have the 
task confluence_op what depend on branch_op, it means confluence_op  is a 
branch by itself, it not depend on some branch, it's branch. We don't have in 
Airflow edges what you can say - o, this edge is Branch without a task. 

In your case on your picture, confluence_op - is a branch of branch operator, 
what never be returned by branch operator and it also depends on the result of 
another branch. 

> Trigger Rule not respected downstream of BranchPythonOperator
> -
>
> Key: AIRFLOW-1419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1419
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Conrad Lee
>Priority: Major
>
> Lets consider the following DAG:
> {noformat}
>   
>  /  \
> branch_op confluence_op
>  \__work_op/
> {noformat}
> This is implemented in the following code:
> {code:java}
> import airflow
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.utils.trigger_rule import TriggerRule
> from airflow.models import DAG
> args = {
> 'owner': 'airflow',
> 'start_date': airflow.utils.dates.days_ago(2)
> }
> dag = DAG(
> dag_id='branch_skip_problem',
> default_args=args,
> schedule_interval="@daily")
> branch_op = BranchPythonOperator(
> task_id='branch_op',
> python_callable=lambda: 'work_op',
> dag=dag)
> work_op = DummyOperator(task_id='work_op', dag=dag)
> confluence_op = DummyOperator(task_id='confluence_op', dag=dag, 
> trigger_rule=TriggerRule.ALL_DONE)
> branch_op.set_downstream(confluence_op)
> branch_op.set_downstream(work_op)
> work_op.set_downstream(confluence_op)
> {code}
> Note that branch_op is a BranchPythonOperator, work_op and confluence_op are 
> DummyOperators, and that confluence_op has its trigger_rule set to ALL_DONE.
> In dag runs where brancher_op chooses to activate work_op as its child, 
> confluence_op never runs. This doesn't seem right, because confluence_op has 
> two parents and a trigger_rule set that it'll run as soon as all of its 
> parents are done (whether or not they are skipped).
> I know this example seems contrived and that in practice there are better 
> ways of conditionally executing work_op. However, this is the minimal code to 
> illustrate the problem. You can imagine that this problem might actually 
> creep up in practice where originally there was a good reason to use the 
> BranchPythonOperator, and then time passes and someone modifies one of the 
> branches so that it doesn't really contain any children anymore, thus 
> resembling the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3182) 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream tasks

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687993#comment-16687993
 ] 

Iuliia Volkova commented on AIRFLOW-3182:
-

[~Zeckt], I got your case, you need after your branch one more branch task - 
where you need to check if it already 23 hour or not. This BranchTask needs to 
be upstream on all hours. And this task will downstream Your aggregation task. 

> 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream 
> tasks
> --
>
> Key: AIRFLOW-3182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3182
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Greg H
>Priority: Major
> Attachments: BrannchPythonOperator.png, Screen Shot 2018-11-15 at 
> 13.51.07.png
>
>
> We have a job that runs some data processing every hour. At the end of the 
> day we need to run aggregation on all data generated by the 'hourly' jobs, 
> regardless if any 'hourly' job failed or not. For this purpose we have 
> prepared DAG that uses BranchPythonOperator in order to decide which 'hourly' 
> job needs to be run in given time and when task for hour 23 is done, we 
> trigger the aggregation (downstream). For this to work regardless of the last 
> 'hourly' task status the *'all_done'* trigger rule is set in the aggregation 
> task. Unfortunately, such configuration works incorrectly causing aggregation 
> task to be run after every 'hourly' task, despite the fact the aggregation 
> task is set as downstream for 'task_for_hour-23' +only+:
>   !BrannchPythonOperator.png!
> Here's sample code:
> {code:java}
> # coding: utf-8
> from airflow import DAG
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.models import TriggerRule
> from datetime import datetime
> import logging
> dag_id = 'test'
> today = datetime.today().strftime("%Y-%m-%d");
> task_prefix = 'task_for_hour-'
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2018, 6, 18),
> 'catchup': False,
> }
> dag = DAG(
> dag_id=dag_id,
> default_args=default_args,
> schedule_interval="@hourly",
> catchup=False
> )
> # Setting the current hour
> def get_current_hour():
> return datetime.now().hour
> # Returns the name id of the task to launch next (task_for_hour-0, 
> task_for_hour-1, etc.)
> def branch():
> return task_prefix + str(get_current_hour())
> # Running hourly job
> def run_hourly_job(**kwargs):
> current_hour = get_current_hour()
> logging.info("Running job for hour: %s" % current_hour)
> # Main daily aggregation
> def run_daily_aggregation(**kwargs):
> logging.info("Running daily aggregation for %s" % today)
> 
> start_task = DummyOperator(
> task_id='start',
> dag=dag
> )
> # 'branch' method returns name of the task to be run next.
> hour_branching = BranchPythonOperator(
> task_id='hour_branching',
> python_callable=branch,
> dag=dag)
> run_aggregation = PythonOperator(
> task_id='daily_aggregation',
> python_callable=run_daily_aggregation,
> provide_context=True,
> trigger_rule=TriggerRule.ALL_DONE,
> dag=dag
> )
> start_task.set_downstream(hour_branching)
> # Create tasks for each hour
> for hour in range(24):
> if hour == 23:
> task_for_hour_23 = PythonOperator(
> task_id=task_prefix + '23',
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag
> )
> hour_branching.set_downstream(task_for_hour_23)
> task_for_hour_23.set_downstream(run_aggregation)
> else:
> hour_branching.set_downstream(PythonOperator(
> task_id=task_prefix + str(hour),
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag)
> )
> {code}
> This me be also related to AIRFLOW-1419



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1419) Trigger Rule not respected downstream of BranchPythonOperator

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687840#comment-16687840
 ] 

Iuliia Volkova edited comment on AIRFLOW-1419 at 11/15/18 11:05 AM:


[~conradlee], [~ashb], please close this task, because answer on this question 
exist in a documentation, there is a pretty clear description of 
BranchOperator' behavior and answer for this ticket: read documentation 
https://airflow.apache.org/concepts.html?highlight=branch%20operator#branching 
also an example in body incorrect lambda: 'right_branch_op1' - such task_id 
does not exist in the code


was (Author: xnuinside):
[~conradlee], [~ashb], please close this task, because answer on this question 
exist in documentation, there is a pretty clear description of BranchOperator' 
behavior and answer for this ticket: read documentation 
https://airflow.apache.org/concepts.html?highlight=branch%20operator#branching 
also an example in body incorrect lambda: 'right_branch_op1' - such task_id 
does not exist in the code

> Trigger Rule not respected downstream of BranchPythonOperator
> -
>
> Key: AIRFLOW-1419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1419
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Conrad Lee
>Priority: Major
>
> Lets consider the following DAG:
> {noformat}
>   
>  /  \
> branch_op confluence_op
>  \__work_op/
> {noformat}
> This is implemented in the following code:
> {code}
> import airflow
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.utils.trigger_rule import TriggerRule
> from airflow.models import DAG
> args = {
> 'owner': 'airflow',
> 'start_date': airflow.utils.dates.days_ago(2)
> }
> dag = DAG(
> dag_id='branch_skip_problem',
> default_args=args,
> schedule_interval="@daily")
> branch_op = BranchPythonOperator(
> task_id='branch_op',
> python_callable=lambda: 'right_branch_op1',
> dag=dag)
> work_op = DummyOperator(task_id='work_op', dag=dag)
> confluence_op = DummyOperator(task_id='confluence_op', dag=dag, 
> trigger_rule=TriggerRule.ALL_DONE)
> branch_op.set_downstream(confluence_op)
> branch_op.set_downstream(work_op)
> work_op.set_downstream(confluence_op)
> {code}
> Note that branch_op is a BranchPythonOperator, work_op and confluence_op are 
> DummyOperators, and that confluence_op has its trigger_rule set to ALL_DONE.
> In dag runs where brancher_op chooses to activate work_op as its child, 
> confluence_op never runs.  This doesn't seem right, because confluence_op has 
> two parents and a trigger_rule set that it'll run as soon as all of its 
> parents are done (whether or not they are skipped).
> I know this example seems contrived and that in practice there are better 
> ways of conditionally executing work_op.  However, this is the minimal code 
> to illustrate the problem.  You can imagine that this problem might actually 
> creep up in practice where originally there was a good reason to use the 
> BranchPythonOperator, and then time passes and someone modifies one of the 
> branches so that it doesn't really contain any children anymore, thus 
> resembling the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1419) Trigger Rule not respected downstream of BranchPythonOperator

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687840#comment-16687840
 ] 

Iuliia Volkova edited comment on AIRFLOW-1419 at 11/15/18 11:05 AM:


[~conradlee], [~ashb], please close this task, because answer on this question 
exist in documentation, there is a pretty clear description of BranchOperator' 
behavior and answer for this ticket: read documentation 
https://airflow.apache.org/concepts.html?highlight=branch%20operator#branching 
also an example in body incorrect lambda: 'right_branch_op1' - such task_id 
does not exist in the code


was (Author: xnuinside):
[~conradlee], [~ashb], please close this task, because answer on this question 
exist in documentation, there pretty clear described behavior of BranchOperator 
and answer for this ticket - please, read documentation 
https://airflow.apache.org/concepts.html?highlight=branch%20operator#branching 
also example in body incorrect lambda: 'right_branch_op1' - such task not exist 
in code

> Trigger Rule not respected downstream of BranchPythonOperator
> -
>
> Key: AIRFLOW-1419
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1419
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Conrad Lee
>Priority: Major
>
> Lets consider the following DAG:
> {noformat}
>   
>  /  \
> branch_op confluence_op
>  \__work_op/
> {noformat}
> This is implemented in the following code:
> {code}
> import airflow
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.utils.trigger_rule import TriggerRule
> from airflow.models import DAG
> args = {
> 'owner': 'airflow',
> 'start_date': airflow.utils.dates.days_ago(2)
> }
> dag = DAG(
> dag_id='branch_skip_problem',
> default_args=args,
> schedule_interval="@daily")
> branch_op = BranchPythonOperator(
> task_id='branch_op',
> python_callable=lambda: 'right_branch_op1',
> dag=dag)
> work_op = DummyOperator(task_id='work_op', dag=dag)
> confluence_op = DummyOperator(task_id='confluence_op', dag=dag, 
> trigger_rule=TriggerRule.ALL_DONE)
> branch_op.set_downstream(confluence_op)
> branch_op.set_downstream(work_op)
> work_op.set_downstream(confluence_op)
> {code}
> Note that branch_op is a BranchPythonOperator, work_op and confluence_op are 
> DummyOperators, and that confluence_op has its trigger_rule set to ALL_DONE.
> In dag runs where brancher_op chooses to activate work_op as its child, 
> confluence_op never runs.  This doesn't seem right, because confluence_op has 
> two parents and a trigger_rule set that it'll run as soon as all of its 
> parents are done (whether or not they are skipped).
> I know this example seems contrived and that in practice there are better 
> ways of conditionally executing work_op.  However, this is the minimal code 
> to illustrate the problem.  You can imagine that this problem might actually 
> creep up in practice where originally there was a good reason to use the 
> BranchPythonOperator, and then time passes and someone modifies one of the 
> branches so that it doesn't really contain any children anymore, thus 
> resembling the example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3182) 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream tasks

2018-11-15 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687820#comment-16687820
 ] 

Iuliia Volkova commented on AIRFLOW-3182:
-

[~Zeckt] , your error not relative to BranchOperator, just because you don't 
understand what is Branching, you not need Branch 
 # Returns the name id of the task to launch next (task_for_hour-0, 
task_for_hour-1, etc.)
def branch(): return task_prefix + str(get_current_hour())

BranchOperator using when you have a condition (it's described in doc: 
[https://airflow.apache.org/concepts.html?highlight=branch%20operator#branching)]
 and you need to define what do next on this condition, simple example:
{code:java}
from datetime import datetime

from airflow import DAG
from airflow.operators.python_operator import PythonOperator, 
BranchPythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.models import TriggerRule

with DAG(dag_id="branch_behavior_operator", start_date=datetime(2018, 11, 15),
 schedule_interval=None) as dag:

def branch_check():
if True:
return 'success_way'
else:
return 'dummy_task'


t1 = BranchPythonOperator(task_id='check_condition', 
python_callable=branch_check)

def print_hello():
print('Hello!')

t2_0 = DummyOperator(task_id='success_way')
t2 = PythonOperator(task_id='print_task', python_callable=print_hello)
t2_1 = DummyOperator(task_id='i_need_to_be_success_too')

t3 = DummyOperator(task_id='dummy_task')

t1.set_downstream([t3, t2_0])

t2_0.set_downstream([t2, t2_1])

t4 = DummyOperator(task_id='final_task', 
trigger_rule=TriggerRule.ALL_SUCCESS)

t4.set_upstream([t2_1, t2])

{code}
a result of this DAG will be: 

!Screen Shot 2018-11-15 at 13.51.07.png|height=250!

You need just use upstream and downstream without BranchOperator, like it's 
done in my example  t2_0.set_downstream([t2, t2_1]) 

> 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream 
> tasks
> --
>
> Key: AIRFLOW-3182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3182
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Greg H
>Priority: Major
> Attachments: BrannchPythonOperator.png, Screen Shot 2018-11-15 at 
> 13.51.07.png
>
>
> We have a job that runs some data processing every hour. At the end of the 
> day we need to run aggregation on all data generated by the 'hourly' jobs, 
> regardless if any 'hourly' job failed or not. For this purpose we have 
> prepared DAG that uses BranchPythonOperator in order to decide which 'hourly' 
> job needs to be run in given time and when task for hour 23 is done, we 
> trigger the aggregation (downstream). For this to work regardless of the last 
> 'hourly' task status the *'all_done'* trigger rule is set in the aggregation 
> task. Unfortunately, such configuration works incorrectly causing aggregation 
> task to be run after every 'hourly' task, despite the fact the aggregation 
> task is set as downstream for 'task_for_hour-23' +only+:
>   !BrannchPythonOperator.png!
> Here's sample code:
> {code:java}
> # coding: utf-8
> from airflow import DAG
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.models import TriggerRule
> from datetime import datetime
> import logging
> dag_id = 'test'
> today = datetime.today().strftime("%Y-%m-%d");
> task_prefix = 'task_for_hour-'
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2018, 6, 18),
> 'catchup': False,
> }
> dag = DAG(
> dag_id=dag_id,
> default_args=default_args,
> schedule_interval="@hourly",
> catchup=False
> )
> # Setting the current hour
> def get_current_hour():
> return datetime.now().hour
> # Returns the name id of the task to launch next (task_for_hour-0, 
> task_for_hour-1, etc.)
> def branch():
> return task_prefix + str(get_current_hour())
> # Running hourly job
> def run_hourly_job(**kwargs):
> current_hour = get_current_hour()
> logging.info("Running job for hour: %s" % current_hour)
> # Main daily aggregation
> def run_daily_aggregation(**kwargs):
> logging.info("Running daily aggregation for %s" % today)
> 
> start_task = DummyOperator(
> task_id='start',
> dag=dag
> )
> # 'branch' method returns name of the task to be run next.
> hour_branching = BranchPythonOperator(
> task_id='hour_branching',
> python_callable=branch,
> dag=dag)
> run_aggregation = PythonOperator(
> task_id='daily_aggregation',
> 

[jira] [Updated] (AIRFLOW-3182) 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream tasks

2018-11-15 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova updated AIRFLOW-3182:

Attachment: Screen Shot 2018-11-15 at 13.51.07.png

> 'all_done' trigger rule works incorrectly with BranchPythonOperator upstream 
> tasks
> --
>
> Key: AIRFLOW-3182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3182
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Greg H
>Priority: Major
> Attachments: BrannchPythonOperator.png, Screen Shot 2018-11-15 at 
> 13.51.07.png
>
>
> We have a job that runs some data processing every hour. At the end of the 
> day we need to run aggregation on all data generated by the 'hourly' jobs, 
> regardless if any 'hourly' job failed or not. For this purpose we have 
> prepared DAG that uses BranchPythonOperator in order to decide which 'hourly' 
> job needs to be run in given time and when task for hour 23 is done, we 
> trigger the aggregation (downstream). For this to work regardless of the last 
> 'hourly' task status the *'all_done'* trigger rule is set in the aggregation 
> task. Unfortunately, such configuration works incorrectly causing aggregation 
> task to be run after every 'hourly' task, despite the fact the aggregation 
> task is set as downstream for 'task_for_hour-23' +only+:
>   !BrannchPythonOperator.png!
> Here's sample code:
> {code:java}
> # coding: utf-8
> from airflow import DAG
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.models import TriggerRule
> from datetime import datetime
> import logging
> dag_id = 'test'
> today = datetime.today().strftime("%Y-%m-%d");
> task_prefix = 'task_for_hour-'
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2018, 6, 18),
> 'catchup': False,
> }
> dag = DAG(
> dag_id=dag_id,
> default_args=default_args,
> schedule_interval="@hourly",
> catchup=False
> )
> # Setting the current hour
> def get_current_hour():
> return datetime.now().hour
> # Returns the name id of the task to launch next (task_for_hour-0, 
> task_for_hour-1, etc.)
> def branch():
> return task_prefix + str(get_current_hour())
> # Running hourly job
> def run_hourly_job(**kwargs):
> current_hour = get_current_hour()
> logging.info("Running job for hour: %s" % current_hour)
> # Main daily aggregation
> def run_daily_aggregation(**kwargs):
> logging.info("Running daily aggregation for %s" % today)
> 
> start_task = DummyOperator(
> task_id='start',
> dag=dag
> )
> # 'branch' method returns name of the task to be run next.
> hour_branching = BranchPythonOperator(
> task_id='hour_branching',
> python_callable=branch,
> dag=dag)
> run_aggregation = PythonOperator(
> task_id='daily_aggregation',
> python_callable=run_daily_aggregation,
> provide_context=True,
> trigger_rule=TriggerRule.ALL_DONE,
> dag=dag
> )
> start_task.set_downstream(hour_branching)
> # Create tasks for each hour
> for hour in range(24):
> if hour == 23:
> task_for_hour_23 = PythonOperator(
> task_id=task_prefix + '23',
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag
> )
> hour_branching.set_downstream(task_for_hour_23)
> task_for_hour_23.set_downstream(run_aggregation)
> else:
> hour_branching.set_downstream(PythonOperator(
> task_id=task_prefix + str(hour),
> python_callable=run_hourly_job,
> provide_context=True,
> dag=dag)
> )
> {code}
> This me be also related to AIRFLOW-1419



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1306) Create Dag in a class and in a separate module

2018-11-11 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682876#comment-16682876
 ] 

Iuliia Volkova commented on AIRFLOW-1306:
-

[~MouradKarim] why do you think it's a bug? Did you read a doсs?  
This info from doc: 
Note
When searching for DAGs, Airflow will only consider files where the string 
“airflow” and “DAG” both appear in the contents of the .py file.
https://airflow.apache.org/concepts.html 
It's not a bug, it's declared behavior. [~ashb]

> Create Dag in a class and in a separate module 
> ---
>
> Key: AIRFLOW-1306
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1306
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 2.0.0
>Reporter: Mourad Benabdelkerim
>Priority: Major
>
> I created a dag in a separate class and in a separate python-model.
> In my main_model, I instantiate  the class, I call a method which creates and 
> returns the dag.
> So when I run *airflow list_dags*, it won't return the dag.
> BUT when I added a comment that contains the word "DAG", the *airflow 
> dag_lists* command returns the dag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1700) Fix airflow cli connections command

2018-11-11 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682871#comment-16682871
 ] 

Iuliia Volkova commented on AIRFLOW-1700:
-

[~ashb], duplicate task was resolved - 
https://issues.apache.org/jira/browse/AIRFLOW-1330. Seems like this task must 
be closed

> Fix airflow cli connections command 
> 
>
> Key: AIRFLOW-1700
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1700
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 2.0.0
>Reporter: Feng Lu
>Assignee: Feng Lu
>Priority: Major
> Fix For: 1.10.0
>
>
> When creating a new connection via airflow cli, the connection type is 
> inferred from the conn-uri argument (i.e., conn_type = url scheme). However, 
> for connection types like "hive_cli" and "google_cloud_platform", urlparse 
> (by design) was unable to get the connection type as '_' is not a valid 
> scheme character. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2442) Airflow run command leaves database connections open

2018-11-11 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682866#comment-16682866
 ] 

Iuliia Volkova commented on AIRFLOW-2442:
-

[~ashb], seems need to close issue, it was merged 
https://github.com/apache/incubator-airflow/commit/250faad0f557bb8deac8bd0b948112bcaf48004a
 

> Airflow run command leaves database connections open
> 
>
> Key: AIRFLOW-2442
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2442
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.8.0
>Reporter: Alejandro Fernandez
>Assignee: Alejandro Fernandez
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: connection_duration_1_hour.png, db_connections.png, 
> fixed_before_and_after.jpg, monthly_db_connections.png, running_tasks.png
>
>
> *Summary*
> The "airflow run" command creates a connection to the database and leaves it 
> open (until killed by SQLALchemy later). The number of these connections can 
> skyrocket whenever hundreds/thousands of tasks are launched simultaneously, 
> and potentially hit the database connection limit.
> The problem is that in cli.py, the run() method first calls 
> {code:java}
> settings.configure_orm(disable_connection_pool=True){code}
> correctly
>  to use a NullPool, but then parses any custom configs and again calls
> {code:java}
> settings.configure_orm(){code}
> , thereby overriding the desired behavior by instead using a QueuePool.
>  The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE and 
> SQL_ALCHEMY_POOL_RECYCLE. This means that while the task is running and the 
> executor is sending heartbeats, the sleeping connection is idle until it is 
> killed by SQLAlchemy.
> This fixes a bug introduced by 
> [https://github.com/apache/incubator-airflow/pull/1934] in 
> [https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
>   
> which is present in branches 1-8-stable, 1-9-stable, and 1-10-test
> NOTE: Will create a PR once I've done more testing since I'm on an older 
> branch. For now, attaching a patch file [^AIRFLOW-2442.patch]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-11 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682847#comment-16682847
 ] 

Iuliia Volkova commented on AIRFLOW-987:


[~jackjack10], [~bolke] yes it's still and issue, because command line args not 
sended to kerberos.run 

look at - 
https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L1333
 and 
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L113
 

I can take this issue and do cover with tests, but first I need to get review 
to this PR https://github.com/apache/incubator-airflow/pull/4174 , it's also 
relative to cli, if it will be ok - I will refactor in same way 'kerberos' part 
and I will add tests to it. 

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-987) `airflow kerberos` ignores --keytab and --principal arguments

2018-11-11 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682847#comment-16682847
 ] 

Iuliia Volkova edited comment on AIRFLOW-987 at 11/11/18 11:32 AM:
---

[~jackjack10], [~bolke] yes it's still and issue, because command line args not 
sent to kerberos.run 

look at - 
https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L1333
 and 
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L113
 

I can take this issue and do cover with tests, but first I need to get review 
to this PR https://github.com/apache/incubator-airflow/pull/4174 , it's also 
relative to cli, if it will be ok - I will refactor in same way 'kerberos' part 
and I will add tests to it. 


was (Author: xnuinside):
[~jackjack10], [~bolke] yes it's still and issue, because command line args not 
sended to kerberos.run 

look at - 
https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L1333
 and 
https://github.com/apache/incubator-airflow/blob/master/airflow/security/kerberos.py#L113
 

I can take this issue and do cover with tests, but first I need to get review 
to this PR https://github.com/apache/incubator-airflow/pull/4174 , it's also 
relative to cli, if it will be ok - I will refactor in same way 'kerberos' part 
and I will add tests to it. 

> `airflow kerberos` ignores --keytab and --principal arguments
> -
>
> Key: AIRFLOW-987
> URL: https://issues.apache.org/jira/browse/AIRFLOW-987
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
> Environment: 1.8-rc5
>Reporter: Ruslan Dautkhanov
>Assignee: Bolke de Bruin
>Priority: Major
>  Labels: easyfix, kerberos, security
>
> No matter which arguments I pass to `airflow kerberos`, 
> it always executes as `kinit -r 3600m -k -t airflow.keytab -c 
> /tmp/airflow_krb5_ccache airflow`
> So it failes with expected "kinit: Keytab contains no suitable keys for 
> airf...@corp.some.com while getting initial credentials"
> Tried different arguments, -kt and --keytab, here's one of the runs (some 
> lines wrapped for readability):
> {noformat}
> $ airflow kerberos -kt /home/rdautkha/.keytab rdautkha...@corp.some.com
> [2017-03-14 23:50:11,523] {__init__.py:57} INFO - Using executor LocalExecutor
> [2017-03-14 23:50:12,069] {kerberos.py:43} INFO - Reinitting kerberos from 
> keytab: 
> kinit -r 3600m -k -t airflow.keytab -c /tmp/airflow_krb5_ccache airflow
> [2017-03-14 23:50:12,080] {kerberos.py:55} ERROR -
>  Couldn't reinit from keytab! `kinit' exited with 1.
> kinit: Keytab contains no suitable keys for airf...@corp.some.com 
> while getting initial credentials
> {noformat}
> 1.8-rc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-571) allow gunicorn config to be passed to airflow webserver

2018-10-26 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664912#comment-16664912
 ] 

Iuliia Volkova commented on AIRFLOW-571:


[~ashb] , [~Fokko], [~sanand] Hi, guys! I want to prepare PR for possible to 
pass Gunicorn config params with Airflow Cli and Airflow config (it's a began 
of this task and the theme )) ). We have some more tickets about passing 
gunicorn params with airflow webserver cli 
https://issues.apache.org/jira/browse/AIRFLOW-1822, 
https://issues.apache.org/jira/browse/AIRFLOW-1592. I want to create a factory, 
what will take all gunicorns param and send them to cli and to airflow 
webserver run (without depending on concrete names). I can also create a 
params_black_list with params that we don't want to allow to pass for some 
reason (if such params exist), what do you think about it? 

> allow gunicorn config to be passed to airflow webserver
> ---
>
> Key: AIRFLOW-571
> URL: https://issues.apache.org/jira/browse/AIRFLOW-571
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Dennis O'Brien
>Assignee: Iuliia Volkova
>Priority: Major
>
> I have run into an issue when running airflow webserver behind a load 
> balancer where redirects result in https requests forwarded to http.  I ran 
> into a similar issue with Caravel which also uses gunicorn.  
> https://github.com/airbnb/caravel/issues/978  From that issue:
> {quote}
> When gunicorn is run on a different machine from the load balancer (nginx or 
> ELB), it needs to be told explicitly to trust the X-Forwarded-* headers sent. 
> gunicorn takes an option --forwarded-allow-ips which can either be a comma 
> separated list of ip addresses, or "*" to trust all.
> {quote}
> I don't see a simple way to inject custom arguments to the gunicorn call in 
> `webserver()`.  Rather than making a special case to set 
> --forwarded-allow-ips, it would be nice if the caller of `airflow webserver` 
> could pass an additional gunicorn config file.
> The call to gunicorn is already including a -c and I'm not sure gunicorn will 
> take multiple configs, so maybe we have to parse the config and include each 
> name=value on the gunicorn command line.  Any suggestions on how best to 
> allow this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-923) airflow webserver -D flag doesn't daemonize anymore

2018-10-26 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664905#comment-16664905
 ] 

Iuliia Volkova commented on AIRFLOW-923:


joyce chan, Ash Berlin-Taylor , can we close the task? This issue is not valid 
now. 

> airflow webserver -D flag doesn't daemonize anymore
> ---
>
> Key: AIRFLOW-923
> URL: https://issues.apache.org/jira/browse/AIRFLOW-923
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: joyce chan
>Priority: Trivial
> Fix For: 1.8.0
>
> Attachments: Screen Shot 2018-02-12 at 10.32.23 AM.png, Screen Shot 
> 2018-02-12 at 10.32.33 AM.png
>
>
> Airflow 1.8 rc4
> airflow webserver -D flag doesn't daemonize anymore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-923) airflow webserver -D flag doesn't daemonize anymore

2018-10-26 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664905#comment-16664905
 ] 

Iuliia Volkova edited comment on AIRFLOW-923 at 10/26/18 8:43 AM:
--

[~joyceschan] , [~ashb] , can we close the task? This issue is not valid now. 


was (Author: xnuinside):
joyce chan, Ash Berlin-Taylor , can we close the task? This issue is not valid 
now. 

> airflow webserver -D flag doesn't daemonize anymore
> ---
>
> Key: AIRFLOW-923
> URL: https://issues.apache.org/jira/browse/AIRFLOW-923
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: joyce chan
>Priority: Trivial
> Fix For: 1.8.0
>
> Attachments: Screen Shot 2018-02-12 at 10.32.23 AM.png, Screen Shot 
> 2018-02-12 at 10.32.33 AM.png
>
>
> Airflow 1.8 rc4
> airflow webserver -D flag doesn't daemonize anymore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1592) Add keep-alive argument supported by gunicorn backend to the airflow configuration

2018-10-26 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1592:
---

Assignee: Iuliia Volkova  (was: Demian Ginther)

> Add keep-alive argument supported by gunicorn backend to the airflow 
> configuration
> --
>
> Key: AIRFLOW-1592
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1592
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Demian Ginther
>Assignee: Iuliia Volkova
>Priority: Minor
>
> The --keep-alive option is necessary for gunicorn to function properly with 
> AWS ELBs, as gunicorn appears to have an issue with the ELB timeouts as set 
> by default.
> In addition, it makes no sense to provide a wrapper for a program but not 
> allow all configuration options to be set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1822) Add gaiohttp and gthread gunicorn workerclass option in cli

2018-10-26 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1822:
---

Assignee: Iuliia Volkova

> Add gaiohttp and gthread gunicorn workerclass option in cli
> ---
>
> Key: AIRFLOW-1822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1822
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sanjay Pillai
>Assignee: Iuliia Volkova
>Priority: Minor
>
> gunicorn in min version has been updated to 19.40 
> we need to add cli support for gthread and gaiohttp worker class



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-571) allow gunicorn config to be passed to airflow webserver

2018-10-26 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-571:
--

Assignee: Iuliia Volkova

> allow gunicorn config to be passed to airflow webserver
> ---
>
> Key: AIRFLOW-571
> URL: https://issues.apache.org/jira/browse/AIRFLOW-571
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Dennis O'Brien
>Assignee: Iuliia Volkova
>Priority: Major
>
> I have run into an issue when running airflow webserver behind a load 
> balancer where redirects result in https requests forwarded to http.  I ran 
> into a similar issue with Caravel which also uses gunicorn.  
> https://github.com/airbnb/caravel/issues/978  From that issue:
> {quote}
> When gunicorn is run on a different machine from the load balancer (nginx or 
> ELB), it needs to be told explicitly to trust the X-Forwarded-* headers sent. 
> gunicorn takes an option --forwarded-allow-ips which can either be a comma 
> separated list of ip addresses, or "*" to trust all.
> {quote}
> I don't see a simple way to inject custom arguments to the gunicorn call in 
> `webserver()`.  Rather than making a special case to set 
> --forwarded-allow-ips, it would be nice if the caller of `airflow webserver` 
> could pass an additional gunicorn config file.
> The call to gunicorn is already including a -c and I'm not sure gunicorn will 
> take multiple configs, so maybe we have to parse the config and include each 
> name=value on the gunicorn command line.  Any suggestions on how best to 
> allow this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-23 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660422#comment-16660422
 ] 

Iuliia Volkova edited comment on AIRFLOW-2925 at 10/23/18 10:46 AM:


[~jackjack10] What do you mean? It shows the full log in the task log. This is 
example of output from the real project: 
https://issues.apache.org/jira/secure/attachment/12945193/Screen%20Shot%202018-10-23%20at%201.40.43%20PM.png
  what wrong this it?  The statement from the issue "This does not show the 
full trace of the error which makes it harder to understand the problem." Is 
incorrect. You see full traceback in the interface.

All log is printed: 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L160
 and 
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L132


was (Author: xnuinside):
[~jackjack10] What do you mean? It shows the full log in the task log. This is 
example of output from the real project: 
https://issues.apache.org/jira/secure/attachment/12945193/Screen%20Shot%202018-10-23%20at%201.40.43%20PM.png
  what wrong this it?  The statement from the issue "This does not show the 
full trace of the error which makes it harder to understand the problem." Is 
incorrect. You see full traceback in the interface.

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
> Attachments: Screen Shot 2018-10-23 at 1.40.43 PM.png
>
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-23 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660422#comment-16660422
 ] 

Iuliia Volkova commented on AIRFLOW-2925:
-

[~jackjack10] What do you mean? It shows the full log in the task log. This is 
example of output from the real project:  !Screen Shot 2018-10-23 at 1.40.43 
PM.png!  what wrong this it?  The statement from the issue "This does not show 
the full trace of the error which makes it harder to understand the problem." 
Is incorrect. You see full traceback in the interface.

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
> Attachments: Screen Shot 2018-10-23 at 1.40.43 PM.png
>
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-23 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660422#comment-16660422
 ] 

Iuliia Volkova edited comment on AIRFLOW-2925 at 10/23/18 10:44 AM:


[~jackjack10] What do you mean? It shows the full log in the task log. This is 
example of output from the real project: 
https://issues.apache.org/jira/secure/attachment/12945193/Screen%20Shot%202018-10-23%20at%201.40.43%20PM.png
  what wrong this it?  The statement from the issue "This does not show the 
full trace of the error which makes it harder to understand the problem." Is 
incorrect. You see full traceback in the interface.


was (Author: xnuinside):
[~jackjack10] What do you mean? It shows the full log in the task log. This is 
example of output from the real project:  !Screen Shot 2018-10-23 at 1.40.43 
PM.png!  what wrong this it?  The statement from the issue "This does not show 
the full trace of the error which makes it harder to understand the problem." 
Is incorrect. You see full traceback in the interface.

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
> Attachments: Screen Shot 2018-10-23 at 1.40.43 PM.png
>
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback

2018-10-23 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova updated AIRFLOW-2925:

Attachment: Screen Shot 2018-10-23 at 1.40.43 PM.png

> gcp dataflow hook doesn't show traceback
> 
>
> Key: AIRFLOW-2925
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2925
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>  Labels: easyfix
> Attachments: Screen Shot 2018-10-23 at 1.40.43 PM.png
>
>
> The gcp_dataflow_hook.py has:
>  
> {code:java}
> if self._proc.returncode is not 0:   
> raise Exception("DataFlow failed with return code 
> {}".format(self._proc.returncode))
> {code}
>  
> This does not show the full trace of the error which makes it harder to 
> understand the problem.
> [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171]
>  
>  
> reported on gitter by Oscar Carlsson



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2639) Dagrun of subdags is set to RUNNING immediately

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640011#comment-16640011
 ] 

Iuliia Volkova commented on AIRFLOW-2639:
-

[~ashb], can you help to close the task? Thanks!

> Dagrun of subdags is set to RUNNING immediately
> ---
>
> Key: AIRFLOW-2639
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2639
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This change has a side-effect. The subdag run and it's task instances are 
> eagerly created, the subdag is immediately set to "RUNNING" state. This means 
> it is immediately visible in the UI (tree view and dagrun view).
> In our case we skip the SubDagOperator base on some conditions. However the 
> subdag run is then still visible in th UI and in "RUNNING" state which looks 
> scary, see attached screenshot. Before there was no subdag run visible at all 
> for skipped subdags.
> One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
> will still be visible in the UI but not as running. Another idea is to try to 
> pass the conf directly in the SubDagOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639551#comment-16639551
 ] 

Iuliia Volkova commented on AIRFLOW-2761:
-

[~ashb], got it, I will try to do at weekend. 

> Parallelize Celery Executor enqueuing
> -
>
> Key: AIRFLOW-2761
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently celery executor enqueues in an async fashion but still doing that 
> in a single process loop. This can slows down scheduler loop and creates 
> scheduling delay if we have large # of task to schedule in a short time, e.g. 
> UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-417) UI should not print traceback for missing dag/task in URL

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639553#comment-16639553
 ] 

Iuliia Volkova commented on AIRFLOW-417:


[~ashb], I can take it next week. 

> UI should not print traceback for missing dag/task in URL
> -
>
> Key: AIRFLOW-417
> URL: https://issues.apache.org/jira/browse/AIRFLOW-417
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Vijay Bhat
>Priority: Major
>  Labels: UI, easy-fix
>
> Right now if a user runs tries to do certain things in the UI with dags/tasks 
> that don't exist they get confusing tracebacks rather than an error rendered 
> in html like "the dag/task doesn't exist". One such traceback can be seen by 
> going to the tree view for any DAG in the UI and then changing the url in the 
> address bar for the dag_id to be a non-existent dag. The following traceback 
> can be seen:
> {quote}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1817, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1477, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1381, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1475, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1461, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Admin-1.4.0-py2.7.egg/flask_admin/base.py",
>  line 68, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Admin-1.4.0-py2.7.egg/flask_admin/base.py",
>  line 367, in _run_view
> return fn(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Login-0.2.11-py2.7.egg/flask_login.py",
>  line 758, in decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 213, in view_func
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 118, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 1208, in tree
> base_date = dag.latest_execution_date or datetime.now()
> AttributeError: 'NoneType' object has no attribute 'latest_execution_date'
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1182) Contrib Spark Submit operator should template fields

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639524#comment-16639524
 ] 

Iuliia Volkova commented on AIRFLOW-1182:
-

[~ashb], can you help to close issue?) Thanks!

> Contrib Spark Submit operator should template fields
> 
>
> Key: AIRFLOW-1182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1182
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, operators
>Affects Versions: 1.8.0, 2.0.0
>Reporter: Vianney FOUCAULT
>Assignee: Vianney FOUCAULT
>Priority: Major
> Fix For: 1.10.0
>
>
> the spark submit operator is not templating any field making {{ ds }} 
> unusable for spark apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1182) Contrib Spark Submit operator should template fields

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639524#comment-16639524
 ] 

Iuliia Volkova edited comment on AIRFLOW-1182 at 10/5/18 9:16 AM:
--

[~ashb], can you help with closing of issue?) Thanks!


was (Author: xnuinside):
[~ashb], can you help to close issue?) Thanks!

> Contrib Spark Submit operator should template fields
> 
>
> Key: AIRFLOW-1182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1182
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, operators
>Affects Versions: 1.8.0, 2.0.0
>Reporter: Vianney FOUCAULT
>Assignee: Vianney FOUCAULT
>Priority: Major
> Fix For: 1.10.0
>
>
> the spark submit operator is not templating any field making {{ ds }} 
> unusable for spark apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1013) airflow/jobs.py:manage_slas() exception for @once dag

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639442#comment-16639442
 ] 

Iuliia Volkova commented on AIRFLOW-1013:
-

[~ashb], [~sanand], [~Tagar], Is it fixed? Or it's not needed? (based on what 
already 1.10 exist)

> airflow/jobs.py:manage_slas() exception for @once dag
> -
>
> Key: AIRFLOW-1013
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1013
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.8.0, 1.8.1, 1.8.2
>Reporter: Ruslan Dautkhanov
>Assignee: Muhammad Ahmmad
>Priority: Critical
>  Labels: dagrun, once, scheduler, sla
> Fix For: 1.10.0
>
>
> Getting following exception 
> {noformat}
> [2017-03-19 20:16:25,786] {jobs.py:354} DagFileProcessor2638 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 346, in helper
> pickle_dags)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1581, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 1175, in _process_dags
> self.manage_slas(dag)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/airflow/jobs.py", 
> line 595, in manage_slas
> while dttm < datetime.now():
> TypeError: can't compare datetime.datetime to NoneType
> {noformat}
> Exception is in airflow/jobs.py:manage_slas() :
> https://github.com/apache/incubator-airflow/blob/v1-8-stable/airflow/jobs.py#L595
> {code}
> ts = datetime.now()
> SlaMiss = models.SlaMiss
> for ti in max_tis:
> task = dag.get_task(ti.task_id)
> dttm = ti.execution_date
> if task.sla:
> dttm = dag.following_schedule(dttm)
>   >>>   while dttm < datetime.now():  <<< here
> following_schedule = dag.following_schedule(dttm)
> if following_schedule + task.sla < datetime.now():
> session.merge(models.SlaMiss(
> task_id=ti.task_id,
> {code}
> It seems that dag.following_schedule() returns None for @once dag?
> Here's how dag is defined:
> {code}
> main_dag = DAG(
> dag_id = 'DISCOVER-Oracle-Load',
> default_args   = default_args,   
> user_defined_macros= dag_macros,   
> start_date = datetime.now(), 
> catchup= False,  
> schedule_interval  = '@once',
> concurrency= 2,  
> max_active_runs= 1,  
> dagrun_timeout = timedelta(days=4),  
> )
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-675) Add an error log to the UI of Airflow.

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639440#comment-16639440
 ] 

Iuliia Volkova commented on AIRFLOW-675:


[~lukem], [~ashb], it seems to me, what this issue not needed now.  

> Add an error log to the UI of Airflow. 
> ---
>
> Key: AIRFLOW-675
> URL: https://issues.apache.org/jira/browse/AIRFLOW-675
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Luke Maycock
>Assignee: Luke Maycock
>Priority: Minor
>  Labels: logging
>
> When we started using Airflow, we noticed that some error messages were not 
> presented to a user using the UI. Also, a user using the UI will not 
> necessarily have access to the error log so it makes sense to be able to 
> display the error log in the UI. 
> This should be merged after: 
> https://github.com/apache/incubator-airflow/pull/1921/files
> as this introduces an error log. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-661) Celery Task Result Expiry

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639436#comment-16639436
 ] 

Iuliia Volkova commented on AIRFLOW-661:


[~ashb], please close the issue, as you mentioned in  
https://github.com/apache/incubator-airflow/pull/2143 it was fixed: 
https://github.com/apache/incubator-airflow/pull/2842

> Celery Task Result Expiry
> -
>
> Key: AIRFLOW-661
> URL: https://issues.apache.org/jira/browse/AIRFLOW-661
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, executor
>Reporter: Robin Miller
>Assignee: Robin Miller
>Priority: Minor
>
> When using RabbitMQ as the Celery Results Backend, it is desirable to be able 
> to set the CELERY_TASK_RESULT_EXPIRES config option to reduce the time out 
> period of the task tombstones to less than a day. As such we should pull this 
> option from the airflow.cfg file and pass it through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-595) PigOperator

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639433#comment-16639433
 ] 

Iuliia Volkova commented on AIRFLOW-595:


[~hwbj], does it actual at version 1.10?

> PigOperator
> ---
>
> Key: AIRFLOW-595
> URL: https://issues.apache.org/jira/browse/AIRFLOW-595
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 1.7.1.3
>Reporter: wei.he
>Priority: Major
> Attachments: .jpg
>
>
> When I use the PigOperator,   I happen to  two issues.
> h3. 1. How should I add the "-param" to make the pig script run dynamically. 
> For example,
> I hope that it run like *pig -Dmapreduce.job.name=test 
> -Dmapreduce.job.queuename=mapreduce -param input=/tmp/input/* -param 
> output=/tmp/output -f test.pig*
> {code:title=test.pig|borderStyle=solid}
> run -param log_name=raw_log_bid -param src_folder='${input}'  load.pig;
> A = FOREACH raw_log_bid GENERATE ActionId AS ActionId;
> A = DISTINCT A;
> C = GROUP A ALL;
> D = foreach C GENERATE COUNT(A);
> STORE D INTO '$output';
> {code}
> {code:title=task.pyp|borderStyle=solid}
> ...
> task_pigjob = PigOperator(
> task_id='task_pigjob',
> pigparams_jinja_translate=True,
> pig='test.pig',
> #pig=templated_command,
> dag=dag)
> 
> {code}
> How can I set the "param" ?
> h3. 2. After I add the ConnId "pig_cli_default" for PigHook and  set the 
> Extra to {"pig_properties": "-Dpig.tmpfilecompression=true"},  I  run test 
> dag.
> I got the following log . 
> {quote}
> INFO - pig -f /tmp/airflow_pigop_c83K9T/tmpqw5on_ 
> -Dpig.tmpfilecompression=true 
> [2016-10-25 17:32:44,619] {ipy_pig_hook.py:105} INFO - any environment 
> variables that are set by the pig command.
> [2016-10-25 17:32:44,901] {models.py:1286} ERROR -
> Apache Pig version 0.12.1.2.1.4.0-632 (rexported)
> compiled Jul 29 2014, 18:24:35
> USAGE: Pig [options] [-] : Run interactively in grunt shell.
>Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
>Pig [options] [-f[ile]] file : Run cmds found in file.
>   options include:
> -4, -log4jconf - Log4j configuration file, overrides log conf
> -b, -brief - Brief logging (no timestamps)
> -c, -check - Syntax check
> -d, -debug - Debug level, INFO is default
> -e, -execute - Commands to execute (within quotes)
> -f, -file - Path to the script to execute
> -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
> -h, -help - Display this message. You can specify topic to get help for 
> that topic.
> properties is the only topic currently supported: -h properties.
> -i, -version - Display version information
> -l, -logfile - Path to client side log file; default is current working 
> directory.
> -m, -param_file - Path to the parameter file
> -p, -param - Key value pair of the form param=val
> -r, -dryrun - Produces script with substituted parameters. Script is not 
> executed.
> -t, -optimizer_off - Turn optimizations off. The following values are 
> supported:
> SplitFilter - Split filter conditions
> PushUpFilter - Filter as early as possible
> MergeFilter - Merge filter conditions
> PushDownForeachFlatten - Join or explode as late as possible
> LimitOptimizer - Limit as early as possible
> ColumnMapKeyPrune - Remove unused data
> AddForEach - Add ForEach to remove unneeded columns
> MergeForEach - Merge adjacent ForEach
> GroupByConstParallelSetter - Force parallel 1 for "group all" 
> statement
> All - Disable all optimizations
> All optimizations listed here are enabled by default. Optimization 
> values are case insensitive.
> -v, -verbose - Print all error messages to screen
> -w, -warning - Turn warning logging on; also turns warning aggregation off
> -x, -exectype - Set execution mode: local|mapreduce, default is mapreduce.
> -F, -stop_on_failure - Aborts execution on the first failed job; default 
> is off
> -M, -no_multiquery - Turn multiquery optimization off; default is on
> -P, -propertyFile - Path to property file
> -printCmdDebug - Overrides anything else and prints the actual command 
> used to run Pig, including
>  any environment variables that are set by the pig 
> command.
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/lib/python2.7/site-packages/airflow/operators/ipy_pig_operator.py", 
> line 56, in execute
> self.hook.run_cli(pig=self.pig)
>   File "/usr/lib/python2.7/site-packages/airflow/hooks/ipy_pig_hook.py", line 
> 109, in run_cli

[jira] [Commented] (AIRFLOW-506) airflow-pr tool should warn if merge is run before setup_git_remotes

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639431#comment-16639431
 ] 

Iuliia Volkova commented on AIRFLOW-506:


[~xuanji], [~ashb], is it still needed? I mean, task was open on September 2016.

> airflow-pr tool should warn if merge is run before setup_git_remotes
> 
>
> Key: AIRFLOW-506
> URL: https://issues.apache.org/jira/browse/AIRFLOW-506
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: PR tool
>Reporter: Li Xuanji
>Assignee: Li Xuanji
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1673) dagrun.dependency-check stat contains space in metric name

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639416#comment-16639416
 ] 

Iuliia Volkova commented on AIRFLOW-1673:
-

[~ashb], assigned to me by mistake. Sorry. Could you please fix the task - 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L5179 
it's already fixed in master

> dagrun.dependency-check stat contains space in metric name
> --
>
> Key: AIRFLOW-1673
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1673
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.2
>Reporter: Andrew Jones
>Assignee: Iuliia Volkova
>Priority: Minor
>  Labels: pull-request-available
>
> In {{models.py}}, we save a stat with the [following 
> code|https://github.com/apache/incubator-airflow/blob/afd927a256d8ff97e59b28a3caf81ac0bf0d07f3/airflow/models.py#L4563]:
> {code}
> Stats.timing("dagrun.dependency-check.{}.{}".
>  format(self.dag_id, self.execution_date), duration)
> {code}
> {{self.execution_date}} is introducing a space in the metric name, so what 
> gets sent to stats is something like this:
> {code}
> airflow.dagrun.dependency-check.dagid.2017-09-25 00:00:00:32.253000|ms
> {code}
> A space isn't valid here and should be removed.
> We could either remove the space from the datetime, save only the date, or 
> remove the datetime from the stat name completely.
> Maybe just change {{self.execution_date}} to 
> {{self.execution_date.isoformat()}}?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-417) UI should not print traceback for missing dag/task in URL

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639429#comment-16639429
 ] 

Iuliia Volkova commented on AIRFLOW-417:


[~ashb], [~sanand], [~aoen] Does this task needed at 1.10? 

> UI should not print traceback for missing dag/task in URL
> -
>
> Key: AIRFLOW-417
> URL: https://issues.apache.org/jira/browse/AIRFLOW-417
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Vijay Bhat
>Priority: Major
>  Labels: UI
>
> Right now if a user runs tries to do certain things in the UI with dags/tasks 
> that don't exist they get confusing tracebacks rather than an error rendered 
> in html like "the dag/task doesn't exist". One such traceback can be seen by 
> going to the tree view for any DAG in the UI and then changing the url in the 
> address bar for the dag_id to be a non-existent dag. The following traceback 
> can be seen:
> {quote}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1817, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1477, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1381, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1475, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask-0.10.1-py2.7.egg/flask/app.py", 
> line 1461, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Admin-1.4.0-py2.7.egg/flask_admin/base.py",
>  line 68, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Admin-1.4.0-py2.7.egg/flask_admin/base.py",
>  line 367, in _run_view
> return fn(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/Flask_Login-0.2.11-py2.7.egg/flask_login.py",
>  line 758, in decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 213, in view_func
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/utils.py", line 
> 118, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 1208, in tree
> base_date = dag.latest_execution_date or datetime.now()
> AttributeError: 'NoneType' object has no attribute 'latest_execution_date'
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-460) user Dag`s visibility bug

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639430#comment-16639430
 ] 

Iuliia Volkova commented on AIRFLOW-460:


[~ashb], [~zstan], could we close bug issues relative to 1.7? 

> user Dag`s visibility bug
> -
>
> Key: AIRFLOW-460
> URL: https://issues.apache.org/jira/browse/AIRFLOW-460
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.7.1
>Reporter: Stanilovsky Evgeny
>Priority: Major
>
> after setting:
> {quote}
> filter_by_owner = True
> authenticate = True
> auth_backend = airflow.contrib.auth.backends.ldap_auth
> [ldap]
> superuser_filter = 
> memberOf=CN=airflow-super-users,OU=Groups,OU=RWC,OU=US,OU=NORAM,DC=example,DC=com
> {quote}
> dags list interface became empty.
> i suppose it`s due to :
> *ldap_auth.py*
> {quote}
> if not user:
> user = models.User(
> username=username,
> is_superuser=False)
> session.merge(user)
> session.commit()
> flask_login.login_user(LdapUser(user))
> {quote}
> and after:
> *views.py*
> {quote}session.query(DM)
> .filter(
> ~DM.is_subdag, DM.is_active,
> {color:red}#DM.owners == current_user.username){color}
> DM.owners == current_user.user.username)
> {quote}
> {quote}
> dags = {
> dag.dag_id: dag
> for dag in dags
> if (
> {color:red}#dag.owner == current_user.username and (not 
> dag.parent_dag){color}
> dag.owner == current_user.user.username and (not 
> dag.parent_dag)
> )
> }
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1391) airflow trigger_dag cannot serialize exec_date when using the json client

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639422#comment-16639422
 ] 

Iuliia Volkova edited comment on AIRFLOW-1391 at 10/5/18 7:44 AM:
--

[~ricardogsilva], hi! Did you try 1.10 ? Does this error exist on new versions 
or we could close the issue? 


was (Author: xnuinside):
[~ricardogsilva], Did you try 1.10 ? Does this error exist on new versions or 
we could close the issue? 

> airflow trigger_dag cannot serialize exec_date when using the json client
> -
>
> Key: AIRFLOW-1391
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1391
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.8.0
>Reporter: Ricardo Garcia Silva
>Priority: Major
>  Labels: easyfix, newbie
>
> The {{airflow trigger_dag}} command cannot serialize a {{datetime.datetime}} 
> when the cli is configured to use the {{json_client}}.
> The command:
> {code}
> airflow trigger_dag --run_id test1 --exec_date 2017-01-01 
> example_bash_operator
> {code}
> Throws the error:
> {code}
> Traceback (most recent call last):
>   File "/home/geo2/.venvs/cglops-dissemination/bin/airflow", line 28, in 
> 
> args.func(args)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/airflow/bin/cli.py",
>  line 180, in trigger_dag
> execution_date=args.exec_date)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/airflow/api/client/json_client.py",
>  line 32, in trigger_dag
> "execution_date": execution_date,
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/api.py",
>  line 112, in post
> return request('post', url, data=data, json=json, **kwargs)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/api.py",
>  line 58, in request
> return session.request(method=method, url=url, **kwargs)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/sessions.py",
>  line 488, in request
> prep = self.prepare_request(req)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/sessions.py",
>  line 431, in prepare_request
> hooks=merge_hooks(request.hooks, self.hooks),
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/models.py",
>  line 308, in prepare
> self.prepare_body(data, files, json)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/models.py",
>  line 458, in prepare_body
> body = complexjson.dumps(json)
>   File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
> return _default_encoder.encode(obj)
>   File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
> chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
> return _iterencode(o, 0)
>   File "/usr/lib/python2.7/json/encoder.py", line 184, in default
> raise TypeError(repr(o) + " is not JSON serializable")
> TypeError: datetime.datetime(2017, 1, 1, 0, 0) is not JSON serializable
> {code}
> The same command works fine if airflow is configured to use the 
> {{local_client}} instead.
> \\
> A fix for this would need to encode the {{datetime}} as a string in the 
> client then being able to deserialize back to a datetime in the server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1391) airflow trigger_dag cannot serialize exec_date when using the json client

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639422#comment-16639422
 ] 

Iuliia Volkova commented on AIRFLOW-1391:
-

[~ricardogsilva], Did you try 1.10 ? Does this error exist on new versions or 
we could close the issue? 

> airflow trigger_dag cannot serialize exec_date when using the json client
> -
>
> Key: AIRFLOW-1391
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1391
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.8.0
>Reporter: Ricardo Garcia Silva
>Priority: Major
>  Labels: easyfix, newbie
>
> The {{airflow trigger_dag}} command cannot serialize a {{datetime.datetime}} 
> when the cli is configured to use the {{json_client}}.
> The command:
> {code}
> airflow trigger_dag --run_id test1 --exec_date 2017-01-01 
> example_bash_operator
> {code}
> Throws the error:
> {code}
> Traceback (most recent call last):
>   File "/home/geo2/.venvs/cglops-dissemination/bin/airflow", line 28, in 
> 
> args.func(args)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/airflow/bin/cli.py",
>  line 180, in trigger_dag
> execution_date=args.exec_date)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/airflow/api/client/json_client.py",
>  line 32, in trigger_dag
> "execution_date": execution_date,
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/api.py",
>  line 112, in post
> return request('post', url, data=data, json=json, **kwargs)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/api.py",
>  line 58, in request
> return session.request(method=method, url=url, **kwargs)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/sessions.py",
>  line 488, in request
> prep = self.prepare_request(req)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/sessions.py",
>  line 431, in prepare_request
> hooks=merge_hooks(request.hooks, self.hooks),
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/models.py",
>  line 308, in prepare
> self.prepare_body(data, files, json)
>   File 
> "/home/geo2/.venvs/cglops-dissemination/local/lib/python2.7/site-packages/requests/models.py",
>  line 458, in prepare_body
> body = complexjson.dumps(json)
>   File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
> return _default_encoder.encode(obj)
>   File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
> chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
> return _iterencode(o, 0)
>   File "/usr/lib/python2.7/json/encoder.py", line 184, in default
> raise TypeError(repr(o) + " is not JSON serializable")
> TypeError: datetime.datetime(2017, 1, 1, 0, 0) is not JSON serializable
> {code}
> The same command works fine if airflow is configured to use the 
> {{local_client}} instead.
> \\
> A fix for this would need to encode the {{datetime}} as a string in the 
> client then being able to deserialize back to a datetime in the server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639418#comment-16639418
 ] 

Iuliia Volkova commented on AIRFLOW-2761:
-

[~yrqls21], oh, okey. Great! I just didn't see open PR, thank you! 

> Parallelize Celery Executor enqueuing
> -
>
> Key: AIRFLOW-2761
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently celery executor enqueues in an async fashion but still doing that 
> in a single process loop. This can slows down scheduler loop and creates 
> scheduling delay if we have large # of task to schedule in a short time, e.g. 
> UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-1673) dagrun.dependency-check stat contains space in metric name

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639416#comment-16639416
 ] 

Iuliia Volkova edited comment on AIRFLOW-1673 at 10/5/18 7:39 AM:
--

[~ashb], I assigned to me by mistake. Sorry. Could you please fix the task - 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L5179 
it's already fixed in master


was (Author: xnuinside):
[~ashb], assigned to me by mistake. Sorry. Could you please fix the task - 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L5179 
it's already fixed in master

> dagrun.dependency-check stat contains space in metric name
> --
>
> Key: AIRFLOW-1673
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1673
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.2
>Reporter: Andrew Jones
>Assignee: Iuliia Volkova
>Priority: Minor
>  Labels: pull-request-available
>
> In {{models.py}}, we save a stat with the [following 
> code|https://github.com/apache/incubator-airflow/blob/afd927a256d8ff97e59b28a3caf81ac0bf0d07f3/airflow/models.py#L4563]:
> {code}
> Stats.timing("dagrun.dependency-check.{}.{}".
>  format(self.dag_id, self.execution_date), duration)
> {code}
> {{self.execution_date}} is introducing a space in the metric name, so what 
> gets sent to stats is something like this:
> {code}
> airflow.dagrun.dependency-check.dagid.2017-09-25 00:00:00:32.253000|ms
> {code}
> A space isn't valid here and should be removed.
> We could either remove the space from the datetime, save only the date, or 
> remove the datetime from the stat name completely.
> Maybe just change {{self.execution_date}} to 
> {{self.execution_date.isoformat()}}?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2325) Task logging with AWS Cloud watch

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639410#comment-16639410
 ] 

Iuliia Volkova commented on AIRFLOW-2325:
-

[~fangpenlin], what with this task and PR - 
https://github.com/apache/incubator-airflow/pull/3229 ? Is it needed?

> Task logging with AWS Cloud watch
> -
>
> Key: AIRFLOW-2325
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2325
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: logging
>Reporter: Fang-Pen Lin
>Priority: Minor
>
> In many cases, it's ideal to use remote logging while running Airflow in 
> production, as the worker could be easily scale down or scale up. Or the 
> worker is running in containers, where the local storage is not meant to be 
> there forever. In that case, the S3 task logging handler could be used
> [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/s3_task_handler.py]
> However, it comes with drawback. S3 logging handler only uploads the log when 
> the task completed or failed. For long running tasks, it's hard to know 
> what's going on with the process until it finishes.
> To make more real-time logging, I built a logging handler based on AWS 
> CloudWatch. It uses a third party python package `watchtower`
>  
> [https://github.com/kislyuk/watchtower/tree/master/watchtower]
>  
> I created a PR here [https://github.com/apache/incubator-airflow/pull/3229], 
> basically I just copy-pasted the code I wrote for my own project, it works 
> fine with 1.9 release, but never tested with master branch. Also, there is a 
> bug in watchtower causing task runner to hang forever when it completes. I 
> created an issue in their repo
> [https://github.com/kislyuk/watchtower/issues/57]
> And a PR for addressing that issue 
> [https://github.com/kislyuk/watchtower/pull/58]
>  
> The PR is still far from ready to be reviewed, but I just want to get some 
> feedback before I spend more time on it. I would like to see if youguys want 
> this cloudwatch handler goes into the main repo, or do youguys prefer it to 
> be a standalone third-party module. If it's that case, I can close this 
> ticket and create a standalone repo on my own. If the PR is welcome, then I 
> can spend more time on polishing it based on your feedback, add tests / 
> documents and other stuff.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1673) dagrun.dependency-check stat contains space in metric name

2018-10-05 Thread Iuliia Volkova (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iuliia Volkova reassigned AIRFLOW-1673:
---

Assignee: Iuliia Volkova  (was: Andrew Jones)

> dagrun.dependency-check stat contains space in metric name
> --
>
> Key: AIRFLOW-1673
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1673
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.8.2
>Reporter: Andrew Jones
>Assignee: Iuliia Volkova
>Priority: Minor
>  Labels: pull-request-available
>
> In {{models.py}}, we save a stat with the [following 
> code|https://github.com/apache/incubator-airflow/blob/afd927a256d8ff97e59b28a3caf81ac0bf0d07f3/airflow/models.py#L4563]:
> {code}
> Stats.timing("dagrun.dependency-check.{}.{}".
>  format(self.dag_id, self.execution_date), duration)
> {code}
> {{self.execution_date}} is introducing a space in the metric name, so what 
> gets sent to stats is something like this:
> {code}
> airflow.dagrun.dependency-check.dagid.2017-09-25 00:00:00:32.253000|ms
> {code}
> A space isn't valid here and should be removed.
> We could either remove the space from the datetime, save only the date, or 
> remove the datetime from the stat name completely.
> Maybe just change {{self.execution_date}} to 
> {{self.execution_date.isoformat()}}?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2001) Make sensors relinquish their execution slots

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639412#comment-16639412
 ] 

Iuliia Volkova commented on AIRFLOW-2001:
-

[~seelmann], [~ysagade], could we close this issue? Does it solved?

> Make sensors relinquish their execution slots
> -
>
> Key: AIRFLOW-2001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2001
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, scheduler
>Reporter: Yati
>Assignee: Yati
>Priority: Major
>
> A sensor task instance should not take up an execution slot for the entirety 
> of its lifetime (as is currently the case). Indeed, for reasons outlined 
> below, it would be better if sensor execution was preempted by the scheduler 
> by parking it away from the slot till the next poll.
>  Some sensors sense for a condition to be true which is affected only by an 
> external party (e.g., materialization by external means of certain rows in a 
> table). By external, I mean external to the Airflow installation in question, 
> such that the producing entity itself does not need an execution slot in an 
> Airflow pool. If all sensors and their dependencies were of this nature, 
> there would be no issue. Unfortunately, a lot of real world DAGs have sensor 
> dependencies on results produced by another task, typically in some other 
> DAG, but scheduled by the same Airflow scheduler.
> Consider a simple example (arrow direction represents "must happen before", 
> just like in Airflow): DAG1(a >> b) and DAG2(c:sensor(DAG1.b) >> d). In other 
> words, The opening task c of the second dag has a sensor dependency on the 
> ending task b of the first dag. Imagine we have a single pool with 10 
> execution slots, and somehow task instances for c fill up the pool, while the 
> corresponding task instances of DAG1.b have not had a chance to execute (in 
> the real world this happens because of, say, back-fills or reprocesses by 
> clearing those sensors instances and their upstream). This is a deadlock 
> situation, since no progress can be made here – the sensors have filled up 
> the pool waiting on tasks that themselves will never get a chance to run. 
> This problem has been [acknowledged 
> here|https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls]
> One way (suggested by Fokko) to solve this is to always run sensors on their 
> pool, and to be careful with the concurrency settings of sensor tasks. This 
> is what a lot of users do now, but there are better solutions to this. Since 
> all the sensor interface allows for is a poll, we can, after each poll, 
> "park" the sensor's execution slot and yield it to other tasks. In the above 
> scenario, there would be no "filling up" of the pool by sensors tasks, as 
> they will be polled, determined to be still unfulfilled, and then parked 
> away, thereby giving a chance to other tasks.
> This would likely have some changes to the DB, and of course to the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2516) Deadlock found when trying to update task_instance table

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639406#comment-16639406
 ] 

Iuliia Volkova commented on AIRFLOW-2516:
-

[~jeffliujing], Already exist version 1.10 of Airflow and community do not 
release fixes to 1.8 version. 

[~ashb], [~jeffliujing] could we close this task or that we should to do this 
it? 

> Deadlock found when trying to update task_instance table
> 
>
> Key: AIRFLOW-2516
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2516
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.0
>Reporter: Jeff Liu
>Priority: Major
>
>  
>  
> {code:java}
> [2018-05-23 17:59:57,218] {base_task_runner.py:98} INFO - Subtask: 
> [2018-05-23 17:59:57,217] {base_executor.py:49} INFO - Adding to queue: 
> airflow run production_wipeout_wipe_manager.Carat Carat_20180227 
> 2018-05-23T17:41:18.815809 --local -sd DAGS_FOLDER/wipeout/wipeout.py
> [2018-05-23 17:59:57,231] {base_task_runner.py:98} INFO - Subtask: Traceback 
> (most recent call last):
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/bin/airflow", line 27, in 
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: 
> args.func(args)
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
> [2018-05-23 17:59:57,232] {base_task_runner.py:98} INFO - Subtask: 
> pool=args.pool,
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in 
> wrapper
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: result = 
> func(*args, **kwargs)
> [2018-05-23 17:59:57,233] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1532, in 
> _run_raw_task
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: 
> self.handle_failure(e, test_mode, context)
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1641, in 
> handle_failure
> [2018-05-23 17:59:57,234] {base_task_runner.py:98} INFO - Subtask: 
> session.merge(self)
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 
> 1920, in merge
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: 
> _resolve_conflict_map=_resolve_conflict_map)
> [2018-05-23 17:59:57,235] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 
> 1974, in _merge
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: merged = 
> self.query(mapper.class_).get(key[1])
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 882, 
> in get
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: ident, 
> loading.load_on_pk_identity)
> [2018-05-23 17:59:57,236] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 952, 
> in _get_impl
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: return 
> db_load_fn(self, primary_key_identity)
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 247, 
> in load_on_pk_i
> dentity
> [2018-05-23 17:59:57,237] {base_task_runner.py:98} INFO - Subtask: return 
> q.one()
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2884, 
> in one
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: ret = 
> self.one_or_none()
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2854, 
> in one_or_none
> [2018-05-23 17:59:57,238] {base_task_runner.py:98} INFO - Subtask: ret = 
> list(self)
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, 
> in __iter__
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - Subtask: return 
> self._execute_and_instances(context)
> [2018-05-23 17:59:57,239] {base_task_runner.py:98} INFO - Subtask: File 
> "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2946, 
> in _execute_and_instances
> [2018-05-23 17:59:57,240] {base_task_runner.py:98} INFO - Subtask: 
> 

[jira] [Commented] (AIRFLOW-2639) Dagrun of subdags is set to RUNNING immediately

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639403#comment-16639403
 ] 

Iuliia Volkova commented on AIRFLOW-2639:
-

[~seelmann], [~ashb], so based on last comment, I see what we need this task 
more, right? Could we close this ticket? 

> Dagrun of subdags is set to RUNNING immediately
> ---
>
> Key: AIRFLOW-2639
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2639
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chao-Han Tsai
>Assignee: Chao-Han Tsai
>Priority: Major
>
> This change has a side-effect. The subdag run and it's task instances are 
> eagerly created, the subdag is immediately set to "RUNNING" state. This means 
> it is immediately visible in the UI (tree view and dagrun view).
> In our case we skip the SubDagOperator base on some conditions. However the 
> subdag run is then still visible in th UI and in "RUNNING" state which looks 
> scary, see attached screenshot. Before there was no subdag run visible at all 
> for skipped subdags.
> One option I see is to not set subdags to "RUNNING" state but "NONE". Then it 
> will still be visible in the UI but not as running. Another idea is to try to 
> pass the conf directly in the SubDagOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639399#comment-16639399
 ] 

Iuliia Volkova commented on AIRFLOW-2761:
-

[~yrqls21], hi Kevin! Do you have any plans for this task? Update/reopen PR?

> Parallelize Celery Executor enqueuing
> -
>
> Key: AIRFLOW-2761
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently celery executor enqueues in an async fashion but still doing that 
> in a single process loop. This can slows down scheduler loop and creates 
> scheduling delay if we have large # of task to schedule in a short time, e.g. 
> UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2790) snakebite syntax error: baseTime = min(time * (1L << retries), cap);

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639375#comment-16639375
 ] 

Iuliia Volkova commented on AIRFLOW-2790:
-

PR what fixed this issue was already merged in master. Could we close the task? 
[~ashb], [~yohei]

> snakebite syntax error: baseTime = min(time * (1L << retries), cap);
> 
>
> Key: AIRFLOW-2790
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2790
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.9.0
> Environment: Amazon Linux
>Reporter: Yohei Onishi
>Priority: Major
>
> Does anybody know how can I fix this issue?
>  * Got the following error when importing 
> airflow.operators.sensors.ExternalTaskSensor.
>  * apache-airflow 1.9.0 depends on snakebite 2.11.0 and it does not work with 
> Python3. https://github.com/spotify/snakebite/issues/250
> [2018-07-23 06:42:51,828] \{models.py:288} ERROR - Failed to import: 
> /home/airflow/airflow/dags/example_task_sensor2.py
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, 
> in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/usr/lib64/python3.6/imp.py", line 172, in load_source
> module = _load(spec)
>   File "", line 675, in _load
>   File "", line 655, in _load_unlocked
>   File "", line 678, in exec_module
>   File "", line 205, in _call_with_frames_removed
>   File "/home/airflow/airflow/dags/example_task_sensor2.py", line 10, in 
> 
> from airflow.operators.sensors import ExternalTaskSensor
>   File "/usr/local/lib/python3.6/site-packages/airflow/operators/sensors.py", 
> line 34, in 
> from airflow.hooks.hdfs_hook import HDFSHook
>   File "/usr/local/lib/python3.6/site-packages/airflow/hooks/hdfs_hook.py", 
> line 20, in 
> from snakebite.client import Client, HAClient, Namenode, AutoConfigClient
>   File "/usr/local/lib/python3.6/site-packages/snakebite/client.py", line 1473
> baseTime = min(time * (1L << retries), cap);
> ^



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2805) Display user's local timezone and DAG's timezone on UI

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639372#comment-16639372
 ] 

Iuliia Volkova commented on AIRFLOW-2805:
-

[~verdan], [~ashb], PR was merged 
https://github.com/apache/incubator-airflow/pull/3687, could we close the 
ticket? 

> Display user's local timezone and DAG's timezone on UI
> --
>
> Key: AIRFLOW-2805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
> Attachments: Screen Shot 2018-08-02 at 1.08.53 PM.png
>
>
> The UI currently only displays the UTC timezone which is also not in human 
> readable forms on all places. 
> Make all the date times in human readable forms. 
> Also, we need to display user's local timezone and DAG's timezone along with 
> UTC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2811) Fix scheduler_ops_metrics.py to work

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639367#comment-16639367
 ] 

Iuliia Volkova commented on AIRFLOW-2811:
-

[~ashb], [~sekikn], PR is merged, please, close the task

> Fix scheduler_ops_metrics.py to work
> 
>
> Key: AIRFLOW-2811
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2811
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>
> I tried to run {{scripts/perf/scheduler_ops_metrics.py}} but it failed with 
> the following error:
> {code}
> $ python scripts/perf/scheduler_ops_metrics.py 
> (snip)
> Traceback (most recent call last):
>   File "scripts/perf/scheduler_ops_metrics.py", line 192, in 
> main()
>   File "scripts/perf/scheduler_ops_metrics.py", line 188, in main
> job.run()
>   File "/home/sekikn/dev/incubator-airflow/airflow/jobs.py", line 202, in run
> self._execute()
>   File "/home/sekikn/dev/incubator-airflow/airflow/jobs.py", line 1584, in 
> _execute
> self._execute_helper(processor_manager)
>   File "/home/sekikn/dev/incubator-airflow/airflow/jobs.py", line 1714, in 
> _execute_helper
> self.heartbeat()
>   File "scripts/perf/scheduler_ops_metrics.py", line 121, in heartbeat
> for dag in dags for task in dag.tasks])
> TypeError: can't subtract offset-naive and offset-aware datetimes
> {code}
> Also, it'd be nice if {{MAX_RUNTIME_SECS}} were configurable, since the 
> default value (6 seconds) is too short for all TaskInstances to finish in my 
> environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2933) Enable Codecov on Docker CI

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639363#comment-16639363
 ] 

Iuliia Volkova commented on AIRFLOW-2933:
-

[~ashb], Hi, Ash! please, close the issue, PR already merged: 
https://github.com/apache/incubator-airflow/pull/3780

> Enable Codecov on Docker CI
> ---
>
> Key: AIRFLOW-2933
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2933
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> Right now the Codecov plugin is not working on the docker-ci that we're using 
> right now. This has to be fixed so we can track code coverage over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3020) LDAP Authentication doesn't check whether a user belongs to a group correctly

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639360#comment-16639360
 ] 

Iuliia Volkova commented on AIRFLOW-3020:
-

[~zeninpalm], do you plan to reopen pull request?

> LDAP Authentication doesn't check whether a user belongs to a group correctly
> -
>
> Key: AIRFLOW-3020
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3020
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Yi Wei
>Assignee: Yi Wei
>Priority: Major
>
> According to Airflow documentation at 
> [https://airflow.apache.org/security.html#ldap,] to enable LDAP 
> authentication, we should write airflow.cfg like this:
> [ldap]
> uri = ldap://XXX.YYY.org
> user_filter = objectClass=*
> user_name_attr = sAMAccountName
> superuser_filter = CN=XXX_Programmers
> bind_user = user_on_ldap
> bind_password = insecure
> basedn =OU=Some,DC=other,DC=org
> search_scope = SUBTREE
>  
> But after enabling LDAP authentication, I just cannot log in with a superuser 
> role. I double-checked my membership to the superuser groups and confirmed I 
> belong to the specified group in 'superuser_filter', still Airflow won't 
> recognize me as a superuser.
> So, I checked airflow/contrib/auth/backends/ldap_auth.py, the 
> group_contains_user function doesn't work as I expected:
>  
> This line:
> conn.search(native(search_base), native(search_filter), 
> attributes=[native(user_name_attr)])
> it search the group and extracts the sAMAccountName attribute of the group, 
> then:
>  for entry in conn.entries:
>   if user_name in getattr(entry, user_name_attr).values:
>      return True
> the code snippet will never return True, because how can user_name occur in 
> group_name anyway? 
> Not sure if this issue only occurs in my company, please correct me if you 
> have any suggestion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3107) Create Dynamic External Task Sensor to handle non exact timeframes

2018-10-05 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639358#comment-16639358
 ] 

Iuliia Volkova commented on AIRFLOW-3107:
-

[~bdesmet], Do I right understand, what you want to add those code to 
apache-airflow master or discuss of need it to be add to apache-airflow master? 
If it so, best way is to open PR and mentioned in this task somebody of 
contributors team. 
But I'm not sure, what make sense to do it like different Sensor,  for me maybe 
more better way was to add data_range param in TimeRangeExternalTaskSensor, and 
use for this 
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html 
for example, it's not a problem because Airflow anyway depend on pandas like 
instal requirement. I'm not sure what here is needed separate Sensor. 

[~ashb], [~Fokko], hi guys! Maybe you can take a look on this sensor, need it 
to be add or not?

> Create Dynamic External Task Sensor to handle non exact timeframes
> --
>
> Key: AIRFLOW-3107
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3107
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Bert Desmet
>Priority: Major
>  Labels: dependencies, dynamic_dependencies, sensors
> Fix For: 2.0.0
>
>
> All, 
> For a project I'm working on it is necessary to check if a specific task has 
> run somewhere between 30 days ago and now. 
> To facilitate this I have patched airflow to include the 
> 'TimeRangeExternalTaskSensor'  as provided by 
> {color:#33}[omnilinguist|https://github.com/omnilinguist] in the 
> following pull request: 
> [https://github.com/apache/incubator-airflow/pull/1641] {color}
> I have updated his code so it aligns better with how the ExternalTaskSensor 
> has been implemented. 
> Currently we are running this patch in production on a 1.9 version of 
> Airflow. I have now merged this with the master branch - but this code has 
> not yet fully been tested. 
>  
> The code can be found here:  
> [https://github.com/biertie/incubator-airflow/blob/master/airflow/sensors/time_range_external_task_sensor.py]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2933) Enable Codecov on Docker CI

2018-10-02 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635337#comment-16635337
 ] 

Iuliia Volkova commented on AIRFLOW-2933:
-

[~Fokko], issue still open.. PR already merged

> Enable Codecov on Docker CI
> ---
>
> Key: AIRFLOW-2933
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2933
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> Right now the Codecov plugin is not working on the docker-ci that we're using 
> right now. This has to be fixed so we can track code coverage over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3123) Allow nested use of DAG as a context manager

2018-10-02 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635103#comment-16635103
 ] 

Iuliia Volkova commented on AIRFLOW-3123:
-

[~newtonle], [~ashb], please don't forget to close the issue, PR was merged. 

> Allow nested use of  DAG as a context manager
> -
>
> Key: AIRFLOW-3123
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3123
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Newton Le
>Assignee: Newton Le
>Priority: Major
>
> DAG context manager fails under some cases with nested contexts:
> {code:python}
> with DAG( ... ) as dag:
>   op1 = Operator()
>   with dag:
> op2 = Operator()
>   op3 = Operator
> {code}
> op3 will not continue to be assigned the original DAG after exiting the 
> nested context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3124) Broken webserver debug mode (RBAC)

2018-10-02 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635101#comment-16635101
 ] 

Iuliia Volkova commented on AIRFLOW-3124:
-

PR was merged. Could we close the issue? [~ajkosel] [~TaoFeng]

> Broken webserver debug mode (RBAC)
> --
>
> Key: AIRFLOW-3124
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3124
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Reporter: Aaron Kosel
>Assignee: Aaron Kosel
>Priority: Minor
>
> {code:java}
> Traceback (most recent call last):
> File "/usr/local/bin/airflow", line 7, in 
> exec(compile(f.read(), __file__, 'exec'))
> File "/airflow/airflow/bin/airflow", line 32, in 
> args.func(args)
> File "/airflow/airflow/utils/cli.py", line 74, in wrapper
> return f(*args, **kwargs)
> File "/airflow/airflow/bin/cli.py", line 875, in webserver
> app.run(debug=True, port=args.port, host=args.hostname,
> AttributeError: 'tuple' object has no attribute 'run'
> {code}
> Nearly the same issue as https://issues.apache.org/jira/browse/AIRFLOW-2204, 
> but only affecting RBAC debug mode. The problem is that `create_app` returns 
> a tuple, but the `cli` script expects to just receive the flask app back 
> without the appbuilder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3125) Add monitoring on Task Instance creation rate

2018-10-02 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635100#comment-16635100
 ] 

Iuliia Volkova commented on AIRFLOW-3125:
-

[~xiamingye], don't forget to close JIRA issue :) your PR already merged!
[~ashb]

> Add monitoring on Task Instance creation rate
> -
>
> Key: AIRFLOW-3125
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3125
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Mingye Xia
>Assignee: Mingye Xia
>Priority: Major
>
> Monitoring on Task Instance creation rate can give us some visibility on how 
> much workload we are putting on Airflow. It can be used for resource 
> allocation in the long run (i.e. to determine when we should scale up 
> workers) and and debugging in scenarios like creation rate for certain types 
> of Task Instances spike.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3129) Improve test coverage

2018-10-02 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635099#comment-16635099
 ] 

Iuliia Volkova commented on AIRFLOW-3129:
-

[~ashb], maybe make sense such big task separate for subtask, or create epic 
for what? different developers can take such subtasks or tasks in epic. 
Separate it in blocks that need be covered with tests.  Just proposal. 

> Improve test coverage
> -
>
> Key: AIRFLOW-3129
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3129
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Priority: Minor
>
> Overall test coverage is about 75%. It would be great to improve coverage. 
> I'll start by backfilling some missing tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3036) Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL

2018-09-26 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629341#comment-16629341
 ] 

Iuliia Volkova commented on AIRFLOW-3036:
-

[~smith-m] please set task unassigned

> Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL
> 
>
> Key: AIRFLOW-3036
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3036
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, db
>Affects Versions: 1.10.0
> Environment: Google Cloud Platform, Google Kubernetes Engine, Airflow 
> 1.10 on Debian Stretch, Google Cloud SQL MySQL
>Reporter: Smith Mathieu
>Assignee: Iuliia Volkova
>Priority: Blocker
>  Labels: 1.10, google, google-cloud-sql
> Fix For: 2.0.0
>
>
> The upgrade path to airflow 1.10 seems impossible for users of MySQL in 
> Google's Cloud SQL service given new mysql requirements for 1.10.
>  
> When executing "airflow upgradedb"
> ```
>  INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 
> 0e2a74e0fc9f, Add time zone awareness
>  Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 32, in 
>  args.func(args)
>  File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 1002, 
> in initdb
>  db_utils.initdb(settings.RBAC)
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 92, 
> in initdb
>  upgradedb()
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 346, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python3.6/site-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python3.6/site-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/compat.py", line 
> 68, in load_module_py
>  module_id, path).load_module(module_id)
>  File "", line 399, in 
> _check_name_wrapper
>  File "", line 823, in load_module
>  File "", line 682, in load_module
>  File "", line 265, in _load_module_shim
>  File "", line 684, in _load
>  File "", line 665, in _load_unlocked
>  File "", line 678, in exec_module
>  File "", line 219, in _call_with_frames_removed
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 91, in 
>  run_migrations_online()
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 86, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python3.6/site-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python3.6/site-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py",
>  line 46, in upgrade
>  raise Exception("Global variable explicit_defaults_for_timestamp needs to be 
> on (1) for mysql")
>  Exception: Global variable explicit_defaults_for_timestamp needs to be on 
> (1) for mysql
>  ```
>   
> Reading documentation for upgrading to airflow 1.10, it seems the requirement 
> for explicit_defaults_for_timestamp=1 was intentional. 
>  
> However,  MySQL on Google Cloud SQL does not support configuring this 
> variable and it is off by default. Users of MySQL and Cloud SQL do not have 
> an upgrade path to 1.10. Alas, so close to the mythical Kubernetes Executor.
> In GCP, Cloud SQL is _the_ hosted MySQL solution. 
> [https://cloud.google.com/sql/docs/mysql/flags]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3036) Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL

2018-09-26 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629339#comment-16629339
 ] 

Iuliia Volkova commented on AIRFLOW-3036:
-

[~smith-m] I'm not sure what somebody could resolve it without Bolke
[~bolke]

> Upgrading to Airflow 1.10 not possible using GCP Cloud SQL for MYSQL
> 
>
> Key: AIRFLOW-3036
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3036
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core, db
>Affects Versions: 1.10.0
> Environment: Google Cloud Platform, Google Kubernetes Engine, Airflow 
> 1.10 on Debian Stretch, Google Cloud SQL MySQL
>Reporter: Smith Mathieu
>Assignee: Iuliia Volkova
>Priority: Blocker
>  Labels: 1.10, google, google-cloud-sql
> Fix For: 2.0.0
>
>
> The upgrade path to airflow 1.10 seems impossible for users of MySQL in 
> Google's Cloud SQL service given new mysql requirements for 1.10.
>  
> When executing "airflow upgradedb"
> ```
>  INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 
> 0e2a74e0fc9f, Add time zone awareness
>  Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 32, in 
>  args.func(args)
>  File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 1002, 
> in initdb
>  db_utils.initdb(settings.RBAC)
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 92, 
> in initdb
>  upgradedb()
>  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 346, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python3.6/site-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python3.6/site-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python3.6/site-packages/alembic/util/compat.py", line 
> 68, in load_module_py
>  module_id, path).load_module(module_id)
>  File "", line 399, in 
> _check_name_wrapper
>  File "", line 823, in load_module
>  File "", line 682, in load_module
>  File "", line 265, in _load_module_shim
>  File "", line 684, in _load
>  File "", line 665, in _load_unlocked
>  File "", line 678, in exec_module
>  File "", line 219, in _call_with_frames_removed
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 91, in 
>  run_migrations_online()
>  File "/usr/local/lib/python3.6/site-packages/airflow/migrations/env.py", 
> line 86, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python3.6/site-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python3.6/site-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py",
>  line 46, in upgrade
>  raise Exception("Global variable explicit_defaults_for_timestamp needs to be 
> on (1) for mysql")
>  Exception: Global variable explicit_defaults_for_timestamp needs to be on 
> (1) for mysql
>  ```
>   
> Reading documentation for upgrading to airflow 1.10, it seems the requirement 
> for explicit_defaults_for_timestamp=1 was intentional. 
>  
> However,  MySQL on Google Cloud SQL does not support configuring this 
> variable and it is off by default. Users of MySQL and Cloud SQL do not have 
> an upgrade path to 1.10. Alas, so close to the mythical Kubernetes Executor.
> In GCP, Cloud SQL is _the_ hosted MySQL solution. 
> [https://cloud.google.com/sql/docs/mysql/flags]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2961) Speed up test_backfill_examples test

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624112#comment-16624112
 ] 

Iuliia Volkova commented on AIRFLOW-2961:
-

[~Fokko], , [~ashb], it's already merged, pls, close the task

> Speed up test_backfill_examples test
> 
>
> Key: AIRFLOW-2961
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2961
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3003) Pull the krb5 image instead of building it

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624108#comment-16624108
 ] 

Iuliia Volkova commented on AIRFLOW-3003:
-

[~Fokko], , [~ashb], it's already merged, pls, close the task


> Pull the krb5 image instead of building it
> --
>
> Key: AIRFLOW-3003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3003
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> For the CI we use a krb5 image to test kerberos functionality. This is not 
> something that we want to since it is faster to pull the finished image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3060) DAG context manager fails to exit properly in certain circumstances

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624100#comment-16624100
 ] 

Iuliia Volkova commented on AIRFLOW-3060:
-

[~newtonle], [~ashb], it's already merged, pls, close the task

> DAG context manager fails to exit properly in certain circumstances
> ---
>
> Key: AIRFLOW-3060
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3060
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Newton Le
>Assignee: Newton Le
>Priority: Major
>
> In certain circumstances, such as in more complex DAGs where users utilize 
> helper functions to add tasks, it may be possible to get into a condition 
> where effectively there is a nested DAG context using the same DAG. When this 
> happens, exiting both contexts does not reset `_CONTEXT_MANAGER_DAG`.
> This is especially problematic because the problem is seen in a later DAG, 
> and the source of the error is not apparent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3073) A note is needed in 'Data Profiling' doc page to reminder users it's no longer supported in new webserver UI

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624099#comment-16624099
 ] 

Iuliia Volkova commented on AIRFLOW-3073:
-

[~XD-DENG], pls close the task, it's already merged
or [~ashb]

> A note is needed in 'Data Profiling' doc page to reminder users it's no 
> longer supported in new webserver UI
> 
>
> Key: AIRFLOW-3073
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3073
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> In [https://airflow.incubator.apache.org/profiling.html,] it's not mentioned 
> at all that these features are no longer supported in new webser (FAB-based) 
> due to security concern 
> (https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#breaking-changes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2885) A Bug in www_rbac.utils.get_params

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624074#comment-16624074
 ] 

Iuliia Volkova commented on AIRFLOW-2885:
-

[~XD-DENG], why you close PR? will you work on this task?

> A Bug in www_rbac.utils.get_params
> --
>
> Key: AIRFLOW-2885
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2885
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> *get_params(page=0, search="abc",showPaused=False)* returns 
> "_search=abc=False_", while it's supposed to return 
> "page=0=abc=False".
> This is because Python takes 0 as False when it's used in a conditional 
> statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2874) Enable Flask App Builder theme support

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623409#comment-16623409
 ] 

Iuliia Volkova commented on AIRFLOW-2874:
-

[~verdan], can you close the task?  as the PR was already merged
or [~ashb] ))

> Enable Flask App Builder theme support
> --
>
> Key: AIRFLOW-2874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2874
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
>
> To customize the look and feel of Apache Airflow (an effort towards making 
> Airflow a whitelabel application), we should enable the support of FAB's 
> theme, which can be set in configuration. 
> Theme can be use in conjunction of existing `navbar_color` configuration or 
> can be used separately by simple unsetting the navbar_color config. 
>  
> http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2829) Brush up the CI script for minikube

2018-09-21 Thread Iuliia Volkova (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623405#comment-16623405
 ] 

Iuliia Volkova commented on AIRFLOW-2829:
-

[~sekikn], could you close the ticket, as the PR was already merged? or [~ashb]

> Brush up the CI script for minikube
> ---
>
> Key: AIRFLOW-2829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2829
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>
> Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found 
> some points that can be improved:
> - minikube version is hard-coded
> - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}}
> - Undefined variables: {{$unameOut}}
> - The following lines cause warnings if download is skipped:
> {code}
>  69 sudo mv bin/minikube /usr/local/bin/minikube
>  70 sudo mv bin/kubectl /usr/local/bin/kubectl
> {code}
> - {{return}} s at line 81 and 96 won't work since it's outside of a function
> - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See 
> https://github.com/kubernetes/minikube/issues/1883.
> {code}
> 105 _MINIKUBE="sudo PATH=$PATH minikube"
> 106 
> 107 $_MINIKUBE config set bootstrapper localkube
> 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION}  
> --vm-driver=none
> 109 $_MINIKUBE update-context
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >