[jira] [Updated] (AIRFLOW-2298) Add Kalibrr to who uses Airflow

2018-04-07 Thread Charles Verdad (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Verdad updated AIRFLOW-2298:

Description: https://github.com/apache/incubator-airflow/pull/3194

> Add Kalibrr to who uses Airflow
> ---
>
> Key: AIRFLOW-2298
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2298
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Charles Verdad
>Assignee: Charles Verdad
>Priority: Trivial
>
> https://github.com/apache/incubator-airflow/pull/3194



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2298) Add Kalibrr to who uses Airflow

2018-04-07 Thread Charles Verdad (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Verdad updated AIRFLOW-2298:

External issue URL: https://github.com/apache/incubator-airflow/pull/3194

> Add Kalibrr to who uses Airflow
> ---
>
> Key: AIRFLOW-2298
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2298
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Charles Verdad
>Assignee: Charles Verdad
>Priority: Trivial
>
> https://github.com/apache/incubator-airflow/pull/3194



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2298) Add Kalibrr to who uses Airflow

2018-04-07 Thread Charles Verdad (JIRA)
Charles Verdad created AIRFLOW-2298:
---

 Summary: Add Kalibrr to who uses Airflow
 Key: AIRFLOW-2298
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2298
 Project: Apache Airflow
  Issue Type: Task
Reporter: Charles Verdad
Assignee: Charles Verdad






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs

2018-04-07 Thread Chris Bandy (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429355#comment-16429355
 ] 

Chris Bandy commented on AIRFLOW-2128:
--

[~szmate1618] what is your {{scheduler.min_file_process_interval}} (or 
{{AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to?

> 'Tall' DAGs scale worse than 'wide' DAGs
> 
>
> Key: AIRFLOW-2128
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun, scheduler
>Affects Versions: 1.9.0
>Reporter: Máté Szabó
>Priority: Major
>  Labels: performance, usability
> Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... 
> -> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
> ... 0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the 
> tasks are just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 
> 2 example DAGs to take (approximately) the same time to run, but apparently 
> this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, 
> for the tall one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the 
> tree view for the wide DAG takes about 6 seconds on my machine, but for the 
> tall one it takes significantly longer, in fact currently it does not load at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs

2018-04-07 Thread Chris Bandy (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429355#comment-16429355
 ] 

Chris Bandy edited comment on AIRFLOW-2128 at 4/7/18 1:21 PM:
--

[~szmate1618] what is your {{scheduler.min_file_process_interval}} (or 
{{AIRFLOW_\_SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to?


was (Author: cbandy):
[~szmate1618] what is your {{scheduler.min_file_process_interval}} (or 
{{AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL}} environment) set to?

> 'Tall' DAGs scale worse than 'wide' DAGs
> 
>
> Key: AIRFLOW-2128
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun, scheduler
>Affects Versions: 1.9.0
>Reporter: Máté Szabó
>Priority: Major
>  Labels: performance, usability
> Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... 
> -> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
> ... 0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the 
> tasks are just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 
> 2 example DAGs to take (approximately) the same time to run, but apparently 
> this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, 
> for the tall one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the 
> tree view for the wide DAG takes about 6 seconds on my machine, but for the 
> tall one it takes significantly longer, in fact currently it does not load at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2298) Add Kalibrr to who uses Airflow

2018-04-07 Thread Charles Verdad (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2298 started by Charles Verdad.
---
> Add Kalibrr to who uses Airflow
> ---
>
> Key: AIRFLOW-2298
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2298
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Charles Verdad
>Assignee: Charles Verdad
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2292) Fix docstring for S3Hook.get_wildcard_key

2018-04-07 Thread Kengo Seki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kengo Seki reassigned AIRFLOW-2292:
---

Assignee: Kengo Seki

> Fix docstring for S3Hook.get_wildcard_key
> -
>
> Key: AIRFLOW-2292
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2292
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs, Documentation
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> S3Hook.get_wildcard_key's docstring says as follows, but the right name is 
> {{wildcard_key}}. It's doubly misleading since the pattern must be specified 
> as glob, not regex.
> {code:python}
> def get_wildcard_key(self, wildcard_key, bucket_name=None, delimiter=''):
> """
> Returns a boto3.s3.Object object matching the regular expression
> :param regex_key: the path to the key
> :type regex_key: str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)