[jira] [Updated] (AIRFLOW-3414) reload_module not working with custom logging class

2018-11-29 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-3414:

Issue Type: Bug  (was: Improvement)

> reload_module not working with custom logging class
> ---
>
> Key: AIRFLOW-3414
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3414
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.2
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> If using custom logging class, the reload_module in dag_processing.py will 
> fail because it will try to reload default logging class, which is not loaded 
> at the first place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3414) reload_module not working with custom logging class

2018-11-28 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-3414:
---

 Summary: reload_module not working with custom logging class
 Key: AIRFLOW-3414
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3414
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.10.2
Reporter: Kevin Yang
Assignee: Kevin Yang


If using custom logging class, the reload_module in dag_processing.py will fail 
because it will try to reload default logging class, which is not loaded at the 
first place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3393) Fix bug in usage of reload_module

2018-11-24 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-3393:
---

 Summary: Fix bug in usage of reload_module
 Key: AIRFLOW-3393
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3393
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang
Assignee: Kevin Yang


The[ reload_module 
usage|https://github.com/apache/incubator-airflow/blob/master/airflow/utils/dag_processing.py#L479]
 is wrong. Need to remove the last section in the package string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3392) Add index on dag_id in sla_miss table

2018-11-24 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-3392:
---

 Summary: Add index on dag_id in sla_miss table
 Key: AIRFLOW-3392
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3392
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang
Assignee: Kevin Yang


The select queries on sla_miss table produce a great % of DB traffic and thus 
made the DB CPU usage unnecessarily high. It would be a low hanging fruit to 
add an index and reduce the load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3194) Refactor session creation to use with block

2018-10-12 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-3194:

Description: There are a lot usage of session = settings.Session() in the 
code base and would be nice to refactor them all to use with create_session() 
as session block.  (was: There are a lot usage of session = settings.Session() 
in the code base and would be nice to refactor them all to use with 
settings.Session() as session block.)

> Refactor session creation to use with block
> ---
>
> Key: AIRFLOW-3194
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3194
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Minor
>
> There are a lot usage of session = settings.Session() in the code base and 
> would be nice to refactor them all to use with create_session() as session 
> block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3194) Refactor session creation to use with block

2018-10-12 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-3194:
---

 Summary: Refactor session creation to use with block
 Key: AIRFLOW-3194
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3194
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


There are a lot usage of session = settings.Session() in the code base and 
would be nice to refactor them all to use with settings.Session() as session 
block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1268) Celery bug can cause tasks to be delayed indefinitely

2018-10-05 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640403#comment-16640403
 ] 

Kevin Yang commented on AIRFLOW-1268:
-

[~lbodeen] Thank you! That's great context. I do agree with you if that's the 
case.

[~saguziel] While this issue is no longer valid, I still think we need some 
sort of requeue, at least optional, to make Airflow more robust to surprises on 
the celery side. Wanna create a new issue for that? 

> Celery bug can cause tasks to be delayed indefinitely
> -
>
> Key: AIRFLOW-1268
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1268
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
> Environment: With celery_executor with redis
>Reporter: Alex Guziel
>Priority: Critical
>
> With celery, tasks can get delayed indefinitely (or default 1 hour) due to a 
> bug with celery, see https://github.com/celery/celery/issues/3765



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1268) Celery bug can cause tasks to be delayed indefinitely

2018-10-05 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640385#comment-16640385
 ] 

Kevin Yang commented on AIRFLOW-1268:
-

Hi [~lbodeen], did you went ahead and verified it in 4.2? From the original 
celery issue it seems like it was just supposed to be verified in 4.2 but that 
haven't happen yet.

> Celery bug can cause tasks to be delayed indefinitely
> -
>
> Key: AIRFLOW-1268
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1268
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
> Environment: With celery_executor with redis
>Reporter: Alex Guziel
>Priority: Critical
>
> With celery, tasks can get delayed indefinitely (or default 1 hour) due to a 
> bug with celery, see https://github.com/celery/celery/issues/3765



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-10-05 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639409#comment-16639409
 ] 

Kevin Yang commented on AIRFLOW-2761:
-

[~xnuinside] Hi Luliia, I actually have [an open 
PR|https://github.com/KevinYang21/incubator-airflow/pull/4] for it but it is 
right now blocked by [this 
PR|https://github.com/apache/incubator-airflow/pull/3873]. Seems like the 
committers are quite busy so not sure about when can I get unblocked.

> Parallelize Celery Executor enqueuing
> -
>
> Key: AIRFLOW-2761
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently celery executor enqueues in an async fashion but still doing that 
> in a single process loop. This can slows down scheduler loop and creates 
> scheduling delay if we have large # of task to schedule in a short time, e.g. 
> UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2442) Airflow run command leaves database connections open

2018-09-24 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626515#comment-16626515
 ] 

Kevin Yang commented on AIRFLOW-2442:
-

I suppose this issue is resolved. let's resolve the ticket.

> Airflow run command leaves database connections open
> 
>
> Key: AIRFLOW-2442
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2442
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.8.0
>Reporter: Alejandro Fernandez
>Assignee: Alejandro Fernandez
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: connection_duration_1_hour.png, db_connections.png, 
> fixed_before_and_after.jpg, monthly_db_connections.png, running_tasks.png
>
>
> *Summary*
> The "airflow run" command creates a connection to the database and leaves it 
> open (until killed by SQLALchemy later). The number of these connections can 
> skyrocket whenever hundreds/thousands of tasks are launched simultaneously, 
> and potentially hit the database connection limit.
> The problem is that in cli.py, the run() method first calls 
> {code:java}
> settings.configure_orm(disable_connection_pool=True){code}
> correctly
>  to use a NullPool, but then parses any custom configs and again calls
> {code:java}
> settings.configure_orm(){code}
> , thereby overriding the desired behavior by instead using a QueuePool.
>  The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE and 
> SQL_ALCHEMY_POOL_RECYCLE. This means that while the task is running and the 
> executor is sending heartbeats, the sleeping connection is idle until it is 
> killed by SQLAlchemy.
> This fixes a bug introduced by 
> [https://github.com/apache/incubator-airflow/pull/1934] in 
> [https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
>   
> which is present in branches 1-8-stable, 1-9-stable, and 1-10-test
> NOTE: Will create a PR once I've done more testing since I'm on an older 
> branch. For now, attaching a patch file [^AIRFLOW-2442.patch]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2760) DAG parsing loop coupled with scheduler loop

2018-07-30 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2760:
---

Assignee: Kevin Yang

> DAG parsing loop coupled with scheduler loop
> 
>
> Key: AIRFLOW-2760
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2760
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Currently DAG parsing loop is coupled with scheduler loop, meaning that if 
> scheduler loop became slow, we will parse DAG slower.
> As a simple producer and consumer pattern, we shall have them decoupled and 
> completely remove the scheduling bottleneck placed by DAG parsing--which is 
> identified in Airbnb as the current biggest bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2760) DAG parsing loop coupled with scheduler loop

2018-07-30 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2760 started by Kevin Yang.
---
> DAG parsing loop coupled with scheduler loop
> 
>
> Key: AIRFLOW-2760
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2760
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Currently DAG parsing loop is coupled with scheduler loop, meaning that if 
> scheduler loop became slow, we will parse DAG slower.
> As a simple producer and consumer pattern, we shall have them decoupled and 
> completely remove the scheduling bottleneck placed by DAG parsing--which is 
> identified in Airbnb as the current biggest bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2762) Parallelize DAG parsing in webserver

2018-07-18 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548352#comment-16548352
 ] 

Kevin Yang commented on AIRFLOW-2762:
-

[~ashb] Good idea. Though I am a bit concerned about the parsing time--we have 
a couple framework DAGs that takes tens of seconds to parse. I think in this 
case cache beforehand during start up may even be better than cache lazily. 
This might also create two sources for webserver to find DAG and potentially 
create inconsistency within the webserver if the files on the scheduler and 
webservers are not synced. I think to parse the DAG into simple DAG would be a 
relatively safer way to approach this.

> Parallelize DAG parsing in webserver
> 
>
> Key: AIRFLOW-2762
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2762
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently the webserver parses DagBag in a single thread fashion and causes 
> the start up time to be slow when we have large # of DAG files. Webservers 
> should not need the actual DAG object and this should be parallelized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2762) Parallelize DAG parsing in webserver

2018-07-17 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547113#comment-16547113
 ] 

Kevin Yang edited comment on AIRFLOW-2762 at 7/17/18 9:13 PM:
--

[~ashb] Ty a lot for providing your opinions. I think that is good idea, since 
it will also provide some sort of consistency between scheduler and webserver. 
Though to be able to do that, we need to store more info in the DagModel that 
webserver needs, e.g. the dependency. I am also not very sure about how much 
extra load that would place on the DB. I think if we go this route, we might 
want to build a DAG parsing component that parses DAG for both scheduler and 
webserver. I think before we decided to do that, we can try parallelize the 
parsing on webserver--the work can be reused when we have the DAG parsing 
service since the webserver will be using the serializable info of the DAG 
instead of the the DAG object in both cases. 


was (Author: yrqls21):
[~ashb] Ty for the opinions. I think that is good idea, since it will also 
provide some sort of consistency between scheduler and webserver. Though to be 
able to do that, we need to store more info in the DagModel that webserver 
needs, e.g. the dependency. I am also not very sure about how much extra load 
that would place on the DB. I think if we go this route, we might want to build 
a DAG parsing component that parses DAG for both scheduler and webserver. I 
think before we decided to do that, we can try parallelize the parsing on 
webserver--the work can be reused when we have the DAG parsing service since 
the webserver will be using the serializable info of the DAG instead of the the 
DAG object in both cases. 

> Parallelize DAG parsing in webserver
> 
>
> Key: AIRFLOW-2762
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2762
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently the webserver parses DagBag in a single thread fashion and causes 
> the start up time to be slow when we have large # of DAG files. Webservers 
> should not need the actual DAG object and this should be parallelized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2762) Parallelize DAG parsing in webserver

2018-07-17 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547113#comment-16547113
 ] 

Kevin Yang commented on AIRFLOW-2762:
-

[~ashb] Ty for the opinions. I think that is good idea, since it will also 
provide some sort of consistency between scheduler and webserver. Though to be 
able to do that, we need to store more info in the DagModel that webserver 
needs, e.g. the dependency. I am also not very sure about how much extra load 
that would place on the DB. I think if we go this route, we might want to build 
a DAG parsing component that parses DAG for both scheduler and webserver. I 
think before we decided to do that, we can try parallelize the parsing on 
webserver--the work can be reused when we have the DAG parsing service since 
the webserver will be using the serializable info of the DAG instead of the the 
DAG object in both cases. 

> Parallelize DAG parsing in webserver
> 
>
> Key: AIRFLOW-2762
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2762
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Priority: Major
>
> Currently the webserver parses DagBag in a single thread fashion and causes 
> the start up time to be slow when we have large # of DAG files. Webservers 
> should not need the actual DAG object and this should be parallelized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2762) Parallelize DAG parsing in webserver

2018-07-17 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2762:
---

 Summary: Parallelize DAG parsing in webserver
 Key: AIRFLOW-2762
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2762
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


Currently the webserver parses DagBag in a single thread fashion and causes the 
start up time to be slow when we have large # of DAG files. Webservers should 
not need the actual DAG object and this should be parallelized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2761) Parallelize Celery Executor enqueuing

2018-07-17 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2761:
---

 Summary: Parallelize Celery Executor enqueuing
 Key: AIRFLOW-2761
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2761
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


Currently celery executor enqueues in an async fashion but still doing that in 
a single process loop. This can slows down scheduler loop and creates 
scheduling delay if we have large # of task to schedule in a short time, e.g. 
UTC midnight we need to schedule large # of sensors in a short period.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2760) DAG parsing loop coupled with scheduler loop

2018-07-17 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2760:
---

 Summary: DAG parsing loop coupled with scheduler loop
 Key: AIRFLOW-2760
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2760
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


Currently DAG parsing loop is coupled with scheduler loop, meaning that if 
scheduler loop became slow, we will parse DAG slower.

As a simple producer and consumer pattern, we shall have them decoupled and 
completely remove the scheduling bottleneck placed by DAG parsing--which is 
identified in Airbnb as the current biggest bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2756) Marking DAG run does not set start_time and end_time correctly

2018-07-16 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2756:
---

Assignee: Kevin Yang

> Marking DAG run does not set start_time and end_time correctly
> --
>
> Key: AIRFLOW-2756
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2756
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Marking DAG run right now always set end_time while it should set start_time 
> when marking RUNNING and otherwise end_time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2756) Marking DAG run does not set start_time and end_time correctly

2018-07-16 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2756:
---

 Summary: Marking DAG run does not set start_time and end_time 
correctly
 Key: AIRFLOW-2756
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2756
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang


Marking DAG run right now always set end_time while it should set start_time 
when marking RUNNING and otherwise end_time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2648) Mapred job name in HiveOperator hard to parse and order can be improved

2018-06-23 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-2648:

Description: 
Existing format: "Airflow HiveOperator task for 
{hostname}.{dag_id}.{task_id}.{execution_date}".
Proposing to make it configurable since it is a bit hard to parse.

  was:
Existing format: "Airflow HiveOperator task for 
{hostname}.{dag_id}.{task_id}.{execution_date}".
Proposing "{dag_id}.{task_id}.{execution_date}.{hostname}"


> Mapred job name in HiveOperator hard to parse and order can be improved
> ---
>
> Key: AIRFLOW-2648
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2648
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Existing format: "Airflow HiveOperator task for 
> {hostname}.{dag_id}.{task_id}.{execution_date}".
> Proposing to make it configurable since it is a bit hard to parse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2648) Mapred job name in HiveOperator hard to parse and order can be improved

2018-06-23 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2648 started by Kevin Yang.
---
> Mapred job name in HiveOperator hard to parse and order can be improved
> ---
>
> Key: AIRFLOW-2648
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2648
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Existing format: "Airflow HiveOperator task for 
> {hostname}.{dag_id}.{task_id}.{execution_date}".
> Proposing to make it configurable since it is a bit hard to parse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2648) Mapred job name in HiveOperator hard to parse and order can be improved

2018-06-23 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2648:
---

Assignee: Kevin Yang

> Mapred job name in HiveOperator hard to parse and order can be improved
> ---
>
> Key: AIRFLOW-2648
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2648
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Existing format: "Airflow HiveOperator task for 
> {hostname}.{dag_id}.{task_id}.{execution_date}".
> Proposing "{dag_id}.{task_id}.{execution_date}.{hostname}"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2648) Mapred job name in HiveOperator hard to parse and order can be improved

2018-06-19 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2648:
---

 Summary: Mapred job name in HiveOperator hard to parse and order 
can be improved
 Key: AIRFLOW-2648
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2648
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


Existing format: "Airflow HiveOperator task for 
{hostname}.{dag_id}.{task_id}.{execution_date}".
Proposing "{dag_id}.{task_id}.{execution_date}.{hostname}"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2624) Airflow webserver broken out of the box

2018-06-14 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2624:
---

 Summary: Airflow webserver broken out of the box
 Key: AIRFLOW-2624
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2624
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


`airflow webserver` and then click on any DAG, I get
```
  File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", 
line 364, in view_func
return f(*args, **kwargs)
  File "/Users/kevin_yang/ext_repos/incubator-airflow/airflow/www/utils.py", 
line 251, in wrapper
user = current_user.user.username
AttributeError: 'NoneType' object has no attribute 'username'
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2615:
---

Assignee: Kevin Yang

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all in the parent process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-2615:

Summary: Webserver parent not using cached app  (was: Webserver not using 
cached app)

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2615) Webserver parent not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-2615:

Description: From what I can tell, the app cached 
[here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
 attempt to cache the app for later use-likely to be for the expensive DagBag() 
creation. Before I dive into the webserver parsing everything in one process 
problem, I was hoping this cached app would save me sometime. However it seems 
to me that every subprocess spun up by gunicorn is trying to create the 
DagBag() right after they've been created--make sense to me since we didn't 
share the cached app to the subprocess( doubt we can). If what I observed is 
true, why do we cache the app at all in the parent process?  (was: From what I 
can tell, the app cached 
[here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
 attempt to cache the app for later use-likely to be for the expensive DagBag() 
creation. Before I dive into the webserver parsing everything in one process 
problem, I was hoping this cached app would save me sometime. However it seems 
to me that every subprocess spun up by gunicorn is trying to create the 
DagBag() right after they've been created--make sense to me since we didn't 
share the cached app to the subprocess( doubt we can). If what I observed is 
true, why do we cache the app at all?)

> Webserver parent not using cached app
> -
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all in the parent process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2615) Webserver not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512059#comment-16512059
 ] 

Kevin Yang commented on AIRFLOW-2615:
-

Adding a little bit context here: Airbnb has ~2000 DAG file in our centralized 
DAG repo and it takes a long time to parse the entire repo, this extra app 
creation is basically doubling the time we need to refresh webserver worker.

> Webserver not using cached app
> --
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2615) Webserver not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-2615:

Description: From what I can tell, the app cached 
[here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
 attempt to cache the app for later use-likely to be for the expensive DagBag() 
creation. Before I dive into the webserver parsing everything in one process 
problem, I was hoping this cached app would save me sometime. However it seems 
to me that every subprocess spun up by gunicorn is trying to create the 
DagBag() right after they've been created--make sense to me since we didn't 
share the cached app to the subprocess( doubt we can). If what I observed is 
true, why do we cache the app at all?  (was: From what I can tell, the app 
cached here attempt to cache the app for later use-likely to be for the 
expensive DagBag() creation. Before I dive into the webserver parsing 
everything in one process problem, I was hoping this cached app would save me 
sometime. However it seems to me that every subprocess spun up by gunicorn is 
trying to create the DagBag() right after they've been created--make sense to 
me since we didn't share the cached app to the subprocess( doubt we can). If 
what I observed is true, why do we cache the app at all?)

> Webserver not using cached app
> --
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached 
> [here|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L790]
>  attempt to cache the app for later use-likely to be for the expensive 
> DagBag() creation. Before I dive into the webserver parsing everything in one 
> process problem, I was hoping this cached app would save me sometime. However 
> it seems to me that every subprocess spun up by gunicorn is trying to create 
> the DagBag() right after they've been created--make sense to me since we 
> didn't share the cached app to the subprocess( doubt we can). If what I 
> observed is true, why do we cache the app at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2615) Webserver not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512057#comment-16512057
 ] 

Kevin Yang edited comment on AIRFLOW-2615 at 6/14/18 7:10 AM:
--

[~joygao] Not very confident in the webserver area, would you kindly provide 
your opinion here please? Thank you!


was (Author: yrqls21):
[~joygao] Not very confident in the webserver area, would you kindly provide 
you opinion here please? Thank you!

> Webserver not using cached app
> --
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached here attempt to cache the app for later 
> use-likely to be for the expensive DagBag() creation. Before I dive into the 
> webserver parsing everything in one process problem, I was hoping this cached 
> app would save me sometime. However it seems to me that every subprocess spun 
> up by gunicorn is trying to create the DagBag() right after they've been 
> created--make sense to me since we didn't share the cached app to the 
> subprocess( doubt we can). If what I observed is true, why do we cache the 
> app at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2615) Webserver not using cached app

2018-06-14 Thread Kevin Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512057#comment-16512057
 ] 

Kevin Yang commented on AIRFLOW-2615:
-

[~joygao] Not very confident in the webserver area, would you kindly provide 
you opinion here please? Thank you!

> Webserver not using cached app
> --
>
> Key: AIRFLOW-2615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> From what I can tell, the app cached here attempt to cache the app for later 
> use-likely to be for the expensive DagBag() creation. Before I dive into the 
> webserver parsing everything in one process problem, I was hoping this cached 
> app would save me sometime. However it seems to me that every subprocess spun 
> up by gunicorn is trying to create the DagBag() right after they've been 
> created--make sense to me since we didn't share the cached app to the 
> subprocess( doubt we can). If what I observed is true, why do we cache the 
> app at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2615) Webserver not using cached app

2018-06-14 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2615:
---

 Summary: Webserver not using cached app
 Key: AIRFLOW-2615
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2615
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang


>From what I can tell, the app cached here attempt to cache the app for later 
>use-likely to be for the expensive DagBag() creation. Before I dive into the 
>webserver parsing everything in one process problem, I was hoping this cached 
>app would save me sometime. However it seems to me that every subprocess spun 
>up by gunicorn is trying to create the DagBag() right after they've been 
>created--make sense to me since we didn't share the cached app to the 
>subprocess( doubt we can). If what I observed is true, why do we cache the app 
>at all?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2605) MySqlHook().run() will not commit if autocommit is set to True.

2018-06-13 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2605:
---

 Summary: MySqlHook().run() will not commit if autocommit is set to 
True.
 Key: AIRFLOW-2605
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2605
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


MySql [set autocommit in a different 
way|https://github.com/PyMySQL/mysqlclient-python/blob/master/MySQLdb/connections.py#L249-L256].
 Thus setting it by doing `conn.autocommit = True` as we currently do will not 
set autocommit correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2597) dbapi hook not committing when autocommit is set to false

2018-06-11 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2597 started by Kevin Yang.
---
> dbapi hook not committing when autocommit is set to false
> -
>
> Key: AIRFLOW-2597
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2597
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> dbapi.run() right now commits only when autocommit is set to true or db does 
> not support autocommit.
> This is breaking CI now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2597) dbapi hook not committing when autocommit is set to false

2018-06-11 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2597:
---

 Summary: dbapi hook not committing when autocommit is set to false
 Key: AIRFLOW-2597
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2597
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


dbapi.run() right now commits only when autocommit is set to true or db does 
not support autocommit.

This is breaking CI now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2590) dbapi hook not committing when conn does not support auto commit

2018-06-10 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2590:
---

 Summary: dbapi hook not committing when conn does not support auto 
commit
 Key: AIRFLOW-2590
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2590
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


After this commit, DbApiHook.run() will only commit when the autocommit field 
is set for the connection. For those don't support autocommit (e.g. 
sqlite_hook), the connection won't commit the query.

This is currently breaking CI (tests/core.py:CoreTest.test_check_operators).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2586) Stop getting AIRFLOW_HOME value from config file in bash operator

2018-06-09 Thread Kevin Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang updated AIRFLOW-2586:

Description: Before [this 
commit|https://github.com/apache/incubator-airflow/commit/a0deb506c070637abc3c426bc7d060e3fe6c854d#diff-30054b6fa334216ba6e66c9f07025cd2R35]
 subprocess created by bash operator inherits env vars from the parent process. 
However, it does not inherit the proper env var from the `airflow worker` 
process because we had a bug in the `sudo airflow run --raw` command. The 
commit was created to address the bug for bash operator. The bug was later on 
fixed in [this 
commit|https://github.com/apache/incubator-airflow/commit/354492bc597130f43c76e7bec4bc894fb6deb7fe]
 and thus bash operator does not need and should not get AIRFLOW_HOME value 
from the config (otherwise there might be discrepancy between the AIRFLOW_HOME 
value in the parent process and the child process).  (was: Before [this 
commit|[https://github.com/apache/incubator-airflow/commit/a0deb506c070637abc3c426bc7d060e3fe6c854d#diff-30054b6fa334216ba6e66c9f07025cd2R35]]
 subprocess created by bash operator inherits env vars from the parent process. 
However, it does not inherit the proper env var from the `airflow worker` 
process because we had a bug in the `sudo airflow run --raw` command. The 
commit was created to address the bug for bash operator. The bug was later on 
fixed in [this 
commit|[https://github.com/apache/incubator-airflow/commit/354492bc597130f43c76e7bec4bc894fb6deb7fe]]
 and thus bash operator does not need and should not get AIRFLOW_HOME value 
from the config (otherwise there might be discrepancy between the AIRFLOW_HOME 
value in the parent process and the child process).)

> Stop getting AIRFLOW_HOME value from config file in bash operator
> -
>
> Key: AIRFLOW-2586
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2586
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> Before [this 
> commit|https://github.com/apache/incubator-airflow/commit/a0deb506c070637abc3c426bc7d060e3fe6c854d#diff-30054b6fa334216ba6e66c9f07025cd2R35]
>  subprocess created by bash operator inherits env vars from the parent 
> process. However, it does not inherit the proper env var from the `airflow 
> worker` process because we had a bug in the `sudo airflow run --raw` command. 
> The commit was created to address the bug for bash operator. The bug was 
> later on fixed in [this 
> commit|https://github.com/apache/incubator-airflow/commit/354492bc597130f43c76e7bec4bc894fb6deb7fe]
>  and thus bash operator does not need and should not get AIRFLOW_HOME value 
> from the config (otherwise there might be discrepancy between the 
> AIRFLOW_HOME value in the parent process and the child process).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2586) Stop getting AIRFLOW_HOME value from config file in bash operator

2018-06-09 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2586:
---

 Summary: Stop getting AIRFLOW_HOME value from config file in bash 
operator
 Key: AIRFLOW-2586
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2586
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


Before [this 
commit|[https://github.com/apache/incubator-airflow/commit/a0deb506c070637abc3c426bc7d060e3fe6c854d#diff-30054b6fa334216ba6e66c9f07025cd2R35],]
 subprocess created by bash operator inherits env vars from the parent process. 
However, it does not inherit the proper env var from the `airflow worker` 
process because we had a bug in the `sudo airflow run --raw` command. The 
commit was created to address the bug for bash operator. The bug was later on 
fixed in [this 
commit|[https://github.com/apache/incubator-airflow/commit/354492bc597130f43c76e7bec4bc894fb6deb7fe],]
 and thus bash operator does not need and should not get AIRFLOW_HOME value 
from the config (otherwise there might be discrepancy between the AIRFLOW_HOME 
value in the parent process and the child process).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2497) Cgroup task runner doesn't pass down correct env vars

2018-05-22 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang closed AIRFLOW-2497.
---
Resolution: Fixed

> Cgroup task runner doesn't pass down correct env vars
> -
>
> Key: AIRFLOW-2497
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2497
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> From 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/task/task_runner/base_task_runner.py#L79-L84,]
>  only PYTHONPATH is propagated to the child process, which make the behavior 
> of bash task runner and cgroup task runner different as bash task runner 
> would issue a `bash -c` command that automatically pass all the env var from 
> the parent process to the subprocess. Cgroup task runner should not behave 
> different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2497) Cgroup task runner doesn't pass down correct env vars

2018-05-22 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483547#comment-16483547
 ] 

Kevin Yang commented on AIRFLOW-2497:
-

This is actually resolved in https://issues.apache.org/jira/browse/AIRFLOW-2162

> Cgroup task runner doesn't pass down correct env vars
> -
>
> Key: AIRFLOW-2497
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2497
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> From 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/task/task_runner/base_task_runner.py#L79-L84,]
>  only PYTHONPATH is propagated to the child process, which make the behavior 
> of bash task runner and cgroup task runner different as bash task runner 
> would issue a `bash -c` command that automatically pass all the env var from 
> the parent process to the subprocess. Cgroup task runner should not behave 
> different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2497) Cgroup task runner doesn't pass down correct env vars

2018-05-22 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2497:
---

Assignee: Kevin Yang

> Cgroup task runner doesn't pass down correct env vars
> -
>
> Key: AIRFLOW-2497
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2497
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> From 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/task/task_runner/base_task_runner.py#L79-L84,]
>  only PYTHONPATH is propagated to the child process, which make the behavior 
> of bash task runner and cgroup task runner different as bash task runner 
> would issue a `bash -c` command that automatically pass all the env var from 
> the parent process to the subprocess. Cgroup task runner should not behave 
> different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2497) Cgroup task runner doesn't pass down correct env vars

2018-05-20 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2497:
---

 Summary: Cgroup task runner doesn't pass down correct env vars
 Key: AIRFLOW-2497
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2497
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang


>From 
>[https://github.com/apache/incubator-airflow/blob/master/airflow/task/task_runner/base_task_runner.py#L79-L84,]
> only PYTHONPATH is propagated to the child process, which make the behavior 
>of bash task runner and cgroup task runner different as bash task runner would 
>issue a `bash -c` command that automatically pass all the env var from the 
>parent process to the subprocess. Cgroup task runner should not behave 
>different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2463) Make task instance context available for hive queries

2018-05-14 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2463:
---

 Summary: Make task instance context available for hive queries
 Key: AIRFLOW-2463
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2463
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang
Assignee: Kevin Yang


Currently hive queries run through HiveOperator() would receive task_instance 
context as hive_conf. But the context is not available when 
HiveCliHook()/HiveServer2Hook() was called through PythonOperator(), nor 
available when hive cli was called in BashOperator() nor available when 
HiveServer2Hook() was called in any operator.

Having the context available would provide users the capability to audit hive 
queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2402) Airflow 1.10 Logs UI throws oops error

2018-05-09 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2402:
---

Assignee: Kevin Yang  (was: Ramki Subramanian)

> Airflow 1.10 Logs UI throws oops error
> --
>
> Key: AIRFLOW-2402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2402
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui
>Affects Versions: 1.10.0
>Reporter: Ramki Subramanian
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> Hi,
> I am getting an error at
> [incubator-airflow/airflow/www_rbac/views.py|https://github.com/apache/incubator-airflow/blob/4d64ad4928f0188f7532936e8da6612f5ec7170d/airflow/www_rbac/views.py#L454]
> Line 454 in 
> [4d64ad4|https://github.com/apache/incubator-airflow/commit/4d64ad4928f0188f7532936e8da6612f5ec7170d]
> | |logs[i] = log.decode('utf-8')|
>  
> {{/home/user/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 
> 454, in log logs[i] = log.decode('utf-8') AttributeError: 'list' object has 
> no attribute 'decode' }}
> Not sure if someone is already looking into this, or I am missing some config?
> Branch : 1.10_test
> More Info here:
> [https://github.com/apache/incubator-airflow/commit/05e1861e24de42f9a2c649cd93041c5c744504e1#diff-77df5adb32d964f37748c4557ffb3c4c]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2402) Airflow 1.10 Logs UI throws oops error

2018-05-03 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463393#comment-16463393
 ] 

Kevin Yang commented on AIRFLOW-2402:
-

Hi [~rsubra13], I believe the problem was introduced by my previous 
[PR|[https://github.com/apache/incubator-airflow/pull/3214],] I think you can 
copy/paste the changesin the PR on www/views.py into www_rbac/views.py

 

I don't have rbac set up on my side but if you want me to do it you can assign 
this ticket to me and I'll follow through. Otherwise you can tag be in the PR, 
I'll review in on a high priority base.

 

Cheers,

Kevin Y

> Airflow 1.10 Logs UI throws oops error
> --
>
> Key: AIRFLOW-2402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2402
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication, ui
>Affects Versions: 1.10.0
>Reporter: Ramki Subramanian
>Assignee: Ramki Subramanian
>Priority: Major
> Fix For: 1.10.0
>
>
> Hi,
> I am getting an error at
> [incubator-airflow/airflow/www_rbac/views.py|https://github.com/apache/incubator-airflow/blob/4d64ad4928f0188f7532936e8da6612f5ec7170d/airflow/www_rbac/views.py#L454]
> Line 454 in 
> [4d64ad4|https://github.com/apache/incubator-airflow/commit/4d64ad4928f0188f7532936e8da6612f5ec7170d]
> | |logs[i] = log.decode('utf-8')|
>  
> {{/home/user/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 
> 454, in log logs[i] = log.decode('utf-8') AttributeError: 'list' object has 
> no attribute 'decode' }}
> Not sure if someone is already looking into this, or I am missing some config?
> Branch : 1.10_test
> More Info here:
> [https://github.com/apache/incubator-airflow/commit/05e1861e24de42f9a2c649cd93041c5c744504e1#diff-77df5adb32d964f37748c4557ffb3c4c]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-26 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454925#comment-16454925
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

[~b11c] If the case is that the set_context is not called properly then this 
will not resolve 2379.

 

The entry point of the task handler's set_context() method should be here: 
[https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L460,]
 which is called in all `airflow run` command. Not sure why it can be missed.

 

I kinda suspect the reason log not being uploaded is because of the bug getting 
fixed in this issue but I might need your help to confirm that.

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File 

[jira] [Closed] (AIRFLOW-2383) Escape colon in partition name when poking inside NamedHivePartitionSensor

2018-04-26 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang closed AIRFLOW-2383.
---
Resolution: Invalid

> Escape colon in partition name when poking inside NamedHivePartitionSensor
> --
>
> Key: AIRFLOW-2383
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2383
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> colon in NamedHivePartitionSensor is not escaping colon in the partition name 
> causing different behavior than HivePartitionSensor if there's colon in the 
> partition name. Need to escape it to `%3A`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2383) Escape colon in partition name when poking inside NamedHivePartitionSensor

2018-04-26 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454840#comment-16454840
 ] 

Kevin Yang commented on AIRFLOW-2383:
-

When using partition names, users are supposed to specify escaped values, 
closing the jira.

> Escape colon in partition name when poking inside NamedHivePartitionSensor
> --
>
> Key: AIRFLOW-2383
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2383
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> colon in NamedHivePartitionSensor is not escaping colon in the partition name 
> causing different behavior than HivePartitionSensor if there's colon in the 
> partition name. Need to escape it to `%3A`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2383) Escape colon in partition name when poking inside NamedHivePartitionSensor

2018-04-26 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2383:
---

Assignee: Kevin Yang

> Escape colon in partition name when poking inside NamedHivePartitionSensor
> --
>
> Key: AIRFLOW-2383
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2383
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> colon in NamedHivePartitionSensor is not escaping colon in the partition name 
> causing different behavior than HivePartitionSensor if there's colon in the 
> partition name. Need to escape it to `%3A`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2383) Escape colon in partition name when poking inside NamedHivePartitionSensor

2018-04-26 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2383:
---

 Summary: Escape colon in partition name when poking inside 
NamedHivePartitionSensor
 Key: AIRFLOW-2383
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2383
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang


colon in NamedHivePartitionSensor is not escaping colon in the partition name 
causing different behavior than HivePartitionSensor if there's colon in the 
partition name. Need to escape it to `%3A`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2374) Airflow fails to show logs

2018-04-26 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454663#comment-16454663
 ] 

Kevin Yang commented on AIRFLOW-2374:
-

Hi [~b11c], I think the BUG is handled in the following jira.

https://issues.apache.org/jira/browse/AIRFLOW-2363

> Airflow fails to show logs
> --
>
> Key: AIRFLOW-2374
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2374
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Berislav Lopac
>Assignee: Berislav Lopac
>Priority: Blocker
>
> When viewing a log in the webserver, the page shows a loading gif and the log 
> never appears. Looking in the Javascript console, the problem appears to be 
> error 500 when loading the {{get_logs_with_metadata}} endpoint, givving the 
> following trace:
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: airflow-nods-dev
> ---
> Traceback (most recent call last):
>   File 
> "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line 
> 113, in _read
> remote_log = self.gcs_read(remote_loc)
>   File 
> "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line 
> 131, in gcs_read
> return self.hook.download(bkt, blob).decode()
>   File "/opt/airflow/src/apache-airflow/airflow/contrib/hooks/gcs_hook.py", 
> line 107, in download
> .get_media(bucket=bucket, object=object) \
>   File "/usr/local/lib/python3.6/dist-packages/oauth2client/_helpers.py", 
> line 133, in positional_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/googleapiclient/http.py", line 
> 841, in execute
> raise HttpError(resp, content, uri=self.uri)
> googleapiclient.errors.HttpError:  https://www.googleapis.com/storage/v1/b/bucket-af/o/test-logs%2Fgeneric_transfer_single%2Ftransfer_file%2F2018-04-25T13%3A00%3A51.250983%2B00%3A00%2F1.log?alt=media
>  returned "Not Found">
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1982, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1614, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1517, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 33, in 
> reraise
> raise value
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1612, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1598, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line 
> 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/flask_login.py", line 758, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/opt/airflow/src/apache-airflow/airflow/www/utils.py", line 269, in 
> wrapper
> return f(*args, **kwargs)
>   File 

[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-26 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454406#comment-16454406
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

[~hamlinkn] Do you mind try out the change in this PR 
[https://github.com/apache/incubator-airflow/pull/3259] and see if the issue 
was resolved?

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = handler.read(ti, try_number, metadata=metadata)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py",
>  line 165, in read
>      logs[i] += log
>  TypeError: must be str, not 

[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-26 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453663#comment-16453663
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

[~jdavidh], it was a tricky one to debug, I actually think the NoneType error 
exist even before 5cb530b455be54e6b58eae19c8c10ef8f5cf955d was merged (at least 
in my naive setup with S3). That error blocks one attempt of uploading (there's 
actually multiple attempts, whenever the s3 task handler was closed) and the 
one that's not blocked got removed by 5cb530b455be54e6b58eae19c8c10ef8f5cf955d 
and I made a fix to it in the PR.

I'm going to assume the root cause is that the uploading called in atexit() was 
killed when the subprocess ended and the upload cannot finish (according to my 
debugging logs). But I believe there's more juice in this task handler closing 
issue and need some more work to be perfect. I'm gonna stop here due to 
priority change but I would be very curious to know all the details if you 
decided to dig to the end of it.

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File 

[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-25 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453189#comment-16453189
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

Seems like the orm is somehow not configured. It's less intuitive to find the 
root cause, I'll set up a S3 env on my side to debug. Sry for the any trouble 
the bug might bring.

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = handler.read(ti, try_number, metadata=metadata)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py",
>  line 165, in read
>      logs[i] += log
>  

[jira] [Created] (AIRFLOW-2373) Do not run tasks when DagRun state is not running

2018-04-25 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2373:
---

 Summary: Do not run tasks when DagRun state is not running
 Key: AIRFLOW-2373
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2373
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


Logically it might make sense to stop tasks from being started when DagRun is 
not running.

 

Note that this will affect the ability to run task from the UI, it might make 
sense to have an additional ignore option in the UI when running tasks manually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-23 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449156#comment-16449156
 ] 

Kevin Yang commented on AIRFLOW-2363:
-

Hi [~hamlinkn], I've created a PR to fix it but I think I didn't tag you 
correctly on github [https://github.com/apache/incubator-airflow/pull/3259]

Do you think it is possible for you to test it in your infra end to end? It 
might take some extra work to have S3 set up on my side and I think you might 
benefit from a faster merge.

Thank you!

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = 

[jira] [Assigned] (AIRFLOW-2363) S3 remote logging appending tuple instead of str

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-2363:
---

Assignee: Kevin Yang

> S3 remote logging appending tuple instead of str
> 
>
> Key: AIRFLOW-2363
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2363
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging
>Reporter: Kyle Hamlin
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> A recent merge into master that added support for Elasticsearch logging seems 
> to have broken S3 logging by returning a tuple instead of a string.
> [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128]
>  
> following errors thrown:
>  
> *Session NoneType error*
>  Traceback (most recent call last):
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py",
>  line 171, in s3_write
>      encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'),
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 274, in load_string
>      encrypt=encrypt)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 313, in load_bytes
>      client = self.get_conn()
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", 
> line 34, in get_conn
>      return self.get_client_type('s3')
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 151, in get_client_type
>      session, endpoint_url = self._get_credentials(region_name)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", 
> line 97, in _get_credentials
>      connection_object = self.get_connection(self.aws_conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 82, in get_connection
>      conn = random.choice(cls.get_connections(conn_id))
>    File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", 
> line 77, in get_connections
>      conns = cls._get_connections_from_db(conn_id)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 72, in wrapper
>      with create_session() as session:
>    File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
>      return next(self.gen)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 41, in create_session
>      session = settings.Session()
>  TypeError: 'NoneType' object is not callable
>  
> *TypeError must be str not tuple*
>  [2018-04-16 18:37:28,200] ERROR in app: Exception on 
> /admin/airflow/get_logs_with_metadata [GET]
>  Traceback (most recent call last):
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>      response = self.full_dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>      rv = self.handle_user_exception(e)
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>      reraise(exc_type, exc_value, tb)
>    File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, 
> in reraise
>      raise value
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>      rv = self.dispatch_request()
>    File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>      return self.view_functions[rule.endpoint](**req.view_args)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 69, in inner
>      return self._run_view(f, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 
> 368, in _run_view
>      return fn(self, *args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in 
> decorated_view
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 
> 269, in wrapper
>      return f(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 
> 74, in wrapper
>      return func(*args, **kwargs)
>    File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 770, in get_logs_with_metadata
>      logs, metadatas = handler.read(ti, try_number, metadata=metadata)
>    File 
> "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py",
>  line 165, in read
>      logs[i] += log
>  TypeError: must be str, not tuple



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1819) Fix slack operator unittest bug

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1819.
-
Resolution: Fixed

> Fix slack operator unittest bug
> ---
>
> Key: AIRFLOW-1819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> slack_operator.py unittest is failing and is not covering code paths for 
> passing in api_params.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1805) Allow to supply Slack token through connection

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1805.
-
Resolution: Fixed

> Allow to supply Slack token through connection
> --
>
> Key: AIRFLOW-1805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> To prevent passing in Slack token directly in plain text, it is safer to pass 
> in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs

2018-04-23 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-1787.
-
Resolution: Fixed

> Fix batch clear RUNNING task instance and inconsistent timestamp format bugs
> 
>
> Key: AIRFLOW-1787
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1787
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
> Fix For: 1.10.0
>
>
> * Batch clear in CRUD is not working for task instances in RUNNING state, 
> need to be fixed
> * Batch clear and set status are not working for manually triggered task 
> instances because manually triggered task instances have different execution 
> date format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2359) Add set failed for DagRun and TaskInstance in tree view

2018-04-23 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2359:
---

 Summary: Add set failed for DagRun and TaskInstance in tree view
 Key: AIRFLOW-2359
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2359
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang
Assignee: Kevin Yang


User has been requesting to add set failed in tree view for DagRun and 
TaskInstance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2202) Support filter in HiveMetastoreHook().max_partition()

2018-03-20 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-2202.
-
Resolution: Fixed

Fixed by https://github.com/apache/incubator-airflow/pull/3117

> Support filter in HiveMetastoreHook().max_partition() 
> --
>
> Key: AIRFLOW-2202
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2202
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Priority: Major
>
> Change made in https://issues.apache.org/jira/browse/AIRFLOW-2150 removed the 
> support for filter in max_partition(), which should be a valid use case. So 
> we're adding it back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2150) Use get_partition_names() instead of get_partitions() in HiveMetastoreHook().max_partition()

2018-03-20 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang resolved AIRFLOW-2150.
-
Resolution: Fixed

> Use get_partition_names() instead of get_partitions() in 
> HiveMetastoreHook().max_partition()
> 
>
> Key: AIRFLOW-2150
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2150
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> get_partitions() is extremely expensive for large tables, max_partition() 
> should be using get_partition_names() instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2150) Use get_partition_names() instead of get_partitions() in HiveMetastoreHook().max_partition()

2018-03-01 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2150 started by Kevin Yang.
---
> Use get_partition_names() instead of get_partitions() in 
> HiveMetastoreHook().max_partition()
> 
>
> Key: AIRFLOW-2150
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2150
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>Priority: Major
>
> get_partitions() is extremely expensive for large tables, max_partition() 
> should be using get_partition_names() instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2150) Use get_partition_names() instead of get_partitions() in HiveMetastoreHook().max_partition()

2018-02-26 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2150:
---

 Summary: Use get_partition_names() instead of get_partitions() in 
HiveMetastoreHook().max_partition()
 Key: AIRFLOW-2150
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2150
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


get_partitions() is extremely expensive for large tables, max_partition() 
should be using get_partition_names() instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-1805) Allow to supply Slack token through connection

2017-11-15 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254572#comment-16254572
 ] 

Kevin Yang commented on AIRFLOW-1805:
-

Bug in this issue is fixed in AIRFLOW-1819

> Allow to supply Slack token through connection
> --
>
> Key: AIRFLOW-1805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> To prevent passing in Slack token directly in plain text, it is safer to pass 
> in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1819) Fix slack operator unittest bug

2017-11-15 Thread Kevin Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254570#comment-16254570
 ] 

Kevin Yang commented on AIRFLOW-1819:
-

this jira fix bug in issue AIRFLOW-1805

> Fix slack operator unittest bug
> ---
>
> Key: AIRFLOW-1819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> slack_operator.py unittest is failing and is not covering code paths for 
> passing in api_params.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1805) Allow to supply Slack token through connection

2017-11-15 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1805 started by Kevin Yang.
---
> Allow to supply Slack token through connection
> --
>
> Key: AIRFLOW-1805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> To prevent passing in Slack token directly in plain text, it is safer to pass 
> in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1819) Fix slack operator unittest bug

2017-11-15 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1819 started by Kevin Yang.
---
> Fix slack operator unittest bug
> ---
>
> Key: AIRFLOW-1819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> slack_operator.py unittest is failing and is not covering code paths for 
> passing in api_params.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1819) Fix slack operator unittest bug

2017-11-15 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-1819:
---

Assignee: Kevin Yang

> Fix slack operator unittest bug
> ---
>
> Key: AIRFLOW-1819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> slack_operator.py unittest is failing and is not covering code paths for 
> passing in api_params.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1819) Fix slack operator unittest bug

2017-11-15 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-1819:
---

 Summary: Fix slack operator unittest bug
 Key: AIRFLOW-1819
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1819
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang


slack_operator.py unittest is failing and is not covering code paths for 
passing in api_params.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1805) Allow to supply Slack token through connection

2017-11-15 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-1805:
---

Assignee: Kevin Yang

> Allow to supply Slack token through connection
> --
>
> Key: AIRFLOW-1805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kevin Yang
>Assignee: Kevin Yang
>
> To prevent passing in Slack token directly in plain text, it is safer to pass 
> in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1805) Allow to supply Slack token through connection

2017-11-10 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-1805:
---

 Summary: Allow to supply Slack token through connection
 Key: AIRFLOW-1805
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1805
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kevin Yang


To prevent passing in Slack token directly in plain text, it is safer to pass 
in the token as 'password' through connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs

2017-11-06 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-1787:
---

 Summary: Fix batch clear RUNNING task instance and inconsistent 
timestamp format bugs
 Key: AIRFLOW-1787
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1787
 Project: Apache Airflow
  Issue Type: Bug
  Components: webserver
Reporter: Kevin Yang
Assignee: Kevin Yang


* Batch clear in CRUD is not working for task instances in RUNNING state, need 
to be fixed
* Batch clear and set status are not working for manually triggered task 
instances because manually triggered task instances have different execution 
date format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1681) Create way to batch retry task instances in the CRUD

2017-11-06 Thread Kevin Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Yang reassigned AIRFLOW-1681:
---

Assignee: Kevin Yang

> Create way to batch retry task instances in the CRUD
> 
>
> Key: AIRFLOW-1681
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1681
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>Assignee: Kevin Yang
>
> The old way to batch retry tasks was to select them on the Task Instances 
> page on the webserver and do a With Selected -> Delete.
> This no longer works as you will get overlapping task instance logs (e.g. the 
> first retry log will be placed in the same location as the first try log). We 
> need an option in the crud called With Selected -> Retry that does the same 
> thing as With Selected -> Delete but follows the logic for task clearing 
> (sets state to none, increases max_tries). Once this feature is stable With 
> Selected -> Delete should probably be removed as it leaders to bad states 
> with the logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)