[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932326#comment-16932326
 ] 

ASF subversion and git services commented on AIRFLOW-5447:
--

Commit b88d4c5a7721c061a6488d519be1646ae0fdddea in airflow's branch 
refs/heads/v1-10-test from Daniel Imberman
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=b88d4c5 ]

[AIRFLOW-5447] Scheduler stalls because second watcher thread in default args


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931826#comment-16931826
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

[~HPCohen] it's already in the 1-10-test branch. I'm gonna work with 
[~kaxilnaik] and [~ash] to see if we can't release hotfix 1.10.4/5 releases 
since this is such a critical bug.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931779#comment-16931779
 ] 

ASF subversion and git services commented on AIRFLOW-5447:
--

Commit 4fb29030a1f848f1cca64ad5558068509e94472e in airflow's branch 
refs/heads/v1-10-test from Daniel Imberman
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=4fb2903 ]

[AIRFLOW-5447] Scheduler stalls because second watcher thread in default args


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931681#comment-16931681
 ] 

Henry Cohen commented on AIRFLOW-5447:
--

Thank you guys so much for working on this, any idea on how long until the fix 
is out now that it's been merged?

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931476#comment-16931476
 ] 

ASF subversion and git services commented on AIRFLOW-5447:
--

Commit c098ff78508f71c038f901903f6ab587c3dc9b01 in airflow's branch 
refs/heads/master from Daniel Imberman
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=c098ff7 ]

[AIRFLOW-5447] Scheduler stalls because second watcher thread in default args



> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931475#comment-16931475
 ] 

ASF GitHub Bot commented on AIRFLOW-5447:
-

dimberman commented on pull request #6129: [AIRFLOW-5447] Scheduler stalls 
because second watcher thread in default args
URL: https://github.com/apache/airflow/pull/6129
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-17 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931309#comment-16931309
 ] 

Kaxil Naik commented on AIRFLOW-5447:
-

Wow, interesting !! Thank you so much [~cwegrzyn] and everyone.

 

This code has been around for ages, wondering the issue has been since the 
start or just from 1.10.4 [~dimberman]?

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931069#comment-16931069
 ] 

ASF GitHub Bot commented on AIRFLOW-5447:
-

dimberman commented on pull request #6128: [AIRFLOW-5447] Don't set the 
executor with a default arg
URL: https://github.com/apache/airflow/pull/6128
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931046#comment-16931046
 ] 

ASF GitHub Bot commented on AIRFLOW-5447:
-

dimberman commented on pull request #6129: [AIRFLOW-5447] Scheduler stalls 
because second watcher thread in default args
URL: https://github.com/apache/airflow/pull/6129
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931031#comment-16931031
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

Thank you so much for catching this [~cwegrzyn]. Yeah having two thread 
managers it would totally make sense for some sort of deadlocking to occur. 
Let's confirm this and then hotpatch it into 1.10.4/5 cc: [~kaxilnaik] [~ash] 
[~schnie]

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930993#comment-16930993
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:


Hmm... Confirmed. Dropping that airflow.[www.app|http://www.app/] import from 
our plugin fixed the issue. I've pushed up a 
[PR|[https://github.com/apache/airflow/pull/6128]] that I believe resolves this 
issue. Will see if I can create a minimal setup where I can reproduce the 
problem tomorrow to confirm that this really is the issue.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930992#comment-16930992
 ] 

ASF GitHub Bot commented on AIRFLOW-5447:
-

cwegrzyn commented on pull request #6128: [AIRFLOW-5447] Don't set the executor 
with a default arg
URL: https://github.com/apache/airflow/pull/6128
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5447
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The call to get a default executor was being executed at module import time. 
If this module is imported by a DAG, it can lead to two executors running in 
the scheduler, which in the case of the KubernetesExecutor can cause a 
deadlock. This defers creating the default executor to initialization.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930889#comment-16930889
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:


I noticed the two thread manager servers earlier too, and I’m currently 
exploring the hypothesis that that is the problem. It’s happening because of a 
useless import triggering a default argument creating an executor. Certainly 
seems like the sort of thing that could cause deadlocks if you have two of ‘em. 
Got sidetracked but hope to take a look later tonight.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930761#comment-16930761
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

Ok so I've broken down the currently running threads in hopes this helps us out
 

Thread 1: attempting to put a new task in the task_queue

 
{code:java}
Thread 0x7f0c13c7d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in 
    args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
    return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
    job.run()  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
    self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
    self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
    self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
    self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
    executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
    self.task_queue.put((key, command, kube_executor_config))
  File "", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
    kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
    chunk = read(handle, remaining)
  File "", line 1, in 
  File "", line 5, in 
{code}
 

Thread 2: re-reading plugins files
{code:java}
Thread 0x7f0c01c31700
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)Thread 0x7f0bff430700  File 
"/usr/local/lib/python3.7/multiprocessing/managers.py", line 201, in 
handle_request
    result = func(c, *args, **kwds)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 422, in 
accept_connection
    self.serve_client(c)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 234, in 
serve_client
    request = recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 251, in 
recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "", line 202, in _lock_unlock_module
  File "", line 98, in acquire
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 178, in 
accepter
    c = self.listener.accept()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 453, in 
accept
    c = self._listener.accept()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 598, in 
accept
    s, self._last_accepted = self._socket.accept()
  File "/usr/local/lib/python3.7/socket.py", line 212, in accept
    fd, addr = self._accept()
  File "/usr/local/airflow/.local/bin/airflow", line 21, in 
    from airflow import configuration
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 677, in _load_unlocked
  File "", line 728, in exec_module
  File "", line 219, in _call_with_frames_removed
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/__init__.py", 
line 94, in 
    operators._integrate_plugins()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/operators/__init__.py",
 line 104, in _integrate_plugins
    from airflow.plugins_manager import operators_modules
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 677, in _load_unlocked
{code}
 

Thread 3: thread manager server
{code:java}
Thread 0x7f0c13c7d700  File "", line 728, 
in exec_module
  File "", line 219, in _call_with_frames_removed
  File 
"/usr/local/airflow/.local/lib/python3

[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930754#comment-16930754
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

[~cwegrzyn] this is nuts... It might be a race condition/failure in 
multiprocessing. How many tasks are you trying to launch at once? This is 
giving me a lot to investigate with btw so thank you.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930751#comment-16930751
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

Thank you [~cwegrzyn]! That helps massively. Let me look through.

Also mentioning this issue in case there's any relation: 
https://issues.apache.org/jira/browse/AIRFLOW-5506

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930644#comment-16930644
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:


After a bit of wrestling with pyrasite and probably dumb luck, I managed to get 
what appears to be a telling stack trace:

 
{code:java}
Thread 0x7fb39d56d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in 
args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
job.run()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
self.task_queue.put((key, command, kube_executor_config))
  File "", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
chunk = read(handle, remaining)
  File "", line 1, in 
  File "", line 5, in {code}
It does seem like something is going wrong with the communication related to 
the put to the task_queue.

 

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930520#comment-16930520
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:


I'm afraid that's not our issue. We're using the helm/charts helm chart, which 
has these permissions granted. Here's the rules section of the role bound to 
the service account used by our pods (this is copied from actual deployed 
values, just in case we drifted from the chart for whatever reason):

 
{code:java}
  rules:
  - apiGroups:
- ""
resources:
- pods
verbs:
- create
- get
- delete
- list
- watch
  - apiGroups:
- ""
resources:
- pods/log
verbs:
- get
- list
  - apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- get
{code}
For what it's worth, one relevant change I made from the default config was 
overriding:
{code:java}
kube_client_request_args = {"_request_timeout" : [60,60] }{code}
I have it set to \{"_request_timeout": null}. If left with a timeout, I get a 
read timeout on the watch, which leads to "Unknown error in 
KubernetesJobWatcher". I've kubectl exec'ed into the pod, and used python and 
the python kubernetes client library to run a few calls like 
list_namespaced_pods, and it works fine. So it's not connectivity per se.

In any event, even supposing that read timeout should NOT have happened, the 
normal order of operations suggests that KubernetesExecutor#sync should call 
AirflowKubernetesScheduler#sync which should health check the job watcher and 
restart it. This does not appear to happen (which also reinforces the 
appearance that some thread is hung).

 

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-15 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930224#comment-16930224
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

[~Yuval.Itzchakov] [~cwegrzyn]Thank you guys for getting this info to us. I 
THINK this might have to do with a bug in the k8s kubernetes client which 
requires "create" and "get" privileges for "pods/exec"

[https://stackoverflow.com/questions/53827345/airflow-k8s-operator-xcom-handshake-status-403-forbidden]
[https://github.com/kubernetes-client/python/issues/690]

The reason I believe this is that this lack of running/updating of pods point 
to a failure or the KubernetesJobWatcher. When we finally started seeing 
similar problems we were seeing these failures from the JobWatcher 
[https://user-images.githubusercontent.com/1036482/64914385-2f0eca80-d71e-11e9-8f8b-44a1c8620b92.png].
 I'm going to look into this further tomorrow and get back ASAP.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-13 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929517#comment-16929517
 ] 

Chris Wegrzyn commented on AIRFLOW-5447:


I just upgraded a deployment from 1.10.2 to 1.10.5 (also had the same issues on 
1.10.4, haven't yet tried 1.10.3) and am experiencing the same issues.

 

I've tracked it down to the same line. Judging by the log messages, we get this 
log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/contrib/executors/kubernetes_executor.py#L762]

 

But we never get this log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/executors/base_executor.py#L135]

 

Here's a slightly sanitized log:

 

{{[2019-09-13 19:51:24,074] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:124}} DEBUG - 0 running task 
instances}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:125}} DEBUG - 0 in queue}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:126}} DEBUG - 32 open slots}}
{{[2019-09-13 19:51:24,075] \{{base_executor.py:135}} DEBUG - Calling the 
 sync 
method}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1459}} DEBUG - Ran scheduling 
loop in 0.01 seconds}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1462}} DEBUG - Sleeping for 
1.00 seconds}}
{{[2019-09-13 19:51:24,087] \{{settings.py:54}} INFO - Configured default 
timezone }}
{{[2019-09-13 19:51:24,093] \{{settings.py:327}} DEBUG - Failed to import 
airflow_local_settings.}}
{{Traceback (most recent call last):}}
{{ File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/settings.py", 
line 315, in import_local_settings}}
{{ import airflow_local_settings}}
{{ModuleNotFoundError: No module named 'airflow_local_settings'}}
{{[2019-09-13 19:51:24,094] \{{logging_config.py:59}} DEBUG - Unable to load 
custom logging, using default config instead}}
{{[2019-09-13 19:51:24,109] \{{settings.py:170}} DEBUG - Setting up DB 
connection pool (PID 49)}}
{{[2019-09-13 19:51:24,110] \{{settings.py:213}} INFO - 
settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, 
pool_recycle=1800, pid=49}}
{{[2019-09-13 19:51:24,295] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 55)}}
{{[2019-09-13 19:51:24,380] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 59)}}
{{[2019-09-13 19:51:25,084] \{{scheduler_job.py:1474}} DEBUG - Sleeping for 
0.99 seconds to prevent excessive logging}}
{{[2019-09-13 19:51:25,117] \{{scheduler_job.py:257}} DEBUG - Waiting for 
}}
{{[2019-09-13 19:51:25,118] \{{scheduler_job.py:257}} DEBUG - Waiting for 
}}
{{[2019-09-13 19:51:25,226] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 69)}}
{{[2019-09-13 19:51:25,278] \{{settings.py:238}} DEBUG - Disposing DB 
connection pool (PID 73)}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1390}} DEBUG - Starting 
Loop...}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results}}
{{[2019-09-13 19:51:26,076] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{scheduler_job.py:1403}} DEBUG - Harvested 1 
SimpleDAGs}}
{{[2019-09-13 19:51:26,109] \{{scheduler_job.py:921}} INFO - 1 tasks up for 
execution:}}
{{ }}
{{[2019-09-13 19:51:26,122] \{{scheduler_job.py:953}} INFO - Figuring out tasks 
to run in Pool(name=default_pool) with 128 open slots and 1 task instances 
ready to be queued}}
{{[2019-09-13 19:51:26,123] \{{scheduler_job.py:981}} INFO - DAG parse_log has 
0/16 running and queued tasks}}
{{[2019-09-13 19:51:26,132] \{{scheduler_job.py:257}} DEBUG - Waiting for 
}}
{{[2019-09-13 19:51:26,133] \{{scheduler_job.py:257}} DEBUG - Waiting for 
}}
{{[2019-09-13 19:51:26,142] \{{scheduler_job.py:1031}} INFO - Setting the 
following tasks to queued state:}}
{{ }}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1107}} INFO - Setting the 
following 1 tasks to queued state:}}
{{ }}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1143}} INFO - Sending 
('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, 
tzinfo=), 1) to executor with priority 
2 and queue default}}
{{[2019-09-13 19:51:26,158] \{{base_executor.py:59}} INFO - Adding to queue: 
['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13', 
'2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', 
'/usr/local/airflow/dags/xyz.py']}}
{{[2019-09-13 19:51:26,158] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor}}
{{[2019-09-13 19:51:26,159] \{{base_executor.py:124}} DEBUG - 0 runnin

[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-13 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929200#comment-16929200
 ] 

Kaxil Naik commented on AIRFLOW-5447:
-

We ([~dimberman] and our other colleagues at Astronomer) but we were not able 
to replicate this issue.

I suspect some issue with the environment or configuration. 

Where do you run k8s on? GKE? EKS? 

Can you share your DAG and airflow.cfg file?

It is hard to tell but we can try.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-13 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929191#comment-16929191
 ] 

Henry Cohen commented on AIRFLOW-5447:
--

If it helps, my pod running the webserver and scheduler is on a node with 5 
cpu, and 6GB memory

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-13 Thread Yuval Itzchakov (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929021#comment-16929021
 ] 

Yuval Itzchakov commented on AIRFLOW-5447:
--

py3

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928907#comment-16928907
 ] 

Henry Cohen commented on AIRFLOW-5447:
--

[~dimberman]p py3

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928901#comment-16928901
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

[~HPCohen] [~Yuval.Itzchakov] also question: Are you running py2 or py3? 

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928804#comment-16928804
 ] 

Henry Cohen commented on AIRFLOW-5447:
--

This line in particular is what lead me through my investigation to 
https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761:
{noformat}
[2019-09-12 17:56:05,186] kubernetes_executor.py:764 INFO - Add task 
('example_subdag_operator', 'start', datetime.datetime(2019, 9, 10, 0, 0, 
tzinfo=), 1) with command ['airflow', 
'run', 'example_subdag_operator', 'start', '2019-09-10T00:00:00+00:00', 
'--local', '--pool', 'default_pool', '-sd', 
'/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_subdag_operator.py']
 with executor_config {}{noformat}

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928788#comment-16928788
 ] 

Henry Cohen commented on AIRFLOW-5447:
--

```[2019-09-12 17:56:03,034] \{{kubernetes_executor.py:698}} INFO - 
TaskInstance:  found in queued state but was not launched, rescheduling
[2019-09-12 17:56:03,043] \{{scheduler_job.py:1376}} INFO - Resetting orphaned 
tasks for active dag runs
[2019-09-12 17:56:03,085] \{{base_job.py:308}} INFO - Reset the following 30 
TaskInstances:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[2019-09-12 17:56:03,092] \{{dag_processing.py:545}} INFO - Launched 
DagFileProcessorManager with pid: 35
[2019-09-12 17:56:03,093] \{{scheduler_job.py:1390}} DEBUG - Starting Loop...
[2019-09-12 17:56:03,093] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results
[2019-09-12 17:56:03,093] \{{scheduler_job.py:1403}} DEBUG - Harvested 0 
SimpleDAGs
[2019-09-12 17:56:03,093] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor
[2019-09-12 17:56:03,093] \{{base_executor.py:124}} DEBUG - 0 running task 
instances
[2019-09-12 17:56:03,094] \{{base_executor.py:125}} DEBUG - 0 in queue
[2019-09-12 17:56:03,094] \{{base_executor.py:126}} DEBUG - 96 open slots
[2019-09-12 17:56:03,094] \{{base_executor.py:135}} DEBUG - Calling the  sync method
[2019-09-12 17:56:03,100] \{{scheduler_job.py:1459}} DEBUG - Ran scheduling 
loop in 0.01 seconds
[2019-09-12 17:56:03,101] \{{scheduler_job.py:1462}} DEBUG - Sleeping for 1.00 
seconds
[2019-09-12 17:56:03,107] \{{settings.py:54}} INFO - Configured default 
timezone 
[2019-09-12 17:56:03,109] \{{settings.py:327}} DEBUG - Failed to import 
airflow_local_settings.
Traceback (most recent call last):
 File "/usr/local/lib/python3.7/site-packages/airflow/settings.py", line 315, 
in import_local_settings
 import airflow_local_settings
ModuleNotFoundError: No module named 'airflow_local_settings'
[2019-09-12 17:56:03,111] \{{logging_config.py:47}} INFO - Successfully 
imported user-defined logging config from log_config.LOGGING_CONFIG
[2019-09-12 17:56:03,120] \{{settings.py:170}} DEBUG - Setting up DB connection 
pool (PID 35)
[2019-09-12 17:56:03,121] \{{settings.py:213}} INFO - settings.configure_orm(): 
Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=35
[2019-09-12 17:56:03,289] \{{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 45)
[2019-09-12 17:56:03,356] \{{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 41)
[2019-09-12 17:56:04,101] \{{scheduler_job.py:1474}} DEBUG - Sleeping for 0.99 
seconds to prevent excessive logging
[2019-09-12 17:56:04,126] \{{scheduler_job.py:257}} DEBUG - Waiting for 

[2019-09-12 17:56:04,127] \{{scheduler_job.py:257}} DEBUG - Waiting for 

[2019-09-12 17:56:04,162] \{{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 55)
[2019-09-12 17:56:04,223] \{{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 58)
[2019-09-12 17:56:05,095] \{{scheduler_job.py:1390}} DEBUG - Starting Loop...
[2019-09-12 17:56:05,095] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results
[2019-09-12 17:56:05,097] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
[2019-09-12 17:56:05,098] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
[2019-09-12 17:56:05,098] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
[2019-09-12 17:56:05,099] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
[2019-09-12 17:56:05,099] \{{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
[2019-09-12 17:56:05,100] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
[2019-09-12 17:56:05,101] \{{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
[2019-09-12 17:56:05,101] \{{scheduler_job.py:1403}} DEBUG - Harvested 4 
SimpleDAGs
[2019-09-12 17:56:05,128] \{{scheduler_job.py:921}} INFO - 5 tasks up for 
execution:
 
 
 
 
 
[2019-09-12 17:56:05,138] \{{scheduler_job.py:953}} INFO - Figuring out tasks 
to run in Pool(name=default_pool) with 128 open slots and 5 task instances 
ready to be queued
[2019-09-12 17:56:05,139] \{{scheduler_job.py:981}} INFO - DAG 
example_subdag_operator has 0/48 running and queued tasks
[2019-09-12 17:56:05,139] \{{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 0/48 running and queued tasks
[2019-09-12 17:56:05,139] \{{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 1/48 running and queued tasks
[2019-09-12 17:56:05,139] \{{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 2/48 running and queued tasks
[2019-09-12 17:56:05,139] \{{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 3/48 running and queued tasks
[2019-09-12 17:56:05,139] \{{scheduler_job.py:257}} DEBUG - Waiting for 

[2019-09-12 17:56:05,140] \{{sched

[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Kaxil Naik (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928757#comment-16928757
 ] 

Kaxil Naik commented on AIRFLOW-5447:
-

Yes please add some stack trace and we will see at our end if we can replicate 
it.


> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928741#comment-16928741
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

[~Yuval.Itzchakov] [~HPCohen] could you guys please post either logs and/or a 
breaking DAG? I'm going to look into this.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928738#comment-16928738
 ] 

Daniel Imberman commented on AIRFLOW-5447:
--

Oof thank you for bringing this to my attention. cc: [~kaxilnaik] [~ash]

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-11 Thread Yuval Itzchakov (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927961#comment-16927961
 ] 

Yuval Itzchakov commented on AIRFLOW-5447:
--

Can confirm I've experienced the exact same behavior. Rolling back to 1.10.3 
works.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)