[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2020-03-27 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069129#comment-17069129
 ] 

Daniel Imberman commented on AIRFLOW-401:
-

This issue has been moved to https://github.com/apache/airflow/issues/7935

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2020-03-26 Thread Ash Berlin-Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067824#comment-17067824
 ] 

Ash Berlin-Taylor commented on AIRFLOW-401:
---

[~marcin.kuthan] what is the full command line of the process that is stuck at 
100100% CPU? If you wait a few minutes do you get a warning in the UI about the 
scheduler not running/heartbeating?

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2020-03-26 Thread Marcin Kuthan (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067815#comment-17067815
 ] 

Marcin Kuthan commented on AIRFLOW-401:
---

The same issue for 1.10.9. No traces in the logs, CPU 100%.

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-12-26 Thread Anuj (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003592#comment-17003592
 ] 

Anuj commented on AIRFLOW-401:
--

Facing the same issue in airflow 1.10.4.Scheduler gets stuck with 100% CPU 
usage.Using airflow with celery executor.

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-09-18 Thread Ash Berlin-Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932357#comment-16932357
 ] 

Ash Berlin-Taylor commented on AIRFLOW-401:
---

[~ms-nmcalabroso] The problem you are seeing there may be caused by 
AIRFLOW-5447 (which affects Kube and Local executors), the fix for which and 
will be released soon as 1.10.6

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-09-18 Thread Neil Calabroso (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932286#comment-16932286
 ] 

Neil Calabroso commented on AIRFLOW-401:


Currently experiencing this issue in `Ubuntu 14.04` using `python 3.6.8`. This 
started when we upgraded our staging environment from `1.10.1` to `1.10.4`. 
We're using `LocalExecutor` and the process is handled by upstart.

I'm also getting the issue in the Web UI:  The scheduler does not appear to be 
running. Last heartbeat was received 9 minutes ago.

For this sample, I got 3 stuck processes:

 
{code:java}
root@airflow-staging/home/ubuntu# ps aux | grep scheduler
airflow  21595  0.2  1.3 469868 109976 ?   S09:52   0:04 
/usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5
airflow  21602  0.0  1.1 1500268 95992 ?   Tl   09:52   0:00 
/usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5
airflow  21648  0.0  1.1 467796 94628 ?S09:52   0:00 
/usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5
root 25735  0.0  0.0  10472   920 pts/3S+   10:24   0:00 grep 
--color=auto scheduler
{code}
 

Running py-spy to each process gives

 
{code:java}
Collecting samples from 'pid: 21595' (python v3.6.8)
Total Samples 500
GIL: 0.00%, Active: 100.00%, Threads: 1  %Own   %Total  OwnTime  TotalTime  
Function (filename:line)
100.00% 100.00%5.00s 5.00s   _recv (multiprocessing/connection.py:379)
  0.00% 100.00%   0.000s 5.00s   wrapper (airflow/utils/cli.py:74)
  0.00% 100.00%   0.000s 5.00s   scheduler (airflow/bin/cli.py:1013)
  0.00% 100.00%   0.000s 5.00s   end 
(airflow/executors/local_executor.py:233)
  0.00% 100.00%   0.000s 5.00s(airflow:32)
  0.00% 100.00%   0.000s 5.00s   recv (multiprocessing/connection.py:250)
  0.00% 100.00%   0.000s 5.00s   _execute 
(airflow/jobs/scheduler_job.py:1323)
  0.00% 100.00%   0.000s 5.00s   end 
(airflow/executors/local_executor.py:212)
  0.00% 100.00%   0.000s 5.00s   _callmethod 
(multiprocessing/managers.py:757)
  0.00% 100.00%   0.000s 5.00s   join (:2)
  0.00% 100.00%   0.000s 5.00s   _recv_bytes 
(multiprocessing/connection.py:407)
  0.00% 100.00%   0.000s 5.00s   _execute_helper 
(airflow/jobs/scheduler_job.py:1463)
  0.00% 100.00%   0.000s 5.00s   run (airflow/jobs/base_job.py:213){code}
 
{code:java}
root@airflow-staging:/home/ubuntu# py-spy --pid 21602
Error: Failed to suspend process
Reason: EPERM: Operation not permitted{code}
 
{code:java}
Collecting samples from 'pid: 21648' (python v3.6.8)
Total Samples 28381
GIL: 0.00%, Active: 100.00%, Threads: 1  %Own   %Total  OwnTime  TotalTime  
Function (filename:line)
100.00% 100.00%   283.8s283.8s   _try_wait (subprocess.py:1424)
  0.00% 100.00%   0.000s283.8s   call (subprocess.py:289)
  0.00% 100.00%   0.000s283.8s   start 
(airflow/executors/local_executor.py:184)
  0.00% 100.00%   0.000s283.8s   wrapper (airflow/utils/cli.py:74)
  0.00% 100.00%   0.000s283.8s   _bootstrap (multiprocessing/process.py:258)
  0.00% 100.00%   0.000s283.8s   _execute_helper 
(airflow/jobs/scheduler_job.py:1347)
  0.00% 100.00%   0.000s283.8s   execute_work 
(airflow/executors/local_executor.py:86)
  0.00% 100.00%   0.000s283.8s(airflow:32)
  0.00% 100.00%   0.000s283.8s   _launch (multiprocessing/popen_fork.py:73)
  0.00% 100.00%   0.000s283.8s   run (airflow/jobs/base_job.py:213)
  0.00% 100.00%   0.000s283.8s   check_call (subprocess.py:306)
  0.00% 100.00%   0.000s283.8s   start (multiprocessing/process.py:105)
  0.00% 100.00%   0.000s283.8s   run 
(airflow/executors/local_executor.py:116)
  0.00% 100.00%   0.000s283.8s   wait (subprocess.py:1477)
  0.00% 100.00%   0.000s283.8s   scheduler (airflow/bin/cli.py:1013)
  0.00% 100.00%   0.000s283.8s   _Popen (multiprocessing/context.py:277)
  0.00% 100.00%   0.000s283.8s   _Popen (multiprocessing/context.py:223)
  0.00% 100.00%   0.000s283.8s   start 
(airflow/executors/local_executor.py:224)
  0.00% 100.00%   0.000s283.8s   _execute 
(airflow/jobs/scheduler_job.py:1323)
  0.00% 100.00%   0.000s283.8s   __init__ (multiprocessing/popen_fork.py:19)
{code}
 

We will try to downgrade to `1.10.3` first and see if this problem persists.

 

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. 

[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-08-20 Thread Roland de Boo (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911106#comment-16911106
 ] 

Roland de Boo commented on AIRFLOW-401:
---

Airflow 1.10.4, running on Kubernetes (Azure AKS).

Scheduler gets stuck, portal shows message "{{The scheduler does not appear to 
be running. Last heartbeat was received 9 minutes ago."}}

{{py-spy output:}}

 
{code:java}

Total Samples 40100
GIL: 0.00%, Active: 100.00%, Threads: 1  %Own   %Total  OwnTime  TotalTime  
Function (filename:line)
100.00% 100.00%   29.57s29.57s   poll (multiprocessing/popen_fork.py:28)
  0.00%   0.00%   0.000s0.030s   _decref (multiprocessing/managers.py:809)
  0.00%   0.00%   0.000s0.040s   one_or_none (sqlalchemy/orm/query.py:3261)
  0.00%   0.00%   0.000s0.030s   answer_challenge 
(multiprocessing/connection.py:732)
  0.00%   0.00%   371.3s371.3s   recv_into (OpenSSL/SSL.py:1821)
  0.00%   0.00%   0.000s0.020s   __missing__ 
(sqlalchemy/orm/path_registry.py:303)
  0.00%   0.00%   0.000s0.030s   wait_procs (psutil/__init__.py:1667)
  0.00%   0.00%   0.000s0.020s   _emit_update_statements 
(sqlalchemy/orm/persistence.py:996)
  0.00%   0.00%   0.000s0.020s   setup (sqlalchemy/orm/interfaces.py:547)
  0.00%   0.00%   0.000s0.020s   save_obj 
(sqlalchemy/orm/persistence.py:236)
  0.00%   0.00%   0.000s0.020s   flush (sqlalchemy/orm/session.py:2459)
  0.00%   0.00%   0.000s0.030s   recv_bytes 
(multiprocessing/connection.py:216)
  0.00%   0.00%   0.000s371.3s   request (kubernetes/client/rest.py:166)
  0.00%   0.00%   0.000s0.010s   __iter__ (sqlalchemy/orm/query.py:3334)
  0.00%   0.00%   0.000s0.010s   connection (sqlalchemy/orm/session.py:1124)
  0.00%   0.00%   0.000s0.010s   _contextual_connect 
(sqlalchemy/engine/base.py:2231)
  0.00%   0.00%   0.000s0.010s   _connection_from_session 
(sqlalchemy/orm/query.py:3349)
  0.00%   0.00%   0.000s0.020s   execute (sqlalchemy/orm/unitofwork.py:422)
  0.00%   0.00%   0.000s0.010s   ping_connection 
(airflow/utils/sqlalchemy.py:70)
  0.00%   0.00%   0.000s371.3s   POST (kubernetes/client/rest.py:266)
  0.00%   0.00%   0.000s0.010s   __call__ (sqlalchemy/event/attr.py:297)
  0.00%   0.00%   0.000s0.010s   __init__ (sqlalchemy/engine/base.py:125)
  0.00%   0.00%   0.000s0.010s   on_terminate (airflow/utils/helpers.py:297)
  0.00%   0.00%   0.000s0.030s   end (airflow/utils/dag_processing.py:691)
  0.00%   0.00%   0.000s0.020s   wait (psutil/__init__.py:1384)
  0.00%   0.00%   0.000s0.010s   check_gone (psutil/__init__.py:1640)
  0.00%   0.00%   0.000s371.3s   __call_api 
(kubernetes/client/api_client.py:168)
  0.00%   0.00%   0.000s0.010s   makeRecord (logging/__init__.py:1413)
  0.00%   0.00%   0.000s0.020s   commit (sqlalchemy/orm/session.py:494)
  0.00%   0.00%   0.000s371.3s   begin (http/client.py:307)
  0.00%   0.00%   0.000s0.020s   check_gone (psutil/__init__.py:1632)
  0.00%   0.00%   0.000s0.020s   wait (psutil/_pslinux.py:1719)
  0.00%   0.00%   0.000s0.030s   Client (multiprocessing/connection.py:493)
  0.00%   0.00%   0.010s0.010s   do_commit 
(sqlalchemy/engine/default.py:505)
  0.00%   0.00%   0.000s0.010s   _connection_for_bind 
(sqlalchemy/orm/session.py:1130)
  0.00%   0.00%   0.000s0.020s   _get_context_loader 
(sqlalchemy/orm/interfaces.py:521)
  0.00%   0.00%   0.000s0.010s   _get_bind_args 
(sqlalchemy/orm/query.py:3371)
  0.00%   0.00%   0.020s0.020s   check_timeout (psutil/_psposix.py:71)
  0.00%   0.00%   0.000s0.010s   _do_commit (sqlalchemy/engine/base.py:1747)
  0.00%   0.00%   0.030s0.030s   _recv (multiprocessing/connection.py:379)
  0.00%   0.00%   0.000s0.040s   load_on_pk_identity 
(sqlalchemy/orm/loading.py:282)
  0.00%   0.00%   0.000s0.010s   _commit_impl 
(sqlalchemy/engine/base.py:761)
  0.00%   0.00%   0.000s0.010s   commit (sqlalchemy/orm/session.py:498)
  0.00%   0.00%   0.000s371.3s   readinto (socket.py:586)
 {code}
 

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually 

[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-07-24 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891918#comment-16891918
 ] 

Ash Berlin-Taylor commented on AIRFLOW-401:
---

When someone next reproduces this please run https://github.com/benfred/py-spy 
against the stuck process and give us the report -- it may help us track down 
where/why it is getting stuck

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-07-04 Thread msempere (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878443#comment-16878443
 ] 

msempere commented on AIRFLOW-401:
--

I'm facing the same issue on *1.10.3*. Scheduler stuck, no errors, and 100%CPU 
(memory used looks ok) on scheduler process. Worth to mention a restart on the 
scheduler fixes temporary the issue, that happens constantly after 6-8 hours.

This wasn't happening before when running the same version with *Redis* as 
*resusts backend*, and started happening when we changed the backend to *MySQL*.

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-07-02 Thread Jia-Liang, GUO (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876762#comment-16876762
 ] 

Jia-Liang, GUO commented on AIRFLOW-401:


Also facing the same problem in version 1.10.3.

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-06-17 Thread Xiao Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866128#comment-16866128
 ] 

Xiao Zhu commented on AIRFLOW-401:
--

Facing this in 1.10.0. Yep when it happens we just restart the scheduler :| 
I'm looking for its real cause...

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2019-05-11 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837834#comment-16837834
 ] 

t oo commented on AIRFLOW-401:
--

facing this in 1.10.3, are folks stop/starting scheduler regularly as 
workaround?

> scheduler gets stuck without a trace
> 
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Nadeem Ahmed Nazeer
>Assignee: Bolke de Bruin
>Priority: Minor
>  Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)