[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069129#comment-17069129 ] Daniel Imberman commented on AIRFLOW-401: - This issue has been moved to https://github.com/apache/airflow/issues/7935 > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Ash Berlin-Taylor >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067824#comment-17067824 ] Ash Berlin-Taylor commented on AIRFLOW-401: --- [~marcin.kuthan] what is the full command line of the process that is stuck at 100100% CPU? If you wait a few minutes do you get a warning in the UI about the scheduler not running/heartbeating? > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Ash Berlin-Taylor >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067815#comment-17067815 ] Marcin Kuthan commented on AIRFLOW-401: --- The same issue for 1.10.9. No traces in the logs, CPU 100%. > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Ash Berlin-Taylor >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003592#comment-17003592 ] Anuj commented on AIRFLOW-401: -- Facing the same issue in airflow 1.10.4.Scheduler gets stuck with 100% CPU usage.Using airflow with celery executor. > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Ash Berlin-Taylor >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932357#comment-16932357 ] Ash Berlin-Taylor commented on AIRFLOW-401: --- [~ms-nmcalabroso] The problem you are seeing there may be caused by AIRFLOW-5447 (which affects Kube and Local executors), the fix for which and will be released soon as 1.10.6 > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932286#comment-16932286 ] Neil Calabroso commented on AIRFLOW-401: Currently experiencing this issue in `Ubuntu 14.04` using `python 3.6.8`. This started when we upgraded our staging environment from `1.10.1` to `1.10.4`. We're using `LocalExecutor` and the process is handled by upstart. I'm also getting the issue in the Web UI: The scheduler does not appear to be running. Last heartbeat was received 9 minutes ago. For this sample, I got 3 stuck processes: {code:java} root@airflow-staging/home/ubuntu# ps aux | grep scheduler airflow 21595 0.2 1.3 469868 109976 ? S09:52 0:04 /usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5 airflow 21602 0.0 1.1 1500268 95992 ? Tl 09:52 0:00 /usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5 airflow 21648 0.0 1.1 467796 94628 ?S09:52 0:00 /usr/bin/python3.6 /usr/local/bin/airflow scheduler -n 5 root 25735 0.0 0.0 10472 920 pts/3S+ 10:24 0:00 grep --color=auto scheduler {code} Running py-spy to each process gives {code:java} Collecting samples from 'pid: 21595' (python v3.6.8) Total Samples 500 GIL: 0.00%, Active: 100.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 100.00% 100.00%5.00s 5.00s _recv (multiprocessing/connection.py:379) 0.00% 100.00% 0.000s 5.00s wrapper (airflow/utils/cli.py:74) 0.00% 100.00% 0.000s 5.00s scheduler (airflow/bin/cli.py:1013) 0.00% 100.00% 0.000s 5.00s end (airflow/executors/local_executor.py:233) 0.00% 100.00% 0.000s 5.00s(airflow:32) 0.00% 100.00% 0.000s 5.00s recv (multiprocessing/connection.py:250) 0.00% 100.00% 0.000s 5.00s _execute (airflow/jobs/scheduler_job.py:1323) 0.00% 100.00% 0.000s 5.00s end (airflow/executors/local_executor.py:212) 0.00% 100.00% 0.000s 5.00s _callmethod (multiprocessing/managers.py:757) 0.00% 100.00% 0.000s 5.00s join (:2) 0.00% 100.00% 0.000s 5.00s _recv_bytes (multiprocessing/connection.py:407) 0.00% 100.00% 0.000s 5.00s _execute_helper (airflow/jobs/scheduler_job.py:1463) 0.00% 100.00% 0.000s 5.00s run (airflow/jobs/base_job.py:213){code} {code:java} root@airflow-staging:/home/ubuntu# py-spy --pid 21602 Error: Failed to suspend process Reason: EPERM: Operation not permitted{code} {code:java} Collecting samples from 'pid: 21648' (python v3.6.8) Total Samples 28381 GIL: 0.00%, Active: 100.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 100.00% 100.00% 283.8s283.8s _try_wait (subprocess.py:1424) 0.00% 100.00% 0.000s283.8s call (subprocess.py:289) 0.00% 100.00% 0.000s283.8s start (airflow/executors/local_executor.py:184) 0.00% 100.00% 0.000s283.8s wrapper (airflow/utils/cli.py:74) 0.00% 100.00% 0.000s283.8s _bootstrap (multiprocessing/process.py:258) 0.00% 100.00% 0.000s283.8s _execute_helper (airflow/jobs/scheduler_job.py:1347) 0.00% 100.00% 0.000s283.8s execute_work (airflow/executors/local_executor.py:86) 0.00% 100.00% 0.000s283.8s(airflow:32) 0.00% 100.00% 0.000s283.8s _launch (multiprocessing/popen_fork.py:73) 0.00% 100.00% 0.000s283.8s run (airflow/jobs/base_job.py:213) 0.00% 100.00% 0.000s283.8s check_call (subprocess.py:306) 0.00% 100.00% 0.000s283.8s start (multiprocessing/process.py:105) 0.00% 100.00% 0.000s283.8s run (airflow/executors/local_executor.py:116) 0.00% 100.00% 0.000s283.8s wait (subprocess.py:1477) 0.00% 100.00% 0.000s283.8s scheduler (airflow/bin/cli.py:1013) 0.00% 100.00% 0.000s283.8s _Popen (multiprocessing/context.py:277) 0.00% 100.00% 0.000s283.8s _Popen (multiprocessing/context.py:223) 0.00% 100.00% 0.000s283.8s start (airflow/executors/local_executor.py:224) 0.00% 100.00% 0.000s283.8s _execute (airflow/jobs/scheduler_job.py:1323) 0.00% 100.00% 0.000s283.8s __init__ (multiprocessing/popen_fork.py:19) {code} We will try to downgrade to `1.10.3` first and see if this problem persists. > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error.
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911106#comment-16911106 ] Roland de Boo commented on AIRFLOW-401: --- Airflow 1.10.4, running on Kubernetes (Azure AKS). Scheduler gets stuck, portal shows message "{{The scheduler does not appear to be running. Last heartbeat was received 9 minutes ago."}} {{py-spy output:}} {code:java} Total Samples 40100 GIL: 0.00%, Active: 100.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 100.00% 100.00% 29.57s29.57s poll (multiprocessing/popen_fork.py:28) 0.00% 0.00% 0.000s0.030s _decref (multiprocessing/managers.py:809) 0.00% 0.00% 0.000s0.040s one_or_none (sqlalchemy/orm/query.py:3261) 0.00% 0.00% 0.000s0.030s answer_challenge (multiprocessing/connection.py:732) 0.00% 0.00% 371.3s371.3s recv_into (OpenSSL/SSL.py:1821) 0.00% 0.00% 0.000s0.020s __missing__ (sqlalchemy/orm/path_registry.py:303) 0.00% 0.00% 0.000s0.030s wait_procs (psutil/__init__.py:1667) 0.00% 0.00% 0.000s0.020s _emit_update_statements (sqlalchemy/orm/persistence.py:996) 0.00% 0.00% 0.000s0.020s setup (sqlalchemy/orm/interfaces.py:547) 0.00% 0.00% 0.000s0.020s save_obj (sqlalchemy/orm/persistence.py:236) 0.00% 0.00% 0.000s0.020s flush (sqlalchemy/orm/session.py:2459) 0.00% 0.00% 0.000s0.030s recv_bytes (multiprocessing/connection.py:216) 0.00% 0.00% 0.000s371.3s request (kubernetes/client/rest.py:166) 0.00% 0.00% 0.000s0.010s __iter__ (sqlalchemy/orm/query.py:3334) 0.00% 0.00% 0.000s0.010s connection (sqlalchemy/orm/session.py:1124) 0.00% 0.00% 0.000s0.010s _contextual_connect (sqlalchemy/engine/base.py:2231) 0.00% 0.00% 0.000s0.010s _connection_from_session (sqlalchemy/orm/query.py:3349) 0.00% 0.00% 0.000s0.020s execute (sqlalchemy/orm/unitofwork.py:422) 0.00% 0.00% 0.000s0.010s ping_connection (airflow/utils/sqlalchemy.py:70) 0.00% 0.00% 0.000s371.3s POST (kubernetes/client/rest.py:266) 0.00% 0.00% 0.000s0.010s __call__ (sqlalchemy/event/attr.py:297) 0.00% 0.00% 0.000s0.010s __init__ (sqlalchemy/engine/base.py:125) 0.00% 0.00% 0.000s0.010s on_terminate (airflow/utils/helpers.py:297) 0.00% 0.00% 0.000s0.030s end (airflow/utils/dag_processing.py:691) 0.00% 0.00% 0.000s0.020s wait (psutil/__init__.py:1384) 0.00% 0.00% 0.000s0.010s check_gone (psutil/__init__.py:1640) 0.00% 0.00% 0.000s371.3s __call_api (kubernetes/client/api_client.py:168) 0.00% 0.00% 0.000s0.010s makeRecord (logging/__init__.py:1413) 0.00% 0.00% 0.000s0.020s commit (sqlalchemy/orm/session.py:494) 0.00% 0.00% 0.000s371.3s begin (http/client.py:307) 0.00% 0.00% 0.000s0.020s check_gone (psutil/__init__.py:1632) 0.00% 0.00% 0.000s0.020s wait (psutil/_pslinux.py:1719) 0.00% 0.00% 0.000s0.030s Client (multiprocessing/connection.py:493) 0.00% 0.00% 0.010s0.010s do_commit (sqlalchemy/engine/default.py:505) 0.00% 0.00% 0.000s0.010s _connection_for_bind (sqlalchemy/orm/session.py:1130) 0.00% 0.00% 0.000s0.020s _get_context_loader (sqlalchemy/orm/interfaces.py:521) 0.00% 0.00% 0.000s0.010s _get_bind_args (sqlalchemy/orm/query.py:3371) 0.00% 0.00% 0.020s0.020s check_timeout (psutil/_psposix.py:71) 0.00% 0.00% 0.000s0.010s _do_commit (sqlalchemy/engine/base.py:1747) 0.00% 0.00% 0.030s0.030s _recv (multiprocessing/connection.py:379) 0.00% 0.00% 0.000s0.040s load_on_pk_identity (sqlalchemy/orm/loading.py:282) 0.00% 0.00% 0.000s0.010s _commit_impl (sqlalchemy/engine/base.py:761) 0.00% 0.00% 0.000s0.010s commit (sqlalchemy/orm/session.py:498) 0.00% 0.00% 0.000s371.3s readinto (socket.py:586) {code} > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891918#comment-16891918 ] Ash Berlin-Taylor commented on AIRFLOW-401: --- When someone next reproduces this please run https://github.com/benfred/py-spy against the stuck process and give us the report -- it may help us track down where/why it is getting stuck > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878443#comment-16878443 ] msempere commented on AIRFLOW-401: -- I'm facing the same issue on *1.10.3*. Scheduler stuck, no errors, and 100%CPU (memory used looks ok) on scheduler process. Worth to mention a restart on the scheduler fixes temporary the issue, that happens constantly after 6-8 hours. This wasn't happening before when running the same version with *Redis* as *resusts backend*, and started happening when we changed the backend to *MySQL*. > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876762#comment-16876762 ] Jia-Liang, GUO commented on AIRFLOW-401: Also facing the same problem in version 1.10.3. > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866128#comment-16866128 ] Xiao Zhu commented on AIRFLOW-401: -- Facing this in 1.10.0. Yep when it happens we just restart the scheduler :| I'm looking for its real cause... > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executors, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace
[ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837834#comment-16837834 ] t oo commented on AIRFLOW-401: -- facing this in 1.10.3, are folks stop/starting scheduler regularly as workaround? > scheduler gets stuck without a trace > > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executor, scheduler >Affects Versions: 1.7.1.3 >Reporter: Nadeem Ahmed Nazeer >Assignee: Bolke de Bruin >Priority: Minor > Labels: celery, kombu > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, > scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU > usage of scheduler service is at 100%. No jobs get submitted and everything > comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the > scheduler service. But again, after running some tasks it gets stuck. I've > tried with both Celery and Local executors but same issue occurs. I am using > the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)