When using Pulp with Qpid (default broker) there is a hard-to-reproduce deadlocking bug [0]. The bug is in Qpid not Pulp, but we are very interested in seeing it resolved.

In terms of clearing out your task operations, this will happen naturally if all the pulp workers are killed and restarted. If it's really bad you could consider running `sudo kill -9 -f celery` which kills all pulp workers.


You could also issue cancel for all outstanding tasks with pulp-admin and then kill+restart at which point your system will be empty when processes finish starting. Note that deadlocked workers usually need to be killed with SIGKILL before being restarted.

Many users never experience this problem. A few users do experience it and usually they experience it again. Several devs have tried to reproduce this but we have not been able to.

The Qpid project is aware and investigating. I believe they have some rpms that provide a new version of python-qpid which is specifically patched for this issue. I'm waiting for them to produce rpms for the different distros so that affected users can evaluate if it resolves their issue.

One other option to be aware of is that Pulp does support rabbitMQ and has not experienced this deadlocking issue. See the docs and server.conf for more info. FYI Pulp currently only tests the releases against Qpid.

[0]: https://issues.apache.org/jira/browse/QPID-7317

-Brian


On 09/21/2016 09:00 PM, Erinn Looney-Triggs wrote:
I have 52 tasks that are stuck in a waiting state with nothing in a
running state. I don't know much about pulp at this point, I am just
fighting my way through satellite in an attempt to make it stable, but
this looks a bit odd to me:

pulp-admin -u admin -p  tasks list | grep -i waiting | wc -l
52

pulp-admin -u admin -p tasks list --state running
+----------------------------------------------------------------------+
                                 Tasks
+----------------------------------------------------------------------+

No tasks found

The tasks, with the exception of one are all unit_update operations, the
remaining one is a sync operation.

I have done many restarts of the pulp processes with no luck in clearing
these out, I can kill them off of course, but I would prefer to know
what is going on here. Also chances are very good this will happen again.

Thanks,
-Erinn

The technical details:
RHEL 7.2

rpm -qa | grep pulp
pulp-katello-1.0.1-1.el7sat.noarch
rubygem-smart_proxy_pulp-1.2.2-1.el7sat.noarch
python-pulp-repoauth-2.8.3.4-1.el7sat.noarch
python-pulp-client-lib-2.8.3.4-1.el7sat.noarch
pulp-docker-plugins-2.0.1.1-1.el7sat.noarch
pulp-selinux-2.8.3.4-1.el7sat.noarch
pulp-server-2.8.3.4-1.el7sat.noarch
pulp-client-1.0-1.noarch
python-pulp-common-2.8.3.4-1.el7sat.noarch
pulp-rpm-admin-extensions-2.8.3.5-1.el7sat.noarch
python-pulp-docker-common-2.0.1.1-1.el7sat.noarch
pulp-ostree-plugins-1.1.1-2.el7sat.noarch
pulp-puppet-plugins-2.8.3.3-1.el7sat.noarch
python-pulp-bindings-2.8.3.4-1.el7sat.noarch
python-isodate-0.5.0-4.pulp.el7sat.noarch
python-pulp-streamer-2.8.3.4-1.el7sat.noarch
python-pulp-oid_validation-2.8.3.4-1.el7sat.noarch
python-pulp-agent-lib-2.8.3.4-1.el7sat.noarch
python-pulp-ostree-common-1.1.1-2.el7sat.noarch
pulp-rpm-handlers-2.8.3.5-1.el7sat.noarch
pulp-admin-client-2.8.3.4-1.el7sat.noarch
pulp-rpm-plugins-2.8.3.5-1.el7sat.noarch
python-pulp-rpm-common-2.8.3.5-1.el7sat.noarch
python-pulp-puppet-common-2.8.3.3-1.el7sat.noarch
pulp-puppet-tools-2.8.3.3-1.el7sat.noarch

ps -awfux | grep celery
root      65959  0.0  0.0 112648   972 pts/0    S+   18:57   0:00  |
              \_ grep --color=auto celery
apache    52282  0.1  0.0 685240 63396 ?        Ssl  18:46   0:00
/usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
resource_manager@%h -Q resource_manager -c 1 --events --umask 18
--pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache    52406  0.0  0.0 595524 53092 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
resource_manager@%h -Q resource_manager -c 1 --events --umask 18
--pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
apache    52424  0.1  0.0 685240 63272 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-0.pid
--heartbeat-interval=30
apache    52692  0.0  0.0 610828 56536 ?        Sl   18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-0.pid
--heartbeat-interval=30
apache    52426  0.1  0.0 684664 63404 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-1.pid
--heartbeat-interval=30
apache    52714  0.0  0.0 595524 53052 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-1.pid
--heartbeat-interval=30
apache    52428  0.1  0.0 684668 63428 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-2.pid
--heartbeat-interval=30
apache    52715  0.0  0.0 595528 53056 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-2.pid
--heartbeat-interval=30
apache    52430  0.1  0.0 684660 63236 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-3.pid
--heartbeat-interval=30
apache    52745  0.0  0.0 595520 53072 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-3.pid
--heartbeat-interval=30
apache    52432  0.1  0.0 684664 63224 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-4.pid
--heartbeat-interval=30
apache    52749  0.0  0.0 595520 53096 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-4.pid
--heartbeat-interval=30
apache    52434  0.1  0.0 684668 63388 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-5.pid
--heartbeat-interval=30
apache    52750  0.0  0.0 595528 53056 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-5.pid
--heartbeat-interval=30
apache    52436  0.1  0.0 684660 65480 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-6.pid
--heartbeat-interval=30
apache    52724  0.0  0.0 595524 55092 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-6.pid
--heartbeat-interval=30
apache    52440  0.1  0.0 684664 63364 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-7.pid
--heartbeat-interval=30
apache    52720  0.0  0.0 595524 53088 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-7.pid
--heartbeat-interval=30
apache    52444  0.1  0.0 684664 63248 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-8.pid
--heartbeat-interval=30
apache    52747  0.0  0.0 595524 53080 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-8.pid
--heartbeat-interval=30
apache    52453  0.1  0.0 684684 65432 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-9.pid
--heartbeat-interval=30
apache    52752  0.0  0.0 595516 53060 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-9.pid
--heartbeat-interval=30
apache    52459  0.1  0.0 684696 63416 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-10.pid
--heartbeat-interval=30
apache    52725  0.0  0.0 595524 53056 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-10.pid
--heartbeat-interval=30
apache    52468  0.1  0.0 684696 63400 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-11.pid
--heartbeat-interval=30
apache    52716  0.0  0.0 595524 53044 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-11.pid
--heartbeat-interval=30
apache    52472  0.1  0.0 684688 63416 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-12.pid
--heartbeat-interval=30
apache    52729  0.0  0.0 595516 53044 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-12.pid
--heartbeat-interval=30
apache    52479  0.1  0.0 684692 63424 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-13.pid
--heartbeat-interval=30
apache    52722  0.0  0.0 595520 53084 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-13.pid
--heartbeat-interval=30
apache    52486  0.1  0.0 685272 63432 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-14.pid
--heartbeat-interval=30
apache    52708  0.0  0.0 669764 54612 ?        Sl   18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-14.pid
--heartbeat-interval=30
apache    52491  0.1  0.0 684652 63360 ?        Ssl  18:46   0:01
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-15.pid
--heartbeat-interval=30
apache    52731  0.0  0.0 595516 53040 ?        S    18:46   0:00  \_
/usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
-A pulp.server.async.app -c 1 --events --umask 18
--pidfile=/var/run/pulp/reserved_resource_worker-15.pid
--heartbeat-interval=30
apache    52570  0.7  0.0 690292 44016 ?        Ssl  18:46   0:05
/usr/bin/python /usr/bin/celery beat
--app=pulp.server.async.celery_instance.celery
--scheduler=pulp.server.async.scheduler.Scheduler

_______________________________________________
Pulp-list mailing list
Pulp-list@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-list


_______________________________________________
Pulp-list mailing list
Pulp-list@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-list

Reply via email to