On the few occasions when I've had issues (usually doing something like deleting the database while a task was still running), a "redis-cli FLUSHALL" has solved my problems as well. So if ^ does not resolve things, try that.
On Mon, Apr 6, 2020 at 1:37 PM Brian Bouterse <[email protected]> wrote: > Thank you. We will look into this bug you've filed. > > I believe you can recover your current installation by canceling the tasks > stuck in the "waiting" state. To cancel use this API call > https://docs.pulpproject.org/restapi.html#operation/tasks_cancel > > Let me know if this doesn't help get your system back on track. > > Thanks, > Brian > > > On Mon, Apr 6, 2020 at 8:04 AM Bin Li (BLOOMBERG/ 120 PARK) < > [email protected]> wrote: > >> Brian, >> >> I filed a bug to track this issue "https://pulp.plan.io/issues/6449". In >> the meantime, is it possible to recover from this issue or we need to erase >> the database and reinstall? >> >> Thanks >> >> >> From: [email protected] At: 04/03/20 16:45:00 >> To: Bin Li (BLOOMBERG/ 120 PARK ) <[email protected]> >> Cc: [email protected] >> Subject: Re: [Pulp-list] Pulp 3 waiting tasks >> >> So the problematic thing I see in this output is the "resource-manager | >> 0". This tells me that Pulp's record of the task is in postgresql (and was >> never run), but RQ has lost the task from the "resource-manager" queue in >> Redis. So the next question is how did that happen? >> >> Would you be willing to file a bug and link it here so that I could try >> to reproduce it on our end? >> >> Thanks! >> Brian >> >> >> On Fri, Apr 3, 2020 at 4:35 PM Bin Li (BLOOMBERG/ 120 PARK) < >> [email protected]> wrote: >> >>> Brian, >>> >>> Here is rq info output. Thanks for look into this. >>> >>> # rq --version >>> rq, version 1.2.2 >>> >>> # rq info >>> 134692@pulpmaster |██ 7 >>> 182536@pulpmaster | 0 >>> 134343@pulpmaster |██ 7 >>> 191144@pulpmaster | 0 >>> 130945@pulpmaster |██ 7 >>> 135922@pulpmaster | 0 >>> 182528@pulpmaster | 0 >>> 182532@pulpmaster | 0 >>> 191145@pulpmaster | 0 >>> 135796@pulpmaster | 0 >>> 191148@pulpmaster | 0 >>> 191152@pulpmaster | 0 >>> 191151@pulpmaster | 0 >>> 135306@pulpmaster | 0 >>> 135679@pulpmaster | 0 >>> 182539@pulpmaster | 0 >>> 182547@pulpmaster | 0 >>> 182530@pulpmaster | 0 >>> 134332@pulpmaster |██ 7 >>> 191147@pulpmaster | 0 >>> 131701@pulpmaster |██ 5 >>> 134330@pulpmaster |██ 7 >>> 134688@pulpmaster |██ 5 >>> 182548@pulpmaster | 0 >>> 134929@pulpmaster | 0 >>> 135180@pulpmaster | 0 >>> 135503@pulpmaster | 0 >>> 182546@pulpmaster | 0 >>> 131485@pulpmaster |██ 7 >>> 131269@pulpmaster |██ 7 >>> 32603@pulpmaster | 0 >>> 191146@pulpmaster | 0 >>> 131053@pulpmaster |██ 7 >>> 134339@pulpmaster |██ 7 >>> 134336@pulpmaster |██ 7 >>> 191150@pulpmaster | 0 >>> 182542@pulpmaster | 0 >>> 182540@pulpmaster | 0 >>> 32609@pulpmaster | 0 >>> 191153@pulpmaster | 0 >>> 131593@pulpmaster |████████ 21 >>> 135051@pulpmaster | 0 >>> 134696@pulpmaster |██ 7 >>> 191149@pulpmaster | 0 >>> 131377@pulpmaster |██ 5 >>> 134694@pulpmaster |██ 7 >>> 134690@pulpmaster |██ 7 >>> 131161@pulpmaster |██ 5 >>> 136342@pulpmaster | 0 >>> 32626@pulpmaster | 0 >>> 131810@pulpmaster |██ 7 >>> 136462@pulpmaster | 0 >>> 130836@pulpmaster |██ 5 >>> resource-manager | 0 >>> 54 queues, 144 jobs total >>> >>> 191146@pulpmaster (b'pulpp-ob-581' 191146): idle 191146@pulpmaster >>> 191147@pulpmaster (b'pulpp-ob-581' 191147): idle 191147@pulpmaster >>> 191153@pulpmaster (b'pulpp-ob-581' 191153): idle 191153@pulpmaster >>> resource-manager (b'pulpp-ob-581' 187238): idle resource-manager >>> 191144@pulpmaster (b'pulpp-ob-581' 191144): idle 191144@pulpmaster >>> 191151@pulpmaster (b'pulpp-ob-581' 191151): idle 191151@pulpmaster >>> 191149@pulpmaster (b'pulpp-ob-581' 191149): idle 191149@pulpmaster >>> 191145@pulpmaster (b'pulpp-ob-581' 191145): idle 191145@pulpmaster >>> 191148@pulpmaster (b'pulpp-ob-581' 191148): idle 191148@pulpmaster >>> 191150@pulpmaster (b'pulpp-ob-581' 191150): idle 191150@pulpmaster >>> 191152@pulpmaster (b'pulpp-ob-581' 191152): idle 191152@pulpmaster >>> 11 workers, 54 queues >>> >>> Updated: 2020-04-03 16:30:23.373244 >>> >>> From: [email protected] At: 04/03/20 16:23:22 >>> To: Bin Li (BLOOMBERG/ 120 PARK ) <[email protected]> >>> Cc: [email protected] >>> Subject: Re: [Pulp-list] Pulp 3 waiting tasks >>> >>> Since the task that is stalled has a "worker" unassigned it tells me it >>> has not traveled through the resource-manager yet. All tests in Pulp3 >>> (currently) go through the resource-manager. I can see from your ps output >>> there is 1 resource-manager running (which is good), and the status API >>> agrees with that (also good). >>> >>> So what does RQ thing the situation is? Can you paste the output of `rq >>> info` please? >>> >>> Also what version of RQ are do you have installed? >>> >>> Thanks, >>> Brian >>> >>> >>> On Fri, Apr 3, 2020 at 9:39 AM Bin Li (BLOOMBERG/ 120 PARK) < >>> [email protected]> wrote: >>> >>>> Here is the more info. Log is very big. I will send you shortly. >>>> >>>> # ./sget status >>>> { >>>> "database_connection": { >>>> "connected": true >>>> }, >>>> "online_content_apps": [ >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:30.135954Z", >>>> "name": "187254@pulpp-ob-581" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:30.132849Z", >>>> "name": "187257@pulpp-ob-581" >>>> } >>>> ], >>>> "online_workers": [ >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.898377Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.796937Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/268261b9-f46d-4d37-ab47-0b50ca382637/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:19.087502Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.807418Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/4fb4d87c-2c3c-4f64-b6f3-e05d9aaf6fc0/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.498852Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.810402Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/7b15b6bd-1437-47b8-9832-0b44b326e0fa/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.798941Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.817391Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/62523740-e109-4828-bcbb-e8459c0944c5/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.598962Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.818322Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/02e33d62-797d-4797-8fdc-b999efc8cd12/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:16.685771Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.831154Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/23e2a484-a877-4083-bcd4-38a0e89fcb49/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:18.487964Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.869871Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/9e63708f-bbc0-473d-8de1-8788a1c91f51/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.898354Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.880995Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/ddd49126-5531-471a-bea1-3aab07bcf8b4/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:18.887949Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.893280Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/2ef1e562-845f-4ae7-8007-9b7db8cf73a0/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:29.798877Z", >>>> "name": "[email protected]", >>>> "pulp_created": "2020-04-02T13:36:11.917095Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/6e2cf918-af8e-4c5d-bc8f-bef3d3a83dca/" >>>> }, >>>> { >>>> "last_heartbeat": "2020-04-03T13:10:15.684710Z", >>>> "name": "resource-manager", >>>> "pulp_created": "2020-01-23T18:24:49.246717Z", >>>> "pulp_href": >>>> "/pulp/api/v3/workers/d46e4da0-9735-445b-a502-2aff7ce13ef7/" >>>> } >>>> ], >>>> "redis_connection": { >>>> "connected": true >>>> }, >>>> "storage": { >>>> "free": 32543019880448, >>>> "total": 33521607376896, >>>> "used": 978587496448 >>>> }, >>>> "versions": [ >>>> { >>>> "component": "pulpcore", >>>> "version": "3.2.1" >>>> }, >>>> { >>>> "component": "pulp_rpm", >>>> "version": "3.2.0" >>>> }, >>>> { >>>> "component": "pulp_file", >>>> "version": "0.2.0" >>>> } >>>> ] >>>> >>>> >>>> # ps -awfux |grep pulp >>>> root 180078 0.0 0.0 107992 616 pts/1 S+ Apr02 0:00 | \_ tail -f >>>> /var/log/pulp/pulp-config.log >>>> root 184836 0.0 0.0 124448 2044 pts/2 S+ Apr02 0:00 | \_ vi bbpulp3.py >>>> root 43270 0.0 0.0 112708 984 pts/3 S+ 09:11 0:00 \_ grep --color=auto >>>> pulp >>>> pulp 187224 0.0 0.0 228600 19188 ? Ss Apr02 0:04 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 >>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.app.wsgi:application >>>> --bind 127.0.0.1:24817 --access-logfile - >>>> pulp 187251 1.4 0.0 528708 109752 ? S Apr02 20:48 \_ >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 >>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.app.wsgi:application >>>> --bind 127.0.0.1:24817 --access-logfile - >>>> pulp 187231 0.0 0.0 269476 27976 ? Ss Apr02 0:05 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 >>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind >>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2 >>>> --access-logfile - >>>> pulp 187254 0.0 0.0 485860 68592 ? S Apr02 0:18 \_ >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 >>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind >>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2 >>>> --access-logfile - >>>> pulp 187257 0.0 0.0 486132 68604 ? S Apr02 0:19 \_ >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 >>>> /opt/utils/venv/pulp/3.7.3/bin/gunicorn pulpcore.content:server --bind >>>> 127.0.0.1:24816 --worker-class aiohttp.GunicornWebWorker -w 2 >>>> --access-logfile - >>>> pulp 187238 0.0 0.0 486428 71128 ? Ss Apr02 1:20 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker -n resource-manager >>>> --pid=/var/run/pulpcore-resource-manager/resource-manager.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191144 0.0 0.0 486392 71064 ? Ss Apr02 0:51 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-1/reserved-resource-worker-1.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191145 0.0 0.0 486404 71064 ? Ss Apr02 0:50 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-2/reserved-resource-worker-2.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191146 0.0 0.0 486404 71044 ? Ss Apr02 0:50 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-3/reserved-resource-worker-3.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191147 0.0 0.0 486404 71036 ? Ss Apr02 0:52 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-4/reserved-resource-worker-4.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191148 0.0 0.0 486164 71056 ? Ss Apr02 0:51 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-5/reserved-resource-worker-5.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191149 0.0 0.0 486168 71060 ? Ss Apr02 0:52 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-6/reserved-resource-worker-6.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191150 0.0 0.0 486148 71040 ? Ss Apr02 0:50 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-7/reserved-resource-worker-7.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191151 0.0 0.0 486400 71060 ? Ss Apr02 0:51 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-8/reserved-resource-worker-8.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191152 0.0 0.0 486164 71044 ? Ss Apr02 0:52 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-9/reserved-resource-worker-9.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> pulp 191153 0.0 0.0 486392 71068 ? Ss Apr02 0:52 >>>> /opt/utils/venv/pulp/3.7.3/bin/python3 /opt/utils/venv/pulp/3.7.3/bin/rq >>>> worker -w pulpcore.tasking.worker.PulpWorker >>>> --pid=/var/run/pulpcore-worker-10/reserved-resource-worker-10.pid -c >>>> pulpcore.rqconfig --disable-job-desc-logging >>>> >>>> From: [email protected] At: 04/03/20 09:05:47 >>>> To: Bin Li (BLOOMBERG/ 120 PARK ) <[email protected]> >>>> Cc: [email protected] >>>> Subject: Re: [Pulp-list] Pulp 3 waiting tasks >>>> >>>> While you are experiencing the issue, can you capture the status API >>>> output? >>>> >>>> Also can you paste an output of the workers on that system with `ps >>>> -awfux | grep pulp`. >>>> >>>> Also do you see any errors in the log? Could you share a copy of the >>>> log? >>>> >>>> On Fri, Apr 3, 2020 at 9:01 AM Bin Li (BLOOMBERG/ 120 PARK) < >>>> [email protected]> wrote: >>>> >>>>> We have been seeing many waiting tasks. They seem to be stuck forever. >>>>> e.g. >>>>> pulpp-ob-581 /home/bli4/pulp3-script # ./get >>>>> /pulp/api/v3/tasks/14b76b27-9f34-4297-88ed-5ec13cbe5e50/ >>>>> HTTP/1.1 200 OK >>>>> Allow: GET, PATCH, DELETE, HEAD, OPTIONS >>>>> Connection: keep-alive >>>>> Content-Length: 323 >>>>> Content-Type: application/json >>>>> Date: Fri, 03 Apr 2020 12:56:02 GMT >>>>> Server: nginx/1.16.1 >>>>> Vary: Accept, Cookie >>>>> X-Frame-Options: SAMEORIGIN >>>>> >>>>> { >>>>> "created_resources": [], >>>>> "error": null, >>>>> "finished_at": null, >>>>> "name": "pulpcore.app.tasks.base.general_update", >>>>> "progress_reports": [], >>>>> "pulp_created": "2020-04-02T13:00:14.881212Z", >>>>> "pulp_href": >>>>> "/pulp/api/v3/tasks/14b76b27-9f34-4297-88ed-5ec13cbe5e50/", >>>>> "reserved_resources_record": [], >>>>> "started_at": null, >>>>> "state": "waiting", >>>>> "worker": null >>>>> } >>>>> >>>>> What could be the reason for these stuck waiting tasks? How should we >>>>> troubleshot the issue? >>>>> _______________________________________________ >>>>> Pulp-list mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/pulp-list >>>> >>>> >>>> >>> >> _______________________________________________ > Pulp-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-list
_______________________________________________ Pulp-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-list
