I have also observed hanging syncs, similar to those described here. I
hadn't got around to documenting it property and complaining yet, although
I did rant about something similar on pulp-list in mid January 2015.
I have observed sync hangs since I commissioned my pulp system in December
2014 with pulp 2.5.1. I have 760 repos, including all kinds of variants of
rhel 5-7, epel, elrepo, pulp, ... and I am not done yet. But the sync hangs
are really getting me down.
I use a script to initialize my repos, exploiting the similarities in the
variants of the repos. I can send in the script if it helps to reproduce
things.
Thanks for describing the way that you restart the whole system. You have
included more services in the list than I have tried.
Regards,
Ben Stanley.
On 31 January 2015 10:10:06 AM "Pier, Bryce" <[email protected]> wrote:
I’ve been having a lot of trouble with my pulp server lately related to rpm
repo syncs hanging/stalling. I thought the issue might have been related to
the 2.6 beta build I was running because it fixed my bug (1176698) but it
doesn’t appear be just that version.
I built a new pulp server this week on version 2.5.3-0.2.rc. This RHEL6.6
VM has 8 vcpus, 8 GB of RAM and 400Gb of SAN LUNs attached to it. Both
/var/lib/pulp and /var/lib/mongodb are symlinked to the SAN LUN for
performance.
Initially this new server was working great. I created and sync several rpm
repos without any issues but today the hangs/stalls of the syncs started
again. I’m beginning to wonder if something about the 2.5+ architecture
isn’t handling the nearly 100,000 rpms that have been pulled into it.
When the stall happens it is always on the downloading of RPMs from the
feed but nothing is logged and no errors are thrown. I’ve let the process
sit and run overnight and it never resumes. After canceling the sync task,
I have to stop all of the pulp-processes and one of the workers never stops:
# for s in
{goferd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do
service $s stop; done
goferd: unrecognized service
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Stopping pulp_celerybeat... OK
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> [email protected]: QUIT -> 2387
> Waiting for 1 node -> 2387.....
> [email protected]: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> [email protected]: QUIT -> 2664
> [email protected]: QUIT -> 2570
> [email protected]: QUIT -> 2633
> [email protected]: QUIT -> 2540
> [email protected]: QUIT -> 2723
> [email protected]: QUIT -> 2602
> [email protected]: QUIT -> 2692
> [email protected]: QUIT -> 2513
> Waiting for 8 nodes -> 2664, 2570, 2633, 2540, 2723, 2602, 2692,
2513............
> [email protected]: OK
> Waiting for 7 nodes -> 2570, 2633, 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 6 nodes -> 2633, 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 5 nodes -> 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 4 nodes -> 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 3 nodes -> 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 2 nodes -> 2692, 2513.....
> [email protected]: OK
> Waiting for 1 node ->
2692.................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C
Session terminated, killing shell... ...killed.
If I run the for loop again, everything appears to clean up but there is
always a single process that I have to manually kill:
apache 2763 1 2 15:34 ? 00:02:09 /usr/bin/python -m
celery.__main__ worker -c 1 -n
[email protected] --events
--app=pulp.server.async.app --loglevel=INFO
--logfile=/var/log/pulp/reserved_resource_worker-6.log
--pidfile=/var/run/pulp/reserved_resource_worker-6.pid
After killing this final process, I usually stop mongodb, start everything
back up and try the sync again. I’ve also tried rebooting the VM but it
doesn’t seem to be more effective than just stopping and starting the services.
Below are the repos I’ve successfully sync’ed so far on the new server.
(Notice the rhel-6-optional one has 7368 rpm units but hasn’t successfully
finished downloading yet even though I’ve killed it and restarted it 3 time
this afternoon.)
# pulp-admin rpm repo list
+----------------------------------------------------------------------+
RPM Repositories
+----------------------------------------------------------------------+
Id: ol5_x86_64_latest
Display Name: ol5_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1116
Package Category: 9
Package Group: 103
Rpm: 6761
Srpm: 2292
Id: ol6_x86_64_latest
Display Name: ol6_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1659
Package Category: 14
Package Group: 207
Rpm: 13215
Srpm: 3812
Id: epel6
Display Name: epel6
Description: None
Content Unit Counts:
Erratum: 3668
Package Category: 3
Package Group: 208
Rpm: 11135
Yum Repo Metadata File: 1
Id: epel5
Display Name: epel5
Description: None
Content Unit Counts:
Erratum: 1953
Package Category: 5
Package Group: 36
Rpm: 6678
Yum Repo Metadata File: 1
Id: epel7
Display Name: epel7
Description: None
Content Unit Counts:
Erratum: 1252
Package Category: 4
Package Environment: 1
Package Group: 209
Rpm: 7161
Yum Repo Metadata File: 1
Id: rhel-6-os
Display Name: rhel-6-os
Description: None
Content Unit Counts:
Erratum: 2842
Package Category: 10
Package Group: 202
Rpm: 14574
Yum Repo Metadata File: 1
Id: rhel-5-os
Display Name: rhel-5-os
Description: None
Content Unit Counts:
Erratum: 3040
Package Category: 6
Package Group: 99
Rpm: 16668
Yum Repo Metadata File: 1
Id: rhel-6-optional
Display Name: rhel-6-optional
Description: None
Content Unit Counts:
Rpm: 7368
Thanks,
- Bryce
_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list
_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list