I’ve been having a lot of trouble with my pulp server lately related to rpm
repo syncs hanging/stalling. I thought the issue might have been related to the
2.6 beta build I was running because it fixed my bug (1176698) but it doesn’t
appear be just that version.
I built a new pulp server this week on version 2.5.3-0.2.rc. This RHEL6.6 VM
has 8 vcpus, 8 GB of RAM and 400Gb of SAN LUNs attached to it. Both
/var/lib/pulp and /var/lib/mongodb are symlinked to the SAN LUN for
performance.
Initially this new server was working great. I created and sync several rpm
repos without any issues but today the hangs/stalls of the syncs started again.
I’m beginning to wonder if something about the 2.5+ architecture isn’t
handling the nearly 100,000 rpms that have been pulled into it.
When the stall happens it is always on the downloading of RPMs from the feed
but nothing is logged and no errors are thrown. I’ve let the process sit and
run overnight and it never resumes. After canceling the sync task, I have to
stop all of the pulp-processes and one of the workers never stops:
# for s in {goferd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd};
do service $s stop; done
goferd: unrecognized service
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Stopping pulp_celerybeat... OK
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> [email protected]: QUIT -> 2387
> Waiting for 1 node -> 2387.....
> [email protected]: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> [email protected]: QUIT -> 2664
> [email protected]: QUIT -> 2570
> [email protected]: QUIT -> 2633
> [email protected]: QUIT -> 2540
> [email protected]: QUIT -> 2723
> [email protected]: QUIT -> 2602
> [email protected]: QUIT -> 2692
> [email protected]: QUIT -> 2513
> Waiting for 8 nodes -> 2664, 2570, 2633, 2540, 2723, 2602, 2692,
> 2513............
> [email protected]: OK
> Waiting for 7 nodes -> 2570, 2633, 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 6 nodes -> 2633, 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 5 nodes -> 2540, 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 4 nodes -> 2723, 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 3 nodes -> 2602, 2692, 2513....
> [email protected]: OK
> Waiting for 2 nodes -> 2692, 2513.....
> [email protected]: OK
> Waiting for 1 node ->
> 2692.................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C
Session terminated, killing shell... ...killed.
If I run the for loop again, everything appears to clean up but there is always
a single process that I have to manually kill:
apache 2763 1 2 15:34 ? 00:02:09 /usr/bin/python -m
celery.__main__ worker -c 1 -n [email protected]
--events --app=pulp.server.async.app --loglevel=INFO
--logfile=/var/log/pulp/reserved_resource_worker-6.log
--pidfile=/var/run/pulp/reserved_resource_worker-6.pid
After killing this final process, I usually stop mongodb, start everything back
up and try the sync again. I’ve also tried rebooting the VM but it doesn’t seem
to be more effective than just stopping and starting the services.
Below are the repos I’ve successfully sync’ed so far on the new server. (Notice
the rhel-6-optional one has 7368 rpm units but hasn’t successfully finished
downloading yet even though I’ve killed it and restarted it 3 time this
afternoon.)
# pulp-admin rpm repo list
+----------------------------------------------------------------------+
RPM Repositories
+----------------------------------------------------------------------+
Id: ol5_x86_64_latest
Display Name: ol5_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1116
Package Category: 9
Package Group: 103
Rpm: 6761
Srpm: 2292
Id: ol6_x86_64_latest
Display Name: ol6_x86_64_latest
Description: None
Content Unit Counts:
Erratum: 1659
Package Category: 14
Package Group: 207
Rpm: 13215
Srpm: 3812
Id: epel6
Display Name: epel6
Description: None
Content Unit Counts:
Erratum: 3668
Package Category: 3
Package Group: 208
Rpm: 11135
Yum Repo Metadata File: 1
Id: epel5
Display Name: epel5
Description: None
Content Unit Counts:
Erratum: 1953
Package Category: 5
Package Group: 36
Rpm: 6678
Yum Repo Metadata File: 1
Id: epel7
Display Name: epel7
Description: None
Content Unit Counts:
Erratum: 1252
Package Category: 4
Package Environment: 1
Package Group: 209
Rpm: 7161
Yum Repo Metadata File: 1
Id: rhel-6-os
Display Name: rhel-6-os
Description: None
Content Unit Counts:
Erratum: 2842
Package Category: 10
Package Group: 202
Rpm: 14574
Yum Repo Metadata File: 1
Id: rhel-5-os
Display Name: rhel-5-os
Description: None
Content Unit Counts:
Erratum: 3040
Package Category: 6
Package Group: 99
Rpm: 16668
Yum Repo Metadata File: 1
Id: rhel-6-optional
Display Name: rhel-6-optional
Description: None
Content Unit Counts:
Rpm: 7368
Thanks,
- Bryce
_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list