Howdy team, We in GSS are currently working a sev1 case from Comcast, which customer are hitting the 503 HTTP error pages when trying to access the system tab on the webUI for Satellite 5.4. All the other functions (yum, create channel, etc) works as expected.
[Fri May 20 14:30:00 2011] [error] (111)Connection refused: proxy: AJP: attempt to connect to 127.0.0.1:8009 (*) failed [Fri May 20 14:30:00 2011] [error] proxy: AJP: failed to make connection to backend: localhost [Fri May 20 14:30:04 2011] [error] (111)Connection refused: proxy: AJP: attempt to connect to 127.0.0.1:8009 (*) failed [Fri May 20 14:30:04 2011] [error] proxy: AJP: failed to make connection to backend: localhost Customer environment: * External Database Oracle 11 * RHN Satellite 5.4 * Issue: when using webUI and clicking System tab, customer receive an 503 HTTPD error. The other tabs works (little slowly), but no 503's errors. Diagnostics Steps: Database query times In order to check if the bottleneck was the external DB, we ran the SQL manually and the SQL ran pretty quickly. -- Show systems (rhn/systems/Overview.do) SELECT DISTINCT S.id, S.name, (SELECT 1 FROM rhnServerFeaturesView SFV WHERE SFV.server_id = S.id AND SFV.label = 'ftr_system_grouping') AS selectable FROM rhnServer S inner join rhnUserServerPerms USP on S.id = USP.server_id WHERE USP.user_id = &rhnuser_id; -- Show systems in Group (rhn/systems/Overview.do?showgroups=true) SELECT SGM.server_id AS ID, S.name AS NAME, (SELECT 1 FROM rhnServerFeaturesView SFV WHERE SFV.server_id = S.id AND SFV.label = 'ftr_system_grouping') AS selectable FROM rhnServer S, rhnServerGroupMembers SGM WHERE SGM.server_group_id = &rhnServerGroup_id AND SGM.server_id = S.id AND EXISTS (SELECT 1 FROM rhnServerFeaturesView SFV WHERE SFV.server_id = S.id AND SFV.label = 'ftr_system_grouping') ORDER BY UPPER(NVL(S.NAME, '(none)')), S.ID; The queries seems to be ok. {SNIP} ... 1000018206 espmon-po-1p.cable.comcast.com 1 1000018235 ocepcui-wc-1p.sys.comcast.net 1 2477 rows selected. Elapsed: 00:00:11.14 {SNIP} 1000016310 xg3 1 1000013457 xtaweb-nb-01p.philadelphia.pa.bo.comcast.net 1 2377 rows selected. Elapsed: 00:00:10.65 Customer have +- 3800 servers registered in Satellite SQL> select count(*) from rhnserver; COUNT(*) ---------- 3748 We asked the DB dump from customer, and we imported it on internal reproducer. Hostname: dhcp12.gsslab.rdu.redhat.com SSH: root/redhat webUI: satadmin/redhat Using the customer db, we **COULD NOT** reproduce the issue directly. To load the system tab, at the first time, it took 1-2 minutes to return at the first access. Afterwards, it took almost 50s. To reproduce the issue in-house, we force the timeout to a very low value, then we got the 503 + ajp timeout error. /etc/httpd/conf/httpd.conf From: Timeout 120 To: Timeout 10 /etc/httpd/conf.d/zz-spacewalk-www.conf From: <IfModule proxy_ajp_module> RewriteRule ^/rhn(.*) ajp://localhost:8009/rhn$1 [P] RewriteRule ^(/.*\.(do|jsp)(\?.*)?)$ ajp://localhost:8009/$1 [P] </IfModule> To: <IfModule proxy_ajp_module> RewriteRule ^/rhn(.*) ajp://localhost:8009/rhn$1 [P] timeout=10 RewriteRule ^(/.*\.(do|jsp)(\?.*)?)$ ajp://localhost:8009/$1 [P] timeout=10 </IfModule> Afterwards, we restarted Satellite. [root@dhcp12 conf.d]# tail -f /var/log/httpd/error_log [Tue May 24 11:58:19 2011] [notice] Digest: done [Tue May 24 11:58:19 2011] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads. [Tue May 24 11:58:19 2011] [notice] Apache configured -- resuming normal operations [Tue May 24 11:58:23 2011] [error] (111)Connection refused: proxy: AJP: attempt to connect to 127.0.0.1:8009 (*) failed [Tue May 24 11:58:23 2011] [error] proxy: AJP: failed to make connection to backend: localhost [Tue May 24 11:58:27 2011] [error] (111)Connection refused: proxy: AJP: attempt to connect to 127.0.0.1:8009 (*) failed [Tue May 24 11:58:27 2011] [error] proxy: AJP: failed to make connection to backend: localhost [Tue May 24 11:58:50 2011] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header [Tue May 24 11:59:37 2011] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header [Tue May 24 12:00:25 2011] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header [root@dhcp12 conf.d]# tail -f /var/log/httpd/ssl_access_log 10.11.9.75 - - [24/May/2011:12:09:37 -0400] "GET /rhn/software/channels/All.do HTTP/1.1" 200 137344 10.11.9.75 - - [24/May/2011:12:09:38 -0400] "GET /rhn/dwr/engine.js HTTP/1.1" 200 46055 10.11.9.75 - - [24/May/2011:12:09:43 -0400] "GET /rhn/systems/Overview.do HTTP/1.1" 503 402 I'm running without options on this case. Do you guys have some clues from what we can do to identify/debug the issue? Why the systems tab take to long to return, if the SQL return pretty quickly. Thanks for your attention. Cheers, --marcelo -- Marcelo Moreira de Mello RHCA RHCSS RHCVA Software Maintenance Engineer/SEG gpg id: 2048R/FDB110E5 gpg fingerprint: 3BE7 EF71 4DD7 6812 D309 8F18 BD42 D095 FDB1 10E5 _______________________________________________ Spacewalk-devel mailing list Spacewalk-devel@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-devel