Hi,

We have recently experienced a problem with our Spacewalk 1.9 installation.  We 
maintain a series of channels, Dev, Staging, and Production, with Dev being 
either synced from public repositories (Scientific Linux, EPEL, etc.) or 
populated with rhnpush.  We follow the pattern of scheduling patches by cloning 
the Dev channel into staging, then staging into production.

Following this pattern, for kickstarts we have dev, staging, and production 
kickstarts for each type of setup that use the matching channels.  This has 
worked perfectly until recently.

Recently we started deploying a new server type, and last week it was time to 
start deploying the production systems.  We cloned our staging kickstart and 
created an appropriate new activation key and set the appropriate channels (the 
relevant channels already existed prior, for a long time now).  When we went to 
kickstart the system, we began seeing "retrying download" for comps.xml (and 
repomd.xml at times).  

In the log on vt3 of the installing system, we saw "WARNING : Try 1/10 for 
http://<sw-proxy-ip>/ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml
 failed: [Errno -1] Metadata file does not match checksum" and then progressing 
through try 10/10.  

On the proxy we saw in squid access.log "1380214098.913    166 127.0.0.1 
TCP_MISS/200 4126 GET 
http://<sw-proxy-hostname>/ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml
 - DIRECT/<sw-proxy-ip> text/html"

On the main SW server we saw messages such as "<sw-proxy-ip> - - 
[26/Sep/2013:11:24:32 -0700] "GET 
/ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml
 HTTP/1.1" 200 3724 "-" "Scientific Linux (anaconda)/6.2"

We were not able to reproduce this with the kickstart we cloned from.  We 
verified that we see the same results when kickstarting through the proxy and 
directly through the main Spacewalk server.  We also tried creating a new 
kickstart clone and see the same thing.

When I try to load the URL in the browser through the proxy, I get "file 
download failed."  If I try through the browser and the main SW server, I get a 
pop-up that says "A serve error has occurred"  However, based on the dissimilar 
log messages to what we experience when it is via kickstart, I am not convinced 
that this is a proper way to check.  Is it?

We can eliminate the problem by changing the new kickstart to use our dev 
channels.  If we use the production channels, or the staging channels (that 
apparently work still with the staging kickstart that we cloned from), then the 
problem reappears.

I am unsure whether I've run into a bug or there has been some sort of 
corruption somewhere.  We haven't had any hardware events of note, and the 
PostgreSQL database doesn't appear to have any bad blocks or other issues.  One 
possible additional symptom is that we had Spacewalk monitoring stop working 
before we started working on the new kickstarts, and restarting all Spacewalk 
services did not clear the issue.  We ended up rebooting the Spacewalk server 
and that did clear up the monitoring issue.

I've already tried adding and removing packages from our channels that are 
populated with rhnpush, and for the others I have treid resyncing our dev 
channels and then deleting and re-adding packages to the clone channels, but 
for some of them I can't fully do this because we are running into 
https://bugzilla.redhat.com/show_bug.cgi?id=970315 and I had planned an upgrade 
to SW 2.0 to address this once we had completed the production release on which 
we were working.  However I am loathe to upgrade to 2.0 with this issue 
unresolved, unless it's a bug in 1.9 and 2.0 fixes it, of course - but 
searching this list and the larger internet hasn't indicated to me that others 
have experienced this.

Everything else in Spacewalk appears to be working fine.  Even the channels 
that fail during kickstart can be assigned to systems, yum clean all and then 
yum install a package and there are no complaints from yum.

Anyone have any ideas what might be going on here?  Other ideas to 
troubleshoot?  And of course, any ideas on how to fix are most welcome.

Thanks,  
Michael Guidero 
Sococo IT 

_______________________________________________
Spacewalk-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/spacewalk-list

Reply via email to