> What's being done to diagnose this further? The current diagnosis is that the disk driver reports SCSI errors on the RAID controller. They don't get logged to disk, as the disk has failed...
Things go downhill from their, apparently causing a kernel hang ultimately. The RAID controller itself reports that it is in "optimal" condition (i.e. no hardware failures). I'm hesitant to perform a firmware update to the controller without physical access in case the update fails. > 2 outages in the space of 3 or 4 days doesn't look to good. The current plan is to change the hosting "soon". If that fails or stalls again, I would try to get the controller replaced (again; this would be the third controller); it's uncertain whether doing so would help. How exactly the replacement could be performed is also unclear. > How does this correlate with pypi being moved to dinsdale? Perhaps not at all. The controller had a series of failures last summer also, but then "fixed" itself somehow. It could be overheating of some component (not necessarily the RAID controller), which could explain why it occurs randomly, and in increased frequency after putting more load on the machine. Regards, Martin _______________________________________________ pydotorg-www mailing list [email protected] http://mail.python.org/mailman/listinfo/pydotorg-www
