> What's being done to diagnose this further?

The current diagnosis is that the disk driver reports SCSI errors on the
RAID controller. They don't get logged to disk, as the disk has
failed...

Things go downhill from their, apparently causing a kernel hang
ultimately.

The RAID controller itself reports that it is in "optimal" condition
(i.e. no hardware failures).

I'm hesitant to perform a firmware update to the controller without
physical access in case the update fails.

> 2 outages in the space of 3 or 4 days doesn't look to good.

The current plan is to change the hosting "soon". If that fails
or stalls again, I would try to get the controller replaced
(again; this would be the third controller); it's uncertain whether
doing so would help. How exactly the replacement could be performed
is also unclear.

> How does this correlate with pypi being moved to dinsdale?

Perhaps not at all. The controller had a series of failures last summer
also, but then "fixed" itself somehow. It could be overheating of some
component (not necessarily the RAID controller), which could explain
why it occurs randomly, and in increased frequency after putting more
load on the machine.

Regards,
Martin
_______________________________________________
pydotorg-www mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pydotorg-www

Reply via email to