At 01:51 PM 1/2/2006, you wrote:
I believe you when you say that you guys spend
> a lot of effort on the server, but I can't help wondering why it needs so
> much attention in the first place. None of my web apps need much
> intervention at all (and I have several high traffic ones under my care),
> and I don't even consider myself that much of a wizard sysadmin. Do you
> have any idea why your server is falling off the net so often? And why has
> it gotten so much worse in the past few months, compared to a couple of
> years back?

Restarting the server when it goes dead is a minimum amount of effort.

My best guess is it seems to die for three reasons.

1) Bad data sent by client that isn't rigorously checked by server. Buffer overruns on user names or passwords or some other field are probably getting the primenet server application in a funny hung state.

2)  The backend database goes down.  An upgrade to SQLServer 2005 might help.
If the primenet server application handled errors better, that would help too. This is
the infamous ERROR 3 problem.

3) The manual web pages provide an opportunity to flood the server with arbitrary
text results.  Buffer overruns or mis-parsing this text might have led to some
outages.

4) This isn't really an outage, but at the top of each hour building the hourly
reports takes about ten minutes.  When the stats reports were first brought
online, the server was managing far, far fewer exponents.

> I
> would like to see some information on what's wrong, and what you guys are
> doing to fix it. The primenet server appears to be a neglected box stuffed
> away somewhere out of sight because I never saw any news about the ongoing
> work.

That's a pretty accurate account of the current state of affairs. Nothing is being
done to track down these bugs and fix them.

Perhaps what we really need to do at this stage is to publish a formal spec
for the server and its interaction with the client.

For the curious, a spec for the next client-server interface is at http://v5.mersenne.org/v5design/v5webAPI_0.96.html

At that stage it just
might be possible that someone would invest time in rewriting the server code
so that it can be implemented in a distributed, hardware/OS independent
method so that reliance on a single box and effectively a single sysadmin can
be removed from this project.

All efforts have been directed toward replacing the primenet server application
with a new from scratch more robust and bulletproof application.  Scott and I
have been working on it for about 4 months, but at nowhere near the 40 hours
a week required to make this happen quickly.

Sorry for the problems,
George
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to