"Paddy" <[EMAIL PROTECTED]> writes: > > But you're proposing cargo cult programming. > i don't know that term.
http://en.wikipedia.org/wiki/Cargo_cult_programming > What I'm proposing is that if, for example, a process stops running > three times in a year at roughly three to four months intervals , > and it should have stayed up; then restart the server sooner, at aa > time of your choosing, What makes you think that restarting the server will make it less likely to fail? It sounds to me like there's zero evidence of that, since you say "roughly three or four month intervals" and talk about threading and race conditions. If it's failing every 3 months, 15 days and 2.43 hours like clockwork, that's different, sure, restart it every three months. But the description I see so far sounds like a random failure caused by some events occurring with low enough probability that they only happen on average every few months of operation. That kind of thing is very common and is often best diagnosed by instrumenting the hell out of the code. > > There is no reason whatsoever to expect that restarting the server > > now and then will help the problem in the slightest. > Thats where we most likely differ. Do you think there is a reason to expect that restarting the server will help the problem in the slightest? I realize you seem to expect that, but you have not given a REASON. That's what I mean by cargo cult programming. > Whilst you sit agreeing on how many fairys can dance on the end of a > pin or not Your company could be loosing customers. You and Nick seem > to be saying it *must* be Poisson, therefore we can't do... I dunno about Nick, I'm saying it's best to assume that it's Poisson and do whatever is necessary to diagnose and fix the bug, and that the voodoo measure you're proposing is not all that likely to help and it will take years to find out whether it helps or not (i.e. restarting after 3 months and going another 3 months without a failure proves nothing). > I'm sorry, but your argument reminds me of when Western statistical > quality control first met with the Japanese Zero defects methodologies. > We had argued ourselves into accepting a certain amount of defective > cars getting out to customers as the result of our theories. The > Japanese practices emphasized *no* defects were acceptable at the > customer, and they seemed to deliver better made cars. I don't see your point. You're the one who wants to keep operating defective software instead of fixing it. > "at random" - "every few months" > Me thinking it happens "every few months" allows me to search for a > fix. If thinking it happens "at random" leads you to a brick wall, > then switch! But you need evidence before you can say it happens every few months. Do you have, say, a graph of the exact dates and times of failure, the number of requests processed so far, etc.? If it happened at some exact or almost exact uniform time interval or precisely once every 1.273 million requests or whatever, that tells you something. But the earlier description didn't sound like that. Restarting the server is not much better than carrying a lucky rabbit's foot. -- http://mail.python.org/mailman/listinfo/python-list