I can't help but flog a dead horse but... Are you serious? The next server that goes down is the one your Zabbix alerts say, "Server X is down."
Until then, do something productive dammit. Sent from my iPhone On Sep 10, 2011, at 1:10 AM, Lance Norskog <[email protected]> wrote: > Ah! The Butter-Side-Down Predictor. > > On Fri, Sep 9, 2011 at 10:38 PM, Matt Pinner <[email protected]> wrote: > >> Easy. The most important, least redundant, and single points of failure >> will >> fail next. >> On Sep 9, 2011 8:33 PM, "Mike Nute" <[email protected]> wrote: >>> IMO, the best approach would depend on your beliefs about the survival >> curve of the server. If you believe the general hazard rate is relatively >> constant (i.e. time-since-startup is not a huge factor) you could make it >> into a basic time series logistic regression problem: Let Y_i_t be 1 if >> server i fails at time t, 0 if it does not. Let X_i_(t-1) be the vector of >> measurements on server i at time (t-1). Then do logistic regression of X on >> Y. You could then add X_i_(t-2) to your predictors and see if it adds >> accuracy, and so on with previous time periods until they stop being >> predictive. >>> >>> That would also facilitate experimenting with transformations like the >> change in certain measurements at (t-1), (t-2), etc..., or interactions >> between certain measurements. >>> >>> If different failure classes are important, you could similarly apply >> that >> to multinomial logistic regression. >>> >>> If the failure rate depends heavily on time since startup, you could >> apply >> some kind of survival modeling technique like a Cox Proportional Hazard >> model or incorporating some prior belief about the shape of the survival >> curve. That could end up being technically similar to the logistic >> regression above, but with a more exotic link function and/or offset term. >> (I have a good brief chapter on the CPH model from an old actuarial exam >> study guide in pdf if you want it. Survival models are actuary staples >> :-).) >> >>> >>> Hope that helps. >>> >>> Mike Nute >>> >>> >>> ------Original Message------ >>> From: Lance Norskog >>> To: user >>> ReplyTo: [email protected] >>> Subject: Predictive analysis problem >>> Sent: Sep 9, 2011 10:45 PM >>> >>> Let's say you manage 2000 servers in a huge datacenter. You have >> regularly >>> sampled stats, with uniform methods: aka, they are all sampled the same >> way >>> across all servers across the full time series This data is a cube of >>> (server X time X measurement type), with a measurement in each cell. >>> >>> You also have a time series of system failures, a matrix of server X >> failure >>> class. What algorithm will predict which server will fail next, and when >> and >>> how? >>> >>> -- >>> Lance Norskog >>> [email protected] >>> >>> >> > > > > -- > Lance Norskog > [email protected]
