Ah! The Butter-Side-Down Predictor. On Fri, Sep 9, 2011 at 10:38 PM, Matt Pinner <[email protected]> wrote:
> Easy. The most important, least redundant, and single points of failure > will > fail next. > On Sep 9, 2011 8:33 PM, "Mike Nute" <[email protected]> wrote: > > IMO, the best approach would depend on your beliefs about the survival > curve of the server. If you believe the general hazard rate is relatively > constant (i.e. time-since-startup is not a huge factor) you could make it > into a basic time series logistic regression problem: Let Y_i_t be 1 if > server i fails at time t, 0 if it does not. Let X_i_(t-1) be the vector of > measurements on server i at time (t-1). Then do logistic regression of X on > Y. You could then add X_i_(t-2) to your predictors and see if it adds > accuracy, and so on with previous time periods until they stop being > predictive. > > > > That would also facilitate experimenting with transformations like the > change in certain measurements at (t-1), (t-2), etc..., or interactions > between certain measurements. > > > > If different failure classes are important, you could similarly apply > that > to multinomial logistic regression. > > > > If the failure rate depends heavily on time since startup, you could > apply > some kind of survival modeling technique like a Cox Proportional Hazard > model or incorporating some prior belief about the shape of the survival > curve. That could end up being technically similar to the logistic > regression above, but with a more exotic link function and/or offset term. > (I have a good brief chapter on the CPH model from an old actuarial exam > study guide in pdf if you want it. Survival models are actuary staples > :-).) > > > > > Hope that helps. > > > > Mike Nute > > > > > > ------Original Message------ > > From: Lance Norskog > > To: user > > ReplyTo: [email protected] > > Subject: Predictive analysis problem > > Sent: Sep 9, 2011 10:45 PM > > > > Let's say you manage 2000 servers in a huge datacenter. You have > regularly > > sampled stats, with uniform methods: aka, they are all sampled the same > way > > across all servers across the full time series This data is a cube of > > (server X time X measurement type), with a measurement in each cell. > > > > You also have a time series of system failures, a matrix of server X > failure > > class. What algorithm will predict which server will fail next, and when > and > > how? > > > > -- > > Lance Norskog > > [email protected] > > > > > -- Lance Norskog [email protected]
