Ah! The Butter-Side-Down Predictor.

On Fri, Sep 9, 2011 at 10:38 PM, Matt Pinner <[email protected]> wrote:

> Easy. The most important, least redundant, and single points of failure
> will
> fail next.
> On Sep 9, 2011 8:33 PM, "Mike Nute" <[email protected]> wrote:
> > IMO, the best approach would depend on your beliefs about the survival
> curve of the server. If you believe the general hazard rate is relatively
> constant (i.e. time-since-startup is not a huge factor) you could make it
> into a basic time series logistic regression problem: Let Y_i_t be 1 if
> server i fails at time t, 0 if it does not. Let X_i_(t-1) be the vector of
> measurements on server i at time (t-1). Then do logistic regression of X on
> Y. You could then add X_i_(t-2) to your predictors and see if it adds
> accuracy, and so on with previous time periods until they stop being
> predictive.
> >
> > That would also facilitate experimenting with transformations like the
> change in certain measurements at (t-1), (t-2), etc..., or interactions
> between certain measurements.
> >
> > If different failure classes are important, you could similarly apply
> that
> to multinomial logistic regression.
> >
> > If the failure rate depends heavily on time since startup, you could
> apply
> some kind of survival modeling technique like a Cox Proportional Hazard
> model or incorporating some prior belief about the shape of the survival
> curve. That could end up being technically similar to the logistic
> regression above, but with a more exotic link function and/or offset term.
> (I have a good brief chapter on the CPH model from an old actuarial exam
> study guide in pdf if you want it. Survival models are actuary staples
> :-).)
>
> >
> > Hope that helps.
> >
> > Mike Nute
> >
> >
> > ------Original Message------
> > From: Lance Norskog
> > To: user
> > ReplyTo: [email protected]
> > Subject: Predictive analysis problem
> > Sent: Sep 9, 2011 10:45 PM
> >
> > Let's say you manage 2000 servers in a huge datacenter. You have
> regularly
> > sampled stats, with uniform methods: aka, they are all sampled the same
> way
> > across all servers across the full time series This data is a cube of
> > (server X time X measurement type), with a measurement in each cell.
> >
> > You also have a time series of system failures, a matrix of server X
> failure
> > class. What algorithm will predict which server will fail next, and when
> and
> > how?
> >
> > --
> > Lance Norskog
> > [email protected]
> >
> >
>



-- 
Lance Norskog
[email protected]

Reply via email to