IMO, the best approach would depend on your beliefs about the survival curve of 
the server. If you believe the general hazard rate is relatively constant (i.e. 
time-since-startup is not a huge factor) you could make it into a basic time 
series logistic regression problem: Let Y_i_t be 1 if server i fails at time t, 
0 if it does not. Let X_i_(t-1) be the vector of measurements on server i at 
time (t-1). Then do logistic regression of X on Y. You could then add X_i_(t-2) 
to your predictors and see if it adds accuracy, and so on with previous time 
periods until they stop being predictive. 

That would also facilitate experimenting with transformations like the change 
in certain measurements at (t-1), (t-2), etc..., or interactions between 
certain measurements.

If different failure classes are important, you could similarly apply that to 
multinomial logistic regression.

If the failure rate depends heavily on time since startup, you could apply some 
kind of survival modeling technique like a Cox Proportional Hazard model or 
incorporating some prior belief about the shape of the survival curve. That 
could end up being technically similar to the logistic regression above, but 
with a more exotic link function and/or offset term. (I have a good brief 
chapter on the CPH model from an old actuarial exam study guide in pdf if you 
want it. Survival models are actuary staples :-).) 

Hope that helps.

Mike Nute


------Original Message------
From: Lance Norskog
To: user
ReplyTo: [email protected]
Subject: Predictive analysis problem
Sent: Sep 9, 2011 10:45 PM

Let's say you manage 2000 servers in a huge datacenter. You have regularly
sampled stats, with uniform methods: aka, they are all sampled the same way
across all servers across the full time series  This data is a cube of
(server X time X measurement type), with a measurement in each cell.

You also have a time series of system failures, a matrix of server X failure
class. What algorithm will predict which server will fail next, and when and
how?

-- 
Lance Norskog
[email protected]


Reply via email to