I can't help but flog a dead horse but...  Are you serious?

The next server that goes down is the one your Zabbix alerts say, "Server X is 
down."

Until then, do something productive dammit. 

Sent from my iPhone

On Sep 10, 2011, at 1:10 AM, Lance Norskog <[email protected]> wrote:

> Ah! The Butter-Side-Down Predictor.
> 
> On Fri, Sep 9, 2011 at 10:38 PM, Matt Pinner <[email protected]> wrote:
> 
>> Easy. The most important, least redundant, and single points of failure
>> will
>> fail next.
>> On Sep 9, 2011 8:33 PM, "Mike Nute" <[email protected]> wrote:
>>> IMO, the best approach would depend on your beliefs about the survival
>> curve of the server. If you believe the general hazard rate is relatively
>> constant (i.e. time-since-startup is not a huge factor) you could make it
>> into a basic time series logistic regression problem: Let Y_i_t be 1 if
>> server i fails at time t, 0 if it does not. Let X_i_(t-1) be the vector of
>> measurements on server i at time (t-1). Then do logistic regression of X on
>> Y. You could then add X_i_(t-2) to your predictors and see if it adds
>> accuracy, and so on with previous time periods until they stop being
>> predictive.
>>> 
>>> That would also facilitate experimenting with transformations like the
>> change in certain measurements at (t-1), (t-2), etc..., or interactions
>> between certain measurements.
>>> 
>>> If different failure classes are important, you could similarly apply
>> that
>> to multinomial logistic regression.
>>> 
>>> If the failure rate depends heavily on time since startup, you could
>> apply
>> some kind of survival modeling technique like a Cox Proportional Hazard
>> model or incorporating some prior belief about the shape of the survival
>> curve. That could end up being technically similar to the logistic
>> regression above, but with a more exotic link function and/or offset term.
>> (I have a good brief chapter on the CPH model from an old actuarial exam
>> study guide in pdf if you want it. Survival models are actuary staples
>> :-).)
>> 
>>> 
>>> Hope that helps.
>>> 
>>> Mike Nute
>>> 
>>> 
>>> ------Original Message------
>>> From: Lance Norskog
>>> To: user
>>> ReplyTo: [email protected]
>>> Subject: Predictive analysis problem
>>> Sent: Sep 9, 2011 10:45 PM
>>> 
>>> Let's say you manage 2000 servers in a huge datacenter. You have
>> regularly
>>> sampled stats, with uniform methods: aka, they are all sampled the same
>> way
>>> across all servers across the full time series This data is a cube of
>>> (server X time X measurement type), with a measurement in each cell.
>>> 
>>> You also have a time series of system failures, a matrix of server X
>> failure
>>> class. What algorithm will predict which server will fail next, and when
>> and
>>> how?
>>> 
>>> --
>>> Lance Norskog
>>> [email protected]
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> [email protected]

Reply via email to