"Prediction is very difficult, especially about the future" Niels Bohr

I would first ask questions on evaluation techniques - how one would
verify that prediction "make sense" and second - how prediction will
be used? One can predict that on average N servers will fail within a
month; or even narrow prediction to a group of servers with higher
probability of failure, but how this prediction will be used?
It seems that future actions should affect the model prediction and
model built on past "non-actionable" data can have  little future
prediction power, if at all.

--Konstantin

On Sat, Sep 10, 2011 at 1:38 AM, highpointe <[email protected]> wrote:
> I can't help but flog a dead horse but...  Are you serious?
>
> The next server that goes down is the one your Zabbix alerts say, "Server X 
> is down."
>
> Until then, do something productive dammit.
>
> Sent from my iPhone
>
> On Sep 10, 2011, at 1:10 AM, Lance Norskog <[email protected]> wrote:
>
>> Ah! The Butter-Side-Down Predictor.
>>
>> On Fri, Sep 9, 2011 at 10:38 PM, Matt Pinner <[email protected]> wrote:
>>
>>> Easy. The most important, least redundant, and single points of failure
>>> will
>>> fail next.
>>> On Sep 9, 2011 8:33 PM, "Mike Nute" <[email protected]> wrote:
>>>> IMO, the best approach would depend on your beliefs about the survival
>>> curve of the server. If you believe the general hazard rate is relatively
>>> constant (i.e. time-since-startup is not a huge factor) you could make it
>>> into a basic time series logistic regression problem: Let Y_i_t be 1 if
>>> server i fails at time t, 0 if it does not. Let X_i_(t-1) be the vector of
>>> measurements on server i at time (t-1). Then do logistic regression of X on
>>> Y. You could then add X_i_(t-2) to your predictors and see if it adds
>>> accuracy, and so on with previous time periods until they stop being
>>> predictive.
>>>>
>>>> That would also facilitate experimenting with transformations like the
>>> change in certain measurements at (t-1), (t-2), etc..., or interactions
>>> between certain measurements.
>>>>
>>>> If different failure classes are important, you could similarly apply
>>> that
>>> to multinomial logistic regression.
>>>>
>>>> If the failure rate depends heavily on time since startup, you could
>>> apply
>>> some kind of survival modeling technique like a Cox Proportional Hazard
>>> model or incorporating some prior belief about the shape of the survival
>>> curve. That could end up being technically similar to the logistic
>>> regression above, but with a more exotic link function and/or offset term.
>>> (I have a good brief chapter on the CPH model from an old actuarial exam
>>> study guide in pdf if you want it. Survival models are actuary staples
>>> :-).)
>>>
>>>>
>>>> Hope that helps.
>>>>
>>>> Mike Nute
>>>>
>>>>
>>>> ------Original Message------
>>>> From: Lance Norskog
>>>> To: user
>>>> ReplyTo: [email protected]
>>>> Subject: Predictive analysis problem
>>>> Sent: Sep 9, 2011 10:45 PM
>>>>
>>>> Let's say you manage 2000 servers in a huge datacenter. You have
>>> regularly
>>>> sampled stats, with uniform methods: aka, they are all sampled the same
>>> way
>>>> across all servers across the full time series This data is a cube of
>>>> (server X time X measurement type), with a measurement in each cell.
>>>>
>>>> You also have a time series of system failures, a matrix of server X
>>> failure
>>>> class. What algorithm will predict which server will fail next, and when
>>> and
>>>> how?
>>>>
>>>> --
>>>> Lance Norskog
>>>> [email protected]
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>



-- 
ksh:

Reply via email to