On 9/9/2011 10:45 PM, Lance Norskog wrote:
Let's say you manage 2000 servers in a huge datacenter. You have regularly
sampled stats, with uniform methods: aka, they are all sampled the same way
across all servers across the full time series  This data is a cube of
(server X time X measurement type), with a measurement in each cell.

You also have a time series of system failures, a matrix of server X failure
class. What algorithm will predict which server will fail next, and when and
how?


A sophisticated algorithm may use time series information, but as a first approximation, and as a good baseline, you might try a simple binary classifier trained on failure / no failure data. Here's a paper that does something similar:

cseweb.ucsd.edu/~dturnbul/Papers/ServerPrediction.pdf

More generally, this is known as failure prediction, which is a subtype of rare event prediction. Similar problems are explored in intrusion detection.


Reply via email to