In that case, another Faloutsos paper would be of interest: 2002 Performance - best student paper award: Mengzhi Wang, Anastassia Ailamaki and Christos Faloutsos, *Capturing the spatio-temporal behavior of real traffic data<http://www.cs.cmu.edu/~christos/PUBLICATIONS/performance02.pdf> * Performance 2002 (IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation), Rome, Italy, Sept. 2002
http://www.cs.cmu.edu/~christos/PUBLICATIONS/performance02.pdf I don't see that these techniques will necessarily scale, but violation of a traffic model might well be a good anomaly detector for your problem. On Sun, Oct 3, 2010 at 6:57 AM, Latency Buster <[email protected]>wrote: > >That's a nice non-answer. The term "network data" and "identify some > > definite patterns" doesn't say enough to answer anything specifically. > If > > you know what kind of patterns you are looking for, or if you could say > what > > the data contains, you could get > > some help here. With open source, you get back much more if you give a > > little to start with. > > Thanks for the pointers. Fuzzing was not my intension but I was > fearing that I might coalesce two different topics under one heading. > > Now,to provide some clarity around this: We have around 1TB of data as > the ultimate aim but with 50GB of data to start with. The data > consists of identifying a definite pattern that arises when people are > making calls. We are a call center using IP phone. There are certain > times of the day when not only the volume but the encoded speech > patterns contains a certain overlap in their freq spectrum. So, our > aim to is cluster/extract some representative feature sets of the > decoded voice signals (or signal fingerprinting), correlate with the > network feature sets (say ratio of 3sigma after (0,1) normalization) > and check if there exists some useful information around these. > > Thanks, >
