Dear Ted, Thanks for pointing to Dirchlet mixture model. I shall look into that.
Basically, I am looking into auto correlation function, Control Charts, Moving Average, Population Stability, and Poisson regression (much of the data can be described as daily|hourly counts)– I’d like to build a tool that would blend these approaches into a scorecard for proactive alerting for any outliers... For the above, I am interested in seeing how the time-series data can be broken into manageable segments and distributed-off to different machines in a Hadoop network. Thanks again, Sri. On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]> wrote: > There is nothing explicit in Mahout for this, but you could use the > Dirchlet > mixture model clustering to do this. > > The idea would be to express your different observed time series or short > segments of time sequences as mixture > models and then find regions that are not well described by this mixture > model. Ideally, you would have a Markov > model underneath the mixture coefficients, but that is out of scope for > what > Mahout does for you right off the bat. It > wouldn't be too hard to merge the HMM code and the DP clustering to get > this, though. > > So the answer is no. > > But Mahout would be a decent substrate for building your own. > > On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas < > [email protected]> wrote: > > > Hi, > > Any pointers to techniques/papers that detect outliers in > time-series > > of very large data sets using Mahout? I am interesting in seeing what > > techniques are favorable for use in large-scale distributed systems using > > Hadoop/Mahout. > > > > Thanks, > > Sri. > > >
