Re: outlier detection in time-series using Mahout

Srivathsan Srinivas Mon, 01 Nov 2010 08:54:50 -0700

Dear Ted,

Thanks for pointing to Dirchlet mixture model. I shall look into that.

Basically, I am looking into auto correlation function, Control Charts,
Moving Average, Population Stability, and Poisson regression (much of the
data can be described as daily|hourly counts)– I’d like to build a tool that
would blend these approaches into a scorecard for proactive alerting for any
outliers...

For the above, I am interested in seeing how the time-series data can be
broken into manageable segments and distributed-off to different machines in
a Hadoop network.

Thanks again,
Sri.

On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <[email protected]> wrote:

> There is nothing explicit in Mahout for this, but you could use the
> Dirchlet
> mixture model clustering to do this.
>
> The idea would be to express your different observed time series or short
> segments of time sequences as mixture
> models and then find regions that are not well described by this mixture
> model.  Ideally, you would have a Markov
> model underneath the mixture coefficients, but that is out of scope for
> what
> Mahout does for you right off the bat.  It
> wouldn't be too hard to merge the HMM code and the DP clustering to get
> this, though.
>
> So the answer is no.
>
> But Mahout would be a decent substrate for building your own.
>
> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
> [email protected]> wrote:
>
> > Hi,
> >       Any pointers to techniques/papers that detect outliers in
> time-series
> > of very large data sets using Mahout? I am interesting in seeing what
> > techniques are favorable for use in large-scale distributed systems using
> > Hadoop/Mahout.
> >
> > Thanks,
> > Sri.
> >
>

Re: outlier detection in time-series using Mahout

Reply via email to