We did that with the openPDC classification system where we broke up
high resolution PMU/sensor data into "blocks of time + sensor id"
buckets, with some overlap.

code at: http://openpdc.codeplex.com

The Cloudera article is just a basic example illustrating the
secondary sort mechanic, which is key for time series on hadoop (sort
for free).

The openPDC has one MR job that scans time series for fuzzy patterns
using Keogh's SAX/iSAX technique and a 1NN classifier based on a
BallTree.

Josh

On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon
<[email protected]> wrote:
> If the data series is large it might be interesting to further split the job 
> over time using overlap/add or overlap/save, or even an FFT suitably 
> partitioned.
>
> On Dec 6, 2011, at 1:48 PM, Josh Patterson <[email protected]> wrote:
>
>> Mahout currently does not have, afaik, much/any time series specific
>> code for it. If I were to point someone at some good resources I'd
>> start wtih:
>>
>> - Box and Jenkins book
>> - Dr Keogh's line of research on time series pattern matching
>>
>> And then beyond that it begins to become "what are you specifically
>> looking for?". R is typically the "go to" resource for a lot of time
>> series work, but there has been some very successful work with Hadoop
>> and large scale time series data. Below I link to a few articles where
>> time series techniques are demonstrated with Hadoop. Specifically here
>> is a blog article on general time series processing with  Hadoop:
>>
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>>
>> Beyond that you could take a look at how we applied these concepts to
>> the US powergrid PMU / smartgrid data back in 2009:
>>
>> http://openpdc.codeplex.com
>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>>
>> Hope that gets you going,
>>
>> Josh
>>
>> 2011/12/4 myn <[email protected]>:
>>> does mahout contain this method?
>>> or is there any other open soure projcet about this?
>>
>>
>>
>> --
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Reply via email to