We did that with the openPDC classification system where we broke up high resolution PMU/sensor data into "blocks of time + sensor id" buckets, with some overlap.
code at: http://openpdc.codeplex.com The Cloudera article is just a basic example illustrating the secondary sort mechanic, which is key for time series on hadoop (sort for free). The openPDC has one MR job that scans time series for fuzzy patterns using Keogh's SAX/iSAX technique and a 1NN classifier based on a BallTree. Josh On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon <[email protected]> wrote: > If the data series is large it might be interesting to further split the job > over time using overlap/add or overlap/save, or even an FFT suitably > partitioned. > > On Dec 6, 2011, at 1:48 PM, Josh Patterson <[email protected]> wrote: > >> Mahout currently does not have, afaik, much/any time series specific >> code for it. If I were to point someone at some good resources I'd >> start wtih: >> >> - Box and Jenkins book >> - Dr Keogh's line of research on time series pattern matching >> >> And then beyond that it begins to become "what are you specifically >> looking for?". R is typically the "go to" resource for a lot of time >> series work, but there has been some very successful work with Hadoop >> and large scale time series data. Below I link to a few articles where >> time series techniques are demonstrated with Hadoop. Specifically here >> is a blog article on general time series processing with Hadoop: >> >> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/ >> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/ >> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/ >> >> Beyond that you could take a look at how we applied these concepts to >> the US powergrid PMU / smartgrid data back in 2009: >> >> http://openpdc.codeplex.com >> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard >> >> Hope that gets you going, >> >> Josh >> >> 2011/12/4 myn <[email protected]>: >>> does mahout contain this method? >>> or is there any other open soure projcet about this? >> >> >> >> -- >> Twitter: @jpatanooga >> Solution Architect @ Cloudera >> hadoop: http://www.cloudera.com -- Twitter: @jpatanooga Solution Architect @ Cloudera hadoop: http://www.cloudera.com
