Data mining/pattern recogniton software in Python?

2012-03-23 Thread Grzegorz Staniak
Hello,

I've been asked by a colleague for help in a small educational
project, which would involve the recognition of patterns in a live 
feed of data points (readings from a measuring appliance), and then 
a more general search for patterns on archival data. The language 
of preference is Python, since the lab uses software written in
Python already. I can see there are packages like Open CV,
scikit-learn, Orange that could perhaps be of use for the mining
phase -- and even if they are slanted towards image pattern 
recognition, I think I'd be able to find an appropriate package
for the timeseries analyses. But I'm wondering about the live 
phase -- what approach would you suggest? I wouldn't want to 
force an open door, perhaps there are already packages/modules that 
could be used to read data in a loop i.e. every 10 seconds, 
maintain a a buffer of 15 readings and ring a bell when the data
in buffer form a specific pattern (a spike, a trough, whatever)?

I'll be grateful for a push in the right direction. Thanks,

GS
-- 
Grzegorz Staniak   gstaniak _at_ gmail [dot] com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data mining/pattern recogniton software in Python?

2012-03-23 Thread Nelle Varoquaux
Hello,

There are two steps in using a supervised learning algorithm: fitting the
classifier on data labeled, and predicting on new data.
If you are looking to fit with incoming data, you are looking for online
algorithms: algorithms that take chunks of data to fit the classifier on
the fly. scikit-learn have a couple of algorithms that are online (k-means
for example)
If you are looking to predict with chunks of data, it can easily be done
with any kind of already fitted classifier. Hence, you only need to find a
way to retrieve the data. twisted may come in handy for that, or any other
asynchronous framework.

scikit-learn is not image oriented. You can do timeseries with it: there is
probably already an example in the gallery.

Hope that helped,
N

On 23 March 2012 17:43, Grzegorz Staniak gstan...@gmail.com wrote:

 Hello,

 I've been asked by a colleague for help in a small educational
 project, which would involve the recognition of patterns in a live
 feed of data points (readings from a measuring appliance), and then
 a more general search for patterns on archival data. The language
 of preference is Python, since the lab uses software written in
 Python already. I can see there are packages like Open CV,
 scikit-learn, Orange that could perhaps be of use for the mining
 phase -- and even if they are slanted towards image pattern
 recognition, I think I'd be able to find an appropriate package
 for the timeseries analyses. But I'm wondering about the live
 phase -- what approach would you suggest? I wouldn't want to
 force an open door, perhaps there are already packages/modules that
 could be used to read data in a loop i.e. every 10 seconds,
 maintain a a buffer of 15 readings and ring a bell when the data
 in buffer form a specific pattern (a spike, a trough, whatever)?

 I'll be grateful for a push in the right direction. Thanks,

 GS
 --
 Grzegorz Staniak   gstaniak _at_ gmail [dot] com
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data mining/pattern recogniton software in Python?

2012-03-23 Thread Jon Clements
On Friday, 23 March 2012 16:43:40 UTC, Grzegorz Staniak  wrote:
 Hello,
 
 I've been asked by a colleague for help in a small educational
 project, which would involve the recognition of patterns in a live 
 feed of data points (readings from a measuring appliance), and then 
 a more general search for patterns on archival data. The language 
 of preference is Python, since the lab uses software written in
 Python already. I can see there are packages like Open CV,
 scikit-learn, Orange that could perhaps be of use for the mining
 phase -- and even if they are slanted towards image pattern 
 recognition, I think I'd be able to find an appropriate package
 for the timeseries analyses. But I'm wondering about the live 
 phase -- what approach would you suggest? I wouldn't want to 
 force an open door, perhaps there are already packages/modules that 
 could be used to read data in a loop i.e. every 10 seconds, 
 maintain a a buffer of 15 readings and ring a bell when the data
 in buffer form a specific pattern (a spike, a trough, whatever)?
 
 I'll be grateful for a push in the right direction. Thanks,
 
 GS
 -- 
 Grzegorz Staniak   gstaniak _at_ gmail [dot] com

It might also be worth checking out pandas[1] and scikits.statsmodels[2].

In terms of reading data in a loop I would probably go for a producer-consumer 
model (possibly using a Queue[3]). Have the consumer constantly try to get 
another reading, and notify the consumer which can then determine if it's got 
enough data to calculate a peak/trough. This article is also a fairly good 
read[4].

That's some pointers anyway,

hth,

Jon.


[1] http://pandas.pydata.org/
[2] http://statsmodels.sourceforge.net/
[3] http://docs.python.org/library/queue.html
[4] 
http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data mining/pattern recogniton software in Python?

2012-03-23 Thread Grzegorz Staniak
On 24.03.2012, Jon Clements jon...@googlemail.com wroted:

 It might also be worth checking out pandas[1] and scikits.statsmodels[2].

 In terms of reading data in a loop I would probably go for a 
 producer-consumer model (possibly using a Queue[3]). Have the consumer 
 constantly try to get another reading, and notify the consumer which can then 
 determine if it's got enough data to calculate a peak/trough. This article is 
 also a fairly good read[4].

 That's some pointers anyway,

 hth,

 Jon.

 [1] http://pandas.pydata.org/
 [2] http://statsmodels.sourceforge.net/
 [3] http://docs.python.org/library/queue.html
 [4] 
 http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/

Thanks for the suggestions.

GS
-- 
Grzegorz Staniak   gstaniak _at_ gmail [dot] com
-- 
http://mail.python.org/mailman/listinfo/python-list