Dear all - I hope this is of interest to many of you. I've talked to some of you before about a streaming machine learning service that we'd love to get some feedback on. So please meet featurestream.io (http://featurestream.io) !
The idea is to provide a service that removes much of the difficulty in applying machine learning/analytics techniques to streaming (and nonstreaming) data, and to make it easy to extract insights and signals from data streams, which you can use in building apps, making infrastructure better, and more. The current API is very simple but I think it's still powerful enough to be valuable. An example is worth a thousand words, so take a look at http://www.featurestream.io/info (and the ipython notebook link from there) - you feed it a stream of JSON events (eg from your infrastructure, other feeds, from CSV/ARFF files, etc). The prediction API takes a JSON event and gives predictions to missing fields, or can show you if some fields have unexpected values, and so on. The algorithms handle missing values, and you don't need to define a schema in advance - for the technically curious, these are variants of streaming, distributed random forest-type algorithms. We've deployed a pre-launch version on AWS and woud love to get some early feedback on it. So if you're interested, please go to http://featurestream.io to get an access key (or contact me directly). Although this version is hosted, we're open to helping people deploy it locally - please let me know! We plan to extend this by adding some more analytics capabilities, but we'd like to get early feedback from developers to help us shape the direction and the roadmap! Thanks for reading - please share and spread the word! Thanks, Andy -- http://featurestream.io @featurestream
