For a new project I'm researching different options for integrating machine
learned models into a stream processing system based on [1].

I've considered other frameworks (Spark MLLib for instance) but so far it
seems to me none of them is as mature and agile as sklearn. Neither do I
have enormous scalability needs.

So one of the options would be to have a "prediction server" that is called
over, say, HTTP from the processing pipeline. I've done that in the past
using Cyclone [2]. The main drawback is transporting the features data over
HTTP which can be quite network / marshaling heavy.

Another option is to call the prediction from the processing code directly,
which is in Java. I consider making a native call to the python interpreter
with a suspicious eye as I've encountered many disappointments in the past
with such settings (horrible performance, difficult error handling...)

Jython for sklearn seems irrealistic at first sight.

So I'm wondering if there are other viable options...

Any comment or feedback welcome :)

Eustache

[1] https://storm.incubator.apache.org/
[2] http://cyclone.io/ https://twistedmatrix.com
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to